Haldokló HDD

Mostanában lettem figyelmes, hogy a laptopom vinyója fura, kattogó hangokat hallatt.
Csináltam egy smartctl tesztet is:


smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.2.5-1-ARCH] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: Hitachi HTS723232A7A364
Serial Number: E3834523G28NXN
LU WWN Device Id: 5 000cca 6d5c1094e
Firmware Version: EC2OA60W
User Capacity: 320,072,933,376 bytes [320 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Form Factor: 2.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ATA8-ACS T13/1699-D revision 6
SATA Version is: SATA 2.6, 3.0 Gb/s
Local Time is: Mon Nov 23 10:47:49 2015 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 45) seconds.
Offline data collection
capabilities: (0x51) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 75) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 062 Pre-fail Always - 0
2 Throughput_Performance 0x0025 100 100 040 Pre-fail Offline - 0
3 Spin_Up_Time 0x0023 239 100 033 Pre-fail Always - 1
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1715
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x002f 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0025 100 100 040 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 089 089 000 Old_age Always - 4955
10 Spin_Retry_Count 0x0033 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1659
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 097 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 4299227136
188 Command_Timeout 0x0032 099 094 000 Old_age Always - 2125
190 Airflow_Temperature_Cel 0x0022 081 050 045 Old_age Always - 19 (Min/Max 15/36)
191 G-Sense_Error_Rate 0x0032 098 098 000 Old_age Always - 720
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 7077996
193 Load_Cycle_Count 0x0032 088 088 000 Old_age Always - 120953
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 100 100 000 Old_age Always - 0
223 Load_Retry_Count 0x002a 100 100 000 Old_age Always - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 4953 -
# 2 Short offline Completed without error 00% 3337 -
# 3 Short offline Completed without error 00% 3337 -
# 4 Short offline Completed without error 00% 3337 -
# 5 Short offline Completed without error 00% 3337 -
# 6 Short offline Completed without error 00% 3337 -
# 7 Short offline Completed without error 00% 3337 -
# 8 Short offline Completed without error 00% 3337 -
# 9 Short offline Completed without error 00% 3337 -
#10 Short offline Completed without error 00% 3337 -
#11 Short offline Completed without error 00% 3337 -
#12 Short offline Completed without error 00% 3337 -
#13 Short offline Completed without error 00% 3337 -
#14 Short offline Completed without error 00% 3336 -
#15 Short offline Completed without error 00% 3336 -
#16 Short offline Completed without error 00% 3336 -
#17 Short offline Completed without error 00% 3336 -
#18 Short offline Completed without error 00% 3336 -
#19 Short offline Completed without error 00% 3336 -
#20 Short offline Completed without error 00% 3336 -
#21 Short offline Completed without error 00% 3336 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Én nem látok benne semmi hibára utaló jelet. Van benne olyan, ami a fentebb linkelt hangra utaló adat lehet?

Hozzászólások

From experience, we have found the following 5 SMART metrics indicate impending disk drive failure:

SMART 5 – Reallocated_Sector_Count.
SMART 187 – Reported_Uncorrectable_Errors.
SMART 188 – Command_Timeout.
SMART 197 – Current_Pending_Sector_Count.
SMART 198 – Offline_Uncorrectable.

Nálad ez, ha jól látom:

187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 4299227136

A következő is elég magas:

192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 7077996

Ez azt mutatja, hogy hányszor állt le szabálytalanul a HDD betáp hiánya miatt (pl. hirtelen power loss vagy hard reset). Gondolom ennyi nem volt. Lehet, hogy tényleg szarakodik.

A kattogásra a "193 Load_Cycle_Count 0x0032 088 088 000 Old_age Always - 120953" utalhat, de a kattogás önmagában nem ok a halálra. Sok HDD 0. perces korától kattog és növeli a Load_Cycle_Count értékét, ha nem kap speciális beállítást.

Ha korábban nem kattogott, akkor nyilván érdemes erre figyelni. A többi érték engem jobban aggasztana...

--
trey @ gépház

A következő is elég magas

A RAW_VALUE ertekek nem minden gyartonal jelentenek konkret szamertekeket, van, hogy bitkombinaciok. Ilyenkor erdemes/kell a VALUE/WORST/THRESH harmast nezni (a value gyartotol fuggoen 100/200/255-rol indul, es csakis csokkenhet). Ezeknel latszik, hogy a gyarto 100-rol inditja az ertekeket, es a pillanatnyi es a legrosszabb ertek is boven a kuszobertek folott van, tehat valoszinuleg mashol kell a hibat keresni. En megneznem a tapot/tapcsatlakozot.

Szerintetek ilyen adatokkal kell aggodjak? Tobb helyen is 0/6156147 van.


smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.2.0-19-generic] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SandForce Driven SSDs
Device Model:     Corsair Force GT
Serial Number:    12287938000015020498
LU WWN Device Id: 0 000000 000000000
Firmware Version: 5.02
User Capacity:    120,034,123,776 bytes [120 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS, ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Nov 23 20:57:24 2015 EET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(    0) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 (  48) minutes.
Conveyance self-test routine
recommended polling time: 	 (   2) minutes.
SCT capabilities: 	       (0x0021)	SCT Status supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   102   102   050    Pre-fail  Always       -       0/6156147
  5 Retired_Block_Count     0x0033   100   100   003    Pre-fail  Always       -       0
  9 Power_On_Hours_and_Msec 0x0032   000   000   000    Old_age   Always       -       10366h+17m+23.170s
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1990
171 Program_Fail_Count      0x0032   000   000   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   000   000   000    Old_age   Always       -       0
174 Unexpect_Power_Loss_Ct  0x0030   000   000   000    Old_age   Offline      -       171
177 Wear_Range_Delta        0x0000   000   000   000    Old_age   Offline      -       1
181 Program_Fail_Count      0x0032   000   000   000    Old_age   Always       -       0
182 Erase_Fail_Count        0x0032   000   000   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   026   041   000    Old_age   Always       -       26 (Min/Max 0/41)
195 ECC_Uncorr_Error_Count  0x001c   120   120   000    Old_age   Offline      -       0/6156147
196 Reallocated_Event_Count 0x0033   100   100   003    Pre-fail  Always       -       0
201 Unc_Soft_Read_Err_Rate  0x001c   120   120   000    Old_age   Offline      -       0/6156147
204 Soft_ECC_Correct_Rate   0x001c   120   120   000    Old_age   Offline      -       0/6156147
230 Life_Curve_Status       0x0013   100   100   000    Pre-fail  Always       -       100
231 SSD_Life_Left           0x0013   100   100   010    Pre-fail  Always       -       0
233 SandForce_Internal      0x0000   000   000   000    Old_age   Offline      -       1049
234 SandForce_Internal      0x0032   000   000   000    Old_age   Always       -       894
241 Lifetime_Writes_GiB     0x0032   000   000   000    Old_age   Always       -       894
242 Lifetime_Reads_GiB      0x0032   000   000   000    Old_age   Always       -       4129

SMART Error Log not supported

SMART Self-test Log not supported

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

itt van egy munin grafikon is rolla

ui: elnezest a blog tulajtol

hdparm -B 255 /dev/eszkoz

Ez segít rajta? Ha nem, akkor lehet, hogy van a lemezhez gyártó által biztosított utility, amivel az advanced power management-et ki lehet kapcsolni. (Van olyan WD lemezem, amihez volt ilyen program (wdidle) és ki is kapcsoltam vele.)

Mint mondtam, önmagában a Load Cycle Count növekedés nem fatális hiba. Van olyan diszk, ami 0. perces korától csinálja, pl. ha bugos BIOS-szal találkozik és az OS workaround nem megfelelő. Ettől önmagában nem fog megpusztulni, de rizikófaktor.

Viszont hallgatni kib. idegesítő, szóval én is tettem ellene. Jó régen SSD-re váltottam. Ez nem kattog. :)

--
trey @ gépház

Nekem is kattogott, teljesen kikeltem magamból.
Adatmentés közben leállt, azóta halott.
Már a BIOS sem látja.

---
--- A gond akkor van, ha látszólag minden működik. ---
---

Eddig még senki sem tette szóvá, így én most megteszem. Légyszi zárd le a code taget, mert a blog listázást elcseszi.

Reported Uncorrect Hitachi-ra nem releváns.
Command_Timeout-ra ránézek, ez inkább aggasztana.