Rendszer Debian Lenny, Linux fw 2.6.26-2-amd64 #1 SMP Tue Mar 9 22:29:32 UTC 2010 x86_64
Tegnap épp dolgozok a raides diszken és crc error hegyek, néztem egy "mdadm --detail /dev/md0" parancsot, de semmi gondot nem jelzett, e-mail se jött hogy gond lenne.
Utána néztem egy hdsentinel kimentet és a syslogot, ahol az SDC lemezen körül irt gondokat (Health: 87%), erre próbaképp az alábbi parancsokat kiadtam:
mdadm --fail /dev/md0 /dev/sdc5
mdadm --remove /dev/md0 /dev/sdc5
mdadm --add /dev/md0 /dev/sdc5
Erre elkezte újra buildelni az md0-t jelenleg 82% körül tart és érdekes módon a Health: 92% -ra nött?!
Ez mitöl lehetséges? A diszkek hőmérséklete nagyjából az alábbi értékeken van állandóan, 2 diszk alig 84 napos (uptime 76 napos), a gép nincs mozgatva, szünetmentesen van.
Vezérlő gond, alaplap vagy memória gond? Miért nem jelzi a raid hogy gond van és csatolja le a hibásnak vélt diszket?
hddtemp:
--------
/dev/sdb: WDC WD15EARS-00Z5B1: 42°C
/dev/sdc: WDC WD15EARS-00Z5B1: 43°C
/dev/sdd: WDC WD15EARS-00Z5B1: 44°C
hdsentinel:
-----------
HDD Device 1: /dev/sdb
HDD Model ID : WDC WD15EARS-00Z5B1
HDD Serial No: WD-WMAVU1303580
HDD Revision : 80.00A80
HDD Size : 1430799 MB
Interface : S-ATA II
Temperature : 42 °C
Health : 100 %
Performance : 100 %
Power on time: 142 days, 13 hours
Est. lifetime: more than 1000 days
HDD Device 2: /dev/sdc
HDD Model ID : WDC WD15EARS-00Z5B1
HDD Serial No: WD-WMAVU2807624
HDD Revision : 80.00A80
HDD Size : 1430799 MB
Interface : S-ATA II
Temperature : 43 °C
Health : 92 %
Performance : 100 %
Power on time: 84 days, 9 hours
Est. lifetime: more than 1000 days
HDD Device 3: /dev/sdd
HDD Model ID : WDC WD15EARS-00Z5B1
HDD Serial No: WD-WMAVU2901326
HDD Revision : 80.00A80
HDD Size : 1430799 MB
Interface : S-ATA II
Temperature : 44 °C
Health : 100 %
Performance : 100 %
Power on time: 84 days, 12 hours
Est. lifetime: more than 1000 days
md0 : active raid5 sdc5[3] sdb5[0] sdd5[2]
2930271744 blocks level 5, 64k chunk, algorithm 2 [3/2] [U_U]
[================>....] recovery = 81.7% (1197531144/1465135872) finish=299.5min speed=14886K/sec
Jul 18 19:04:07 fw kernel: [9465188.652163] ata2.00: exception Emask 0x0 SAct 0x7 SErr 0x0 action 0x0
Jul 18 19:04:07 fw kernel: [9465188.652163] ata2.00: irq_stat 0x40000001
Jul 18 19:04:07 fw kernel: [9465188.652163] ata2.00: cmd 60/30:00:ce:63:6c/00:00:0d:00:00/40 tag 0 ncq 24576 in
Jul 18 19:04:07 fw kernel: [9465188.652163] res 41/40:00:f8:63:6c/24:00:0d:00:00/40 Emask 0x409 (media error)
Jul 18 19:04:07 fw kernel: [9465188.652163] ata2.00: status: { DRDY ERR }
Jul 18 19:04:07 fw kernel: [9465188.652163] ata2.00: error: { UNC }
Jul 18 19:04:07 fw kernel: [9465188.652163] ata2.00: cmd 60/70:08:fe:63:6c/00:00:0d:00:00/40 tag 1 ncq 57344 in
Jul 18 19:04:07 fw kernel: [9465188.652163] res 41/40:00:00:00:00/24:00:00:00:00/00 Emask 0x9 (media error)
Jul 18 19:04:07 fw kernel: [9465188.652163] ata2.00: status: { DRDY ERR }
Jul 18 19:04:07 fw kernel: [9465188.652163] ata2.00: error: { UNC }
Jul 18 19:04:07 fw kernel: [9465188.652163] ata2.00: cmd 60/08:10:76:74:98/00:00:0d:00:00/40 tag 2 ncq 4096 in
Jul 18 19:04:07 fw kernel: [9465188.652163] res 41/40:00:00:00:00/00:00:00:00:00/00 Emask 0x9 (media error)
Jul 18 19:04:07 fw kernel: [9465188.652163] ata2.00: status: { DRDY ERR }
Jul 18 19:04:07 fw kernel: [9465188.652163] ata2.00: error: { UNC }
Jul 18 19:04:07 fw kernel: [9465188.659497] ata2.00: configured for UDMA/33
Jul 18 19:04:07 fw kernel: [9465188.659497] ata2: EH complete
Jul 18 19:04:07 fw kernel: [9465188.659497] sd 2:0:0:0: [sdc] 2930277168 512-byte hardware sectors (1500302 MB)
Jul 18 19:04:07 fw kernel: [9465188.659497] sd 2:0:0:0: [sdc] Write Protect is off
Jul 18 19:04:07 fw kernel: [9465188.659497] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
Jul 18 19:04:07 fw kernel: [9465188.659497] sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jul 18 19:04:10 fw kernel: [9465192.225164] ata2.00: exception Emask 0x0 SAct 0x7 SErr 0x0 action 0x0
Jul 18 19:04:10 fw kernel: [9465192.225164] ata2.00: irq_stat 0x40000008
Jul 18 19:04:10 fw kernel: [9465192.225164] ata2.00: cmd 60/30:10:ce:63:6c/00:00:0d:00:00/40 tag 2 ncq 24576 in
Jul 18 19:04:10 fw kernel: [9465192.225164] res 41/40:00:f8:63:6c/24:00:0d:00:00/40 Emask 0x409 (media error)
Jul 18 19:04:10 fw kernel: [9465192.225164] ata2.00: status: { DRDY ERR }
Jul 18 19:04:10 fw kernel: [9465192.225164] ata2.00: error: { UNC }
Jul 18 19:04:10 fw kernel: [9465192.236390] ata2.00: configured for UDMA/33
Jul 18 19:04:10 fw kernel: [9465192.236407] ata2: EH complete
Jul 18 19:04:10 fw kernel: [9465192.236390] sd 2:0:0:0: [sdc] 2930277168 512-byte hardware sectors (1500302 MB)
Jul 18 19:04:10 fw kernel: [9465192.236390] sd 2:0:0:0: [sdc] Write Protect is off
Jul 18 19:04:10 fw kernel: [9465192.236390] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
Jul 18 19:04:10 fw kernel: [9465192.236390] sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jul 18 19:04:13 fw kernel: [9465195.740082] ata2.00: exception Emask 0x0 SAct 0x7 SErr 0x0 action 0x0
Jul 18 19:04:13 fw kernel: [9465195.740087] ata2.00: irq_stat 0x40000008
Jul 18 19:04:13 fw kernel: [9465195.740094] ata2.00: cmd 60/30:00:ce:63:6c/00:00:0d:00:00/40 tag 0 ncq 24576 in
Jul 18 19:04:13 fw kernel: [9465195.740096] res 41/40:00:f8:63:6c/24:00:0d:00:00/40 Emask 0x409 (media error)
Jul 18 19:04:13 fw kernel: [9465195.740099] ata2.00: status: { DRDY ERR }
Jul 18 19:04:13 fw kernel: [9465195.740102] ata2.00: error: { UNC }
Jul 18 19:04:13 fw kernel: [9465195.745985] ata2.00: configured for UDMA/33
Jul 18 19:04:13 fw kernel: [9465195.745985] ata2: EH complete
Jul 18 19:04:13 fw kernel: [9465195.746792] sd 2:0:0:0: [sdc] 2930277168 512-byte hardware sectors (1500302 MB)
Jul 18 19:04:13 fw kernel: [9465195.746792] sd 2:0:0:0: [sdc] Write Protect is off
Jul 18 19:04:13 fw kernel: [9465195.746792] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
Jul 18 19:04:13 fw kernel: [9465195.746792] sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA