09-08-2009 12:11 AM - edited 09-10-2009 12:31 AM
Hi
I have 7 Barracuda LP 1TB ST31000520AS. 5 of them are used in a software RAID6 on Ubuntu Server (kernel 2.6.30-02063005-generic). I have set the spindown time to 1h with
hdparm -S242 /dev/sdb
/dev/sdb:
setting standby to 242 (1 hours)
During use of the drives I have noticed a very frequent clicking sound. When I investigated this further I realized that 4 of the 5 disks in the array had a Start_Stop_Count above 25K! Half of the rated value in 90 days!:
sudo smartctl --all /dev/sdc
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 117 099 006 Pre-fail Always - 130780386
3 Spin_Up_Time 0x0003 096 095 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 075 075 020 Old_age Always - 26557
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 069 060 030 Pre-fail Always - 9998207
9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 2191
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 26
183 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Unknown_Attribute 0x0032 100 099 000 Old_age Always - 1
189 High_Fly_Writes 0x003a 077 077 000 Old_age Always - 23
190 Airflow_Temperature_Cel 0x0022 063 044 045 Old_age Always In_the_past 37 (0 1 37 37)
194 Temperature_Celsius 0x0022 037 056 000 Old_age Always - 37 (0 23 0 0)
195 Hardware_ECC_Recovered 0x001a 036 028 000 Old_age Always - 130780386
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 233019155678609
241 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 4099986043
242 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 1907871403
If I read information about one of the 4 problem drives with hdparm:
sudo hdparm -I /dev/sdc
/dev/sdc:
ATA device, with non-removable media
Model Number: ST31000520AS
Serial Number: 5VX--------Firmware Revision: CC32
Transport: Serial
Standards:
Used: unknown (minor revision code 0x0029)
Supported: 8 7 6 5
Likely used: 8
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 1953525168
device size with M = 1024*1024: 953869 MBytes
device size with M = 1000*1000: 1000204 MBytes (1000 GB)
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
Recommended acoustic management value: 254, current value: 0
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns...
There is no line "Advanced power management level"! If I do the same on the fifth drive I get:
...
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
Advanced power management level: 254
Recommended acoustic management value: 254, current value: 0
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
Cycle time: min=120ns recommended=120ns
...
If I try to set APM on one of the 4 problem drives I get:
hdparm -B254 /dev/sdb
/dev/sdb:
setting Advanced Power Management level to 0xfe (254)
HDIO_DRIVE_CMD failed: Input/output error
and on the fifth, healty drive:
sudo hdparm -B254 /dev/sdf
/dev/sdf:
setting Advanced Power Management level to 0xfe (254)
This is very frustrating and if the Start_Stop_Count continues to skyrocket the 4 drives will soon die.
It is the same firmware revision on all of the 5 drives ("CC32"). The fifth drive is bought in another batch then the first 4 though.
The system is a Asus M4A78-VM motherboard (AMD 780G+SB700 chipsets) with a 4850e AMD processor. I have tried both the SATA mode and the AHCI mode in the bios.
How do I fix this? Why can't I set APM on 4 of the drives? How is it possible for the Start_Stop_Count to reach 25k in 90 days with a spindown time of 1h?
/ Tuomaz
09-08-2009 02:19 AM
09-08-2009 02:46 AM
09-08-2009 01:06 PM
09-09-2009 07:21 AM
Now I have done some further testing. I set the sleep time for one of the first 6 drives (bought in one batch) and not used in my raid array to 1h yesterday. After some hours I checked with
sudo hdparm -C /dev/sda
/dev/sda:
drive state is: standby
"standby" means that the spindle has stopped. This drive had a Start_Stop_Count value of 12 before I set the sleep time to 1h. Today I checked it with smartctl (smartctl wakes the drive from standby) :
sudo smartctl -A /dev/sda
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 119 099 006 Pre-fail Always - 234129350
3 Spin_Up_Time 0x0003 096 096 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 898
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
In 24h of sleeping the drive has done 886 load/unload of the heads. Why?
For safety reasons I have now disabled sleep for all my Seagate drives. I have tested the same thing on my 7th drive but not for 24h, only 3h. It just inreased its Start_Stop_Count with 2 (which is the expected value).
I suspect that Seagate has done somehting to correct this issue on newer drives (like my 7th). The question is if the drives actually does all this reported load/unload or if it is just an error in the code that counts the Start_Stop.
09-09-2009 08:12 AM - edited 09-09-2009 08:24 AM
09-10-2009 02:02 AM
09-10-2009 04:19 AM
Here are before and after values from 2h of sleep for one of my drives:
cat 090910-1106 090910-1305
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 119 099 006 Pre-fail Always - 132241
3 Spin_Up_Time 0x0003 096 096 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 945
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 064 060 030 Pre-fail Always - 2970203
9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 1956
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 13
183 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 3
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 082 070 045 Old_age Always - 18 (Lifetime Min/Max 18/28)
194 Temperature_Celsius 0x0022 018 040 000 Old_age Always - 18 (Lifetime Min/Max 0/18)
195 Hardware_ECC_Recovered 0x001a 038 022 000 Old_age Always - 132241
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 186118112806821
241 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 2122586099
242 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 2800954763
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 119 099 006 Pre-fail Always - 132241
3 Spin_Up_Time 0x0003 096 096 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 1003
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 064 060 030 Pre-fail Always - 2970208
9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 1958
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 13
183 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 4
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 082 070 045 Old_age Always - 18 (Lifetime Min/Max 18/28)
194 Temperature_Celsius 0x0022 018 040 000 Old_age Always - 18 (Lifetime Min/Max 0/18)
195 Hardware_ECC_Recovered 0x001a 038 022 000 Old_age Always - 132241
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 66052302047143
241 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 2122586099
242 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 2800954763
Commands were:
smartctl -A /dev/sda
hdparm -S120 /dev/sda
hdparm -Y /dev/sda
smartctl -A /dev/sda
(no hdparm -Y, it seems like it wakes the drive)
Haven't got time right now to carefully read you last post fzabkar, but it seems like you know very much about harddrives and the different ATA standards. Thank you.
09-10-2009 05:43 AM
09-28-2009 07:16 PM
©2012 Seagate Technology LLC