Reply
Byte
Tuomaz
Posts: 5
Registered: ‎09-07-2009
0

Sleeping ST31000520AS Barracuda LP 1TB drives results in massive head load/unload, strange APM issue

[ Edited ]

Hi

I have 7 Barracuda LP 1TB ST31000520AS. 5 of them are used in a software RAID6 on Ubuntu Server (kernel 2.6.30-02063005-generic). I have set the spindown time to 1h with

 

hdparm -S242 /dev/sdb

/dev/sdb:
setting standby to 242 (1 hours)

 

During use of the drives I have noticed a very frequent clicking sound. When I investigated this further I realized that 4 of the 5 disks in the array had a Start_Stop_Count above 25K! Half of the rated value in 90 days!:

 

sudo smartctl --all /dev/sdc

smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 117 099 006 Pre-fail Always - 130780386
3 Spin_Up_Time 0x0003 096 095 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 075 075 020 Old_age Always - 26557
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 069 060 030 Pre-fail Always - 9998207
9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 2191
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 26
183 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Unknown_Attribute 0x0032 100 099 000 Old_age Always - 1
189 High_Fly_Writes 0x003a 077 077 000 Old_age Always - 23
190 Airflow_Temperature_Cel 0x0022 063 044 045 Old_age Always In_the_past 37 (0 1 37 37)
194 Temperature_Celsius 0x0022 037 056 000 Old_age Always - 37 (0 23 0 0)
195 Hardware_ECC_Recovered 0x001a 036 028 000 Old_age Always - 130780386
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 233019155678609
241 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 4099986043
242 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 1907871403

 

If I read information about one of the 4 problem drives with hdparm:

 

sudo hdparm -I /dev/sdc

/dev/sdc:

ATA device, with non-removable media
Model Number: ST31000520AS
Serial Number: 5VX--------

Firmware Revision: CC32
Transport: Serial
Standards:
Used: unknown (minor revision code 0x0029)
Supported: 8 7 6 5
Likely used: 8
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 1953525168
device size with M = 1024*1024: 953869 MBytes
device size with M = 1000*1000: 1000204 MBytes (1000 GB)
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
Recommended acoustic management value: 254, current value: 0
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns

...

 

There is no line "Advanced power management level"! If I do the same on the fifth drive I get:

 

...
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
Advanced power management level: 254
Recommended acoustic management value: 254, current value: 0
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
Cycle time: min=120ns recommended=120ns
...

 

If I try to set APM on one of the 4 problem drives I get:

 

hdparm -B254 /dev/sdb

/dev/sdb:
setting Advanced Power Management level to 0xfe (254)
HDIO_DRIVE_CMD failed: Input/output error

 and on the fifth, healty drive:

 

sudo hdparm -B254 /dev/sdf

/dev/sdf:
setting Advanced Power Management level to 0xfe (254)

 

 

This is very frustrating and if the Start_Stop_Count continues to skyrocket the 4 drives will soon die.

 

It is the same firmware revision on all of the 5 drives ("CC32"). The fifth drive is bought in another batch then the first 4 though.

 

The system is a Asus M4A78-VM motherboard (AMD 780G+SB700 chipsets) with a 4850e AMD processor. I have tried both the SATA mode and the AHCI mode in the bios.

 

How do I fix this? Why can't I set APM on 4 of the drives? How is it possible for the Start_Stop_Count to reach 25k in 90 days with a spindown time of 1h?

 

 / Tuomaz

 

 

Message Edited by Tuomaz on 09-10-2009 12:31 AM
Yottabyte
fzabkar
Posts: 4,658
Registered: ‎01-27-2009

Re: Cannot set APM on ST31000520AS Barracuda LP 1TB

Barracuda LP Series SATA Product Manual, Rev. A:

http://www.seagate.com/staticfiles/support/disc/manuals/desktop/Barracuda%20LP/100564361a.pdf

Page 29 of the manual (page 35 of the PDF) states that "Advanced Power Management (APM) and Automatic Acoustic Management (AAM) features are not supported."

Page 12 (18) has this to say about Standby mode:

"The drive enters Standby mode when the host sends a Standby Immediate command. If the host has set the standby timer, the drive can also enter Standby mode automatically after the drive has been inactive for a specifiable length of time. The standby timer delay is established using a Standby or Idle command. In Standby mode, the drive buffer is enabled, the heads are parked and the spindle is at rest. The drive accepts all commands and returns to Active mode any time disc access is necessary."

See pages 264 (STANDBY) and 171 (Table 40) of the "AT Attachment 8 - ATA/ATAPI Command Set (ATA8-ACS)":

http://www.t13.org/Documents/UploadedDocuments/docs2008/D1699r6a-ATA8-ACS.pdf

Byte
Tuomaz
Posts: 5
Registered: ‎09-07-2009
0

Re: Cannot set APM on ST31000520AS Barracuda LP 1TB

Hmm. This just raises two new questions: 1. Why can I set APM on one of the drives? and 2. How do I stop the Start_Stop_Count to increase dramatically?
Yottabyte
fzabkar
Posts: 4,658
Registered: ‎01-27-2009

Re: Cannot set APM on ST31000520AS Barracuda LP 1TB

No doubt everyone is just as puzzled as you are. :-(

One thing I would suggest is to capture the 512 bytes (= 256 words) returned by each drive in response to an ATA Identify Device command. Then look for differences.

You can retrieve these data with smartmontools in debug mode, eg ...

smartctl -a -r ioctl,2 -d usbjmicron scsi10

The above example generates the following log file for my AMD K6-2 Win98SE box using a generic Microsoft USB mass storage driver, with a Seagate 320GB drive in an external USB enclosure, behind a JMicron USB-ATA bridge chip:

http://www.users.on.net/~fzabkar/Smartctl/320GB_all.log

The Identify Device command is documented on page 114 of the ATA-8 spec.
Byte
Tuomaz
Posts: 5
Registered: ‎09-07-2009
0

Re: Cannot set APM on ST31000520AS Barracuda LP 1TB

Now I have done some further testing. I set the sleep time for one of the first 6 drives (bought in one batch) and not used in my raid array to 1h yesterday. After some hours I checked with

 

 

sudo hdparm -C /dev/sda

/dev/sda:
drive state is: standby

"standby" means that the spindle has stopped. This drive had a Start_Stop_Count value of 12 before I set the sleep time to 1h. Today I checked it with smartctl (smartctl wakes the drive from standby) :

sudo smartctl -A /dev/sda
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 119 099 006 Pre-fail Always - 234129350
3 Spin_Up_Time 0x0003 096 096 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 898
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0

In 24h of sleeping the drive has done 886 load/unload of the heads. Why?

 

For safety reasons I have now disabled sleep for all my Seagate drives. I have tested the same thing on my 7th drive but not for 24h, only 3h. It just inreased its Start_Stop_Count with 2 (which is the expected value).

 

I suspect that Seagate has done somehting to correct this issue on newer drives (like my 7th). The question is if the drives actually does all this reported load/unload or if it is just an error in the code that counts the Start_Stop.

Yottabyte
fzabkar
Posts: 4,658
Registered: ‎01-27-2009

Re: Cannot set APM on ST31000520AS Barracuda LP 1TB

[ Edited ]
Interesting experiment.

886 load/unload cycles in 24 hours amounts to about 1 every 100 seconds.

Have any of the other SMART attributes changed in value during that time?

In particular I'd compare the Seek Error Rate before and after. The raw value actually counts the number of seeks -- it is not an error rate.

Also look at the Head Flying Hours.

For example, a raw value of 233019155678609 equates to a hex value of 0xd3ee00000591. I suspect that the actual flying time is 0x00000591 hours (the lower 32 bits), ie 1425. Power_On_Hours is 2191.

BTW, have a look at this [unrelated] thread:

http://forums.seagate.com/stx/board/message?board.id=ata_drives&message.id=14681#M14681

You'll see that you're not the only one with firmware issues.

If you need a hex calculator, try this one:

http://www.mrcalculator.com/hexdec.html

For small numbers you can use Google's calculator, eg ...

http://www.google.com/search?hl=en&client=opera&rls=en&hs=PQP&num=25&newwindow=1&q=0x591+in+decimal&btnG=Search

http://www.google.com/search?hl=en&client=opera&rls=en&hs=9k4&num=25&newwindow=1&q=1425+in+hex&btnG=Search
Message Edited by fzabkar on 10-09-2009 01:24 AM
Yottabyte
fzabkar
Posts: 4,658
Registered: ‎01-27-2009

Re: Cannot set APM on ST31000520AS Barracuda LP 1TB

I found the following information in the ATA-8 command spec. Perhaps the APM feature set can be enabled using ATA DCO commands.

"The optional DCO [Device Configuration Overlay] feature set allows a utility program to reduce the capability of the device by modifying some of the optional commands, modes, and feature sets that a device reports as supported in the IDENTIFY DEVICE or IDENTIFY PACKET DEVICE data as well as the capacity reported."

7.10.3 DEVICE CONFIGURATION IDENTIFY - B1h/C2h, PIO Data-In

"The DEVICE CONFIGURATION IDENTIFY command causes a device to return a 512-byte data structure. The content of this data structure indicates the selectable commands, modes, and feature sets that the device is capable of disabling or modifying through processing of a DEVICE CONFIGURATION SET command. If a DEVICE CONFIGURATION SET command reducing a device's capabilities has completed without error, then:

a) the response by a device to an IDENTIFY DEVICE, IDENTIFY PACKET DEVICE, and other commands, except the DEVICE CONFIGURATION IDENTIFY command, shall reflect the reduced set of capabilities; and

b) the response by a device to a DEVICE CONFIGURATION IDENTIFY command shall reflect the entire set of selectable capabilities."
Byte
Tuomaz
Posts: 5
Registered: ‎09-07-2009
0

Re: Cannot set APM on ST31000520AS Barracuda LP 1TB

Here are before and after values from 2h of sleep for one of my drives:

 

cat 090910-1106 090910-1305
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 119 099 006 Pre-fail Always - 132241
3 Spin_Up_Time 0x0003 096 096 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 945
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 064 060 030 Pre-fail Always - 2970203
9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 1956
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 13
183 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 3
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 082 070 045 Old_age Always - 18 (Lifetime Min/Max 18/28)
194 Temperature_Celsius 0x0022 018 040 000 Old_age Always - 18 (Lifetime Min/Max 0/18)
195 Hardware_ECC_Recovered 0x001a 038 022 000 Old_age Always - 132241
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 186118112806821
241 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 2122586099
242 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 2800954763

smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 119 099 006 Pre-fail Always - 132241
3 Spin_Up_Time 0x0003 096 096 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 1003
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 064 060 030 Pre-fail Always - 2970208
9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 1958
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 13
183 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 4
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 082 070 045 Old_age Always - 18 (Lifetime Min/Max 18/28)
194 Temperature_Celsius 0x0022 018 040 000 Old_age Always - 18 (Lifetime Min/Max 0/18)
195 Hardware_ECC_Recovered 0x001a 038 022 000 Old_age Always - 132241
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 66052302047143
241 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 2122586099
242 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 2800954763

Commands were:

smartctl -A /dev/sda
hdparm -S120 /dev/sda
hdparm -Y /dev/sda
smartctl -A /dev/sda

 (no hdparm -Y, it seems like it wakes the drive)

 

Haven't got time right now to carefully read you last post fzabkar, but it seems like you know very much about harddrives and the different ATA standards. Thank you.

 

 

 

 

Yottabyte
fzabkar
Posts: 4,658
Registered: ‎01-27-2009
0

Re: Cannot set APM on ST31000520AS Barracuda LP 1TB

Thanks for the compliment, but I assure you I'm reading the ATA-8 standard for the first time.

I notice this time that the Start_Stop_Count increased by 58 in 2 hours. That's one event every 124 seconds. Assuming the events are regularly spaced, then you would only need to listen for activity for about 2 minutes.

I also notice that the Raw_Read_Error_Rate value does not change, and that the Seek_Error_Rate increases by only 5 counts. The 5 seeks would be incurred when the drive retrieves SMART data from the platters. Otherwise there does not appear to be any sign of software induced activity.

Power_On_Hours increases from 1956 to 1958. At the same time Head_Flying_Hours goes from 186118112806821 to 66052302047143.

In hexadecimal these numbers are 0xa946000007a5 and 0x3c13000007a7.

If I'm right, then the actual flying hours are 0x7a5 (= 1957) and 0x7a7 (= 1959) which of course doesn't make too much sense. It certainly doesn't tally with what we observed for your other drive.

The other curiosity is "188 Unknown_Attribute" which increases from 3 to 4.

According to this article ...

http://en.wikipedia.org/wiki/S.M.A.R.T.

... attribute 188 represents Command Timeouts.
Byte
acranox
Posts: 4
Registered: ‎09-28-2009
0

Re: Cannot set APM on ST31000520AS Barracuda LP 1TB

Did you find out anything more on this issue?  I used `hdparm -y` to put one of my disks into standby today, and in 8 hours of being spun down and inactive (I kept checking it with hdparm -C, it remained in standby the whole time) my Start_Stop_Count went up by over 150.  This just doesn't make any sense.  I can hear that the drive is inactive, so why is the Start_Stop_Count increasing?