01-27-2009 07:49 PM - last edited on 01-28-2009 05:46 AM by BradC
That's not true, my old drives do not have reallocated sectors.
I have had, in my possession, 7 1.5TB drives. Out of the 7, 1 of them started off with 0 sectors, but went up to 36 shortly after. 2 of them are with 0 sectors so far, after about 100 hours. The rest have had from a moderate number of reallocated sectors, to a very large one. (2 of the drives even had bad blocks).
I will keep RMA'ing until I get drives that are to my satisfaction. If my old HDDs that ran for 16,000 hours can have 0, a new drive can have 0 as well. I believe Google has said that a drive is 17 times more likely to fail soon after 1 reallocated sector is found, which is unacceptable to me.
[Edited in compliance of the community rules and regulations.]
01-28-2009 03:14 AM
Your call, f00kie, if you want to want to be without storage for longer than necessary.
I am a little more perturbed by drives that develop a reallocated block early in their lifespan, but I guess things can happen to a drive whilst in transit that make marginal blocks more likely to fail. As an aside, I strongly suspect that as areal density increases, the chances of blocks going back increases, so comparing current model drives with older, less dense drives isn't a fair comparison.
Do you run SMART long tests or RAID scrubs at all?
As for your drives with read errors, sometimes these are merely soft errors. Have you tried locating the affected blocks and writing to them? (e.g. by removing any file that occupies them, and filling the partition with a large dummy file)? If the reallocated block count doesn't increase, they were indeed soft errors (high fly writes and power problems are common causes). If the reallocated block count does increase, then they were hard errors after all.
Oh, and Google's paper says the most significant indicator of failure is a scan error (3.5.1). They reckon reallocation counts >=1 impact Annualized Failure rates by 3-6x, rather than the 17x you state. Furthermore, this drops after 8.5 months, and 85% of drives that have a reallocation event go on to survive past those 8.5 months.
01-29-2009 10:51 AM - edited 01-29-2009 10:53 AM
Experiencing same thing on my 1.5tb. Has been happening since it was brand new, every week I check and reallocated sectors has increased. Last time checked it was 31. Currently using SD17 firmware with no problems other than reallocated sectors. Seagate serial # checker says my drive is unaffected and doesn't require a firmware update. Currently it passes all the Seatools tests but as soon as seatools gives an error, the drive is going back.
Edit: Applying the firmware update would have no effect on reallocated sectors would it?
02-16-2009 01:42 PM - edited 03-12-2009 06:27 AM
Just chiming in with a "me, too." I recently purchased 18 ST31500341AS 1.5TB drives that shipped with firmware CC1H. All of the drives started out with a reallocated sector count of 0. The reallocated sector counts have increased to 1, 3, 19 and 51 on 4 of the 8 drives that I've been exercising (long SMART tests, RAID 5 array builds/rebuilds) over the course of a few days. My sample and sampling time are small, but the counts seem to increase over time, rather than increasing, then stabilizing as I continue to exercise the drives. I agree with cowtub that the chances for sector reallocation are probably higher for these dense drives. Maybe we're comparing apples and oranges when we expect 1.5 TB drives to behave like an 80 GB drive that's been spinning for years and has never reallocated a single block. Who knows? But it's disturbing to me when a brand new drive starts reallocating sectors right out of the box. And I'm not keen on entrusting valuable data to drives that might fail in a few weeks or months. I'm not really sure how to proceed. If I RMA the drives, I have a feeling the replacements will behave similarly. I also have a bad feeling that if I keep beating on the 4 drives that currently have not reallocated any sectors, they'll begin reallocating sectors eventually, too. But maybe I'm being too skeptical.
added on 02-26-2009: I've completed testing of 16 of the 18 drives and 5 of them reallocated at least one sector (1, 3, 3, 19 and 51). One of the drives that reallocated sectors was actually a replacement for the first drive I returned to the vendor. What really bothers me about this is that I can have a drive up and running for a number of days, run a long SMARTCTL test, rebuild my RAID5 array onto the drive, then read heavily from the drive as I rebuild the array onto the next 3 drives I plug in for testing, and the drive may not reallocate any sectors until the last step. That's what leads me to say that I have a feeling that if I kept beating on any of these drives, they'd probably all start reallocating sectors eventually.
I also had an ST3500320NS (ES.2) drive updated to SN06 that had been in service for roughly 2 months fail last weekend and noted that it had reallocated 826 sectors. My faith in Seagate was already shaken by the ongoing ES.2/7200.11 firmware issues, but now I'm starting to question the reliability of the hardware, as well. This is very discouraging.
added on 02-27-2009: I decided to pull an ST31000340AS updated to SD1A that has been spinning in a Mac for under 90 days so I could take a look at it using smartctl. The drive has already reallocated 10 sectors and has 4 current pending sectors. Is this drive headed down the path of the ES.2? It appears that Seagate has issues with the 7200.11 and ES.2 lines that are independent of the firmware issues that have been discussed at length. Are others considering replacing 7200.11 and ES.2 drives in mission critical situations with drives manufactured by another company?
added on 03-12-2009: After more testing of the 18 x ST31500341AS drives, I ended up with another set of 4 drives with non-zero reallocated sector counts and intended to return them to the vendor for exchange. I planned to attend a local security conference last week, but got tied up with other things. But someone who did attend informed me that an FBI agent stated during a presentation that the ST31500341AS drives were unreliable and should be avoided, suggesting that WD drives be used instead. That was the last straw for me. I decided to try return the whole lot of 18 drives to the vendor and wait for the WD15EADS to hit the retail channels. Not surprisingly, the vendor wasn't happy about taking back 18 opened drives, but they were kind enough to do it. I'm sure I'll have to deal with similar problems with other 7200.11 and ES.2 drives that I've purchased, but at least the scale will be smaller.
03-08-2009 09:03 AM - last edited on 03-08-2009 09:38 AM by BradC
Another me too - which with these 7200.11 1.5TB drives seems to be unfortunately the case.
I have SD1A firmware (upgraded myself immediately after purchase). Have not tried later firmware yet.
4 drives in RAID 10 with 1 spare (so luckily very little chance of data loss).
Drives have been in service only 1 month, but already 2 out of the 4 have reallocated sectors (1 reallocated sector on one drive, and 2 on another drive). The spare I just added yesterday so no reallocated sectors (yet).
The first drive developed its reallocated sector only 1 day after I started it up. The second drive developed its reallocated sector yesterday when I added the spare. Incidentally, and possibly a pattern, both developed their reallocated sectors on a shutdown and reboot of the server (which I rarely do). I note that the firmware bugs that have plagued these drives affect data loss on power down, so I wonder if there is a connection.
In any case, I just wanted to add that I was running 1TB 7K1000 drives for one and a half years of continuous 24x7 service before "upgrading" to the seagate 1.5TBs. This was in the identical server, zero changes other than migrating to new drives. When I took them out of service, the Hitachi's had zero SMART errors, and zero reallocated sectors. I find it hard to believe that the Seagate errors are merely a result of increased density vs. poor quality control. I also note that these types of QC problems plagued MAXTOR - and it seems no coincidence to me that the problems are appearing after Seagate bought MAXTOR.
Of equal concern to me is the SMART errors being reported by the drives:
Tray 1 - 14885 ecc hardware errors recovered (no reallocated sectors, brand new drive)
Tray 2 - 54594643 hardware errors recovered (1 reallocated sector)
Tray 3 - 93414255 hardware errors recovered (2 reallocated sectors)
Tray 4 - 1028957 ecc hardware errors recovered (no reallocated sectors)
Tray 5 - 139657452 ecc hardware errors recovered (no reallocated sectors)
I have not listed read errors but they are similarly unusually high.
I have read Seagate's support folks on this board who claim we should ignore SMART errors. I think that is not true: (1) SMART is an industry standard supported by Seagate and the errors are reported by the drive itself not any external software, (2) if the errors are bogus that is a firmware problem/defect, (3) while the errors are generally recoverable (as indicated above) the magnitude of the errors mean that the drives are having problems of some kind and they recovery process is at a minimum slowing down performance, (4) I have never ever seen errors of this magnitude on ANY other drive, and (5) the errors correlate somewhat with reallocated sectors (except tray 5) so it makes me think that the errors are at least indicative of future drive problems.
I have always trusted Seagate - I hope they can restore all of our faith in them soon because it is very quickly evaporating from my perspective.
[Edited in compliance of the community rules and regulations.]
03-08-2009 09:27 AM
The firmware bug that is disclosed by Seagate and fixed by the famous controversial patch is to avoid a lockup caused by logging code. It has nothing to do with errors on the actual medium.
Naively, I am concerned about any hardware errors. I say "naively" because I have no idea what is acceptable and what isn't. The rate of hardware errors that you report startle and concern me. 139 million ECC errors would seem to be enough that one would start to be concerned that some errors would actually go undetected (the ECC codes are only so robust). Tray 5 seems to be more than an order of magnitude worse than tray 4, but all seem to be very high.
Is there a chance that those numbers are misreported?
Do those numbers go up monotonically? At a rate consistent with the totals being reported?
What proportion of the block reads seem to have errors?
03-08-2009 01:50 PM
HughR - no, I do not think the numbers are misreported. I have used other Seagate drives, Samsung spinpoint, and Hitachi and never had anything more than a few errors. If its a misreporting issue, I think the misreporting must be at the drive hardware level. Server is running Linux kernel 2.6.13
I thought that I saw somewhere that the firmware bugs had to do with Linux raid arrays and power-off data loss? Maybe I misremembered that... Anyway, I do not think it is system related unless it is a reporting/SMART issue that is absolutely unique to the 7200.11 drive (which seems very unlikely).
03-10-2009 12:09 AM - edited 03-10-2009 12:10 AM
I've got a ST31500341AS drive with firmware SD1A. The serial number checker says it doesn't need SD1B.
I'm not using it much. I did a bunch of long tests with SeaTools for DOS. I've got it holding perhaps 3 hours of TV recorded by MythTV.
The Ubuntu 8.10 "smartctl -a /dev/sdf" command shows:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
195 Hardware_ECC_Recovered 0x001a 037 031 000 Old_age Always - 170255562
Since that is hard to read, I will point out that the raw number of Hardware_ECC_Recovered is 170,255,562.
The same command on a Hitachi Deskstar T7K500 shows no line with ID 195.
It is interesting that the normalized value is 037 which is better than the worst normalized value (031) and quite a bit better than the normalized threshold (000).
I strongly suspect that 170255562 is not the number of ECC failures. Google shows some hints that others doubt these numbers (certainly not conclusive).
PPS: in half an hour, the number hasn't changed.
03-10-2009 02:04 PM
I just did a long S.M.A.R.T. selftest on the drive mentioned in the post immediately above and no problems showed up.
In fact the Hardware_ECC_Recovered count was unchanged too.