06-26-2009 10:12 PM
Because of my very poor english :-(
> I wonder why it is stopping at 90%. Is this with smartmontools? Under Linux? Which distro & version?
Yes, smartmontools. Yes linux. Gentoo, no version, uptodate.
> If it is under Linux, could you look at the output of dmesg (kernel logging) to see if there are suspicious messages about the drive?
No message. Just stay at 90% remaining, after hours...
(I'm not the first one, you can see that on google)
> Are you using this drive in a hardware RAID array? (I don't actually know if the Intel controller + Linux is capable of hardware RAID.)
Software raid, no hardware. The motherboard has a raid mode, but not really hardware, so I prefer my software one.
> Have you done any motherboard BIOS updates that are suggested?
I didn't do any BIOS update, because when I bought it, the seller was supposed to do all of that, ckeching harware and update bios. The BIOS date is December 2008. Not sure I'll find something newer, but I can try.
PS : for your information, I have a ticket open with seagate support already. Their first answer was :
"If the drive has failed our diagnostic Seatools utility or has proven to be defective, here is a web link where you can replace the drive provided the drive is still under warranty"
I was afraid by having 2 drives defectives (but working very well except smartd selftest) and loosing all my work during this week, installing my server, and having to wait for weeks to continue if I return them.
Their second answer :
"you said that your drives were not completing in our Seatools application. It is strange to see two drives failing however I have seen it before. If possible I would like you to download the Seatools for Dos and test the drive. As long as you have an Intel based machine (non Mac) the application will run. Below is a link to Seatools."
But this tool can't recognize my HD...
06-27-2009 06:55 AM
What does your drive's SMART log say? I've been looking at wolfgangr's log (from the first message of this thread) and assuming that yours was the same. That's silly, even though you said "Same problem here", your log is probably different.
My own struggles with SeaTools for DOS are documented in this thread: http://forums.seagate.com/stx/board/message?board.
How many drives are misbehaving for you? In message #3 you said that you have two ST31000528AS drives. Are they both not working?
Do you have another machine that you could try the drives in? Then, perhaps, SeaTools for DOS might work.
It might be good to collect similar reports and add pointers to them here.
Here's WolfganR's report in bugzilla for Red Hat Linux 11 (what I run on my desktop, but not with one of these drives): https://bugzilla.redhat.com/show_bug.cgi?id=503344
He also reported it on a the fedora mailing list. In his bugzilla entry he points at two other reports: http://www.mail-archive.com/freebsd-hackers@freebs
and http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=5
I doubt that you had two drives fail in the same way unless there is a systematic design error, for example, a firmware bug. wolfgangr is using the an Intel controller too, and smartmontools. I think that the other reports are as well. It would be good to experiment, changing one of those factors at a time.
06-27-2009 09:13 AM
> What does your drive's SMART log say? I've been looking at wolfgangr's log (from the first message of this thread) and assuming that yours was the same. That's silly, even though you said "Same problem here", your log is probably different.
The logs are very similar with the first message :
# smartctl -l selftest /dev/sdb
smartctl version 5.38 [x86_64-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 102 -
# 2 Short offline Completed without error 00% 98 -
# 3 Extended offline Aborted by host 90% 89 -
# 4 Short offline Aborted by host 90% 75 -
# 5 Short offline Completed without error 00% 40 -
# smartctl -l selftest /dev/sda
smartctl version 5.38 [x86_64-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Self-test routine in progress 90% 99 -
# 2 Short offline Completed without error 00% 95 -
# 3 Extended offline Aborted by host 90% 86 -
# 4 Extended offline Aborted by host 90% 86 -
# 5 Conveyance offline Completed without error 00% 73 -
# 6 Short offline Aborted by host 60% 73 -
# 7 Extended offline Aborted by host 90% 73 -
# 8 Short offline Completed without error 00% 72 -
# 9 Short offline Aborted by host 90% 72 -
#10 Extended offline Aborted by host 90% 38 -
#11 Extended offline Aborted by host 90% 37 -
#12 Short offline Completed without error 00% 37 -
> How many drives are misbehaving for you? In message #3 you said that you have two ST31000528AS drives. Are they both not working?
I have 2 identical drives, on soft raid, bought at the same time. Both give me the same problem : long selftest neverpass 90% remaining, not detected by dos seatool.
> Do you have another machine that you could try the drives in? Then, perhaps, SeaTools for DOS might work.
I could try that, but not very simply (my others old PCs doesn't have SATA).
> I doubt that you had two drives fail in the same way
So do I, that's why I don't want to loose all my work returning them to seagate.
> unless there is a systematic design error, for example, a firmware bug.
Possible, but no update available on seagate site (imho).
> wolfgangr is using the an Intel controller too, and smartmontools. I think that the other reports are as well. It would be good to experiment, changing one of those factors at a time.
Of course.
I'll see if I can do something in this way.
06-27-2009 10:56 PM
I wonder if some time-out is happening. For example, is the driver or the drive shutting down after a fixed amount of "inactivity" (ignoring the fact that testing is going on)?
This idea is based on a report about a problem with testing an external drive with a USB interface.
One way to get around this might be to have a task periodically access the disk while testing is going on. I don't know if, say, "dd if=/dev/sda count=1 of=/dev/null" would keep a drive awake. If so, then this would be useful, running in parallel with the test:
while sleep 60
do
dd if=/dev/sda count=1 of=/dev/null
done
Still another thing to try might be testing from a live CD of another distro of Linux. I used to use knoppix for this but seem to have gravitated to Ubuntu. Of course there are many others available.
06-27-2009 11:58 PM
> I wonder if some time-out is happening
It could be an idea, and I'm try now, but I think that's not possible because of all services already started
rc-status | grep started | wc -l
36
Impossible there is no activity during 60 seconds...
> Still another thing to try might be testing from a live CD of another distro of Linux. I used to use knoppix for this but seem to have gravitated to Ubuntu. Of course there are many others available.
I can try that of course. But I'm nearly sure that it's not due to gentoo.
I'll try that when I have a moment to shutdown the server again.
06-28-2009 08:19 AM
I should have explained why I asked about a time-out.
So many of your tests seem to be "Aborted by host" after 90%. Why is it so often the same percentage?
Perhaps the percentage is a gross approximation and this isn't a strong correlation.
Or perhaps something is happening "like clockwork" -- a timeout of some kind. One such timeout might be a timer for entering some kind of reduced power mode.
I had not realized that your drives were in service while you were testing. That makes it all the more imteresting to try testing without any concurrent activity. Since SeaTools for DOS won't work for you, a live Linux CD would be useful as a test platform.
I don't understand all the IDE/ATA drive settings, but perhaps hdparm -Z is useful. Or something like it.
Good luck. I'm kind of running out of ideas.
06-28-2009 10:53 AM
The percentage starts out at 90 percent remaining, even the first time one looks at the result as soon as the test starts. It never changes from that. The timeout idea is interesting, the disk is very lightly used. I see the disk access light blinking perhaps once or twice a minute. I should check to see if the disk has any powersave mode engaged (as soon as I remember how to do that. It has been years since I explicitly set those spin-down power save modes on any drive.) All the drives in that box are at factory defaults.
I do suspect this bug has something to do with the low lever drivers. One web page I found mentioned the problem going away on his bsd or linux system when he switched to a different driver mode via a bios setting. I assume it set a flag where the sata controller chip acted like an earlier model.
08-07-2009 11:49 AM
WolfgangR's Fedora Bugzilla report has been closed "CLOSED CANTFIX". Not surprising. https://bugzilla.redhat.com/show_bug.cgi?id=50334
Anybody make any progress on this?
11-09-2009 03:39 AM
Same problem here on one of two Seagate 500G drives in a Linux software RAID 1 array when running an extended offline self-test using smartd.
This drive gets stuck at 90%:
Model Family: Seagate Barracuda 7200.12
Device Model: ST3500410AS
Firmware Version: CC34
This drive completes the test OK:
Model Family: Seagate Barracuda 7200.11
Device Model: ST3500320AS
Firmware Version: SD15
SATA controller: ATI Technologies Inc SB600 Non-Raid-5 SATA
This is on a moderately busy web & mail server, test running at a relatively quiet time but still activity going on.
Dave
11-09-2009 11:52 AM
©2012 Seagate Technology LLC