09-29-2008 12:23 AM - edited 09-29-2008 12:25 AM
Solved! Go to Solution.
10-17-2008 08:33 AM
You're not alone :/
Oct 17 17:48:33 kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Oct 17 17:48:33 kernel: ata9.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Oct 17 17:48:33 kernel: res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct 17 17:48:33 kernel: ata9.00: status: { DRDY }
Oct 17 17:48:33 kernel: ata9: hard resetting link
Oct 17 17:48:34 kernel: ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Oct 17 17:48:34 kernel: ata9.00: max_sectors limited to 256 for NCQ
Oct 17 17:48:34 kernel: ata9.00: max_sectors limited to 256 for NCQ
Oct 17 17:48:34 kernel: ata9.00: configured for UDMA/133
Oct 17 17:48:34 kernel: ata9: EH complete
Oct 17 17:48:34 kernel: sd 8:0:0:0: [sdf] 2930277168 512-byte hardware sectors (1500302 MB)
Oct 17 17:48:34 kernel: end_request: I/O error, dev sdf, sector 2930271935
Oct 17 17:48:34 kernel: md: super_written gets error=-5, uptodate=0
Oct 17 17:48:34 kernel: raid5: Disk failure on sdf1, disabling device.
Oct 17 17:48:34 kernel: raid5: Operation continuing on 4 devices.
This is happening on newest 2.6.27.1 kernel.
10-17-2008 09:11 AM
Hi,
Crude but avoids the problem - disable write cache on the drives: hdparm -W0 /dev/sd
I have been running with cache disabled for a few weeks now without error.
With the 2.6.24 kernel the error would occur every ~20 mins watching MythTV and recover after 20-30 seconds each time. The drive did not drop out of the RAID array (2 drive mirror using the RAID 5 engine).
With the 2.6.26 kernel the error took a lot longer to occur (good for 12+ hours the first day) but when it hit it did not recover (physical reset required).
Other tests:
As part of the diagnostics I also tried using a SATA card (different chipset to the motherboard) - it did not change the behavior.
Reducing the configuration to a single drive (still RAID but a degraded mirror) did not change the behavior.
Previously I was running 3 x 500GB (Seagate, 32MB cache) and did not have this problem.
This could suggest a timing bug that has existed through several kernel releases or could suggest the 1.5TB drive can wedge itself executing a cache-flush command.
10-17-2008 09:19 AM - edited 10-17-2008 09:35 AM
Tried to disable write cache -> performance drops so much! ~10MiB/drive :/
Have you tried disable ncq with libata.force=noncq kernel switch?
Do you have good grounding on drives (do you have vibration dampeners on hdd)?
Have you tried different power supply?
Those I haven't tried yet.
10-17-2008 09:30 AM
i'm running into the same issues too with seagate 1tb hdds
They keep dropping out of arrays. See my post http://forums.seagate.com/stx/board/message?board.
for logs including smartctl -a /dev/sdx output of errors caused by this.
10-17-2008 09:41 AM
Hi,
The noncq option was only used on older kernels.
I tried setting the ncq depth to 1 but that did not avoid the problem.
Nick
10-17-2008 09:45 AM
2.6.27 kernel has noncq switch
in Documentation/kernel-parameters.txt:
libata.force= [LIBATA] Force configurations.....
* [no]ncq: Turn on or off NCQ.
10-17-2008 09:54 AM
Hi LotoBak,
I think you are seeing a different problem... the cmd from your log was a dma related command. The symptom I am seeing is always triggered by a flush-cache command.
I will follow up with your thread.
Nick
10-18-2008 08:31 PM
I am also having a problem with the ST31500341AS 1.5tb drive but I'm not running Ubuntu. I'm running OS X and I have my drives installed in a Drobo enclosure. The drives exhibit the same kind of problem as described by the OP: freezes/inaccesability for 30 seconds or so. I'm not using this as a system drive but it's causing all kinds of havoc for my data.
The drive checks out perfectly well using Drive Genius 2 (I can't run SeaTools on my Mac Pro). No bad sectors and the Smart data are just fine.
Is there something that is wrong with this hard drive? Drobo support doesn't know enough about the drive to say for sure and some people out there have reported no issues whatsoever! I'm very frustrated with these 1.5TB drives.
10-19-2008 04:01 AM
Having the same problem with the latest Debian 2.6.24 kernel.
Not gonna do much in attempt to figure out what part in the kernel is causing this since I can live with running the drives without cache on.
Out of curiousity, have anyone been getting the same kinds of errors under other OSes than Linux?
©2012 Seagate Technology LLC