03-24-2009 11:02 AM - edited 11-10-2009 11:01 AM
There have been lots of reports of 7200.11 drives dropping out of RAID arrays. Typically, the drive reappears when the system is power cycled.
If you have similar problems but are not using RAID, see 7200.11 non-RAID drives going offline: unified thread
The purpose of this thread is to try to figure out what is going on. The method of attack is to gather reports in one place and to encourage more complete reports.
I previously encouraged people to add to Large file data collapse in Intel Raid-5 with 702.11 500GB HDs but perhaps that had too specific a title.
What to include in your report:
Details about your system:
What did you observe that seemed wrong?
When a disk hangs, what do you need to do the get it going again? Reboot? Power cycle? Tell the OS to try again?
Have you tried any diagnostic programs? What did they find?
03-24-2009 11:03 AM - edited 12-12-2009 02:09 PM
I will try to add interesting links in this message as I find them.
matt_callaghan reports in message 267 of this thread that Seagate has offered to exchange 7200.12 drives for his 7200.11 drives. This should solve his RAID problems. We will see how willing they are to extend this offer to others. http://forums.seagate.com/stx/board/message?board.
Large file data collapse in Intel Raid-5 with 702.11 500GB HDs: http://forums.seagate.com/stx/board/message?board.
Really really long thread about the firmware upgrade. Some of the postings are about RAID but finding them takes a lot of reading: Seagate Barracuda 7200.11 Firmware Issues
I've been having trouble with SeaTools for DOS. Perhaps on one machine and not another: http://forums.seagate.com/stx/board/message?board.
These threads talk about modifying the drive's time limit on error recovery. This may well be relevant: a long delay caused by a drive's error recovery procedure may fool the RAID system into thinking that the drive is dead.
03-24-2009 11:21 AM - edited 08-05-2009 10:19 PM
What causes the RAID problems? There could well be more than one cause.
[Added 2009 August 6] Intel Storage Matrix driver version 8.9 seems to cause some problems that 8.8 does not. See message 98 in this thread and other messages after it. There remain a number of reports for which this cannot be the problem.
The drives appear to become unresponsive for a time. Then the RAID controller decides that the drive is dead.
What causes the drive to appear dead? Perhaps error recovery. Perhaps a firmware crash. Perhaps the host adapter and the drive get out of sync.
How long is the pause? Does it only end when the drive is power-cycled?
The diagnostic messages provided don't seem to give any useful detail. Still, do report any diagnostic information.
Has anyone observed a likely pause during the operation of a diagnostic program? If so, what did the program report?
Some people have reported problems with Drive Self Test on these drives with SeaTools for DOS, but there is no obvious link with the "RAID drop-out problem". http://forums.seagate.com/stx/board/message?board.
Do all your drives have this problem or just a subset? Some reports suggest SDxx and CCxx firmware differ. Others suggest SD15 differs from SD1A.
Although RAID makes the problem painfully evident, the enormous firmware rant thread includes a few reports of non-RAIDed disks disappearing temporarily. I don't remember if the reports mention what controller is being used. (I expect that RAID users, on average, are more sophisticated and give better problem reports.) Don't assume that the underlying problem has anything to do with RAID.
03-24-2009 03:08 PM - edited 03-24-2009 03:09 PM
What did you observe that seemed wrong? A few disks keeps falling out of the RAID5 and in RAID0+1 when I switched. Intel Matrix manager marks it as Failed. However I can rebuild it. After a few days, it drops out again...
When a disk hangs, what do you need to do the get it going again? Reboot? Power cycle? Tell the OS to try again? I come back and it tells me either Raid has been degraded in Windows or it can't find the OS cause I lost 2 drives. Note: I was able to recover from Raid0+1 with 2 failed drives by rebuilding them using Intel Matrix manager on a separate hard disk outside the raid. Thank god, I didn't lose it!
Have you tried any diagnostic programs? What did they find? SeaTools 2.13b DOS, smart tripped on the drive that kept dropping out and errors were found and repaired. However, the other drive that dropped out had no errors detected.
So far I've sent in 3 7200.11 drives back to Seagate mainly because of errors and disks keep dropping out of the Raid. It's getting real frustrating!!!
03-25-2009 06:00 PM
Disk drive models: Four 1.5TB Seagate 7200.11's, ST31500341AS
Disk drive firmware: CC1J, CC1J, CC1J, LC1A
Disk Controllers: Areca ARC-1230. Sweet buttery goodness!
Motherboard: Gigabyte GA-EZX58-UD4P
What seems wrong? First, I have two arrays on the controller. One using 500gb Samsung drives which has no problems, and one using 4x 1.5TB 7200.11's. The 7200's are in RAID 5. I cannot watch moves from the drives, as the movie will stutter/skip/buffer for 5-20 seconds at a time, every minute. As a result, every movie stored on this array that I'd like to watch, must be copied to another array and then deleted afterwards. It can copy huge files with no problem. I have only noticed the stuttering while watching movies and occasionally playing lossless MP3's with roughly 1000kbs.
When a disk hangs? The RAID controller has not marked any drives as bad yet, the issue seems to resolve itself if I'm patient. however when watching a movie there is no time to have patience while your movie buffers.
Diagnostic programs? Checkdisk is fine and the event viewer is clean. After checking Seagate's website thoroughly, the LC1A firmware is the latest.
03-25-2009 08:36 PM
I have a similar issue with ES.2 drives, they are running the latest firmware SN06
I have 10 blade servers that have 2 drives each, the setup is RAID 1 using Intel codebase.
Randomly RAID is degraded with missing harddrives.
Warm reboot of the OS does not revive the array.
Cold reboot of the OS does.
Not sure if this is related or not to your issue. Not sure if these drives are similar or not in design.
03-29-2009 08:41 AM
i have 2 units of the 7200.11 500GB drives having SD15 firmware running raid 0. One locked up and was sent away to seagate to be fixed. Drive returned and plugged in back into the computer and now the onboard raid controller no longer detects the fixed drive as part of the array. Raid array is still offline.
Contacted seagate support and told them about the problem and reply was its a problem with the raid controller, not their drive. All they did was update the firmware. This really sucks the kind of support you get from them when the problem with the drives was created by Seagate in the 1st place.
Is it still possible for me to retrieve data that's in the raid 0 array? Bios detects the drive which had its firmware updated and unlocked by Seagate but its no longer part of the raid array.
Motherboard is Asus P4, intel chipset with onboard raid controller. I'm stuck and out of ideas in how i can get the array running again as i have valuable information that needs to be retrieved.
03-30-2009 03:21 AM - edited 03-30-2009 03:27 AM
In RAID0 half of your data is stored on each drive. There is NO redundancy, so if you loose one of your drives, all of your data is gone. RAID1 makes a mirror image which lets you loose either of the two drives and all the data is still there, of course you dont get a write speed improvement as both drives write all of the same data during writes. Reads can be twice as fast because each drive could read only half of what's required.
In anycase, I am also having problems with my RAID-6 config and 8x 7200.11 drives.
4 SD1A 1000340AS drives
4 CC1F 1000333AS drives
Adaptec 31205 controller
I know I shouldn't mix and match model numbers but I've seen this same problem happen when I built my array from either of the group of 4 identical drives.
Random lockups (most noticable during RAID rebuids or expansions) which cause the "Aborted commands" measurement for the affected drives go up. I have emailed this problem to support and they closed my ticket without even reading it. I've since re-opened a ticket to tell them to look at my old one but i'm not holding out much hope.
03-30-2009 05:11 AM
My Raid0+1 failed again!!! This time, it's the other 2 drives that failed. This is so frustrating!
From what it looks like, Seagate isn't doing anything to fix the Raid issue. Maybe its time to move on.