06-13-2009 04:17 AM - last edited on 06-13-2009 08:59 AM by BradC
FYI, the RAID card I'm using is the Highpoint 4322, using Intels processor.
I got another answer from Seagate stating that SSC is OFF on the CCxx firmware, and that it is not the problem. Of course, they did not gave any other path to search at.
Highpoint opened another case, and I hope they will tear this down directly with Seagate.
Whatever, i'm more encline to trust Highpoint than seagate, where every support answer sounds like.
HughR, unfortunately, I haven't been able to find a way to determine if SSC is off or on on the drive. This would require low level S-ATA driver tweak that i'm not smart enough to do, not saying that I also have a life and no time for that.
I just bought 8 drives from Seagate that I can't use just because they did not "tune it" for raid... Tuning does not mean it's not working.
FUY, I'm trying, by the mean of my company, to have a French or European manager to answer those questions as, again, this sounds like a big issue to me. I hope doing so will have them find a real solution...
[Edited in compliance of the community rules and regulations.]
06-14-2009 07:33 AM - edited 06-16-2009 07:10 AM
-- EDITED 6/16/09 --
This does not appeared to have fixed my problems after alll. I will leave the original message here for historical purposes, but for the record, I am still getting errors from Windows.
----- Original Post -----
Ok, a post from Kaliena gave me some testing ideas... See:
I'm using multiple ST31500341AS drives with an LSI SAS3800x controller is JBOD (no RAID) mode.
When formatted at full capacity, 1397.14 GB, I start getting inconsisten error messages and write failures. Windows Disk Manager starts labeling the disk as "At Risk" as the drive is obviously reporting problems. Unlike Kaliena, I was able to format the volume just fine -- but it was reporting errors immediately after formatting.
1: I tried Kaliena's approach deleted all volumes, created a smaller volume (Y: ) of 1013.76 GB and left the remaining capacity unpartitioned. The new 1013.76 GB volume works FINE!! No errors, reasonably fast writes and reads (anywhere from 45-130 MB/sec) with LARGE test files (200 GB). It even passes the SeaTools Windows generic diagnostics
2: Then, in order to verify that this was the problem, I created a single additional partition (Z: ) in the remaining 407.12 GB space -- it formatted without any errors. However, as soon as the format was complete and the volume mounted, Windows Disk Manager started reporting the Z: volume "At Risk" -- the Y: volume was still "Healthy" -- but was no longer mounted. I was able to copy the same 200 GB file onto the Z: volume, but Windows was still complaining that the volume was "At Risk".
3: Delete the Z: partition, Deactivate Disk, Reactivate Disk -- ok, as expected, Z: volume is gone and Y: volume is back and working fine. Y: volume is "Healthy" in Windows Disk Manager.
4: Try something a little different... I used Disk Manager's "Extend Volume" command to grow the Y: volume into the remaining (now unused) 407.12 GB space. Extension command is successful, but again,the Y: drive starts reporting itself as "At Risk" -- clearly errors are being thrown again. Could NOT even complete the large file copy... I couldn't find a way to undo this, so I deleted both the original and the extended partitions for the Y: volume.
5: Ok, time to prove that the drive mechanism is not physically defective/broken... Using the SAME server, controller card, array chassis, array slot, etc. -- created a NEW single 1013.76 GB volume (Y: ) on the drive. Remaining 407.12 GB is unused. No surprise, Y: volume works fine.
So -- do you think my LSI SAS3800x has the same problem with > 1TB size as the nVidia controllers? LSI does not use nVidia chipsets...
Is this a Seagate problem or an LSI problem??
Any other suggestions for testing???
Has anyone else with RAID problems tried using < 1TB on the 1.5 TB drives for their RAID sets?
BTW -- my hardware config:
- Server: HP Proliant DL380 G3
- HBA: LSI SAS3800x
- Array: Norco 1240DS JBOD chassis (Infiniband/SAS)
- OS: Windows 2003 Server
- Seagate ST31500341AS (8x CC1H)
06-14-2009 07:59 AM
(Interestingly, I was the one to give Kaliena the solution, but I never guessed it would be the key to making your setup work.)
It seems very odd that file-system formatting would work but general use would not (true for Kaliena too). It suggests that at some level the OS drivers are working but not at other levels. I don't know about Windows drivers, but it sure sounds like the Windows Driver architecture is badly designed (in Linux, if a driver works for block access, it should work for any block access).
It sure looks like a problem for LSI to address. I'd be surprised if you were the first to hit this.
(The reason that I've pointed out the nVidia driver issue on this forum so often is that it is so bad: the symptoms don't point at the problem, Windows Update won't fix it, and the problem is not widely announced. Your experience seems to parallel this, only worse.)
06-16-2009 07:01 AM
Unfortunately, I have bad news.
I thought that the ST31500341AS ( 1.5TB ) formatted down to 1TB was going to run reliably. But that no longer seems to be the case. Last night, when checking the server, I noticed that the Seagate drive I am testing went to "At Risk" state in the Windows Disk Manager. So, obviously it is throwing errors...
I checked the Windows Event Viewer... It seems that I have MANY of these error messages in my Event Viewer logs. The same error message has been posted almost exactly every 11 seconds, starting from 7:48pm last night until 9:32pm. It also stopped by itself.
Just before this flurry of messages, I got what looks like 3 related errors logged:
It is no longer reporting any error messages at this time -- I did nothing to change the configuration... Server was experiencing "normal" load at the time (which is moderate email and light fie sharing). There is a Hitachi 250GB drive connected to the same controller in an adjacent array slot -- no errors coming from that drive.
Does this give anyone any insight to what might be happening??
06-20-2009 06:57 AM - edited 06-20-2009 06:59 AM
Just checked the Drobo website as a new hardware is out. Also tried to check which drive was compatible wit it.
I found interessting things I will quote :
Article ID 0235: Do Drobo and DroboPro work with Seagate 1.5TB drives?
Do Drobo and DroboPro work with Seagate 1.5TB drives?
Data Robotics customers have had good results with some Seagate 1.5TB drives and less success with others. We recommend that you explore the lists below of Seagate 1.5TB drives that have (and have not) undergone Data Robotics’ testing and qualifying process. In addition, check the recommendations about Seagate firmware updates, also described below.
Data Robotics has verified that the following Seagate 1.5TB drives work smoothly with Drobo and DroboPro. These qualified drives’ model numbers, associated part numbers, and firmware versions are as follows:
Model Number: ST31500341AS
Part Numbers: 9JU138 -300 and 9JU138 -336
Firmware version: SD1A
Data Robotics recommends that you avoid using Seagate 1.5TB drives with firmware versions that we have not qualified, including:
SD15, SD16, SD17, SD18, SD19, LC1A and CC1J.
Data Robotics has explicitly disqualified firmware version CC1H.
This is another proof that 1.5To drive is versatile and CC1H firmware IS special and not working as it should !!
06-21-2009 11:40 PM
What did you observe that seemed wrong? A few disks keeps falling out of the RAID0. Intel Matrix manager marks it as Failed. However I can rebuild it. After a few days, it drops out again...
When a disk hangs, what do you need to do the get it going again? Reboot? Power cycle? Tell the OS to try again? I come back and it tells me either Raid has been degraded in Windows or it can't find the OS cause I lost a drives.
I was unable to run diagnostics since they couldn't see the individual drives (since they are in a raid array). But I did get messages in windows about s.m.a.r.t. errors and that the drive was going to fail. I'm not sure if this is the right place to post.. but it seemed like my experience was similar to others and I wanted to report it. I will try to contact Seagate support soon (Even though my s/n is not listed as bad).
06-22-2009 10:37 AM
Thanks for your useful post.
I don't know if your drive is in the 7200.11 family or how MX15 relates to Seagate firmware versions. I'm not saying that they are not related.
It would seem to be surprisingly dumb if better diagnostic tools were not available for checking the health of a RAID drive while in service. Certainly I can do SMART Drive Self Test under Linux while the drive is in service (although I've not tried it in hardware RAID contexts). RAID installations are often ones where there exist the expertise and need for deeper diagnostic tools.
What SMART errors were reported? Were they usefully specific?
I didn't understand your response to the stock question "When a disk hangs, what do you need to do the get it going again?". Do the drives come back only after a power cycle?
The Serial Number checker is for a different bug, not the one you are experiencing. Don't worry about it beyond what you've already done.
06-22-2009 10:54 AM - last edited on 06-22-2009 11:32 AM by AlanM
I have good news for you. I'm still working with seagate support team to try to understand WHY the heck this drive is not working as it should. Well. Support team, whis is not helping much, just asked me to STOP ASKING !
Dear Mr.As stated before, there is not problems with CC1H. There is no firmware updates, that is the latest update for that drive. If you have any other questions about something else I would be glad to answer those. But again there is no problems with CC1H.Forums are written by the customers, those are not Seagate statements.Regards,Taylor
I hope every one of you guys have an opened case to the support ?
If not, please, do so, and ask for engineering team escalation on this problem !
As long as I know, people having SD1A firmware have NO PROBLEM, while a lot people with CCxx do. Maybe we don't have the same problem, but it rally seem to be related to CCxx firmware.
(Edited: Please do not evade the profanity filter.)
06-22-2009 10:57 AM
PRBERG's drive is a Maxtor 1TB drive...
Given the model designation, it sounds like it is simply a re-labeled ST31000340AS, but who knows...
I find it odd that Seagate would have a separate firmware tree for Maxtor drives, but who knows -- we already know they have 3 trees for their OWN drives (SDxx, LCxx, CCxx)...
What a mess...
06-22-2009 11:24 AM
People are having trouble with SDxx drives too. I just did a quick scan of the first page of this thread to confirm this.
You should probably complain to support about drive problems that you are experiencing. Just asking for an updated firmware is easy to deflect with the response "there is none".
When I've been a support person, one of the difficulties is dealing with people who want particular (wrong) solutions rather just present me with the problem.
+1 for the idea that everyone experiencing a problem that hasn't been solved should report it to Seagate Support. If Seagate Support sees enough of these problems they should figure out that a solution is necessary. Furthermore, more data usually helps engineers zero in on the problem.