[HW] Slack - hda lost interrupt
posledni dobou mam cim dal tim vetsi problem s hda
lspci -v
Kód:
00:00.0 Host bridge: ALi Corporation M1531 [Aladdin IV] (rev b3)
Subsystem: ALi Corporation M1531 [Aladdin IV]
Flags: bus master, slow devsel, latency 32
00:02.0 ISA bridge: ALi Corporation M1533 PCI to ISA Bridge [Aladdin IV] (rev b4)
Flags: bus master, medium devsel, latency 0
00:03.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
Subsystem: Edimax Computer Co. EN-9130TX
Flags: bus master, medium devsel, latency 64, IRQ 11
I/O ports at 6400 [size=256]
Memory at e0000000 (32-bit, non-prefetchable) [size=256]
Capabilities: [50] Power Management version 2
00:05.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
Subsystem: Compex FN22-3(A) LinxPRO Ethernet Adapter
Flags: bus master, medium devsel, latency 64, IRQ 10
I/O ports at 6500 [size=256]
Memory at e0001000 (32-bit, non-prefetchable) [size=256]
Capabilities: [50] Power Management version 2
00:0b.0 IDE interface: ALi Corporation M5229 IDE (rev 20) (prog-if fa)
Flags: bus master, medium devsel, latency 64, IRQ 10
I/O ports at f000 [size=16]
hdparm -i /dev/hda
Kód:
/dev/hda:
Model=ST34311A, FwRev=8.01, SerialNo=5BF2A168
Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
RawCHS=8944/15/63, TrkSize=0, SectSize=0, ECCbytes=4
BuffType=unknown, BuffSize=256kB, MaxMultSect=16, MultSect=off
CurCHS=8944/15/63, CurSects=8452080, LBA=yes, LBAsects=8452080
IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 *mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4
AdvancedPM=no WriteCache=enabled
Drive conforms to: device does not report version: 1 2 3 4
dmesg
Kód:
Uniform Multi-Platform E-IDE driver Revision: 7.00beta4-2.4
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ALI15X3: IDE controller at PCI slot 00:0b.0
PCI: Assigned IRQ 10 for device 00:0b.0
ALI15X3: chipset revision 32
ALI15X3: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:pio, hdd:pio
hda: ST34311A, ATA DISK drive
hdc: CD-540E, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
hda: attached ide-disk driver.
hda: host protected area => 1
hda: 8452080 sectors (4327 MB) w/256KiB Cache, CHS=526/255/63
hdc: attached ide-cdrom driver.
hdc: ATAPI 40X CD-ROM drive, 128kB Cache
Uniform CD-ROM driver Revision: 3.12
Partition check:
hda: hda1 hda2 hda3
v lilo.conf mam
Kód:
image = /boot/vmlinuz-2.4.29
root = /dev/hda1
label = Linux-2-4-29
append = "ide=nodma"
read-only
zkousel jsem i hdparm -d0 /dev/hda, v BIOSu ruzne povypinat DMA a zapnout PIO vymenit kabel,...
syslog mam plny hlasek jako :
Kód:
Aug 2 04:52:02 gw1 kernel: hda: lost interrupt
Aug 2 04:59:54 gw1 kernel: hda: lost interrupt
Aug 2 05:11:52 gw1 kernel: hda: lost interrupt
Aug 2 05:13:48 gw1 kernel: hda: lost interrupt
Aug 2 13:12:12 gw1 kernel: hda: lost interrupt
Aug 2 13:12:12 gw1 kernel: hda: multwrite_intr: status=0x51 { DriveReady SeekComplete Error }
Aug 2 13:12:12 gw1 kernel: hda: multwrite_intr: error=0x00 { }
Aug 2 13:12:12 gw1 kernel: hda: multwrite_intr: status=0x51 { DriveReady SeekComplete Error }
Aug 2 13:12:12 gw1 kernel: hda: multwrite_intr: error=0x00 { }
Aug 2 13:12:12 gw1 kernel: hda: status timeout: status=0xd0 { Busy }
Aug 2 13:12:12 gw1 kernel: hda: no DRQ after issuing WRITE
Aug 2 13:20:14 gw1 kernel: hda: lost interrupt
Aug 2 13:32:07 gw1 smartd[144]: Device: /dev/hda, ATA error count increased from 41 to 43
Aug 2 13:50:12 gw1 kernel: hda: lost interrupt
Aug 2 15:21:48 gw1 kernel: hda: multwrite_intr: status=0x51 { DriveReady SeekComplete Error }
Aug 2 15:21:48 gw1 kernel: hda: multwrite_intr: error=0x00 { }
Aug 2 15:21:48 gw1 kernel: hda: multwrite_intr: status=0x51 { DriveReady SeekComplete Error }
Aug 2 15:21:48 gw1 kernel: hda: multwrite_intr: error=0x00 { }
Aug 2 15:21:48 gw1 kernel: hda: multwrite_intr: status=0x51 { DriveReady SeekComplete Error }
Aug 2 15:21:48 gw1 kernel: hda: multwrite_intr: error=0x00 { }
Aug 2 15:21:48 gw1 kernel: hda: multwrite_intr: status=0x51 { DriveReady SeekComplete Error }
Aug 2 15:21:48 gw1 kernel: hda: multwrite_intr: error=0x00 { }
Aug 2 15:21:48 gw1 kernel: ide0: reset: success
Aug 2 15:48:39 gw1 smartd[144]: Device: /dev/hda, ATA error count increased from 43 to 47
co je to presne za MB nevim (maximalne zjistit pres BIOS string) - P55C 200, 3C509 ISA , 2x Realtek 8139 PCI, 1x ATAPI CDROM, 1x HDD ST34311A - kazdy na svem IDE kanale, 2x DIMM celkove 128MB
je to zapnuty 24/7 (router, DNS, DHCP, mail+anti spam,www, ...) - jen nevim jestli je to HW problem nebo jestli je moznost to spravit novejsim jadrem....
Re: [HW] Slack - hda lost interrupt
Skus, ci to robi aj s inym diskom.
Re: [HW] Slack - hda lost interrupt
Citace:
Aug 2 15:48:39 gw1 smartd[144]: Device: /dev/hda, ATA error count increased from 43 to 47
pošli sem ještě výpis
Citace:
zkousel jsem i ..., v BIOSu ruzne povypinat DMA a zapnout PIO ...
To v linuxu nemá moc smysl, ten jakmile čapne řadič disků do svých spárů, tak na bios zvysoka s*e. Příkladem budiž bezproblémové provozování různých velkých disků (co zvládne řadič, tj. LBA32) na deskách pro pentium. ;-) PIO a DMA má cenu nastavovat jenom přes hdparm.
Re: [HW] Slack - hda lost interrupt
Citace:
Původně odeslal David Jaša
PIO a DMA má cenu nastavovat jenom přes hdparm.
Nie je to uplne pravda - na 1 serveri som sa divil ze preco ide disk len v UDMA33 - myslel som, ze tam niekto dal 40-zilovy kabel. Ked som tam isiel osobne, tak som pozrel do BIOSu a tam bolo DMA uplne vypnute. Takze Linux nastavil aspon UDMA33, ale viac uz nezvladol. Po zapnuti v BIOSe funguje aj UDMA100.
Re: [HW] Slack - hda lost interrupt
pokud to pribyva neriskuj. ja takhle menim disky rovnou na linuxovych serverech. temhle diskum sverim maximalne cachovani dat...
Re: [HW] Slack - hda lost interrupt
2Rainbow: Je možný, že přepnutí do "PIO" módu znamenalo nějaký safe/legacy mód. IMHO potom by k tomu to UDMA2 (ATA33) sedělo.
Re: [HW] Slack - hda lost interrupt
no SMART rve jak o zivot - mam z smartd udelany mail info :
The following warning/error was logged by the smartd daemon:
Device: /dev/hda, ATA error count increased from 52 to 54
For details see host's SYSLOG (default: /var/log/messages).
Kód:
smartctl -a /dev/hda
smartctl version 5.36 [i486-slackware-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Model Family: Seagate U4 family
Device Model: ST34311A
Serial Number: 5BF2A168
Firmware Version: 8.01
User Capacity: 4 327 464 960 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 4
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Wed Aug 2 22:34:17 2006 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (3120) seconds.
Offline data collection
capabilities: (0x1b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 10) minutes.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0008 111 099 000 Old_age Offline - 33216423
3 Spin_Up_Time 0x0006 097 097 000 Old_age Always - 0
4 Start_Stop_Count 0x0013 100 100 020 Pre-fail Always - 90
5 Reallocated_Sector_Ct 0x0013 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x0009 075 060 030 Pre-fail Offline - 4335450923
10 Spin_Retry_Count 0x0013 100 100 090 Pre-fail Always - 0
12 Power_Cycle_Count 0x0013 099 099 020 Pre-fail Always - 1368
197 Current_Pending_Sector 0x0010 100 100 000 Old_age Offline - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
SMART Error Log Version: 1
ATA Error Count: 54 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 54 occurred at disk power-on lifetime: 9572 hours (398 days + 20 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
01 51 28 5f 0f 0c e0 Error:
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
Error 53 occurred at disk power-on lifetime: 9572 hours (398 days + 20 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
01 51 30 00 45 3c e0 Error:
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
Error 52 occurred at disk power-on lifetime: 9572 hours (398 days + 20 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
01 51 80 00 c1 00 e0 Error:
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
Error 51 occurred at disk power-on lifetime: 9572 hours (398 days + 20 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
01 51 80 80 86 00 e0 Error:
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
Error 50 occurred at disk power-on lifetime: 9572 hours (398 days + 20 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
01 51 80 00 8b 00 e0 Error:
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
Device does not support Selective Self Tests/Logging
jinak uz jsem na tom routeru provozoval vic disku a i MB - bohuzel si nepamatuju vsechny modely - vetsi problemy mam az s timhle
Kernel 2.4.29, MB string 05/27/98-Ali-154x-2a5kib09c-00
Re: [HW] Slack - hda lost interrupt
No zjavne je problem s prenosom dat cez kabel - otazka je, ci je zly disk alebo nie...
btw. pouzivaj [code], prvy post som ti upravil do citatelnej podoby
Re: [HW] Slack - hda lost interrupt
nj - jsem si zvyknul na tlacitka phpBB... takze upraveno
kabel jsem uz menil..disk snad spatny neni...zkousel jsem googlit(pfff http://www.sysopt.com/forum/archive/...p/t-57098.html -
Citace:
Point being this: The ALi "1543" south bridge contains an ALi "5229" IDE controller unit, rev. 20h or 32 decimal, that was designed to be UDMA33 capable, but was MUCH later found to cause all kinds of problems at that speed. I have one of those too, on an earlier board using the Aladdin IV north.
), zda Ali IV IDE nema problem s DMA...ono tedy spis se zda, ze cela ta deska je jeden problem...ale do doby nez sezenu napr. nejakou i430TX MB bych to potreboval nejak upravit/opravit
podle bios stringu by to mel byt Biostar M5ATA
stejne myslim, ze tyhle desky nejsou vubec staveny na 24/7 provoz...zvlaste kdyz od toho clovek chce nejake intenizvnejsi I/O operace ... :(
Re: [HW] Slack - hda lost interrupt
Ano, M1543 ma vselijake problemy, ale driver v Linuxe by to mal mat vsetko osetrene.
Naozaj si najprv over, ci to robi/nerobi ten disk, kym budes menit dosku...
Re: [HW] Slack - hda lost interrupt
co treba tohle ?
Kód:
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0007 117 117 021 Pre-fail Always - 4675
4 Start_Stop_Count 0x0032 100 100 040 Old_age Always - 338
5 Reallocated_Sector_Ct 0x0033 199 199 140 Pre-fail Always - 1
7 Seek_Error_Rate 0x000b 200 200 051 Pre-fail Always - 0
9 Power_On_Hours 0x0032 073 073 000 Old_age Always - 20233
10 Spin_Retry_Count 0x0013 100 100 051 Pre-fail Always - 0
11 Calibration_Retry_Count 0x0013 100 100 051 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 177
194 Temperature_Celsius 0x0022 111 253 000 Old_age Always - 39
196 Reallocated_Event_Count 0x0032 199 199 000 Old_age Always - 1
197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0012 200 200 000 Old_age Always - 1
199 UDMA_CRC_Error_Count 0x000a 200 253 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0009 200 155 051 Pre-fail Offline - 0
SMART Error Log Version: 1
ATA Error Count: 10 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 10 occurred at disk power-on lifetime: 571 hours (23 days + 19 hours)
When the command that caused the error occurred, the device was doing SMART Offline or Self-test.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 9f 94 30 e0 Error:
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
00 00 25 00 00 08 00 00 46d+00:46:50.108 NOP [Abort queued commands]
16 00 30 00 00 98 94 00 46d+00:46:50.108 RECALIBRATE [RET-4]
00 00 25 00 00 08 00 00 46d+00:46:50.108 NOP [Abort queued commands]
16 00 30 00 00 88 94 00 46d+00:46:50.108 RECALIBRATE [RET-4]
00 00 35 00 00 48 00 00 46d+00:46:50.108 NOP [Abort queued commands]
Error 9 occurred at disk power-on lifetime: 571 hours (23 days + 19 hours)
When the command that caused the error occurred, the device was doing SMART Offline or Self-test.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 9f 94 30 e0 Error:
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
00 00 25 00 00 08 00 00 46d+00:46:48.358 NOP [Abort queued commands]
16 00 30 00 00 88 95 00 46d+00:46:48.358 RECALIBRATE [RET-4]
00 00 25 00 00 08 00 00 46d+00:46:48.358 NOP [Abort queued commands]
00 00 35 00 00 08 00 00 46d+00:46:48.358 NOP [Abort queued commands]
16 00 a7 00 00 20 53 00 46d+00:46:48.358 RECALIBRATE [RET-4]
Error 8 occurred at disk power-on lifetime: 571 hours (23 days + 19 hours)
When the command that caused the error occurred, the device was doing SMART Offline or Self-test.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 f0 9f 94 30 e0 Error:
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
00 00 25 00 00 f0 00 00 46d+00:46:46.558 NOP [Abort queued commands]
00 00 25 00 00 00 01 00 46d+00:46:46.558 NOP [Abort queued commands]
16 00 30 00 00 88 92 00 46d+00:46:46.558 RECALIBRATE [RET-4]
00 00 25 00 00 00 01 00 46d+00:46:46.558 NOP [Abort queued commands]
16 00 30 00 00 88 8f 00 46d+00:46:46.558 RECALIBRATE [RET-4]
Error 7 occurred at disk power-on lifetime: 571 hours (23 days + 19 hours)
When the command that caused the error occurred, the device was doing SMART Offline or Self-test.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 f8 9f 94 30 e0 Error:
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
00 00 25 00 00 f8 00 00 46d+00:46:44.808 NOP [Abort queued commands]
00 00 25 00 00 00 01 00 46d+00:46:44.808 NOP [Abort queued commands]
16 00 30 00 00 88 92 00 46d+00:46:44.808 RECALIBRATE [RET-4]
00 00 25 00 00 00 01 00 46d+00:46:44.808 NOP [Abort queued commands]
16 00 30 00 00 88 8f 00 46d+00:46:44.808 RECALIBRATE [RET-4]
Error 6 occurred at disk power-on lifetime: 571 hours (23 days + 19 hours)
When the command that caused the error occurred, the device was doing SMART Offline or Self-test.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 9f 94 30 e0 Error:
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
00 00 25 00 00 00 01 00 46d+00:46:43.058 NOP [Abort queued commands]
16 00 30 00 00 88 92 00 46d+00:46:43.058 RECALIBRATE [RET-4]
00 00 25 00 00 00 01 00 46d+00:46:43.058 NOP [Abort queued commands]
16 00 30 00 00 88 8f 00 46d+00:46:43.058 RECALIBRATE [RET-4]
00 00 25 00 00 00 01 00 46d+00:46:43.058 NOP [Abort queued commands]
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 90% 572 372282527
# 2 Short offline Completed: read failure 90% 572 372282527
# 3 Conveyance offline Completed without error 00% 711 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.