Wed Apr 29 05:03:41 CEST 2015
Code Hygiene
Some convenient Makefile targets that make it very easy to keep code clean:
indent makes code pretty, the 'linux' default settings are not exactly what I want, but close enough that I don't care to finetune yet.
Every commit should be properly indented and not cause more warnings to appear!
scan: scan-build clang foo.c -o foo indent: indent -linux *.cscan-build is llvm/clang's static analyzer and generates some decent warnings. Using clang to build (in addition to 'default' gcc in my case) helps diversity and sometimes catches different errors.
indent makes code pretty, the 'linux' default settings are not exactly what I want, but close enough that I don't care to finetune yet.
Every commit should be properly indented and not cause more warnings to appear!
Sat Apr 11 13:06:54 CEST 2015
Almost quiet dataloss
Some harddisk manufacturers have interesting ideas ... using some old Samsung disks in a RAID5 config:
And a while later things like this happen:
I'm not sure who to blame here - the kernel should actively punch out any harddisk that is fish-on-land flopping around like that, the md layer should hate on any device that even looks weirdly, but somehow "just doing a link reset" is considered enough.
I'm not really upset that an old cheap disk that is now ~9 years old decides to have dementia, but I'm quite unhappy with the firmware programming that doesn't seem to consider data loss as a problem ... (but at least it's not Seagate!)
[15343.451517] ata3.00: exception Emask 0x0 SAct 0x40008410 SErr 0x0 action 0x6 frozen [15343.451522] ata3.00: failed command: WRITE FPDMA QUEUED [15343.451527] ata3.00: cmd 61/20:20:d8:7d:6c/01:00:07:00:00/40 tag 4 ncq 147456 out res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [15343.451530] ata3.00: status: { DRDY } [15343.451532] ata3.00: failed command: WRITE FPDMA QUEUED [15343.451536] ata3.00: cmd 61/30:50:d0:2f:40/00:00:0d:00:00/40 tag 10 ncq 24576 out res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [15343.451538] ata3.00: status: { DRDY } [15343.451540] ata3.00: failed command: WRITE FPDMA QUEUED [15343.451544] ata3.00: cmd 61/a8:78:90:be:da/00:00:0b:00:00/40 tag 15 ncq 86016 out res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [15343.451546] ata3.00: status: { DRDY } [15343.451549] ata3.00: failed command: READ FPDMA QUEUED [15343.451552] ata3.00: cmd 60/38:f0:c0:2b:d6/00:00:0e:00:00/40 tag 30 ncq 28672 in res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [15343.451555] ata3.00: status: { DRDY } [15343.451557] ata3: hard resetting link [15343.911891] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [15344.062112] ata3.00: configured for UDMA/133 [15344.062130] ata3.00: device reported invalid CHS sector 0 [15344.062139] ata3.00: device reported invalid CHS sector 0 [15344.062146] ata3.00: device reported invalid CHS sector 0 [15344.062153] ata3.00: device reported invalid CHS sector 0 [15344.062169] ata3: EH completeHmm, that doesn't look too good ... but mdadm still believes the RAID is functional.
And a while later things like this happen:
[ 2968.701999] XFS (md4): Metadata corruption detected at xfs_dir3_data_read_verify+0x72/0x77 [xfs], block 0x36900a0 [ 2968.702004] XFS (md4): Unmount and run xfs_repair [ 2968.702007] XFS (md4): First 64 bytes of corrupted metadata buffer: [ 2968.702011] ffff8802ab5cf000: 04 00 00 00 99 00 00 00 fc ff ff ff ff ff ff ff ................ [ 2968.702015] ffff8802ab5cf010: 03 00 00 00 00 00 00 00 02 00 00 00 9e 00 00 00 ................ [ 2968.702018] ffff8802ab5cf020: 0c 00 00 00 00 00 00 00 13 00 00 00 00 00 00 00 ................ [ 2968.702021] ffff8802ab5cf030: 04 00 00 00 82 00 00 00 fc ff ff ff ff ff ff ff ................ [ 2968.702048] XFS (md4): metadata I/O error: block 0x36900a0 ("xfs_trans_read_buf_map") error 117 numblks 8 [ 2968.702476] XFS (md4): Metadata corruption detected at xfs_dir3_data_reada_verify+0x69/0x6d [xfs], block 0x36900a0 [ 2968.702491] XFS (md4): Unmount and run xfs_repair [ 2968.702494] XFS (md4): First 64 bytes of corrupted metadata buffer: [ 2968.702498] ffff8802ab5cf000: 04 00 00 00 99 00 00 00 fc ff ff ff ff ff ff ff ................ [ 2968.702501] ffff8802ab5cf010: 03 00 00 00 00 00 00 00 02 00 00 00 9e 00 00 00 ................ [ 2968.702505] ffff8802ab5cf020: 0c 00 00 00 00 00 00 00 13 00 00 00 00 00 00 00 ................ [ 2968.702508] ffff8802ab5cf030: 04 00 00 00 82 00 00 00 fc ff ff ff ff ff ff ff ................ [ 2968.702825] XFS (md4): Metadata corruption detected at xfs_dir3_data_read_verify+0x72/0x77 [xfs], block 0x36900a0 [ 2968.702831] XFS (md4): Unmount and run xfs_repair [ 2968.702834] XFS (md4): First 64 bytes of corrupted metadata buffer: [ 2968.702839] ffff8802ab5cf000: 04 00 00 00 99 00 00 00 fc ff ff ff ff ff ff ff ................ [ 2968.702842] ffff8802ab5cf010: 03 00 00 00 00 00 00 00 02 00 00 00 9e 00 00 00 ................ [ 2968.702866] ffff8802ab5cf020: 0c 00 00 00 00 00 00 00 13 00 00 00 00 00 00 00 ................ [ 2968.702871] ffff8802ab5cf030: 04 00 00 00 82 00 00 00 fc ff ff ff ff ff ff ff ................ [ 2968.702888] XFS (md4): metadata I/O error: block 0x36900a0 ("xfs_trans_read_buf_map") error 117 numblks 8fsck finds quite a lot of data not being where it should be.
I'm not sure who to blame here - the kernel should actively punch out any harddisk that is fish-on-land flopping around like that, the md layer should hate on any device that even looks weirdly, but somehow "just doing a link reset" is considered enough.
I'm not really upset that an old cheap disk that is now ~9 years old decides to have dementia, but I'm quite unhappy with the firmware programming that doesn't seem to consider data loss as a problem ... (but at least it's not Seagate!)