Wed Apr 29 05:03:41 CEST 2015

Code Hygiene

Some convenient Makefile targets that make it very easy to keep code clean:
scan:
        scan-build clang foo.c -o foo

indent:
        indent -linux *.c
scan-build is llvm/clang's static analyzer and generates some decent warnings. Using clang to build (in addition to 'default' gcc in my case) helps diversity and sometimes catches different errors.

indent makes code pretty, the 'linux' default settings are not exactly what I want, but close enough that I don't care to finetune yet.

Every commit should be properly indented and not cause more warnings to appear!

Posted by Patrick | Permalink

Sat Apr 11 13:06:54 CEST 2015

Almost quiet dataloss

Some harddisk manufacturers have interesting ideas ... using some old Samsung disks in a RAID5 config:
[15343.451517] ata3.00: exception Emask 0x0 SAct 0x40008410 SErr 0x0 action 0x6 frozen
[15343.451522] ata3.00: failed command: WRITE FPDMA QUEUED
[15343.451527] ata3.00: cmd 61/20:20:d8:7d:6c/01:00:07:00:00/40 tag 4 ncq 147456 out
                        res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)                                                                                                                                                                                            
[15343.451530] ata3.00: status: { DRDY }
[15343.451532] ata3.00: failed command: WRITE FPDMA QUEUED
[15343.451536] ata3.00: cmd 61/30:50:d0:2f:40/00:00:0d:00:00/40 tag 10 ncq 24576 out
                        res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)                                                                                                                                                                                            
[15343.451538] ata3.00: status: { DRDY }
[15343.451540] ata3.00: failed command: WRITE FPDMA QUEUED
[15343.451544] ata3.00: cmd 61/a8:78:90:be:da/00:00:0b:00:00/40 tag 15 ncq 86016 out
                        res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)                                                                                                                                                                                            
[15343.451546] ata3.00: status: { DRDY }
[15343.451549] ata3.00: failed command: READ FPDMA QUEUED
[15343.451552] ata3.00: cmd 60/38:f0:c0:2b:d6/00:00:0e:00:00/40 tag 30 ncq 28672 in
                        res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)                                                                                                                                                                                            
[15343.451555] ata3.00: status: { DRDY }
[15343.451557] ata3: hard resetting link
[15343.911891] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[15344.062112] ata3.00: configured for UDMA/133
[15344.062130] ata3.00: device reported invalid CHS sector 0
[15344.062139] ata3.00: device reported invalid CHS sector 0
[15344.062146] ata3.00: device reported invalid CHS sector 0
[15344.062153] ata3.00: device reported invalid CHS sector 0
[15344.062169] ata3: EH complete
Hmm, that doesn't look too good ... but mdadm still believes the RAID is functional.

And a while later things like this happen:
[ 2968.701999] XFS (md4): Metadata corruption detected at xfs_dir3_data_read_verify+0x72/0x77 [xfs], block 0x36900a0
[ 2968.702004] XFS (md4): Unmount and run xfs_repair
[ 2968.702007] XFS (md4): First 64 bytes of corrupted metadata buffer:
[ 2968.702011] ffff8802ab5cf000: 04 00 00 00 99 00 00 00 fc ff ff ff ff ff ff ff  ................
[ 2968.702015] ffff8802ab5cf010: 03 00 00 00 00 00 00 00 02 00 00 00 9e 00 00 00  ................
[ 2968.702018] ffff8802ab5cf020: 0c 00 00 00 00 00 00 00 13 00 00 00 00 00 00 00  ................
[ 2968.702021] ffff8802ab5cf030: 04 00 00 00 82 00 00 00 fc ff ff ff ff ff ff ff  ................
[ 2968.702048] XFS (md4): metadata I/O error: block 0x36900a0 ("xfs_trans_read_buf_map") error 117 numblks 8
[ 2968.702476] XFS (md4): Metadata corruption detected at xfs_dir3_data_reada_verify+0x69/0x6d [xfs], block 0x36900a0
[ 2968.702491] XFS (md4): Unmount and run xfs_repair
[ 2968.702494] XFS (md4): First 64 bytes of corrupted metadata buffer:
[ 2968.702498] ffff8802ab5cf000: 04 00 00 00 99 00 00 00 fc ff ff ff ff ff ff ff  ................
[ 2968.702501] ffff8802ab5cf010: 03 00 00 00 00 00 00 00 02 00 00 00 9e 00 00 00  ................
[ 2968.702505] ffff8802ab5cf020: 0c 00 00 00 00 00 00 00 13 00 00 00 00 00 00 00  ................
[ 2968.702508] ffff8802ab5cf030: 04 00 00 00 82 00 00 00 fc ff ff ff ff ff ff ff  ................
[ 2968.702825] XFS (md4): Metadata corruption detected at xfs_dir3_data_read_verify+0x72/0x77 [xfs], block 0x36900a0
[ 2968.702831] XFS (md4): Unmount and run xfs_repair
[ 2968.702834] XFS (md4): First 64 bytes of corrupted metadata buffer:
[ 2968.702839] ffff8802ab5cf000: 04 00 00 00 99 00 00 00 fc ff ff ff ff ff ff ff  ................
[ 2968.702842] ffff8802ab5cf010: 03 00 00 00 00 00 00 00 02 00 00 00 9e 00 00 00  ................
[ 2968.702866] ffff8802ab5cf020: 0c 00 00 00 00 00 00 00 13 00 00 00 00 00 00 00  ................
[ 2968.702871] ffff8802ab5cf030: 04 00 00 00 82 00 00 00 fc ff ff ff ff ff ff ff  ................
[ 2968.702888] XFS (md4): metadata I/O error: block 0x36900a0 ("xfs_trans_read_buf_map") error 117 numblks 8
fsck finds quite a lot of data not being where it should be.
I'm not sure who to blame here - the kernel should actively punch out any harddisk that is fish-on-land flopping around like that, the md layer should hate on any device that even looks weirdly, but somehow "just doing a link reset" is considered enough.

I'm not really upset that an old cheap disk that is now ~9 years old decides to have dementia, but I'm quite unhappy with the firmware programming that doesn't seem to consider data loss as a problem ... (but at least it's not Seagate!)

Posted by Patrick | Permalink