Fri Feb 28 08:37:26 CET 2014

INSTALL_MASK'ing for a better future

So today I was pointed at a funny one:
/etc/systemd/system/ntpdate.service.d/00gentoo.conf
Now instead of being wrongly installed in /usr/lib (whuarghllaaaaaaaawwreghhh!?!$?) there's some config files for systemd bleeding into /etc.

Apart from being inconsistent with itself this eludes all previous ways to avoid useless files from being installed. The proper response thus looks like this now:
INSTALL_MASK="/lib/systemd /lib32/systemd /lib64/systemd /usr/lib/systemd /usr/lib32/systemd /usr/lib64/systemd /etc/systemd"
And on the upside this will break udev unless you carefully move config to /etc (lolwat ur no haz EUNICHS system operation?) - which just motivated me to shift everything I can to eudev.

Reading recommendation: FHS

Posted by Patrick | Permalink

Thu Feb 20 09:32:05 CET 2014

gentoo-x86 to git, round two

After my not-so-good experiments with cvs2git I was pointed at cvsps. The currently masked 3.13 release (plus the lastest ~arch version of cvs) seems to do the trick quite well. It throws a handful of warnings about timestamps that appear to be harmless to me.
What I haven't figured out yet is how to "fix" the email addresses, but that's a minor thing.
Take the raw cvs repo as in the first blogpost, then:
$ time cvsps --root :local:/var/tmp/git-test/gentoo-x86-raw/ --fast-export gentoo-x86 > git-fast-export-stream
cvsps: NOTICE: used alternate strip path /var/tmp/git-test/gentoo-x86-raw/gentoo-x86/
cvsps: broken revision date: 2003-02-18 13:46:55 +0000 -> 2003-02-18 13:46:55 file: dev-php/PEAR-Date/PEAR-HTML_Common-1.0.ebuild, repairing.

[SNIP]

real    212m56.219s
user    12m11.170s
sys     6m59.110s
So this step takes near 3h walltime, and consumes ~10GB RAM. It generates about 17GB of temporary data.
To get performance up you'd need a machine with 32GB+ RAM so that you can do that in TMPFS (and don't forget to make /tmp a tmpfs too, because tmpfile() creates lots and lots of temporary files there) - and the tmpfs needs to be >18GB

In theory you can pipe that directly into git-fast-import. To make testing easier I didn't do that..
Throwing everything into git takes "a while" (forgot to time it, about 20 minutes I think):
Alloc'd objects:    9680000
Total objects:      9675121 (    190979 duplicates                  )
      blobs  :      3020032 (    158366 duplicates    1389088 deltas of    2989578 attempts)
      trees  :      5150778 (     32613 duplicates    4633675 deltas of    4709477 attempts)
      commits:      1504311 (         0 duplicates          0 deltas of          0 attempts)
      tags   :            0 (         0 duplicates          0 deltas of          0 attempts)
Total branches:           8 (         3 loads     )
      marks:     1073741824 (   4682709 unique    )
      atoms:         431658
Memory total:        516969 KiB
       pools:         63219 KiB
     objects:        453750 KiB

pack_report: getpagesize()            =       4096
pack_report: core.packedGitWindowSize = 1073741824
pack_report: core.packedGitLimit      = 8589934592
pack_report: pack_used_ctr            =    7139457
pack_report: pack_mmap_calls          =    1976288
pack_report: pack_open_windows        =          3 /          9
pack_report: pack_mapped              = 2545679911 / 8589934592
And then run git gc (warning: Another mem-hungry operation peaking at ~8GB).
The result is about 7.2GB git repository and appears to have full history.

Files to play around with:
Raw copy of the CVS repo (~440MB)
The git-fast-importable stream created by cvsps (biiig)
The mangled compressed git repository that results from it (~6GB)
Edit:
The same repo recompressed (~1.7GB)
"git repack -a -d -f --max-pack-size=10g --depth=100 --window=250" takes ~3 CPU-hours and collapses the size nicely. Thanks, Mr.Klausmann!

Posted by Patrick | Permalink

Wed Feb 19 07:43:21 CET 2014

Thunderbird - double sending is better sending

So here's something brilliant I've found while debugging some PGP-issues:
0q2CYNVFEz6wXHAGYArfO/F/faOL5L6fQw9f93FurZgx7Y+iR1J7Civaa7LHxQ8h
FzstP7BYEhCx2HmEZuDf18htDsTBZAlNVGsI0DMb2wFKudCaI7hXhMHpYBQF/rdZ
=3Dw1hZ
-- --  END PGP MESSAGE     


--  -- --  --  -- 070107010101000406000609
Content Type: text/html; charset=ISO 8859 1
Content Transfer Encoding: 8bit

<html>
  <head>
    <meta content="text/html; charset=ISO 8859 1"
      http equiv="Content Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <br>
         BEGIN PGP MESSAGE      <br>
    Charset: ISO 8859 1 <br>
    Version: GnuPG v2.0.22 (MingW32) <br>
    Comment: Using GnuPG with Thunderbird   <a class="moz txt link freetext" href="http://www.enigmail.net/">http://www.enigmail.net/</a> <br>
     <br>
    hQIMA0dhXCfgRaeBAQ/+P2NCYSVE7vxW742D9eYJmJ/7g7xHSvPFuYvGSZk2gRaJ <br>
    JoZ98x+TPjSlvYVWuS+Y2Fz04ydhi4vNcK+QAqImVO0nO6dFvxUfmZiERBcYGs4C <br>
    Lhe+B/I0P/hEDl+Zu/QJ/v+SEcFoXKv2iclrXwWF6RyLlO97iu8UsLYUjLIZ7Y+r <br>
    YGqphoIdJLfVZ9bb05RIb0ZKnYX5dzunpqu6V6zRpwckWCkos7qBOZ9hfBjaFkvD <br>
    ZQAoJM78qQ0//vV6qyxSpXXFEFbDZuJjPjjDfIF+qyNbcW657bDHQH2ctcyvdcTf <br>
(Modulo some dashes, but you get the idea)

So, uhm, there's a multipart-mime mail, with a PGP-encrypted attachment, and then there's a properly quoted HTML attachment, CONTAINING the same PGP attachment BASE64 encoded. Or something. The funny thing is that Thunderbird itself fails to display the body directly, but displays it in the editor window when you reply.
In vino veritas, and tonight I will need lots of veritas to unremember this madness.

Posted by Patrick | Permalink

Tue Feb 18 06:14:37 CET 2014

Converting gentoo-x86 to git, first attempt

A first attempt at cvs-to-git conversion of gentoo-x86; not yet complete. Needs: ~4GB storage for cvs repo, a few GB for temporary files, and a few GB for the git repo

Where possible using tmpfs is recommended as this whole operation is very IO-heavy.
Aquire a complete (server-side) copy of the CVS repo:
mkdir cvs; cd cvs
mkdir CVSROOT
rsync anoncvs.gentoo.org::vcs-public-cvsroot/gentoo-x86 . -r --stats
WIP: Use ferringb's modifications to cvs2git to transform the repo
git clone git://pkgcore.org/git-conversion-tools
I haven't figured out why it fails for me yet, but that would make the whole thing a lot easier.

Naive cvs2git run on one category to demonstrate that it works in theory:
cvs2git --encoding=utf_8 --fallback-encoding=ascii 
        --trunk-only --blobfile=./blob --dumpfile=./dump 
        --username=derp cvs/gentoo-x86/app-emulation/
This does work, but it's really slow and doesn't do things like rewrite committer names etc.etc.
cvs2svn Statistics:

Total CVS Files:              6569
Total CVS Revisions:         37696
Total CVS Branches:              0
Total CVS Tags:                  0
Total Unique Tags:               0
Total Unique Branches:           0
CVS Repos Size in KB:        37135
Total SVN Commits:           11385
First Revision Date:    Thu Oct 26 15:02:06 2000
Last Revision Date:     Mon Feb 10 06:58:17 2014

Timings (seconds):

   6   pass1    CollectRevsPass
   0   pass2    CleanMetadataPass
   0   pass3    CollateSymbolsPass
1100   pass4    FilterSymbolsPass
   0   pass5    SortRevisionsPass
   0   pass6    SortSymbolsPass
   2   pass7    InitializeChangesetsPass
   2   pass8    BreakRevisionChangesetCyclesPass
   2   pass9    RevisionTopologicalSortPass
   0   pass10   BreakSymbolChangesetCyclesPass
   2   pass11   BreakAllChangesetCyclesPass
   1   pass12   TopologicalSortPass
   3   pass13   CreateRevsPass
   0   pass14   SortSymbolOpeningsClosingsPass
   0   pass15   IndexSymbolsPass
   3   pass16   OutputPass
1121   total
This creates some temporary files which we feed to git fast-import:
$ git init --bare git-test; cd git-test
$ git fast-import --export-marks=../cvs2git-tmp/git-marks.dat <../blob                    
[snip]
$ git fast-import --import-marks=../cvs2git-tmp/git-marks.dat <../dump 
git-fast-import statistics:

Alloc'd objects:      70000
Total objects:        40469 (       253 duplicates                  )
      blobs  :            0 (         0 duplicates          0 deltas of          0 attempts)
      trees  :        29085 (       253 duplicates      26014 deltas of      26667 attempts)
      commits:        11384 (         0 duplicates          0 deltas of          0 attempts)
      tags   :            0 (         0 duplicates          0 deltas of          0 attempts)
Total branches:           1 (         1 loads     )
      marks:     1073741824 (     43450 unique    )
      atoms:           5742
Memory total:          5454 KiB
       pools:          2173 KiB
     objects:          3281 KiB

pack_report: getpagesize()            =       4096
pack_report: core.packedGitWindowSize = 1073741824
pack_report: core.packedGitLimit      = 8589934592
pack_report: pack_used_ctr            =      28072
pack_report: pack_mmap_calls          =          2
pack_report: pack_open_windows        =          2 /          2
pack_report: pack_mapped              =   19836762 /   19836762
And there's our converted category. Runtime of the cvs2git step is ~1200sec = 20min, the git fast-import steps both take ~5 seconds.
There's still a lot left to figure out, but this should be enough information to allow others to attempt to do this reliably.

Posted by Patrick | Permalink