Fri Feb 28 08:37:26 CET 2014
INSTALL_MASK'ing for a better future
So today I was pointed at a funny one:
Apart from being inconsistent with itself this eludes all previous ways to avoid useless files from being installed. The proper response thus looks like this now:
Reading recommendation: FHS
/etc/systemd/system/ntpdate.service.d/00gentoo.confNow instead of being wrongly installed in /usr/lib (whuarghllaaaaaaaawwreghhh!?!$?) there's some config files for systemd bleeding into /etc.
Apart from being inconsistent with itself this eludes all previous ways to avoid useless files from being installed. The proper response thus looks like this now:
INSTALL_MASK="/lib/systemd /lib32/systemd /lib64/systemd /usr/lib/systemd /usr/lib32/systemd /usr/lib64/systemd /etc/systemd"And on the upside this will break udev unless you carefully move config to /etc (lolwat ur no haz EUNICHS system operation?) - which just motivated me to shift everything I can to eudev.
Reading recommendation: FHS
Thu Feb 20 09:32:05 CET 2014
gentoo-x86 to git, round two
After my not-so-good experiments with cvs2git I was pointed at cvsps.
The currently masked 3.13 release (plus the lastest ~arch version of cvs) seems to do the trick quite well.
It throws a handful of warnings about timestamps that appear to be harmless to me.
What I haven't figured out yet is how to "fix" the email addresses, but that's a minor thing.
Take the raw cvs repo as in the first blogpost, then:
To get performance up you'd need a machine with 32GB+ RAM so that you can do that in TMPFS (and don't forget to make /tmp a tmpfs too, because tmpfile() creates lots and lots of temporary files there) - and the tmpfs needs to be >18GB
In theory you can pipe that directly into git-fast-import. To make testing easier I didn't do that..
Throwing everything into git takes "a while" (forgot to time it, about 20 minutes I think):
The result is about 7.2GB git repository and appears to have full history.
Files to play around with:
Raw copy of the CVS repo (~440MB)
The git-fast-importable stream created by cvsps (biiig)
The mangled compressed git repository that results from it (~6GB)
Edit:
The same repo recompressed (~1.7GB)
"git repack -a -d -f --max-pack-size=10g --depth=100 --window=250" takes ~3 CPU-hours and collapses the size nicely. Thanks, Mr.Klausmann!
What I haven't figured out yet is how to "fix" the email addresses, but that's a minor thing.
Take the raw cvs repo as in the first blogpost, then:
$ time cvsps --root :local:/var/tmp/git-test/gentoo-x86-raw/ --fast-export gentoo-x86 > git-fast-export-stream cvsps: NOTICE: used alternate strip path /var/tmp/git-test/gentoo-x86-raw/gentoo-x86/ cvsps: broken revision date: 2003-02-18 13:46:55 +0000 -> 2003-02-18 13:46:55 file: dev-php/PEAR-Date/PEAR-HTML_Common-1.0.ebuild, repairing. [SNIP] real 212m56.219s user 12m11.170s sys 6m59.110sSo this step takes near 3h walltime, and consumes ~10GB RAM. It generates about 17GB of temporary data.
To get performance up you'd need a machine with 32GB+ RAM so that you can do that in TMPFS (and don't forget to make /tmp a tmpfs too, because tmpfile() creates lots and lots of temporary files there) - and the tmpfs needs to be >18GB
In theory you can pipe that directly into git-fast-import. To make testing easier I didn't do that..
Throwing everything into git takes "a while" (forgot to time it, about 20 minutes I think):
Alloc'd objects: 9680000 Total objects: 9675121 ( 190979 duplicates ) blobs : 3020032 ( 158366 duplicates 1389088 deltas of 2989578 attempts) trees : 5150778 ( 32613 duplicates 4633675 deltas of 4709477 attempts) commits: 1504311 ( 0 duplicates 0 deltas of 0 attempts) tags : 0 ( 0 duplicates 0 deltas of 0 attempts) Total branches: 8 ( 3 loads ) marks: 1073741824 ( 4682709 unique ) atoms: 431658 Memory total: 516969 KiB pools: 63219 KiB objects: 453750 KiB pack_report: getpagesize() = 4096 pack_report: core.packedGitWindowSize = 1073741824 pack_report: core.packedGitLimit = 8589934592 pack_report: pack_used_ctr = 7139457 pack_report: pack_mmap_calls = 1976288 pack_report: pack_open_windows = 3 / 9 pack_report: pack_mapped = 2545679911 / 8589934592And then run git gc (warning: Another mem-hungry operation peaking at ~8GB).
The result is about 7.2GB git repository and appears to have full history.
Files to play around with:
Raw copy of the CVS repo (~440MB)
The git-fast-importable stream created by cvsps (biiig)
The mangled compressed git repository that results from it (~6GB)
Edit:
The same repo recompressed (~1.7GB)
"git repack -a -d -f --max-pack-size=10g --depth=100 --window=250" takes ~3 CPU-hours and collapses the size nicely. Thanks, Mr.Klausmann!
Wed Feb 19 07:43:21 CET 2014
Thunderbird - double sending is better sending
So here's something brilliant I've found while debugging some PGP-issues:
So, uhm, there's a multipart-mime mail, with a PGP-encrypted attachment, and then there's a properly quoted HTML attachment, CONTAINING the same PGP attachment BASE64 encoded. Or something. The funny thing is that Thunderbird itself fails to display the body directly, but displays it in the editor window when you reply.
In vino veritas, and tonight I will need lots of veritas to unremember this madness.
0q2CYNVFEz6wXHAGYArfO/F/faOL5L6fQw9f93FurZgx7Y+iR1J7Civaa7LHxQ8h FzstP7BYEhCx2HmEZuDf18htDsTBZAlNVGsI0DMb2wFKudCaI7hXhMHpYBQF/rdZ =3Dw1hZ -- -- END PGP MESSAGE -- -- -- -- -- 070107010101000406000609 Content Type: text/html; charset=ISO 8859 1 Content Transfer Encoding: 8bit <html> <head> <meta content="text/html; charset=ISO 8859 1" http equiv="Content Type"> </head> <body bgcolor="#FFFFFF" text="#000000"> <br> BEGIN PGP MESSAGE <br> Charset: ISO 8859 1 <br> Version: GnuPG v2.0.22 (MingW32) <br> Comment: Using GnuPG with Thunderbird <a class="moz txt link freetext" href="http://www.enigmail.net/">http://www.enigmail.net/</a> <br> <br> hQIMA0dhXCfgRaeBAQ/+P2NCYSVE7vxW742D9eYJmJ/7g7xHSvPFuYvGSZk2gRaJ <br> JoZ98x+TPjSlvYVWuS+Y2Fz04ydhi4vNcK+QAqImVO0nO6dFvxUfmZiERBcYGs4C <br> Lhe+B/I0P/hEDl+Zu/QJ/v+SEcFoXKv2iclrXwWF6RyLlO97iu8UsLYUjLIZ7Y+r <br> YGqphoIdJLfVZ9bb05RIb0ZKnYX5dzunpqu6V6zRpwckWCkos7qBOZ9hfBjaFkvD <br> ZQAoJM78qQ0//vV6qyxSpXXFEFbDZuJjPjjDfIF+qyNbcW657bDHQH2ctcyvdcTf <br>(Modulo some dashes, but you get the idea)
So, uhm, there's a multipart-mime mail, with a PGP-encrypted attachment, and then there's a properly quoted HTML attachment, CONTAINING the same PGP attachment BASE64 encoded. Or something. The funny thing is that Thunderbird itself fails to display the body directly, but displays it in the editor window when you reply.
In vino veritas, and tonight I will need lots of veritas to unremember this madness.
Tue Feb 18 06:14:37 CET 2014
Converting gentoo-x86 to git, first attempt
A first attempt at cvs-to-git conversion of gentoo-x86; not yet complete.
Needs: ~4GB storage for cvs repo, a few GB for temporary files, and a few GB for the git repo
Where possible using tmpfs is recommended as this whole operation is very IO-heavy.
Aquire a complete (server-side) copy of the CVS repo:
Naive cvs2git run on one category to demonstrate that it works in theory:
There's still a lot left to figure out, but this should be enough information to allow others to attempt to do this reliably.
Where possible using tmpfs is recommended as this whole operation is very IO-heavy.
Aquire a complete (server-side) copy of the CVS repo:
mkdir cvs; cd cvs mkdir CVSROOT rsync anoncvs.gentoo.org::vcs-public-cvsroot/gentoo-x86 . -r --statsWIP: Use ferringb's modifications to cvs2git to transform the repo
git clone git://pkgcore.org/git-conversion-toolsI haven't figured out why it fails for me yet, but that would make the whole thing a lot easier.
Naive cvs2git run on one category to demonstrate that it works in theory:
cvs2git --encoding=utf_8 --fallback-encoding=ascii --trunk-only --blobfile=./blob --dumpfile=./dump --username=derp cvs/gentoo-x86/app-emulation/This does work, but it's really slow and doesn't do things like rewrite committer names etc.etc.
cvs2svn Statistics: Total CVS Files: 6569 Total CVS Revisions: 37696 Total CVS Branches: 0 Total CVS Tags: 0 Total Unique Tags: 0 Total Unique Branches: 0 CVS Repos Size in KB: 37135 Total SVN Commits: 11385 First Revision Date: Thu Oct 26 15:02:06 2000 Last Revision Date: Mon Feb 10 06:58:17 2014 Timings (seconds): 6 pass1 CollectRevsPass 0 pass2 CleanMetadataPass 0 pass3 CollateSymbolsPass 1100 pass4 FilterSymbolsPass 0 pass5 SortRevisionsPass 0 pass6 SortSymbolsPass 2 pass7 InitializeChangesetsPass 2 pass8 BreakRevisionChangesetCyclesPass 2 pass9 RevisionTopologicalSortPass 0 pass10 BreakSymbolChangesetCyclesPass 2 pass11 BreakAllChangesetCyclesPass 1 pass12 TopologicalSortPass 3 pass13 CreateRevsPass 0 pass14 SortSymbolOpeningsClosingsPass 0 pass15 IndexSymbolsPass 3 pass16 OutputPass 1121 totalThis creates some temporary files which we feed to git fast-import:
$ git init --bare git-test; cd git-test $ git fast-import --export-marks=../cvs2git-tmp/git-marks.dat <../blob [snip] $ git fast-import --import-marks=../cvs2git-tmp/git-marks.dat <../dump git-fast-import statistics: Alloc'd objects: 70000 Total objects: 40469 ( 253 duplicates ) blobs : 0 ( 0 duplicates 0 deltas of 0 attempts) trees : 29085 ( 253 duplicates 26014 deltas of 26667 attempts) commits: 11384 ( 0 duplicates 0 deltas of 0 attempts) tags : 0 ( 0 duplicates 0 deltas of 0 attempts) Total branches: 1 ( 1 loads ) marks: 1073741824 ( 43450 unique ) atoms: 5742 Memory total: 5454 KiB pools: 2173 KiB objects: 3281 KiB pack_report: getpagesize() = 4096 pack_report: core.packedGitWindowSize = 1073741824 pack_report: core.packedGitLimit = 8589934592 pack_report: pack_used_ctr = 28072 pack_report: pack_mmap_calls = 2 pack_report: pack_open_windows = 2 / 2 pack_report: pack_mapped = 19836762 / 19836762And there's our converted category. Runtime of the cvs2git step is ~1200sec = 20min, the git fast-import steps both take ~5 seconds.
There's still a lot left to figure out, but this should be enough information to allow others to attempt to do this reliably.