Dec 03 (Sat), 2005, 05:37
Purveyor of Banjosity
Unhappy note, but an intersting fellow from the circle I occasionally run in passed away recently. Not much to say, the freeman article pretty well covers it. About the best you'll find of marty on the web, one of the more interesting tracks, although tracking down a full version of a Celia album he was in might be wise. Joys of a weekly house 'jam' session in our apt and bar conversion (mandolin, banjo, guitar, bass, occasionally keyboard and dulcimer), always was entertaining (and educational) to see a master at work with their instrument.
So, RIP marty :-/
Nov 21 (Mon), 2005, 21:01
confcache-0.3.3
Whee, another confcache bugfix release. Version 0.3.3 is up in usual location. Adding it to portage shortly, since the bugfixes against 0.3 have been pretty minor, plus 0.3.x will sit for a bit while some refactoring occurs to better handle doing tricks to avoid invalidation of the cache.
Many thanks to Ed Catmur, Christian Lemke, and Diego 'Flameeyes' Pettenò for feedback and bug reports.
Nov 16 (Tue), 2005, 10:14
confcache-0.3
Confcache, version 0.3. Fixed up a bunch of ass biter issues in dealing with the sandbox when trying to start it up (versus confcache being called within a sandbox env); pretty confident it now works fine under userpriv, and standalone as of version 0.2
Wound up having to introduce a lovely little bit of twisty code- certain sandbox vars defined prior to starting the sandbox result in the sandbox not appending the usual sane defaults (allowing write to /dev/tty for example). The solution? Well, we shift those vars to the side, spawn an indirection script that appends them to the sandbox default, then execs the actual configure call.
Kind of nuts, but does the trick.
Aside from the quicky version 0.2 release, 0.3 was released a while after. Closed up some nasty gaps in staleness detection. At this point, I don't know any way to trip up the detection code- still get some occasional failures (mono for example), but the failures so far haven't been confcache's fault, issue has been that the package doesn't properly support --cache (which confcache uses).
Aside from that, rewrote the portage integration patch- back out the original if you have it, and raid the v3 from the usual place.
Addressing a couple of questions-
Why was configure complaining about stale build_alias/host_alias?
Bug, namely. Version 0.3 now digs through the args passed to configure, and (should) properly handle staleness detection for that facet
of configure caches.
Why isn't configure caching all configure tests?
Because the configure script has to be written to cache results. A
lot of default checks are cached, but a lot of package specific checks aren't written to cache (yell at upstream in other words).
Why isn't confcache working for ebuild xyz?
The current integration is active only for econf (it's the only place I can force it).
You can do a nasty little trick however to have confcache step in in the meantime. Add
function ./configure() { $CONFCACHE ./configure; }to your /etc/portage/bashrc. Shouldn't break stuff, but if it does it will be obvious- additionally, don't blame me. It's an evil hack, you've been duly warned :) .
Why is the global cache invalidated every run when I have cpufreq enabled?
Cpufreq changes the proc frequency (good thing), this info is exposed in /proc/cpuinfo, a file configure scripts check. The md5 changes
every time the proc changes frequency, thus invalidating the cache each time (bad thing). I'm aware of it, and intend to address this in
0.4.
Offhand, probably end of the weekend for 0.4- need to introduce a framework for registering triggers to override the normal md5 checks for certain files, so it'll require a bit of expansion to confcache.
Nov 15 (Tue), 2005, 02:19
portage parallel fetching/compiling
Continuing to pillage year old patches out of the ebd portage branch, parallel-fetch.patch is available.
Roughly, this patch adds FEATURES="parallel-fetch", which when enabled (and merging more then one package), splits the buildplan execution into two processes- one building, the other doing strictly fetching.
Equivalent to (emerge -f target &> /dev/null < /dev/null &); emerge target , just saner (verifies distlocks are available for example).
Testing and feedback, as always, is desired and appreciated.
Nov 14 (Mon), 2005, 19:05
confcache-0.1.1
Got a little ticked off at the runtime cost of configure calls for package updates, so I wound up rewriting confcache from the ebd portage line as a stand alone tool (score, I can use it locally for development), and split a patch for portage stable integrating it as FEATURES="confcache". Ebuild, and portage patch are available in my devspace.
Rough example of the improvement run time wise, is a reduction of php's configure call from 80s to 25s. About what ebd was getting, if I recall correctly.
For those unfamiliar with what this feature/code is actually doing- autoconf scripts have the option to cache settings in a file, and reuse that file next time around. Save the result of checks/tests, basically, which can result in pretty massive speed ups sometimes.
The problem is that autoconf doesn't offer a way to quickly/cleanly verify the cache- for portage, this is a major issue since a glibc update should most likely invalidate the cache; you don't want configure tests using old results that are no longer correct.
Enter the sandbox. :)
Via the sandbox, we can track what the configure script accesses for it's tests. Store a list of the files checked, and their md5 checksums, and you have a way to check if the cache is stale or not. It's actually a bit more complicated (need to track env vars also), but that's the basic jist of it.
Either way, looking for testers and feedback- the gain from it should be obvious, so I'll avoid the usual pleading to get people to test stuff ;)
For the non-gentoo folk, well, grab our sandbox, install it, and install confcache and you will be able to use it to manage a global cache for configure calls. Still useful, although y'all will have a few more hoops to jump through ;)
Oct 23 (Sun), 2005, 03:53
diffball releases
Hola kiddies
Received a request earlier, which wound up with diffball 0.7 being released, and a large amount of lib work being done so that people can get access to a rather simple high level api.
At this point, the lib exposes the exact same set of runs that compromises the differ binaries algo, and exposes a reconstruct func that, again, pretty much comprises the patcher binaries internal reconstruction calls. An example of using libdiffball for doing delta compression between two files in memory, dumping the results to another array (note that the code is being anal about dealloc, hence the structure)-
So what exactly is that doing? Libdiffball uses internally the io lib cfile that I created for it, that's intended to transparently wrap access to mem/compressed files/raw files behind a common structure and set of functions. Pretty much have the usual open/read/write/lseek/close/tell, just prefixed with a c; internally, if it's a bzip2 stream, it does the decompression on it's own, and feeds data to users.#include#include #include // returns 0 on success, non zero on failure int diff_it(char *src_array, int src_len, char *ver_array, int ver_len, char **out_array, int *out_len) { cfile src_cfh, ver_cfh, out_cfh; int ret = -1; if((ret = copen_mem(&src_cfh, src_array, src_len, NO_COMPRESSOR, CFILE_RONLY)) == 0) { if((ret = copen_mem(&ver_cfh, ver_array, ver_len, NO_COMPRESSOR, CFILE_RONLY)) == 0) { if((ret = copen_mem(&out_cfh, NULL, 0, NO_COMPRESSOR, CFILE_WONLY)) == 0) { if((ret = simple_difference(&src_cfh, &ver_cfh, &out_cfh, 0, 0, 0, 0)) == 0) { *out_array = out_cfh.data.buff; *out_len = out_cfh.data.write_end; } else { free(out_cfh.data.buff); } cclose(&out_cfh); } cclose(&ver_cfh); } cclose(&src_cfh); } return ret; }
To head off concerns about buffers of buffers, preferred access to cfile structs is via a page api, that exposes the underlying buffer, so as to cut down on unneccessarily cread memcpy's (for example).
The copen_mem above is configuring the cfile struct so that it's working directly from the passed in array- for CFILE_RONLY, obviously, doesn't modify the buffer. :) I'm opening the out_cfh with a null buffer, so that it allocs it's own.
From there... the simple_difference call is pretty straightforward. The trailing zero args are just telling it to use defaults for a couple of tunables (in this case, I have no interest in fooling with them).
What's all the nonsense involving pulling a buffer? Well, that's because I haven't yet thought up a good, *clean* api for raiding the buffer + size out; till that's in place, management of data.buff (namely free'ing), is left to the caller- this is also why I choose this example, since would welcome suggestions for it. Remaining chunk of the code is pretty much just free'ing stuff on the way out, error or otherwise, then returning either the error or zero.
That's the beast. Might seem a bit complex, but aside from the CFILE_WONLY + free design decision, it's about as simple as I can make it. The api reconstruct equivalent, 'simple_reconstruct' is roughly on par for straightforwardness-
Again, pretty much just having the cfile handles passed in, a couple of options that most people will leave set to zero (use internal sane defaults). Nifty thing to note is that patch_cfh is an array of cfile ptrs- multiple patches applied in a single run (hard max of 256 atm), so via those funcs above you've got transparent decompression of patches/sources, multiple patches in a single run, and at least read capabilities on 6 different patch formats- xdelta, fdtu, bsdiff, and gdiff being the main external formats others may know.int simple_reconstruct(cfile *src_cfh, cfile *patch_cfh[], unsigned char patch_count, cfile *out_cfh, unsigned int force_patch_id, unsigned int max_buff_size);
So, enough whoring of that. Needed an easy way to test the new code from above, so I also wrote out some quick python bindings via pyrex- pydiffball-0.1 being the result. Exposed functionality is cfile creation (both memory and usual file based), simple_difference, and simple_reconstruct. The python module is quite a bit more friendly, since it'll generate the cfile instances on it's own for all api calls, assuming it's an on disk path passed in when the arg is a basestring derivative.
Hopefully people find some use in it; personally, the python bindings will at least make my life a helluva lot easier for some extra code I occasionaly work with.
Oct 12 (Wed), 2005, 23:59
backport of cache rewrite to stable
Hola all. Patch still is a bit raw, due to the fact the integration chunk of it is 24 hours old, but my cache rewrite sitting in 2.1 and 3.x I've backported to 2.0.53_rc5 for anyone interested. Thanks to antarus, zmedico, fuzzyray, and Bastian Balthazar Bux for testing the hell out of it. Initial patches were rather raw.
Did some further improvements to it, lifting a couple of classes out of 3.x so as to cut down on unnecessary io overhead. Stats via FuzzyRay (Paul Varner), a 233mhz pentium with 256 megs and an amazing disk access speed of 9.74MB/sec. All tests are with 2.0.53_rc5 as base.
emerge --metadata with existing full cache
| Version | real | user | sys |
|---|---|---|---|
| vanilla | 9m30.580s | 4m9.230s | 0m17.850s |
| patched | 5m43.876s | 2m39.730s | 0m13.610s |
emerge --metadata without cache
| Version | real | user | sys |
|---|---|---|---|
| vanilla | 35m30.164s | 26m49.250s | 1m23.410s |
| patched | 11m59.595s | 5m6.890s | 0m36.930s |
Not too shabby. Stats source available here. Patch is available here, and further feedback would be appreciated.
Be aware if you test it out, you're going to need to run a emerge --metadata after applying the patch to /usr/lib/portage/pym. CVS $PORTDIR users, you're stuck running a regen.
Oct 01 (Sat), 2005, 19:42
portage 2.0.53 release candidate
Jason added a portage-2.0.53 release candidate p.masked to the tree earlier today- notable changes since .52
- EAPI awareness. Unset EAPI in an ebuild is EAPI=0, so no massive tree wide changes needed (grandfathered the existing tree in).
- Support for upcoming rsync metadata changes, transparent switch over.
- Glep31 checks, and enforcement.
- CDEPEND metadata key is no longer looked at by the resolver (wasn't used).
- SetUID/SetGID installed files will have their o+w yanked automatically on merging now, rather then complaining about it.
- No more has_version/portageq in global scope. Do it, and portage intentionally pukes on your ebuild during sourcing.
- prelink md5 calculation optimization via zmedico.
- Good collection of bug fixes, plus cleanup of a couple of messages emerge spits out so it's saner.
Not the full list (generate the diff if you're after it, or raid the ChangeLog), but that's the notables. Please test it :)
Sep 27 (Tue), 2005, 13:10
python weirdness
A while back I wrote a tokenizing generator func for some core portage rewrite string processing; nothing incredibly fancy, just chunks up strings dependant on splitters past in. This serves as the basis for chunking up and processing depset syntax. Example being: "dev-util/diffball bsdiff? ( dev-util/bsdiff )".
Now, I thought the func was fairly tight, speedy enough. Simple little sucker-
def iter_tokens(s, splitter=" "):
"""iterable yielding of splitting of a string"""
pos = 0
l = len(s)
while pos < l:
if s[pos] in splitter:
pos += 1
continue
next_pos = pos + 1
while next_pos < l and s[next_pos] not in splitter:
next_pos+=1
yield s[pos:next_pos]
pos = next_pos + 1
python -m timeit -s 'x="mamma said knock you out\n mama \t knocked\t you out\n";x*=10000;' 'list(iter_tokens(x, " \t\n"))'
Using that func, is 365ms per run (roughly).
Now granted, there is room for improvement, but at first glance, the only tricky spot is linear search of splitter- using a set there actually is a bit slower, due to overhead of creating said set.
What massively gets my goat is that it's actually pretty damn slow in most real world usage compared to a seemingly primitive, and butt ugly (imo) aproach.
from itertools import ifilter
def iter_tokens(s, splitter=" "):
l = len(splitter)
if l > 1:
if l == 3 and " " in splitter and "\t" in splitter and "\n" in splitter:
return iter(s.split())
for x in splitter[:-1]:
s = s.replace(x, splitter[-1])
return ifilter(None, s.split(splitter[-1]))
python -m timeit -s 'x="mamma said knock you out\n mama \t knocked\t you out\n";x*=10000;' 'list(iter_tokens(x, " \t\n"))';
Is faster. Much faster. Clocks in at 38.7ms. Without the check for " \t\n", it clocks in at 61ms.
If it were a single split, still the replace hack is faster (although the difference between the two is minor enough). So... that's weird, and bugged the hell out of me last night :)
Zac Medico's comments about the yield instantiating and returning another string instance probably are fairly on par. Either way, it's not intuitive to me :)
Final comment on it, downside to the faster approach is that you have to do the processing up front, rather then JIT as the generator does- in the case of the code that uses this, it's not an issue though. Haven't dug into the underlying python source to figure out why there's such a difference, so if someone knows kindly tell me so I spend my time doing something else ;)
Update: Tweaked the replace func and updated it's runtime since it was brain dead from experimentation at 3am, saner/simpler version of the replace loop is courtesy Andy Dustman for the replace cleanup. The check for " \t\n" is a quicky addition from me, mainly since that is even faster.
Note also that the faster approach I don't have issue with, I'm just rather amazed at the major difference in runtime for the two approaches.
Sep 24 (Sat), 2005, 21:33
Upcoming rsync cache changes
Commited a variation of a patch I posted in this thread to stable earlier today. Covers two things-
Detection of $PORTDIR/metadata/cache format- currently portage stable uses an ordered list of (implicit) key -> value; this makes it essentially impossible to ever remove a key, and makes addition of keys have a hard limit. Bad. So... the new format is an old format I hacked out a year back, flat_hash, (explicit) key -> value unordered. Nothing hugely fancy, but does allow us to jam stuff in without issue.
Increased flexibility requires us to version the cache entry in some way, so that we know if entries are incompatible with the version of portage reading it. Additionally, we should have been versioning the expected ebuild env (how it will be called, what funcs are available, etc) long ago. EAPI is that; additions/extensions to the ebuild spec result in a new EAPI standard, for example, src_configure addition is part of what EAPI=1 is. With EAPI in the cache, we can know whether or not the local portage version is capable of properly handling that cache entry. A higher EAPI (later portage release) may add new metadata; any portage version that doesn't support that EAPI must in some way mark the entry as "I know of it, but I can't use this ebuild".
So... in that jumble, essentially the rsync metadata/cache auto-detection allows us to move over to a more flexible format without causing cache horkages every time we change stuff (as has happened often enough), and EAPI allows us to to version those entries, so that EAPI aware portage versions can protect themselves from doing something stupid.
That and a lot of emerge --metadata cleanups got stuck in, hopefully killing off any remaining failures during cache transfer ;)
Also dropped root requirement for emerge --metadata. That always bugged the heck out of me, since it wasn't needed...
Sep 19 (Mon), 2005, 11:55
mailbox archiving
Finally got around to writing a quicky filter to yank old msgs from maildir, and slap them into mbox. Nothing massively fancy, just a quicky script since I prefer my archives in mbox, so if anyone is interested it's available here.
Does the trick for my needs, will miss a few weird headers, but doesn't lose msgs via it (or shouldn't) ;)
Aug 24 (Wed), 2005, 20:58
build implementation in rewrite
One plus to this rewrite is that a lot of the base code is being lifted from earlier experiments, and attempts. Ebd was implemented a long while back, and is in use, and is still pretty much the same beast just with a large collection of cleanup of ebuild*sh original code (full removal of clean, package, and help support, since those should be python side). The real work ebd wise has been cleaning up the original python side handler, the ebuild_processor that talks to the running bash ebuild-daemon.sh.
The first implementation of it was pretty much pipe/comm methods written, then doebuild slapped into it and hidden away (this version is what exists in the 2.1 snapshot for example). For those who don't know, doebuild is the python side chunk of code that handles all ebuild phase execution; hell, it handles fetch and digest. It's a giant function that is rather nasty with a massive amount of if statements tagged in to special case specific phase requirements. Tiz a beast. Tagging it into the ebuild processor wasn't incredibly clean, but was done so that breaking up doebuild could be done internally, rather then breaking any api people expect out of portage.
Unfortunately, that didn't go so well, which is why it exists in 2.1 still. The problem is that it's bound pretty heavily to Config, which holds all profile/user settings, and does other nasty things involving globals and features. Need to break Config up, break the globals, then you can eye breaking up doebuild. Which pretty much is what's occuring in the rewrite, and subject of this entry.
A build operation class implementation was added which abstracts the setup->install phases, wrapping it into a class is now in the rewrite, so building (mplayer) for example is possible. Weighs in around 240 lines, versus doebuild's 405- difference being it lacks digest/fetch functionality (fetch should be external), that and this is clean :). Haven't added elib support yet, since I'm waiting for the council to be elected and then vote on glep 33, but addition of a new command is pretty easy. Integration of confcache again also needs to be done, but that's minor (in other words, it's a feature and is implemented once the core is done). So... the beast can build. May not sound like much, but definite step beyond the previous work; next step involves finishing off the vdb class (which someone is working on in parallel to my build work), and defining a merge operation which represent the action of adding a package to a mutable database.
Resolver work can be done in parallel to it; once that's done, UI work is foremost, with custom implementations of components being done in parallel; example of non-standard component would be a cvs derivative of the ebuild class, to recreate the autoaddcvs functionality. That beast would derive from a mutable ebuild_repository derivative, that can do digest creation, etc.
Chunk the sucker up, and it's a heck of a lot easier to maintain, and extend :)
Aug 09 (Tue), 2005, 03:11
ACCEPT_LICENSE implementation, sizable chunk of use/slot deps, and config work
Busy weekend; aside from having likely too much heineken over the weekend, been taking advantage of new laptop to do a fair amount of rewrite work; aside from ongoing innard work, completed the following repository filters-
- keywords + package.keywords (hey, it's a from the ground up rewrite, so yes, adding this does matter :)
- profile package.mask and visibility limiters (look at $PORTDIR/profile/base/packages if curious about that one), + package.mask + package.unmask
- ACCEPT_LICENSE + package.license
Beyond that, finished off what I'm calling "conditional restrictions". All of the above (searching included) is based around restriction objects grouped as needed, fex "( package = diffball && category = dev-util && fullver = 0.6.5 )" which is created via
AndRestrictionSet(PackageRestriction("package", StrExactMatch("diffball")), PackageRestriction("category", StrExactMatch("dev-util")), PackageRestriction("fullver", VersionMatch("=", "0.6.5")));or... quite a bit easier,
atom("=dev-util/diffball-0.6.5");Atom provides a couple of nice attributes, but mainly translates (internally) the atom syntax into restriction objects when requested. Haven't explicitly mentioned it, but negation is available for each restriction object, including restriction groupings (atom is a boolean AND fex); so
atom("!=dev-util/diffball-0.6.5");becomes
AndRestrictionSet(PackageRestriction("package", StrExactMatch("dev-util")), PackageRestriction("category", StrExactMatch("diffball")), VersionMatch("=", "0.6.5", negate=True));
Realize it sounds a bit complex (partially because it is) but you can represent a *lot* via it, CONTENTS lookup via a ContainmentMatch for example, optionally limiting it to specific categories/packages. That said, all of the restriction stuff above is effectively boolean, either True or False in result- conditional restrictions aren't really. How do you represent a use dep? Aside from a cat/pkg/ver restriction, you need to force a change on the configuration wrapper, flipping the tk flag on for python for instance.
With potentially arbitrary grouping/construction/arrangement of restrictions, the resolver needs to be able to back out conditional changes from attempting a match down a particular restriction branch; fex, if
( category = dev-util && package = portage && ( ( use contains build && use not contains selinux ) || ( use contains xyz ) ) )matching will hit the first chunk of the boolean or, and if selinux is on, it needs to back out the build use flag enforcement (since that branch of matching failed), and then attempt the next match (forcing xyz on). It's kind of a thorny issue, complicated by the fact that certain conditionals cannot be toggled, your arch use flag for instance. So the use flag set needs to be protected somewhat (which is where portage.util.mappings.LimitedChangeSet comes into play). Further, if a use flag enforcement is requested, it must not be reversed in that branch of matching- you cannot require build on, and require build off. You back up to the point where build was flipped on, reverse the changes from that branch of matching, and see what other matches are possible. Basically a stack hack, with the package object (a mallable configurable package specifically) tracking changes, and the restriction's themselves pushing/popping changes as required.
This is a different mode of matching, initiated by the repository when it detects that it's working with mutable configuration wrappers (rather then an immutable wrapper, a binpkg that has it's use flags locked). That chunk of code allows for tracking and blocking unwanted configuration reversions, but aside from framework required to get to that point (something stable couldn't sanely support without a rewrite, oh gee, isn't this a rewrite? :), for the resolver to be truly package/format agnostic, it really can't know about conditionals. It deals in restrictions pulled from packages, graphing the package/restrictions out; if it were left at this point, use depends would be supported but the resolver would eat itself upon the first cycle-
- pkg a depends="x? ( b[x] )"
- pkg b depends="x? ( a )"
- USE="-x" pkg a
- USE="x" pkg b
- USE="x" pkg a
What remains use/slot dep work wise is pretty much adding a hook to the configurable layer that allows the restrictions to hand off requests, and be told whether it is possible (and the configuration change is recorded), or that it's not possible. That portion is going to be fun, since basically it's a lot of introspection/navel gazing, but it's the remaining chunk to kill off in that particular area of work.
After that's finished, still have stuff remaining, but most of it isn't as nasty (imho, at least). Implementing
- CONTENTS sets (file objs produced by building a pkg, or the contents of a built pkg)
- finish build operation class
- fetchables integration (fortunately just integration since Alec Warners has done that chunk)
- vdb (/var/db/pkg, installed pkg db) querying (mostly written)
- merge operation class
- finish vdb off (it's a modifiable repository, with merge operation representing addition to the repository)
- resolver integration into domain (another thing I can fortunately dodge, since Jason is the resolver goto, not I)
- UI work (yay)
- binpkg repo (same mutable characteristics as vdb, so can lift from that)
Any enhancements beyond that follow afterwards- which is where the new config format (samba.conf style) comes into play. First, disclaimer that it's not a forced config change, the current make.conf and make.profile will be mapped on the fly into the internal config representation, and that's functionality I don't foresee ever having a reason to drop, exempting lack of use. The samba style conf is a preferred internal, and for advanced configuration, preferred external format.
The reasons pretty much come down to the fact that how this beast is structured, determination of what classes/callables to use for a specific object (say your configured ebuild repository), is all done on the fly. The samba style config makes it *much* easier to specify domains (master obj/setting grouping), group repositories (and/or creating a repository set, aka PORTDIR + PORTDIR_OVERLAY with individual caches possible), cache groupings (slaved updates to multiple cache backends fex), use different classes for almost all objects (remote class for repo or cache or config), and in general, lots of crazy crap. Essentially, it allows you to represent trees of obj/settings streaming down from domain, and the interdeps between those objects/settings.
The samba style conf won't be forced as the only option for advanced configurations either, I'm using samba style mainly because python bundles a module that can parse the format already (ConfigParser). Alternative formats, properly implemented can use whatever config format they deem as long as the python object encapsulating the config is accessible via a ConfigParser akin api. From there, pretty much stick an autoexec section in the samba style config, or change defaults. Still fleshing out portage.config.central's capabilities, but at this point it will be possible once autoexec section support is added to central.
So the 'advanced' config basically is groupings of settings that map out to objects that central (based off of a config) arranges into what will be the internal (potentially external) api. If you note that make.conf settings are pretty much scalar, N overlays, single true ebuild repository, single binpkg db, single vdb, single set of build settings, it's not to hard to see that mapping that into the internal representation is easy; basically you just chunk the data up, and throw in whatever class overrides required.
With that disclaimer thrown out, and explanation for why a more powerful config is required, back to the extensibility bit, although frankly not a hell of a lot to say about it with details explained above. As stated, the config specifies class/callable to use, which is imported on the fly and executed. It'll allow for the guts to be replaced via config definitions, essentially plugins so via the config, you just change whatever implementation you've choosen for that grouping.
Bored peeps can do remote implementations of classes, or better implementations of existing (legacy I might add) specs; the binpkg format could stand an overhaul, and the vdb on disk layout blows, badly. Adding refcounts plus a global CONTENTS db to a vdb derivative would probably make *many* people happy who hate .keep files and slow CONTENTS lookup :)
On the plus side, no changes have been required of the portage tree (which would be design flaws, or indication of bad previous designs ;), so things are proceeding. Hell, a simple config change allowed me to dodge out of the metadata transfer post syncing, my setup runs directly off of $PORTDIR/metadata/cache, treating it (and the repository it's bound to) as unmodifiable. Frozen support, essentially. The flat_list (the database name for metadata/cache layout) format sucks, very bad on it's own for --searchDesc style ops (which access each pkg's metadata), but options are open for alternatives.
Meanwhile, back to the cage till the taming of this beast of a rewrite is completed. If interested in helping, pop into #gentoo-portage on irc.freenode.net, and hunt for someone who appears active, or just email dev-portage.
Jul 13 (Wed), 2005, 02:55
users of portage cvs head take note
If you're using portage cvs head (not stable, using cvs head directly), please take note of an earlier warning/request. The short version is that the processor class that was mangled from portage.doebuild is being replaced, and ebuild*.sh will be broke in the process.
Aside from being reworked (this time around got a better grasp on how to implement it), majority of portage hardcoded paths in ebuild*.sh are being removed, and handed down the ebd pipes. Until it's finished (which will be a good chunk of time), cvs head will be broke, so if you're using cvs head kindly let the portage devs know so that we have an idea of how many people are affected, and snapshot what ever you're running, since once this starts won't be reverting till it's finished. Note this isn't a statement that you cannot use cvs head, just that so far it's been fairly stable, and that is going to change, the developmental branch will be broke while work is being done to correct long standing flaws.
Keep in mind that cvs head *is* a developmental branch- so far nobody has had to break it, although it was expected to occur (beyond me making a dumb typo and short term breaking something :). So kindly give a yell, either ask on the ml or irc, and be wary about cvsups.
Jul 13 (Wed), 2005, 01:21
Restrictions
So... what are restrictions, and why is this under the portage category? Restriction implies a yay/nay evaluation of a chunk of data, which if you look at most portage package interactions, they're application of restrictions to data. Bad description, but should make sense as I continue.
An atom, say =dev-util/diffball-0.6.2 is viewed in stable as a string, which is split up and various tests done to it to determine if a cpv (category/package-version, the atom above is cpv dev-util/diffball-0.6.2) matches. The idea has been kicked around for a bit, but essentially an atom is 3 restrictions; a category restriction, a package restriction, and a version restriction.
Where it gets useful is that if you treat an atom as a set of restrictions (specifically portage.restriction.restrictionSets.AndRestrictionSet, and yes, the namespace needs massive shortening), you can tag in other restrictions to it. Since you've broken it down into individual checks and used a general class to abstract the boolean matches of it, slipping in a use restriction or a slot restriction really isn't that complex.
Nice little world view there, except I'm simplifying it a bit; you need to establish an api/protocol for the restrictions to work against, eg the actual data it checks. To make that a sane, you need an api for package instances, which is underway.
Note that there is a lot more work involved then just slipping in a set of restrictions, this is just a design abstraction that allows for it to be easily tagged in (mainly due to Jason's work, I just broke his original Atom class down into restrictions). The resolver still needs to have a way to deal with merging of cpv's that have compatible use/slot restrictions, but that functionality probably can be pushed into the base restrictionSet, interaction/union functionality. Useful functionality anyways for restrictionSet and derivatives.
Beyond atom mangling, searchDesc queries, contents queries (which cpv(s) claim file/dir/link/dev/fifo xyz) all are essentially restrictions. You want to merge strictly built packages already? Could implement it as a repository restriction. The fun part is going to be keeping it quick, and doing optimizations/collapsing of restrictions, but so far it's seeming to work rather nicely.
If you're bored and feel like poking into it, look into gentoo-src/portage/portage; only disclaimer I'd offer is that it doesn't do cache regeneration at the moment, mainly because we haven't decided if we're going to break cvs head by importing the changed ebuild-daemon.{sh,lib}. If you're interested in docs, look at intro and layout.txt in rewrite-misc. layout.txt is background info (no longer accurate, good for background knowledge though), and intro should be pretty accurate.
Aside from that, if you're interested, pop into the channel and ask.