Aug 24 (Wed), 2005, 20:58

build implementation in rewrite

One plus to this rewrite is that a lot of the base code is being lifted from earlier experiments, and attempts. Ebd was implemented a long while back, and is in use, and is still pretty much the same beast just with a large collection of cleanup of ebuild*sh original code (full removal of clean, package, and help support, since those should be python side). The real work ebd wise has been cleaning up the original python side handler, the ebuild_processor that talks to the running bash ebuild-daemon.sh.

The first implementation of it was pretty much pipe/comm methods written, then doebuild slapped into it and hidden away (this version is what exists in the 2.1 snapshot for example). For those who don't know, doebuild is the python side chunk of code that handles all ebuild phase execution; hell, it handles fetch and digest. It's a giant function that is rather nasty with a massive amount of if statements tagged in to special case specific phase requirements. Tiz a beast. Tagging it into the ebuild processor wasn't incredibly clean, but was done so that breaking up doebuild could be done internally, rather then breaking any api people expect out of portage.

Unfortunately, that didn't go so well, which is why it exists in 2.1 still. The problem is that it's bound pretty heavily to Config, which holds all profile/user settings, and does other nasty things involving globals and features. Need to break Config up, break the globals, then you can eye breaking up doebuild. Which pretty much is what's occuring in the rewrite, and subject of this entry.

A build operation class implementation was added which abstracts the setup->install phases, wrapping it into a class is now in the rewrite, so building (mplayer) for example is possible. Weighs in around 240 lines, versus doebuild's 405- difference being it lacks digest/fetch functionality (fetch should be external), that and this is clean :). Haven't added elib support yet, since I'm waiting for the council to be elected and then vote on glep 33, but addition of a new command is pretty easy. Integration of confcache again also needs to be done, but that's minor (in other words, it's a feature and is implemented once the core is done). So... the beast can build. May not sound like much, but definite step beyond the previous work; next step involves finishing off the vdb class (which someone is working on in parallel to my build work), and defining a merge operation which represent the action of adding a package to a mutable database.

Resolver work can be done in parallel to it; once that's done, UI work is foremost, with custom implementations of components being done in parallel; example of non-standard component would be a cvs derivative of the ebuild class, to recreate the autoaddcvs functionality. That beast would derive from a mutable ebuild_repository derivative, that can do digest creation, etc.

Chunk the sucker up, and it's a heck of a lot easier to maintain, and extend :)


Posted by Brian Harring | Permalink

Aug 09 (Tue), 2005, 03:11

ACCEPT_LICENSE implementation, sizable chunk of use/slot deps, and config work

Busy weekend; aside from having likely too much heineken over the weekend, been taking advantage of new laptop to do a fair amount of rewrite work; aside from ongoing innard work, completed the following repository filters-

  • keywords + package.keywords (hey, it's a from the ground up rewrite, so yes, adding this does matter :)
  • profile package.mask and visibility limiters (look at $PORTDIR/profile/base/packages if curious about that one), + package.mask + package.unmask
  • ACCEPT_LICENSE + package.license

Beyond that, finished off what I'm calling "conditional restrictions". All of the above (searching included) is based around restriction objects grouped as needed, fex "( package = diffball && category = dev-util && fullver = 0.6.5 )" which is created via

AndRestrictionSet(PackageRestriction("package", StrExactMatch("diffball")), PackageRestriction("category", StrExactMatch("dev-util")), PackageRestriction("fullver", VersionMatch("=", "0.6.5")));
or... quite a bit easier,
atom("=dev-util/diffball-0.6.5");
Atom provides a couple of nice attributes, but mainly translates (internally) the atom syntax into restriction objects when requested. Haven't explicitly mentioned it, but negation is available for each restriction object, including restriction groupings (atom is a boolean AND fex); so
atom("!=dev-util/diffball-0.6.5");
becomes
AndRestrictionSet(PackageRestriction("package", StrExactMatch("dev-util")), PackageRestriction("category", StrExactMatch("diffball")), VersionMatch("=", "0.6.5", negate=True));

Realize it sounds a bit complex (partially because it is) but you can represent a *lot* via it, CONTENTS lookup via a ContainmentMatch for example, optionally limiting it to specific categories/packages. That said, all of the restriction stuff above is effectively boolean, either True or False in result- conditional restrictions aren't really. How do you represent a use dep? Aside from a cat/pkg/ver restriction, you need to force a change on the configuration wrapper, flipping the tk flag on for python for instance.

With potentially arbitrary grouping/construction/arrangement of restrictions, the resolver needs to be able to back out conditional changes from attempting a match down a particular restriction branch; fex, if

( category = dev-util && package = portage && ( ( use contains build && use not contains selinux ) || ( use contains xyz ) ) )
matching will hit the first chunk of the boolean or, and if selinux is on, it needs to back out the build use flag enforcement (since that branch of matching failed), and then attempt the next match (forcing xyz on). It's kind of a thorny issue, complicated by the fact that certain conditionals cannot be toggled, your arch use flag for instance. So the use flag set needs to be protected somewhat (which is where portage.util.mappings.LimitedChangeSet comes into play). Further, if a use flag enforcement is requested, it must not be reversed in that branch of matching- you cannot require build on, and require build off. You back up to the point where build was flipped on, reverse the changes from that branch of matching, and see what other matches are possible. Basically a stack hack, with the package object (a mallable configurable package specifically) tracking changes, and the restriction's themselves pushing/popping changes as required.

This is a different mode of matching, initiated by the repository when it detects that it's working with mutable configuration wrappers (rather then an immutable wrapper, a binpkg that has it's use flags locked). That chunk of code allows for tracking and blocking unwanted configuration reversions, but aside from framework required to get to that point (something stable couldn't sanely support without a rewrite, oh gee, isn't this a rewrite? :), for the resolver to be truly package/format agnostic, it really can't know about conditionals. It deals in restrictions pulled from packages, graphing the package/restrictions out; if it were left at this point, use depends would be supported but the resolver would eat itself upon the first cycle-

  • pkg a depends="x? ( b[x] )"
  • pkg b depends="x? ( a )"
Innocent looking unless you've caught Jason's comments on the half dozen ass-biters the resolver needs to handle- you can get out of that cycle, but to do so you need to find a way to build in the following order-
  1. USE="-x" pkg a
  2. USE="x" pkg b
  3. USE="x" pkg a
So... either the resolver is aware of conditionals, or it has a method to request from the repository a set of restrictions that represent "I want another wrapped pkg a with whatever changes required to the configuration to kindly drop that dependency on b". If the repository returns a match (meaning it figured out a configuration that meets the restrictions), the resolver can build that package , build b, then rebuild the original requested configuration of a. Bit more complicated when you throw in the various types of depends, but that's the general approach to finding a way out of the cycle (actually, finding any potential ways out of a cycle). This is simplifying it a bit, but the general jist is there. This *should*, unless I'm seriously crack adled, also heavily simplify building from stages once use deps are deployed (mind you, my opinion, but going by what I know from bootstrap*.sh)

What remains use/slot dep work wise is pretty much adding a hook to the configurable layer that allows the restrictions to hand off requests, and be told whether it is possible (and the configuration change is recorded), or that it's not possible. That portion is going to be fun, since basically it's a lot of introspection/navel gazing, but it's the remaining chunk to kill off in that particular area of work.

After that's finished, still have stuff remaining, but most of it isn't as nasty (imho, at least). Implementing

  • CONTENTS sets (file objs produced by building a pkg, or the contents of a built pkg)
  • finish build operation class
  • fetchables integration (fortunately just integration since Alec Warners has done that chunk)
  • vdb (/var/db/pkg, installed pkg db) querying (mostly written)
  • merge operation class
  • finish vdb off (it's a modifiable repository, with merge operation representing addition to the repository)
  • resolver integration into domain (another thing I can fortunately dodge, since Jason is the resolver goto, not I)
  • UI work (yay)
  • binpkg repo (same mutable characteristics as vdb, so can lift from that)
Bit high level of a list, with good chunks of reusable code from experiments, but mostly on par. As always, don't ask for a timeline, cause frankly, it's not done till it's done :) . Besides, it all sounds easy at a high level, till you get into the guts >:)

Any enhancements beyond that follow afterwards- which is where the new config format (samba.conf style) comes into play. First, disclaimer that it's not a forced config change, the current make.conf and make.profile will be mapped on the fly into the internal config representation, and that's functionality I don't foresee ever having a reason to drop, exempting lack of use. The samba style conf is a preferred internal, and for advanced configuration, preferred external format.

The reasons pretty much come down to the fact that how this beast is structured, determination of what classes/callables to use for a specific object (say your configured ebuild repository), is all done on the fly. The samba style config makes it *much* easier to specify domains (master obj/setting grouping), group repositories (and/or creating a repository set, aka PORTDIR + PORTDIR_OVERLAY with individual caches possible), cache groupings (slaved updates to multiple cache backends fex), use different classes for almost all objects (remote class for repo or cache or config), and in general, lots of crazy crap. Essentially, it allows you to represent trees of obj/settings streaming down from domain, and the interdeps between those objects/settings.

The samba style conf won't be forced as the only option for advanced configurations either, I'm using samba style mainly because python bundles a module that can parse the format already (ConfigParser). Alternative formats, properly implemented can use whatever config format they deem as long as the python object encapsulating the config is accessible via a ConfigParser akin api. From there, pretty much stick an autoexec section in the samba style config, or change defaults. Still fleshing out portage.config.central's capabilities, but at this point it will be possible once autoexec section support is added to central.

So the 'advanced' config basically is groupings of settings that map out to objects that central (based off of a config) arranges into what will be the internal (potentially external) api. If you note that make.conf settings are pretty much scalar, N overlays, single true ebuild repository, single binpkg db, single vdb, single set of build settings, it's not to hard to see that mapping that into the internal representation is easy; basically you just chunk the data up, and throw in whatever class overrides required.

With that disclaimer thrown out, and explanation for why a more powerful config is required, back to the extensibility bit, although frankly not a hell of a lot to say about it with details explained above. As stated, the config specifies class/callable to use, which is imported on the fly and executed. It'll allow for the guts to be replaced via config definitions, essentially plugins so via the config, you just change whatever implementation you've choosen for that grouping.

Bored peeps can do remote implementations of classes, or better implementations of existing (legacy I might add) specs; the binpkg format could stand an overhaul, and the vdb on disk layout blows, badly. Adding refcounts plus a global CONTENTS db to a vdb derivative would probably make *many* people happy who hate .keep files and slow CONTENTS lookup :)

On the plus side, no changes have been required of the portage tree (which would be design flaws, or indication of bad previous designs ;), so things are proceeding. Hell, a simple config change allowed me to dodge out of the metadata transfer post syncing, my setup runs directly off of $PORTDIR/metadata/cache, treating it (and the repository it's bound to) as unmodifiable. Frozen support, essentially. The flat_list (the database name for metadata/cache layout) format sucks, very bad on it's own for --searchDesc style ops (which access each pkg's metadata), but options are open for alternatives.

Meanwhile, back to the cage till the taming of this beast of a rewrite is completed. If interested in helping, pop into #gentoo-portage on irc.freenode.net, and hunt for someone who appears active, or just email dev-portage.


Posted by Brian Harring | Permalink | Categories: General Gentoo, Portage news