Sun Nov 23 09:26:01 CET 2014

DBus, FreeDesktop, and lots of madness

For various reasons I've spent a bit of time dissecting how dbus is supposed to work. It's a rather funny game, but it is confusing to see people trying to use these things. It starts quite hilarious, the official documentation (version 0.25) says:
The D-Bus protocol is frozen (only compatible extensions are allowed) as of November 8, 2006. However, this specification could still use a fair bit of work to make interoperable reimplementation possible without reference to the D-Bus reference implementation. Thus, this specification is not marked 1.0
Ahem. After over a decade (first release in Sept. 2003!!) people still haven't documented what it actually does.
Allrighty then!
So we start reading:
"D-Bus is low-overhead because it uses a binary protocol"
then
"Immediately after connecting to the server, the client must send a single nul byte."
followed by
"A nul byte in any context other than the initial byte is an error; the protocol is ASCII-only."
Mmmh. Hmmm. Whut?
So anyway, let's not get confused ... we continue:
The string-like types are basic types with a variable length. The value of any string-like type is conceptually 0 or more Unicode codepoints encoded in UTF-8, none of which may be U+0000. The UTF-8 text must be validated strictly: in particular, it must not contain overlong sequences or codepoints above U+10FFFF. Since D-Bus Specification version 0.21, in accordance with Unicode Corrigendum #9, the "noncharacters" U+FDD0..U+FDEF, U+nFFFE and U+nFFFF are allowed in UTF-8 strings (but note that older versions of D-Bus rejected these noncharacters).
So there seems to be some confusion what things like "binary" mean, and UTF-8 seems to be quite challenging too, but no worry: At least we are endian-proof!
A block of bytes has an associated byte order. The byte order has to be discovered in some way; for D-Bus messages, the byte order is part of the message header as described in the section called “Message Format”. For now, assume that the byte order is known to be either little endian or big endian.
Hmm? Why not just define network byte order and be happy? Well ... we're even smarterer:
The signature of the header is: "yyyyuua(yv)"
Ok, so ...
1st BYTE Endianness flag; ASCII 'l' for little-endian or ASCII 'B' for big-endian. Both header and body are in this endianness.
We actually waste a BYTE on each message to encode endianness, because ... uhm ... we run on ... have been ... I don't get it. And why a full byte (with a random ASCII mnemonic) instead of a bit? The whole 32bit bytemash at the beginning of the header could be collapsed into 8 bit if it were designed. Of course performance will look silly if you use an ASCII protocol over the wire with generous padding ... so let's kdbus because performance. What teh? Just reading this "spec" makes me want to get more drunk.

Here's a radical idea: Fix the wire protocol to be more sane, then fix the configuration format to not be hilarious XML madness (which the spec says is bad, so what were the authors not thinking?)
But enough about this idiocy, let's go up the stack one layer. The freedesktop wiki uses gdbus output as an API dump (would be boring if it were an obvious format), so we have a look at it:
      @org.freedesktop.systemd1.Privileged("true")
      UnlockSessions();
Looking through the man page there's no documentation what a line beginning with "@" means. Because it's obvious!@113###
So we read through the gdbus sourcecode and start crying (thanks glib, I really needed to be reminded that there are worse coders than me). And finally we can correlate it with "Annotations"

Back to the spec:
Method, interface, property, and signal elements may have "annotations", which are generic key/value pairs of metadata. They are similar conceptually to Java's annotations and C# attributes.
I have no idea what that means, but I guess as the name implies it's just a textual hint for the developer. Or not? Great to see a specification not define its own terms.

So what I interpret into this fuzzy text is most likely very wrong, and someone should define these items in the spec in a way that can be understood before you understood the spec. Less tautological circularity and all that ...
Let's assume then that we can ignore annotations for now ... here's our next funny:
The output of gdbus is not stable. If you were to write a dbus listener based on the documentation, well, the order of functions is random, so it's very hard to compare the wiki API dump 'documentation' with your output.
Oh great ... sigh. Grumble. Let's just do fuzzy matching then. Kinda looks similar, so that must be good enough.
(Radical thought: Shouldn't a specification be a bit less ambiguous and maybe more precise?)

Anyway. Ahem. Let's just assume we figure out a way to interact with dbus that is tolerable. Now we need to figure out what the dbus calls are supposed to do. Just for fun, we read the logind 'documentation':
      @org.freedesktop.systemd1.Privileged("true")
      CreateSession(in  u arg_0,
                    in  u arg_1,
                    in  s arg_2,
                    in  s arg_3,
                    in  s arg_4,
                    in  s arg_5,
                    in  u arg_6,
                    in  s arg_7,
                    in  s arg_8,
                    in  b arg_9,
                    in  s arg_10,
                    in  s arg_11,
                    in  a(sv) arg_12,
                    out s arg_13,
                    out o arg_14,
                    out s arg_15,
                    out h arg_16,
                    out u arg_17,
                    out s arg_18,
                    out u arg_19,
                    out b arg_20);
with the details being:
CreateSession() and ReleaseSession() may be used to open or close login sessions. These calls should never be invoked directly by clients. Creating/closing sessions is exclusively the job of PAM and its pam_systemd module.
*cough*
*nudge nudge wink wink*
Zero of twenty (!) parameters are defined in the 'documentation', and the same document says that it's an internal function that accidentally ended up in the public API (instead of, like, being, ah, a private API in a different namespace?)
Since it's actively undefined, and not to be used, a valid action on calling it would be to shut down the machine.

Dogbammit. What kind of code barf is this? Oh well. Let's try to figure out the other functions -
LockSession() asks the session with the specified ID to activate the screen lock.
And then we look at the sourcecode to learn that it actually just calls:
session_send_lock(session, streq(sd_bus_message_get_member(message), "LockSession"));
Which calls a function somewhere else which then does:
        return sd_bus_emit_signal(
                        s->manager->bus,
                        p,
                        "org.freedesktop.login1.Session",
                        lock ? "Lock" : "Unlock",
                        NULL);
So in the end it internally sends a dbus message to a different part of itself, and that sends a dbus signal that "everyone" is supposed to listen to.
And the documentation doesn't define what is supposed to happen, instead it speaks in useless general terms.

      PowerOff(in  b arg_0);
      Reboot(in  b arg_0);
      Suspend(in  b arg_0);
      Hibernate(in  b arg_0);
      HybridSleep(in  b arg_0);
Here we have a new API call for each flavour of power management. And there's this stuff:
The main purpose of these calls is that they enforce PolicyKit policy
And I have absolutely no idea about the mechanism(s) involved. Do I need to query PK myself? How does the dbus API know? Oh well, just read more code, and interpret it how you think it might work. Must be good enough.

While this exercise has been quite educational in many ways I am surprised that this undocumented early-alpha quality code base is used for anything serious. Many concepts are either not defined, or defined by the behaviour of the implementation. The APIs are ad-hoc without any obvious structure, partially redundant (what's the difference between Terminate and Kill ?), and not documented in a way that allows a reimplementation.
If this is the future I'll stay firmly stuck in the past ...

Posted by Patrick | Permalink

Sat Nov 15 05:14:02 CET 2014

Having fun with networking

Since the last minor upgrade my notebook has been misbehaving in funny ways.
I presumed that it was NetworkManager being itself, but ... this is even more fun. To quote from the manpage:
If the hostname is currently blank, (null) or localhost, or force_hostname is YES or TRUE or 1 then dhcpcd
     sets the hostname to the one supplied by the DHCP server.
Guess what. Now my hostname is 192.168.0.7, I mean 192.168.0.192.168.0.7, err...
And as a bonus this even breaks X in funny ways so that starting new apps becomes impossible. The fix?
Now the hostname is set to "localhorst". Because that's the name of the machine!111 (It doesn't have an explicit name, so localhost used to be ok)

Posted by Patrick | Permalink

Sun Nov 9 12:06:17 CET 2014

gentooJoin 2004/04/11

How time flies!
gentooJoin: 2004/04/11

Now I feel ooold

Posted by Patrick | Permalink

Wed Nov 5 09:38:47 CET 2014

Just a simple webapp, they said ...

The complexity of modern software is quite insanely insane. I just realized ...
Writing a small webapp with flask, I've had to deal with the following technologies/languages:
  • System package manager, in this case portage
  • SQL DBs, both SQLite (local testing) and PostgreSQL (production)
  • python/flask, the core of this webapp
  • jinja2, the template language usually used with it
  • HTML, because the templates don't just appear magically
  • CSS (mostly hidden in Bootstrap) to make it look sane
  • JavaScript, because dynamic shizzle
  • (flask-)sqlalchemy, ORMs are easier than writing SQL by hand when you're in a hurry
  • alembic, for DB migrations and updates
  • git, because version control
So that's about a dozen things that each would take years to master. And for a 'small' project there's not much time to learn them deeply, so we staple together what we can, learning as we go along ...

And there's an insane amount of context switching going on, you go from mangling CSS to rewriting SQL in the span of a few minutes. It's an impressive polyglot marathon, but how is this supposed to generate sustainable and high-quality results?

Und then I go home in the evening and play around with OpenCL and such things. Learning never ends - but how are we going to build things that last for more than 6 months? Too many moving parts, too much change, and never enough time to really understand what we're doing :)

Posted by Patrick | Permalink