Byzantine Askemos Language Layer

IntroRequestExample :: pdf :: context view

inbound links:
Requestprocessing Walkthrough

Requestprocessing Walkthrough

This document gives a bird eyes view on the request processing within Askemos/BALL. It explains mostly what the user space applications do not have to worry about. There is also a more general version with pictures.

Let's follow what happens when I upload a new version of the source code.

Contact The Net

In my home directory at ~/Askemos there is a WebDAV file system mounted, which connects to my local peer of the network. This local peer is just another node, however it is authorised (by means of attributes in the SSL cert) to sign requests in my name - hence such a node is often called a representative - as opposed to "annonymous" or "notarian" nodes. To send the file off I use:

      $ cp askemos-0.8.8.tar.gz ~/Askemos/CC/subdir
     

Incomming Request Dispatch

As long as the connection limit is not reached, the peer will accept new connections. HTTPS connections are forwarded to the a sslmgr subprocess. Since the openssl license is not GPL compatible, we don't link the core to this library and have a small C program instead. The design makes a nice framework to plug in other connection handlers, e.g., sslmgr itself supports SOCKS5a to route outgoing SSL traffic via tor.

The request is parsed into an internal message object. The BALL code keeps small requests in main memory. Beyond a configurable size (default 128k) the body of the request is split into sections (works similar to bittorrent) and the index file is used instead of the plain data.

The message is then send to my personal "entry point". This entry point is an object in the Askemos space. Those objects are quite heavyweight in comparsion to objects in the sense of object oriented programming languages. But otherwise not different. To stress the difference they go under the name of "place" if needed.

So the entry point object will consult it's user space code to find a method to handle the request. In that case forward it to the place it knows under the name "CC". But that's the next step already. Before that the representative will find that my entry point has eight copies and forward the request to all of them. Once enough peers have received a copy of the request message, it's send to the local copy too.

Find A Consensus

When a message is delivered to a place, the system will send an "echo" message to all peers to initiate the consensus protocol (byzantine agreement). Then it will process the request locally:

As objects have a type so have those places. The type is encoded in yet another - static - place: the contract according to which the place will act. (In the source code called 'action-document'). The system core reads the contract, finds an interpreter and invokes it with a reference to the place and message. (A process very simillar to the famous #!-notation.)

In the case of my entry point the action-document will invoke the messy-yet-powerful XSLT/Scheme/SQL reflexive scripting environment. It could be anything, e.g., a CGI/SCGI call.

The contract has in general two parts (each of which could be a noop but not both), one for the "propose" phase and one for the "commit" part.

The "propose" function must return a valid input for the "commit" part and a hash sum of the proposal to be used in the byzantine agreement. If the propose function terminates with exception (or timeout) the request processing is aborted and an error message is sent back. The "commit" part must never fail (or the node is counted as failing and may be resynchronised later - note that the reason could be a bug in the contract; the core does not care, either it resyncs to the new version or to the old one, if enough peers did not commit the transaction).

When enough "echo" messages where counted, the peer will communicate the hash of the proposal in a "ready" message to the other copies. Once a peer has seen enough "ready" messages the "commit" part is executed. It will change the state of the snapshot. (How exactly is a matter of the glue code for the particular interpreter. The CoreAPI incompletely document describes an XML implementation of the available API - TODO: document use of variables to create multiple references to result objects at once.)

Keep The Record And Continue

The new state is saved to persistent memory now. Then outgoing requests are send out.

A configurable number of storage adaptors are updated with the new persistent state. The "fsm" (for "file system mirror") for example stores an XML encoded RDF structure of the meta data for each object. This RDF structure (german description) includes a reference to the objects data (a BLOB from the system's point of view), which is stored in a second file under it's hash value.

Again

The dispatch layer will now forward the outgoing message objects to their receivers. In the case of the "CC" objects seven peers will find themself in the set of copies and hence re-enter the consensus protocol with the new object in focus. My representative however is not among them. Since it does not know what's going to happen, it will wait for status reports from the other nodes.

The other peers will find the WebDAV interpreter responsible for "CC" (and all following path components) and store the message object accordingly. This WebDAV implementation keeps all data and directory structure in static documents (simple implemented: those always terminate in the propose phase with an exception).  Hash checks are available to proof that no local tempering has happend, but since the peer should at least trust itself are done offline. (Such a recursive hash structure is also called a Merkele tree. It's great to implement all kinds of version control systems.) When done they will send back the HTTP-201 response to my representative.

Finished

As soon as the success message arrives on my representative, the HTTP implementation will stop sending 102 responses to the WebDAV client to keep it waiting and forward the response.

By now the "cp" command is done and I get back the shell prompt.

For large files as this those other peers will now start to download those torrent-style sections. While I may already continue my work and including copying more files or moving the result file around, I should give the process a chance to complete before switching my representative off.

On Failure Recovery

Now let's look at a variation of the processing. There is the ticket tracker. It's used for a variety of projects, some publically, some hidden. My representative is often enough offline while a ticket is submitted.

The ticket tracker "is supported" by eight peers (i.e., there are eight peers, which are supposed to participate in the consensus about accepting a new ticket). The request is replicated from the senders representative (which in turn could be a public terminal for annonymous messages, since those are deemed acceptable here) and those seven peers accept the ticket and commit a new version of their state.

When my respresentative is started again - and it's configured to run in synchroneous mode as recommended for representatives, while there's a tradeoff wrt. speed for notarian nodes - it will check with the quorum (the set of nodes maintaining a copy of the object), find itself lagging behind and resynchronise to the current version before it even displays it's state.


Last modification: Sun, 21 Jul 2013 10:08:46 +0200

Author(s): jfw,

Document number A26b5619be8d5e3348cca356acfc8efea delivered to public at Mon, 11 Dec 2017 08:25:28 +0100

short comments

Here a more [[http://askemos.org/Abb8999dd38524dcc113f977d378a9ee0/?_v=wiki&_id=617|general version with pictures]].

add comment