GoblinShare: Secure, Peer-to-Peer File-Sharing with Goblins

Juliana Sims —

This year, instead of participating in the Autumn Lisp Game Jam, Spritely decided to make some non-game demos to show off our tech. To that end, I spent the week of November 3rd writing GoblinShare, a secure, peer-to-peer file-sharing utility using the Guile port of Magenc for storage and distribution, relying on Goblins for the peer-to-peer connection abstraction, and delivered over Tor. Thanks to Goblins, this turned out to be super easy to implement. Let me show you how!

🎶️ Demos; glorious demos! 🎶️

We at Spritely absolutely love our technology demos. We know it can be tricky to understand an unfamiliar paradigm, but we also think hands-on demos help. To that end, we build lots of usable demos to try. We particularly like building games during the Lisp Game Jam so we can connect with the broader Lisp community, encourage more developers to use our technology, and end up with something fun that people want to use. After all, we can't build the future of the social web all by ourselves! This year, though, we decided to focus not on showing off work we've already done, but pushing our work in a new direction; and it felt best to do that outside the context of the jam.

My assignment (which I did choose to accept) was tripartite: port Magenc to Guile, implement a simple file-sharing tool (which I decided to somewhat model on Magic Wormhole), and port Crystal to Guile. The Crystal work remains ahead, but I've ported Magenc and built GoblinShare as the file-sharing tool. I treated the Magenc port as prep work and GoblinShare proper as the equivalent of a jam entry, so that will be the focus of this post. First, though, I'll provide an overview of these two projects, what they do, and how they do it.

Magenc

Magenc is an encrypted, distributed, content-addressed data store relying on magnet URLs as capabilities, inspired by Tahoe-LFS. It does not use Goblins, but it is built with capability security concepts in mind. It consists of three components, each functioning as a subcommand under a single program: magenc serve, which starts a server; magenc put, which encrypts and POSTs files to a server; and magenc get, which GETs files from a server. (Although the architecture is designed to support arbitrary remote stores, only a web store using HTTP has been implemented so far.)

A quick summary

The core of Magenc's functionality is the magnet URL, which looks like magnet://?xt=<url-encoded-urn>&ek=<base64-url-encoded-string>&es=<string> and uniquely identifies a specific file. Decomposing the query parameters (everything after ?), xt refers to the exact topic identifying a specific file, which is simply the sha256 hash of the encrypted manifest (which may be a raw object as discussed below). ek is the encryption key used to encrypt the file. es is the encryption suite used for encryption. The Guile port of Magenc uses AES in Galois/Counter Mode (AES-GCM) for encryption; the Racket version uses AES in Counter Mode (AES-CTR). Of this information, only the exact topic is ever known by the server. Together, the information in a magnet URL is both all that is needed and everything that is required to identify, access, and decrypt a file on a given server. This makes a magnet URL a capability.

The server only has access to encrypted binary data a client sends. When the server receives a POST request with attached binary data, it hashes the provided data using sha256, encodes the unhashed data using base64, and stores the encoded data in a key-value store where the hash ‒ the exact topic ‒ is the key. It then converts the exact topic into a URI object from Guile's (web uri) module before sending the URI back in the content-location field of the response object. As a safety redundancy, the client checks to make sure that the exact topic it gets back is what it expects. When the server receives a GET request with an exact topic in the content-location field, it looks up the associated data, decodes the base64 into binary data, and sends that back as the body of a response. That's all the server knows and does; everything else is handled by the client.

magenc put handles the encryption and generation of cryptographic inputs. When passed a file (and, optionally, server URL), Magenc first chunks the file as necessary, encrypting each chunk to transmit separately so that the server does not necessarily know the chunks are related (though correlation of connections and network traffic by a malicious server or observer could be used to reliably guess interrelation). Then, the client prints a magnet URL which identifies the entrypoint to retrieve the file ‒ a "raw" object if the file fits in one chunk or a "manifest" if it doesn't. As discussed above, the magnet URL also embeds cryptographic information.

A file's magnet URL can be used with magenc get to retrieve that file. When passed a magnet URL (and, if necessary, server URL), magenc get first retrieves and decrypts the object identified by the exact topic. If the object is a "raw" object, Magenc extracts the binary data, decrypts it, and writes it to either standard output or a given file. If the object is a "manifest" object, Magenc extracts the exact topics for each chunk from the manifest, retrieves and decrypts each chunk in order, then writes out the complete file in the same way.

The original version of Magenc includes a more thorough write-up. Aside from the differing encryption suites and commandline interfaces, that write-up also holds for the Guile version.

A simple example

Some things are easier to understand in action, so let's walk through a trivially simple example of storing and retrieving a file using Magenc. We will skip build instructions, which are provided in the repository. For ease of demonstration, we will use the default server configuration which launches a process listening at http://127.0.0.1:8118 (http://localhost:8118) and stores data in memory rather than writing it to disk ‒ Magenc does include a backend relying on Goblins' bloblin persistence store to efficiently store files on disk. We will use what is called convergent encryption, which uses part of the unencrypted file itself as a cryptographic input, to ensure that our example file produces the same magnet URL every time it is stored. (If you are reading this in the future and Magenc's cryptography has changed, the magnet URLs may no longer match.)

First, let's start a server:

magenc serve

This should print the address where the server is listening, like:

Server running at: http://127.0.0.1:8118

We can leave that running in one terminal session and use a different one for everything else. Now let's create our example file:

echo "Hello! I'm an example file!" > example.txt

Next, we'll store the file with magenc put example.txt --convergent. This prints the magnet URL magnet:?xt=urn%3Asha256d%3A4a2TJXrPx83v1DGnOJyPa5b678AkVsPaplsx_LcT06I&ek=_TrAfpNRRLQb7gutF8KKMtj-tPWk8_AapsJQgu6sDeo&es=AES-256-GCM. (Note that argument ordering is relevant; to simplify the implementation of the CLI, Magenc expects option arguments ‒ anything starting with - or -- ‒ after positional arguments.) Finally, we can retrieve the file:

magenc get "magnet:?xt=urn%3Asha256d%3A4a2TJXrPx83v1DGnOJyPa5b678AkVsPaplsx_LcT06I&ek=_TrAfpNRRLQb7gutF8KKMtj-tPWk8_AapsJQgu6sDeo&es=AES-256-GCM"

Make sure to quote the magnet URL otherwise the shell will interpret & as a command. This command produces the output:

Hello! I'm an example file!

And that's it! Simple! There are a few more options available for the various subcommands, each explained with magenc --help.

GoblinShare

In addition to being an application, Magenc is also a library. GoblinShare is implemented using Magenc in this capacity. All it adds is a purpose-built UI and a wrapper around the in-memory store backend which makes it easier to use through a Goblins actor. The sending peer launches a Tor netlayer, generates a sturdyref ‒ a persistent object reference which can be shared out-of-band ‒ for the store actor, and adds that to the magnet URL with the additional acceptable source (as) field. The receiving peer sets up its own netlayer, "enlivens" the sturdyref, and downloads the file. Once the file is retrieved, the sending peer terminates, removing the associated data from memory.

An example

We will perform essentially the same tasks for this example as we did for the Magenc example. There are a few differences to note. First, GoblinShare does not provide an option for convergent encryption. This simply wouldn't make sense for an ephemeral file-sharing tool. For our present purposes, this means that you will almost certainly get a different magnet URL than is produced here. Second, it is necessary to manually run the tor daemon. Build and usage instructions in the GoblinShare repository cover that so the following example assumes a running daemon.

We will reuse the same example file as above, so feel free to reuse the previous command to create it. Then, all we have to do is send it:

goblinshare send example.txt

to get the magnet URL. The sending process will wait for the file to be retrieved, so switch to another terminal session to retrieve the file:

goblinshare receive "magnet://?xt=...&ek=...&es=...&as=..."

Because we're using Tor as our network layer, it can take a few seconds for the connection to be established and the data to be sent, even though we are connecting to a local server. Since our example file is small enough to fit into a single chunk, the delay isn't very long, but a larger file can take quite a while to transfer. Eventually, you will receive the expected output:

Hello! I'm an example file!

At this point, both the sending and receiving processes will terminate, and that's that! You've successfully shared a file with Goblins! Just like Magenc, GoblinShare has a (very) few options you can see with goblinshare --help. Unlike Magenc, option arguments can be supplied before or after positional arguments as long as they follow the associated subcommand.

A note on Magic Wormhole

I mentioned that I took initial inspiration for this project from Magic Wormhole. In practice, though, the only similarity wound up being the names of the subcommands send and receive. Magic Wormhole relies on a central relay server because it uses "wormhole codes" to facilitate oral communication of the relevant capabilities. These codes seem to be keys mapping to a fuller capability and therefore requiring a coordination point. GoblinShare, by contrast, chooses to instead provide less-human-friendly magnet URLs which allow fully peer-to-peer file transfer because they encode all necessary information.

How easy was it, really?

As I mentioned near the beginning of this post, the most surprising part of implementing GoblinShare was how easy it was. The core functionality was implemented in about half a day, though a failed attempt to get GoblinShare to manage the tor daemon stretched the initial implementation out to about a day and a half. In the end, the UI and business logic of GoblinShare together require 250 lines of code including module headers, whitespace, docstrings, and inline comments (but not license headers). That's very little code! Magenc took a bit more work, but a lot of that was getting the cryptography and URL abstractions playing nicely ‒ and, admittedly, there's still room to make Magenc more approachable as a library. Still, the port took a little under a week all told, including time to implement tests, and came out to somewhere in the neighborhood of 1400 lines of code with similar caveats.

Numbers are one thing, but the simplicity of GoblinShare really comes through in the code itself. The core logic comes down to three procedures, two in the send logic (with an additional helper) and one in the receive logic. Let's break these down and walk through them to build up the relevant logic.

As a general note, the UI layers of GoblinShare and Magenc use SRFI-37 to convert commandline arguments into an association list of options and arguments. The main procedures in (goblinshare) and (magenc) simply parse the commandline into the appropriate arguments which they pass to the procedure associated with a given subcommand.

Now, on to the code walkthrough!

send

First, we'll briefly discuss the relevant helper procedure, add-sref-to-magnet-url:

(define (add-sref-to-magnet-url magnet-url sref)
  (match magnet-url
    (($ <magnet-url> xt ek es #f)
     (make-magnet-url xt ek es
                      (uri-encode
                       (ocapn-id->string sref))))))

All this procedure does is decompose the magnet URL we get from Magenc then build a new magnet URL with an additional query parameter holding the sturdyref to the Goblins actor.

Most of the important logic happens in connect-to-goblinshare-server-store, named to match the convention of Magenc's store abstraction:

(define (connect-to-goblinshare-server-store done)
  (define backend (connect-to-backend #:backend-type 'memory))

  (define (put data)
    (store-backend-data backend data))

  (define (get exact-topic)
    ;; CapTP doesn't support records so we turn exact topics into strings
    (retrieve-backend-data backend (string->exact-topic exact-topic)))

  (define (close)
    (close-backend backend)
    (signal-condition! done))

  (connect-to-store* 'goblinshare-send
                     (lambda () (values get put close))))

This procedure does a few interesting things. First, it accepts a done parameter. This is a Fibers condition so that we can wait on a remote peer to collect the shared file before exiting, which is handled elsewhere.

Next, this procedure creates a Magenc memory backend. Backends are simply data stores wrapped in a Scheme record so they can be used with an abstract interface. They provide three procedures associated with three fields of the record type: backend-get, backend-put, and backend-close, accessed using retrieve-backend-data, store-backend-data, and close-backend, respectively. Here we wrap each of the underlying backend's procedures in a new procedure so we can massage our inputs to work over OCapN.

put calls the memory backend's put procedure unmodified.

get does a bit more. Because we are using OCapN to communicate between peers, and because OCapN doesn't have a dedicated type for generic records, we convert exact topics into strings when sending messages between Goblins actors. get thus converts exact topic strings back into exact topic records. We could instead have written a marshaller to convert our record into an OCapN tagged value, but there was little reason to do so.

close calls the underlying backend's close then signals the done condition.

Finally, the last call in this procedure is to Magenc's connect-to-store* helper which constructs a store interface (similar in shape and function to a backend interface) to be passed to chunk-and-store-data. As you can see, it takes two arguments: a symbol identifying its type, and a thunk which returns three values: get, put, and close. (The higher-level connect-to-store supports keyword arguments and is used to construct Magenc's built-in backends.)

The last and likely most interesting piece of GoblinShare's send logic is the part that actually deals with Goblins. All it does is wrap a store in an actor, spawn a Tor netlayer, and return a sturdyref to the wrapper actor:

(define (spawn-client-sref store)
  (define (^client bcom store)
    (methods
     ((get id) (retrieve-data store id))
     ((put . _) (error "cannot put with client capability"))
     ((close) (close-store store))))

  (define mycapn (spawn-mycapn (spawn ^onion-netlayer)))
  (:: mycapn 'register (spawn ^client store) 'onion))

The actor is ^client. As you can see, we override the backing store's put procedure to prevent remote peers from writing unexpectedly. Otherwise, we use Magenc's store interface to manipulate the store normally.

The last two lines pack a lot of logic. First, we spawn an ^onion-netlayer representing a Tor netlayer. We immediately pass this into spawn-mycapn, creating a new ^mycapn object populated with that netlayer. Then, we spawn a new ^client, immediately passing it to our ^mycapn object's register method to get a promise to a sturdyref referencing the object. Whew! All that in only two lines!

We bring all of the send logic together in gs-send:

(define* (gs-send filename #:optional
                  (out-port (current-output-port)))
  (let* ((done (make-condition))
         (store (connect-to-goblinshare-server-store done))
         (magnet-url
          (call-with-input-file filename
            (lambda (in-port)
              (chunk-and-store-data in-port store))
            #:binary #t)))
    (with-vat (spawn-vat #:name 'goblinshare-server)
      (on (spawn-client-sref store)
          (lambda (sref)
            (format out-port "~a~%"
                    (magnet-url->string
                     (add-sref-to-magnet-url magnet-url sref))))))
    (wait done)))

As you can see, gs-send requires a filename and optionally accepts an output port where it will write the resulting magnet URL. This is to facilitate tests and is not exposed in the commandline client.

The procedure starts by creating a Fibers condition which it passes to connect-to-goblinshare-server-store to create a store.

Next, it passes the resulting store to Magenc's chunk-and-store-data with default parameters ‒ so, using AES-GCM encryption and generating a new key appropriate for that cipher ‒ and a port for reading the input file. It assigns the returned magnet URL to a new variable.

Then, gs-send spawns a vat and enters a vat context. There, it spawns a client actor to wrap the store and resolves the sturdyref to the client actor. It adds the resolved sturdyref to the magnet URL and writes out the result.

Finally, gs-send waits for the done condition to be signaled.

receive

The receive command relies on only one procedure for most of its logic:

(define (connect-to-goblinshare-client-store client-sref)
  (define vat (spawn-vat #:name 'goblinshare-client))
  (define mycapn (with-vat vat (spawn-mycapn (spawn ^onion-netlayer))))
  (define client (with-vat vat (:: mycapn 'enliven client-sref)))

  (define (put . _)
    (error "cannot put with client capability"))

  (define get
    (let ((ch (make-channel)))
      (lambda (exact-topic)
        (with-vat vat
          ;; CapTP doesn't support records so we turn exact topics into strings
          (on (<- client 'get (exact-topic->string exact-topic))
              (lambda (val)
                (syscaller-free-fiber
                 (lambda () (put-message ch `(ok ,val))))
                #t)
              #:catch
              (lambda (exn)
                (syscaller-free-fiber
                 (lambda () (put-message ch `(error ,exn))))
                #f)))
        (match (get-message ch)
          (('ok val) val)
          (('error exn) (raise-exception exn))))))

  (define (close)
    (with-vat vat
      (<-np client 'close)))

  (connect-to-store* 'goblinshare-receive
                     (lambda () (values get put close))))

There are more lines here than in connect-to-goblinshare-server-store, but the gist is simple. connect-to-goblinshare-client-store takes a sturdyref, which may be a promise thanks to promise pipelining.

It starts by spawning a vat, spawning a ^mycapn with a new ^onion-netlayer, and enlivening the client actor. These steps mirror the steps used to spawn the client sturdyref above. Rather than registering an actor to get a sturdyref, here we enliven a sturdyref to get an actor.

Next, connect-to-goblinshare-client-store wraps each method of the client actor in a regular Scheme procedure for Magenc's store interface.

put and close are quite simple. The former errors out to avoid unneeded network access should it be called. The latter sends the close message to the remote actor.

get looks complicated, but most of the code is there to resolve a promise into a concrete Scheme value. The core functionality is (<- client 'get (exact-topic->string exact-topic)). This line encodes the exact topic as a string, messages the client actor's get method with that string, and returns the resulting promise. When that promise fulfills or breaks, the surrounding logic propagates the result as normal.

Finally, connect-to-goblinshare-client-store creates a new store with these wrapper procedures using connect-to-store*.

The resulting store is used by gs-receive to get the desired file:

(define* (gs-receive magnet-url #:optional
                     (out-port (current-output-port)))
  (let* ((store
          (connect-to-goblinshare-client-store
           (string->ocapn-id
            (uri-decode
             (magnet-url-acceptable-source magnet-url)))))
         (result
          (call-with-output-bytevector
           (lambda (out-port)
             (retrieve-and-unchunk-data
              out-port store
              #:exact-topic (magnet-url-exact-topic magnet-url)
              #:key (magnet-url-encryption-key magnet-url)
              #:cipher (magnet-url-encryption-suite magnet-url))))))
    (write-bytevector result out-port)
    (close-store store)))

This procedure requires the magnet URL to retrieve, and optionally accepts an output port where it will write the resulting data. This is exposed through goblinshare receive's --output option.

First, gs-receive creates a store as discussed above, extracting the sturdyref string from the magnet URL and converting it into the appropriate Scheme type. Then, it decomposes the magnet URL into its components and passes them, along with the store and a port receiving the result, to Magenc's retrieve-and-unchunk-data. Finally, it writes the result and informs the remote store that it's done.

And that's it! All the other code in GoblinShare is for the UI.

I told you it was simple!

Final thoughts

It's worth mentioning that neither Magenc nor GoblinShare are intended as finished, production software. Notably, the underlying cryptography has not been audited. Additionally, there are some improvements I'd like to make to Magenc's API which would make GoblinShare simpler.

That's okay, though. These projects are demonstrations. Magenc is intended to demonstrate the basic concepts of capability-secure distributed, encrypted, content-addressed data storage. GoblinShare, for its part, is supposed to show how easy it is to implement otherwise-complex functionality with Goblins, which I think it does. As a rough comparison, Magic Wormhole is about 11,500 lines of Python code, counted with similar caveats as those for Magenc and GoblinShare, and relies on a much longer list of dependencies.

All told, I am incredibly happy with how things turned out. GoblinShare is beautifully simple and useful. I hope it shows others how easy Goblins makes networked applications and inspires more neat software.

Happy hacking!