GoblinShare: Secure, Peer-to-Peer File-Sharing with Goblins
Juliana Sims —This year, instead of participating in the Autumn Lisp Game Jam, Spritely decided to make some non-game demos to show off our tech. To that end, I spent the week of November 3rd writing GoblinShare, a secure, peer-to-peer file-sharing utility using the Guile port of Magenc for storage and distribution, relying on Goblins for the peer-to-peer connection abstraction, and delivered over Tor. Thanks to Goblins, this turned out to be super easy to implement. Let me show you how!
🎶️ Demos; glorious demos! 🎶️
We at Spritely absolutely love our technology demos. We know it can be tricky to understand an unfamiliar paradigm, but we also think hands-on demos help. To that end, we build lots of usable demos to try. We particularly like building games during the Lisp Game Jam so we can connect with the broader Lisp community, encourage more developers to use our technology, and end up with something fun that people want to use. After all, we can't build the future of the social web all by ourselves! This year, though, we decided to focus not on showing off work we've already done, but pushing our work in a new direction; and it felt best to do that outside the context of the jam.
My assignment (which I did choose to accept) was tripartite: port Magenc to Guile, implement a simple file-sharing tool (which I decided to somewhat model on Magic Wormhole), and port Crystal to Guile. The Crystal work remains ahead, but I've ported Magenc and built GoblinShare as the file-sharing tool. I treated the Magenc port as prep work and GoblinShare proper as the equivalent of a jam entry, so that will be the focus of this post. First, though, I'll provide an overview of these two projects, what they do, and how they do it.
Magenc
Magenc is an encrypted, distributed, content-addressed data store relying on
magnet URLs as capabilities, inspired by
Tahoe-LFS. It does not use Goblins,
but it is built with capability security concepts in mind. It consists of three
components, each functioning as a subcommand under a single program: magenc serve, which starts a server; magenc put, which encrypts and POSTs files to
a server; and magenc get, which GETs files from a server. (Although the
architecture is designed to support arbitrary remote stores, only a web store
using HTTP has been implemented so far.)
A quick summary
The core of Magenc's functionality is the magnet URL, which looks like
magnet://?xt=<url-encoded-urn>&ek=<base64-url-encoded-string>&es=<string> and
uniquely identifies a specific file. Decomposing the query parameters
(everything after ?), xt refers to the exact topic identifying a specific
file, which is simply the sha256 hash of the encrypted manifest (which may be a
raw object as discussed below). ek is the encryption key used to encrypt the
file. es is the encryption suite used for encryption. The Guile port of
Magenc uses AES in Galois/Counter Mode (AES-GCM) for encryption; the Racket
version uses AES in Counter Mode (AES-CTR). Of this information, only the exact
topic is ever known by the server. Together, the information in a magnet URL is
both all that is needed and everything that is required to identify, access, and
decrypt a file on a given server. This makes a magnet URL a capability.
The server only has access to encrypted binary data a client sends. When the
server receives a POST request with attached binary data, it hashes the
provided data using sha256, encodes the unhashed data using base64, and stores
the encoded data in a key-value store where the hash ‒ the exact topic ‒ is
the key. It then converts the exact topic into a URI object from Guile's (web uri) module before sending the URI back in the content-location field of the
response object. As a safety redundancy, the client checks to make sure that
the exact topic it gets back is what it expects. When the server receives a
GET request with an exact topic in the content-location field, it looks up
the associated data, decodes the base64 into binary data, and sends that back as
the body of a response. That's all the server knows and does; everything else
is handled by the client.
magenc put handles the encryption and generation of cryptographic inputs.
When passed a file (and, optionally, server URL), Magenc first chunks the file
as necessary, encrypting each chunk to transmit separately so that the server
does not necessarily know the chunks are related (though correlation of
connections and network traffic by a malicious server or observer could be used
to reliably guess interrelation). Then, the client prints a magnet URL which
identifies the entrypoint to retrieve the file ‒ a "raw" object if the file fits
in one chunk or a "manifest" if it doesn't. As discussed above, the magnet URL
also embeds cryptographic information.
A file's magnet URL can be used with magenc get to retrieve that file. When
passed a magnet URL (and, if necessary, server URL), magenc get first
retrieves and decrypts the object identified by the exact topic. If the object
is a "raw" object, Magenc extracts the binary data, decrypts it, and writes it
to either standard output or a given file. If the object is a "manifest"
object, Magenc extracts the exact topics for each chunk from the manifest,
retrieves and decrypts each chunk in order, then writes out the complete file in
the same way.
The original version of Magenc includes a more thorough write-up. Aside from the differing encryption suites and commandline interfaces, that write-up also holds for the Guile version.
A simple example
Some things are easier to understand in action, so let's walk through a
trivially simple example of storing and retrieving a file using Magenc. We will
skip build instructions, which are provided in the
repository. For ease of demonstration,
we will use the default server configuration which launches a process listening
at http://127.0.0.1:8118 (http://localhost:8118) and stores data in memory
rather than writing it to disk ‒ Magenc does include a backend relying on
Goblins' bloblin persistence
store
to efficiently store files on disk. We will use what is called convergent
encryption, which uses part of the unencrypted file itself as a cryptographic
input, to ensure that our example file produces the same magnet URL every time
it is stored. (If you are reading this in the future and Magenc's cryptography
has changed, the magnet URLs may no longer match.)
First, let's start a server:
magenc serve
This should print the address where the server is listening, like:
Server running at: http://127.0.0.1:8118
We can leave that running in one terminal session and use a different one for everything else. Now let's create our example file:
echo "Hello! I'm an example file!" > example.txt
Next, we'll store the file with magenc put example.txt --convergent. This
prints the magnet URL
magnet:?xt=urn%3Asha256d%3A4a2TJXrPx83v1DGnOJyPa5b678AkVsPaplsx_LcT06I&ek=_TrAfpNRRLQb7gutF8KKMtj-tPWk8_AapsJQgu6sDeo&es=AES-256-GCM.
(Note that argument ordering is relevant; to simplify the implementation of the
CLI, Magenc expects option arguments ‒ anything starting with - or -- ‒
after positional arguments.) Finally, we can retrieve the file:
magenc get "magnet:?xt=urn%3Asha256d%3A4a2TJXrPx83v1DGnOJyPa5b678AkVsPaplsx_LcT06I&ek=_TrAfpNRRLQb7gutF8KKMtj-tPWk8_AapsJQgu6sDeo&es=AES-256-GCM"
Make sure to quote the magnet URL otherwise the shell will interpret & as a
command. This command produces the output:
Hello! I'm an example file!
And that's it! Simple! There are a few more options available for the various
subcommands, each explained with magenc --help.
GoblinShare
In addition to being an application, Magenc is also a library. GoblinShare is
implemented using Magenc in this capacity. All it adds is a purpose-built UI
and a wrapper around the in-memory store backend which makes it easier to use
through a Goblins actor. The sending peer launches a Tor
netlayer,
generates a sturdyref ‒ a persistent object reference which can be shared
out-of-band ‒ for the store actor, and adds that to the magnet URL with the
additional acceptable source (as) field. The receiving peer sets up its own
netlayer, "enlivens" the sturdyref, and downloads the file. Once the file is
retrieved, the sending peer terminates, removing the associated data from
memory.
An example
We will perform essentially the same tasks for this example as we did for the Magenc example. There are a few differences to note. First, GoblinShare does not provide an option for convergent encryption. This simply wouldn't make sense for an ephemeral file-sharing tool. For our present purposes, this means that you will almost certainly get a different magnet URL than is produced here. Second, it is necessary to manually run the tor daemon. Build and usage instructions in the GoblinShare repository cover that so the following example assumes a running daemon.
We will reuse the same example file as above, so feel free to reuse the previous command to create it. Then, all we have to do is send it:
goblinshare send example.txt
to get the magnet URL. The sending process will wait for the file to be retrieved, so switch to another terminal session to retrieve the file:
goblinshare receive "magnet://?xt=...&ek=...&es=...&as=..."
Because we're using Tor as our network layer, it can take a few seconds for the connection to be established and the data to be sent, even though we are connecting to a local server. Since our example file is small enough to fit into a single chunk, the delay isn't very long, but a larger file can take quite a while to transfer. Eventually, you will receive the expected output:
Hello! I'm an example file!
At this point, both the sending and receiving processes will terminate, and
that's that! You've successfully shared a file with Goblins! Just like Magenc,
GoblinShare has a (very) few options you can see with goblinshare --help.
Unlike Magenc, option arguments can be supplied before or after positional
arguments as long as they follow the associated subcommand.
A note on Magic Wormhole
I mentioned that I took initial inspiration for this project from Magic
Wormhole. In practice,
though, the only similarity wound up being the names of the subcommands send
and receive. Magic Wormhole relies on a central relay server because it uses
"wormhole codes" to facilitate oral communication of the relevant capabilities.
These codes seem to be keys mapping to a fuller capability and therefore
requiring a coordination point. GoblinShare, by contrast, chooses to instead
provide less-human-friendly magnet URLs which allow fully peer-to-peer file
transfer because they encode all necessary information.
How easy was it, really?
As I mentioned near the beginning of this post, the most surprising part of implementing GoblinShare was how easy it was. The core functionality was implemented in about half a day, though a failed attempt to get GoblinShare to manage the tor daemon stretched the initial implementation out to about a day and a half. In the end, the UI and business logic of GoblinShare together require 250 lines of code including module headers, whitespace, docstrings, and inline comments (but not license headers). That's very little code! Magenc took a bit more work, but a lot of that was getting the cryptography and URL abstractions playing nicely ‒ and, admittedly, there's still room to make Magenc more approachable as a library. Still, the port took a little under a week all told, including time to implement tests, and came out to somewhere in the neighborhood of 1400 lines of code with similar caveats.
Numbers are one thing, but the simplicity of GoblinShare really comes through in
the code itself. The core logic comes down to three procedures, two in the
send logic (with an additional helper) and one in the receive logic. Let's
break these down and walk through them to build up the relevant logic.
As a general note, the UI layers of GoblinShare and Magenc use
SRFI-37 to convert commandline
arguments into an association list of options and arguments. The main
procedures in (goblinshare) and (magenc) simply parse the commandline into
the appropriate arguments which they pass to the procedure associated with a
given subcommand.
Now, on to the code walkthrough!
send
First, we'll briefly discuss the relevant helper procedure, add-sref-to-magnet-url:
(define (add-sref-to-magnet-url magnet-url sref)
(match magnet-url
(($ <magnet-url> xt ek es #f)
(make-magnet-url xt ek es
(uri-encode
(ocapn-id->string sref))))))
All this procedure does is decompose the magnet URL we get from Magenc then build a new magnet URL with an additional query parameter holding the sturdyref to the Goblins actor.
Most of the important logic happens in connect-to-goblinshare-server-store,
named to match the convention of Magenc's store abstraction:
(define (connect-to-goblinshare-server-store done)
(define backend (connect-to-backend #:backend-type 'memory))
(define (put data)
(store-backend-data backend data))
(define (get exact-topic)
;; CapTP doesn't support records so we turn exact topics into strings
(retrieve-backend-data backend (string->exact-topic exact-topic)))
(define (close)
(close-backend backend)
(signal-condition! done))
(connect-to-store* 'goblinshare-send
(lambda () (values get put close))))
This procedure does a few interesting things. First, it accepts a done
parameter. This is a Fibers condition so
that we can wait on a remote peer to collect the shared file before exiting,
which is handled elsewhere.
Next, this procedure creates a Magenc memory backend. Backends are simply data
stores wrapped in a Scheme record so they can be used with an abstract
interface. They provide three procedures associated with three fields of the
record type: backend-get, backend-put, and backend-close, accessed using
retrieve-backend-data, store-backend-data, and close-backend,
respectively. Here we wrap each of the underlying backend's procedures in a new
procedure so we can massage our inputs to work over OCapN.
put calls the memory backend's put procedure unmodified.
get does a bit more. Because we are using OCapN to communicate between peers,
and because OCapN doesn't have a dedicated type for generic records, we convert
exact topics into strings when sending messages between Goblins actors. get
thus converts exact topic strings back into exact topic records. We could
instead have written a marshaller to convert our record into an OCapN tagged
value, but there was little reason to do so.
close calls the underlying backend's close then signals the done
condition.
Finally, the last call in this procedure is to Magenc's connect-to-store*
helper which constructs a store interface (similar in shape and function to a
backend interface) to be passed to chunk-and-store-data. As you can see, it
takes two arguments: a symbol identifying its type, and a thunk which returns
three values: get, put, and close. (The higher-level connect-to-store
supports keyword arguments and is used to construct Magenc's built-in backends.)
The last and likely most interesting piece of GoblinShare's send logic is the part that actually deals with Goblins. All it does is wrap a store in an actor, spawn a Tor netlayer, and return a sturdyref to the wrapper actor:
(define (spawn-client-sref store)
(define (^client bcom store)
(methods
((get id) (retrieve-data store id))
((put . _) (error "cannot put with client capability"))
((close) (close-store store))))
(define mycapn (spawn-mycapn (spawn ^onion-netlayer)))
(:: mycapn 'register (spawn ^client store) 'onion))
The actor is ^client. As you can see, we override the backing store's put
procedure to prevent remote peers from writing unexpectedly. Otherwise, we use
Magenc's store interface to manipulate the store normally.
The last two lines pack a lot of logic. First, we spawn an ^onion-netlayer
representing a Tor netlayer. We immediately pass this into spawn-mycapn,
creating a new ^mycapn object populated with that netlayer. Then, we spawn a
new ^client, immediately passing it to our ^mycapn object's register
method to get a promise to a sturdyref referencing the object. Whew! All that
in only two lines!
We bring all of the send logic together in gs-send:
(define* (gs-send filename #:optional
(out-port (current-output-port)))
(let* ((done (make-condition))
(store (connect-to-goblinshare-server-store done))
(magnet-url
(call-with-input-file filename
(lambda (in-port)
(chunk-and-store-data in-port store))
#:binary #t)))
(with-vat (spawn-vat #:name 'goblinshare-server)
(on (spawn-client-sref store)
(lambda (sref)
(format out-port "~a~%"
(magnet-url->string
(add-sref-to-magnet-url magnet-url sref))))))
(wait done)))
As you can see, gs-send requires a filename and optionally accepts an output
port where it will write the resulting magnet URL. This is to facilitate tests
and is not exposed in the commandline client.
The procedure starts by creating a Fibers condition which it passes to
connect-to-goblinshare-server-store to create a store.
Next, it passes the resulting store to Magenc's chunk-and-store-data with
default parameters ‒ so, using AES-GCM encryption and generating a new key
appropriate for that cipher ‒ and a port for reading the input file. It
assigns the returned magnet URL to a new variable.
Then, gs-send spawns a vat and enters a vat context. There, it spawns a
client actor to wrap the store and resolves the sturdyref to the client actor.
It adds the resolved sturdyref to the magnet URL and writes out the result.
Finally, gs-send waits for the done condition to be signaled.
receive
The receive command relies on only one procedure for most of its logic:
(define (connect-to-goblinshare-client-store client-sref)
(define vat (spawn-vat #:name 'goblinshare-client))
(define mycapn (with-vat vat (spawn-mycapn (spawn ^onion-netlayer))))
(define client (with-vat vat (:: mycapn 'enliven client-sref)))
(define (put . _)
(error "cannot put with client capability"))
(define get
(let ((ch (make-channel)))
(lambda (exact-topic)
(with-vat vat
;; CapTP doesn't support records so we turn exact topics into strings
(on (<- client 'get (exact-topic->string exact-topic))
(lambda (val)
(syscaller-free-fiber
(lambda () (put-message ch `(ok ,val))))
#t)
#:catch
(lambda (exn)
(syscaller-free-fiber
(lambda () (put-message ch `(error ,exn))))
#f)))
(match (get-message ch)
(('ok val) val)
(('error exn) (raise-exception exn))))))
(define (close)
(with-vat vat
(<-np client 'close)))
(connect-to-store* 'goblinshare-receive
(lambda () (values get put close))))
There are more lines here than in connect-to-goblinshare-server-store, but the
gist is simple. connect-to-goblinshare-client-store takes a sturdyref, which
may be a promise thanks to promise pipelining.
It starts by spawning a vat, spawning a ^mycapn with a new ^onion-netlayer,
and enlivening the client actor. These steps mirror the steps used to spawn the
client sturdyref above. Rather than registering an actor to get a sturdyref,
here we enliven a sturdyref to get an actor.
Next, connect-to-goblinshare-client-store wraps each method of the client
actor in a regular Scheme procedure for Magenc's store interface.
put and close are quite simple. The former errors out to avoid unneeded
network access should it be called. The latter sends the close message to the
remote actor.
get looks complicated, but most of the code is there to resolve a promise into
a concrete Scheme value. The core functionality is (<- client 'get (exact-topic->string exact-topic)). This line encodes the exact topic as a
string, messages the client actor's get method with that string, and returns
the resulting promise. When that promise fulfills or breaks, the surrounding
logic propagates the result as normal.
Finally, connect-to-goblinshare-client-store creates a new store with these
wrapper procedures using connect-to-store*.
The resulting store is used by gs-receive to get the desired file:
(define* (gs-receive magnet-url #:optional
(out-port (current-output-port)))
(let* ((store
(connect-to-goblinshare-client-store
(string->ocapn-id
(uri-decode
(magnet-url-acceptable-source magnet-url)))))
(result
(call-with-output-bytevector
(lambda (out-port)
(retrieve-and-unchunk-data
out-port store
#:exact-topic (magnet-url-exact-topic magnet-url)
#:key (magnet-url-encryption-key magnet-url)
#:cipher (magnet-url-encryption-suite magnet-url))))))
(write-bytevector result out-port)
(close-store store)))
This procedure requires the magnet URL to retrieve, and optionally accepts an
output port where it will write the resulting data. This is exposed through
goblinshare receive's --output option.
First, gs-receive creates a store as discussed above, extracting the sturdyref
string from the magnet URL and converting it into the appropriate Scheme type.
Then, it decomposes the magnet URL into its components and passes them, along
with the store and a port receiving the result, to Magenc's
retrieve-and-unchunk-data. Finally, it writes the result and informs the
remote store that it's done.
And that's it! All the other code in GoblinShare is for the UI.
I told you it was simple!
Final thoughts
It's worth mentioning that neither Magenc nor GoblinShare are intended as finished, production software. Notably, the underlying cryptography has not been audited. Additionally, there are some improvements I'd like to make to Magenc's API which would make GoblinShare simpler.
That's okay, though. These projects are demonstrations. Magenc is intended to demonstrate the basic concepts of capability-secure distributed, encrypted, content-addressed data storage. GoblinShare, for its part, is supposed to show how easy it is to implement otherwise-complex functionality with Goblins, which I think it does. As a rough comparison, Magic Wormhole is about 11,500 lines of Python code, counted with similar caveats as those for Magenc and GoblinShare, and relies on a much longer list of dependencies.
All told, I am incredibly happy with how things turned out. GoblinShare is beautifully simple and useful. I hope it shows others how easy Goblins makes networked applications and inspires more neat software.
Happy hacking!