Composing capability security and conflict-free replicated data types
Dave Thompson —![]()
In August, I attended the DWeb Seminar where a small group of builders gathered to discuss the state-of-the-art and open problems in the distributed web space. Some in the group are primarily concerned with distributed data and focus on sync algorithms and local-first use cases. I am mainly concerned with distributed behavior and focus on the object capability security model. Both areas of study are steeped in their own lore and research papers, which makes it difficult for the two camps to communicate effectively with each other.
It is in the interest of unity that I write this blog post. I will show how distributed behavior and data techniques can be composed to build local-first applications that combine the strengths of each paradigm.
Fortunately, the distinction between behavior and data is a false dichotomy; they are two sides of the same coin. This circular relationship is well understood in the Lisp world where we say that “code is data and data is code” in reference to Lisp’s homoiconic syntax.
Messages are both behavior and data. They invoke behavior but are also encoded as a string of bytes and sent across the wire. Our context within the tower of abstraction determines how we look at a message. What is treated as data at one abstraction level may be treated as behavior in another. We need to be equipped to handle both cases.
I’ve crystalized all that I’ve learned recently into a small prototype that combines the following techniques:
-
On the behavior side, object capabilities (ocaps) and the actor model, both of which are common subjects on this blog.
-
On the data side, conflict-free replicated data types (CRDTs) and authorization/certificate capabilities (zcaps).
Local-first chat again
![]()
I started down the well-trodden path of making a local-first group chat application. Seemingly every other DWeb-adjacent project has one, after all, so why shouldn’t Spritely? I then branched off and went down my own trail, trying to compose tools in ways I hadn’t quite seen before in this context. The result is Brassica Chat!
Brassica Chat is written in Scheme, is built upon Goblins, our distributed programming environment, and uses Hoot to compile it all to WebAssembly so it can be used on the web. Besides just posting messages, it also supports some of the usual features like editing/removing messages and emoji reacts. 🚀
Demo time!
Below is an embedded and simplified demo of Brassica Chat that simulates a conversation between Alice, Bob, and Carol. Messages are sent over a fake network and each user’s network access can be toggled with a button to simulate network partitions and offline usage. Alice is the chat room creator and has the privilege to edit/remove any post. Bob and Carol can only edit/remove their own posts. Okay, hopefully that’s enough context. Try it out!
If you’d like more screen real estate, try this demo on its own dedicated web page. Check out the source code if you’d like.
High-level design
![]()
Let’s examine the scenario modeled in the demo more closely. First, Alice creates a new chat room on her computer. She then shares a capability with her friend Bob and another (distinct) capability with Carol that grants them the privilege to send messages to her chat room. Bob and Carol reciprocate by giving Alice capabilities to their respective chat room copies. The resulting network is shown in the diagram above.
Note that Bob and Carol are not directly connected to each other but rather indirectly connected through Alice. This is because Bob and Carol did not exchange capabilities with each other. This is okay! They can all still chat with each other in real time as long as Alice is online. When Alice goes offline, Bob and Carol can still send messages locally. Everything done while in offline mode will be synchronized once Bob and Carol can connect to Alice again. Perhaps Bob and Carol will exchange capabilities with each other later so they can still chat in real time when Alice is offline. The important detail is that Brassica Chat does not try to wire everyone together directly without the active consent of its users.
Each user in the system has a cryptographic identity in the form of a public/private key pair. This key is used for signing messages. In addition to the key, an identity also contains a human-readable, self-proposed name for displaying in the user interface.
Each chat room is an eventually-consistent replica of the distributed chat room state managed using a collection of CRDTs. Chat rooms can propagate locally created or remotely received messages to other replicas of the chat room for which it holds a capability. The replication process works to eventually achieve convergence across all reachable replicas.
At a meta level, these replicas can be thought of as forming a single, conceptual chat room actor. To use some ocap jargon, the chat room is an unum where each presence (replica) communicates by broadcasting messages to the other presences it knows about. In the diagram above, there’s a dotted line drawn around the three replicas to indicate that the chat room is an abstract entity whose canonical form does not live on any single machine. The presences are all co-equal; no single presence has more privilege than any other.
The stack
![]()
There are four levels of abstraction in the Brassica Chat architecture. From bottom to top, they are:
- Object capabilities: online access control through reference passing.
- Actors: online, asynchronous messaging through object references.
- CRDTs: eventually consistent, offline messaging.
- Authorization capabilities: offline access control through certificate chains.
All objects in the application are represented as actors, including CRDTs. Implementing CRDTs as actors has been done elsewhere, Akka being a notable example.
A reference to an actor is an object capability. In other words, holding a reference to an actor gives you the authority to send messages to it. An actor needs to be online in order to receive messages, however. For offline usage, an object capability variant known as an authorization or certificate capability is used, as well.
Messages are sent between machines using the Object Capability Network (OCapN) protocol, which handles the burden of secure message transport. Messages can be transported over any medium with an associated OCapN netlayer. For this prototype, I used a WebSocket netlayer with a relay in the middle. The CRDT implementation has its own messaging protocol which is defined using actors so that it automatically works over OCapN.
On capabilities
Brassica Chat’s use of capabilities stands in contrast to most existing local-first applications that use the access-control list (ACL) model. In the ACL model, users are associated with groups or roles that grant privileges. When compared to capabilities, the ACL model has many deficiencies:
-
ACLs are too coarse-grained. It’s difficult to follow the principle of least authority with a limited set of role-based privilege levels so the norm is for users to have more privilege than is necessary. By contrast, capabilities can be arbitrarily fine-grained. Want to make it so that Bob can only moderate Carol’s posts and not Alice’s? It’s easy and natural to make a capability for this but awkward to define a one-off ACL role.
-
ACLs can’t be safely delegated. Only an administrator may grant or revoke privileges. As a non-admin, your only option is to share your credentials, which is unsafe and hard to audit. Credential sharing happens often in the real world due to the friction involved in doing things “the right way”. With capabilities, it is easy to delegate a subset of your authority to someone else in an auditable, revokable manner without sharing your own credentials or communicating with a central authority.
-
Most importantly, ACLs have inherent vulnerabilities, such as the confused deputy problem. The “if you don’t have it, you can’t use it” approach of capabilities avoids an entire class of security bugs.
In short, capabilities are safer, more expressive, and more decentralized than ACLs. Now, let’s move on to some implementation details.
The chat room actor
![]()
The chat room actor is implemented as a composition of several CRDTs. Rather than using one giant CRDT for the entirety of a chat room’s history, it is partitioned by time into a set of chat log CRDT actors. Each partition covers some uniform number of seconds of real time known as the “period”. This means that all presences must use the same period value in order to converge properly (30 minutes was chosen as a reasonable default). The benefit of this partitioning strategy is that it allows each replica to perform garbage collection (GC) on entire chunks of history without coordinating with the other replicas (GC within a CRDT requires coordination). This ought to keep the append-only log for any individual chunk of history quite small and manageable. Rebuilding the state of a previously deleted chunk from scratch shouldn’t take much time, assuming there is another replica online with the data. For this prototype I didn’t bother to GC old message history as the chat rooms are ephemeral and not persisted to disk (but we could use Goblins’ persistence API to do so in the future).
In addition to the message log partitions, there are two additional CRDT actors that make up the chat room: profiles and certificates. The profiles CRDT contains a mapping from a user’s public key to their self-proposed display name (and could later be extended to include other metadata that a user would like to share with the room). The certificates CRDT contains the set of all zcaps that have been issued for the chat room.
The CRDT actors
![]()
CRDTs can be roughly divided into two categories: state-based or operation-based. Brassica Chat uses operation-based CRDTs, which can be thought of like a Git repository with automatic conflict resolution. Each replica of an operation-based CRDT maintains an event log containing all of the operations that have occurred. Due to concurrency in distributed systems, an event may have one or more direct causal predecessors (a fancy term for “parents”). Thus, the log entries form an append-only, directed acyclic graph (DAG), as shown in the diagram above.
An event has the following immutable fields:
- ID: Unique ID of the event (SHA-256 hash).
- Parent IDs: IDs of all causal predecessors (forming a DAG).
- Timestamp: Timestamp from a hybrid logical clock indicating when the event occurred.
- Author: Creator of the event (ed25519 public key).
- Signature: Crytographic signature of the event.
- Blob: Syrup encoded event data (Syrup is the binary serialization format used by OCapN).
Events are delivered in causal order, meaning that an event is not applied to the CRDT’s internal state until all of its predecessor events have been applied. Concurrent events may be applied in any order, so it’s important that operations on the CRDT state are commutative. Despite causal order being encoded in the event graph, a logical timestamp is included in each event. This is important for handling concurrent events and is used to implement common CRDT patterns like the “last write wins” register.
Brassica Chat contains a generic operation-based CRDT actor with
prepare, effect and query hooks (straight out of the CRDT
literature) for special-purpose CRDTs to implement. The CRDT actor is
used as the basis for the chat log, certificates, and profiles actors.
This CRDT implementation, though on the simple side, is Byzantine fault tolerant. A Byzantine fault is best explained by the following scenario: Mallet, a user who is up to no good, sends Alice and Bob an event with the same ID but different contents. When Alice and Bob sync data with each other, they ignore events with IDs that they already have and don’t realize that Mallet has tricked them. The result is that Alice and Bob will never converge to the correct state because their message logs contain different operations.
Divergence due to Byzantine behavior is prevented through content-addressing and cryptographic signing of events, much like Git, as described in Martin Kleppmann’s “Making CRDTs Byzantine Fault Tolerant” paper. Mallet cannot send Alice and Bob events with the same ID but different contents because the ID is the hash of the contents and if the hash doesn’t match then the event is rejected. Events are signed to associate them with the author for use with the authorization capability system and the parent IDs are incorporated into the signature to prevent replay attacks. For this prototype, SHA-256 was chosen for the hash function and ed25519 for signatures.
Any number of Byzantine replicas may be in the network, but as long as Alice and Bob can directly connect to each other, or indirectly connect through a non-Byzantine node such as Carol, the well-behaved nodes will eventually converge to the correct state. While not implemented in this prototype, detection of Byzantine behavior from a replica could be used as the basis for revoking the object capability being used to send such messages, adding a layer of accountability to the system.
Authorization capabilities
![]()
With CRDTs in the mix as an offline messaging layer, object
capabilities alone are insufficient for access control. The ocap
layer controls access to synchronize chat messages between two
replicas but it does not (and cannot) control what those messages
contain. Why is that? Because the chat messages are at a higher
level of abstraction than the actor messages for which the ocaps
apply. When Bob writes the message (react alice-message-1 "👋") to
his local replica, he is sending a message to the abstract chat room
that doesn’t exist in any single location. What if Alice wanted to
prevent Bob from reacting to messages? Who even has the authority to
impose that restriction when there’s no central server? We’ve traded
away strong consistency to support local-first usage, so there is no
way for an adminstrator to install an ocap on all replicas such that
they are all guaranteed to reject this message from Bob and converge
to the same state. Ocaps are online capabilities, but CRDTs use
offline messaging. We need an offline capability that can be used
to process the offline messages.
This is where authorization capabilities (zcaps) come in. A zcap is a signed certificate that describes what actions a controller of that certificate may perform. Like ocaps, zcaps support delegation which is represented as a chain of signed certificates. A crucial property of a delegated zcap is that it cannot expand privilege, only reduce it. Certificate chains need to bottom out somewhere, so we need to decide upon a root signer. In Brassica Chat, the initiator of the chat room (Alice in our example scenario) is considered to be the root signer for all zcaps used in the chat room. This is just a convention, though, and a user could decide to place their trust in a different root signer.
Certificates in Brassica Chat are inspired by ZCAP-LD and are composed of the following immutable fields:
- ID: Unique ID of the certificate (SHA-256 hash).
- Parent ID: ID of the previous certificate in the delegation chain.
- Signer: The public key used to sign the certificate. The signer must be a controller of the parent certificate to be considered valid.
- Controllers: A list of public keys for the users who are allowed to invoke the capabilities of this certificate.
- Predicate: An expression that constrains (or attenuates, to
use the ocap term) the capabilities granted by the parent
certificate. For example, the expression
(when-op (edit delete) (allow-self))says thateditanddeleteoperations can only be used on posts authored by the user invoking the capability (one of the controllers).
Certificates also carry one piece of mutable state: a flag that the signer can flip from false to true to revoke the certificate. Revocation cannot be reversed, making this a trivially monotonic operation within the certificates CRDT.
At first glance, zcaps might appear to have the same problem as ocaps: a zcap cannot prevent Bob from sending a message that is not permitted because there’s no strong consistency. Instead, zcaps specify the rules by which well-behaved clients should interpret the events that have occurred. For example, Bob can send a message that edits the contents of Carol’s post, but if the zcap Bob used for that operation does not grant the capability to edit posts authored by Carol then that edit will simply be ignored when updating the chat room state on a given replica. Since zcaps are encoded as certificate documents, they can be synced amongst all replicas so that the user interface can eventually render the correct view of the chat room. This is a good example of something treated as data at one level of abstraction but behavior at another.
Security considerations
The security implications of sharing a capability to a chat room are rather large. If Alice, Bob, and Carol have replicas of the same chat room then sending Alice a message means indirectly sending Bob and Carol messages, too. Each presence of the chat room is co-equal with all other presences, after all. As a consequence, we cannot perform administration in a centralized manner like we could if there was a single canonical chat room actor living on a single machine. Revocation, for example, is now a communal effort. If Mallet can propagate messages through Bob and Carol (because Mallet holds a capability to both) then Bob and Carol must each revoke their respective capabilities in order to prevent Mallet from sending messages to the chat room in the future. While it’s possible to create a zcap that would cause Mallet’s messages to be ignored by clients, it doesn’t change the fundamental truth that Mallet has the capability to send messages to the chat room until such a time that all previously issued ocaps have been revoked. The formation of complete networks, where each replica holds a capability to sync with every other replica, is thus discouraged in this design. The connectedness of a replica is a function of how trusted the user of that replica is in the real world social group. The more strongly connected a user is, the harder it becomes to remove them later if the social dynamic changes. There is a tension between the risk imposed by a strongly connected network and the desire to maximize availability of the chat room for online users.
The overall security goal for this prototype was to prevent Mallet from irreparably destroying the shared state of the chat room, which was achieved through Byzantine fault tolerance. Additionally, message signing and zcaps provide a means of holding Mallet accountable for anti-social/malicious actions that the system is technically incapable of preventing, giving users some agency over what they see in their client interface. Is this good enough?
Things left undone
This prototype was focused on exploring the core of a minimally viable p2p chat built on capability security principles. It is not production software. I did not concern myself with optimal bandwidth nor memory usage. As mentioned earlier, chat history is not even saved to disk.
Some areas for improvement are:
-
Decentralized identity and naming. This was deliberately left out to keep the scope of this experiment manageable. Spritely has another project, codenamed Brux, to explore this topic. See also our paper on petnames.
-
Ergonomic UI/UX for the complexity introduced by decentralization and eventual consistency. What’s a user-friendly way to add and revoke ocaps and zcaps? The UI doesn’t even attempt to allow viewing or editing zcaps right now. How can we clearly communicate what the security properties are/aren’t so that users don’t get false impressions?
-
History rewriting. If Mallet writes some truly terrible content to the append-only chat log, it’s stuck in there even if it’s hidden in the user interface. Introducing some amount of synchronization to deal with this scenario seems okay. We could take inspiration from Git where the commit graph is append-only but branch names are mutable pointers.
-
Preventing new members from reading past messages like in Signal groups. This should be an option like it is in other secure chat programs, but it’s a complex topic and exploring it was out of scope.
Conclusion
![]()
I hope this was an interesting walkthrough of how ocaps, actors, CRDTs, and zcaps can be composed with each other! Big thanks to the DWeb Seminar organizers for providing the spark of inspiration I needed to dive into the CRDT literature and build this prototype.