Gossip Glomers

Solutions for the Distributed System Challenges from Fly.io and Kyle Kingsbury

Solutions

Challenge #1: Echo

Hello world! Just echoes back what you send it.

solution / tests

Challenge #2: Unique IDs

Generates a unique ID for each node. Inspired by MongoDB's ObjectId.

Each ID is a concatenation of:

Node ID: to avoid collisions with other nodes
Current time: to avoid collisions with messages sent at the same time
Count: to avoid collisions with messages sent at the same time and node

solution / tests

Challenge #3a: Single-Node Broadcast

A basic node that receives messages, saves them to a slice, and returns on the read message.

solution / tests

Challenge #3b: Multi-Node Broadcast

Same as the previous challenge, but now it broadcast the message to all neighbors.

solution / tests

Challenge #3c: Fault Tolerant Broadcast

Node spawns a new thread for each message sent to a neighbor, and exponentially backoff if the neighbor doesn't respond.

This solution is pretty dumb and not the most efficient one, here's other ideas of how it could be done:

Use a map to keep track of which messages are not yet acknowledged and retry them on the next messages
Use a separated thread with a ticker to periodically distribute the messages to neighbors

solution / tests

Challenge #3d: Efficient Broadcast, Part I

The node receiving the message broadcasts to all the other nodes in network with a send-and-forget approach with the gossip message type.

Results from a run:

All checks passed
  Messages per op: 11.789474/30
  Median latency: 84/400
  Maximum latency: 105/600

solution / tests

Challenge #3e: Efficient Broadcast, Part II

The previous solution also works for this challenge, but I've taken a step further and tried to make it send as less messages per operation as possible.

It's the same idea as #3d, but it spawns a new thread that broadcasts the messages every 1.5s with a buffered channel and only broadcasts if there are new messages.

Results:

All checks passed
  Messages per op: 4.728111/20
  Median latency: 791/1000
  Maximum latency: 1584/2000

solution / tests

Challenge #4: Grow-Only Counter

I've made 2 solutions for this challenge, one that uses a OT approach and another with a CRDT solution.

OT (Operational Transformation) solution uses the Sequential KV store from Maelstrom and tries to Compare and Swap the value of the counter.

solution / tests

CRDT (Conflict-free Replicated Data Type) solution uses a map to store the counter of the other nodes and propagates its own when receives an add message.

solution / tests

Links:

CRDTs: The Hard Parts by Martin Kleppmann - Great explanation about the differences between OTs and CRDTs
Conflict-free Replicated Data Types on Wikipedia

Challenge #5a: Single-Node Kafka-Style Log

Simple implementation of a Kafka-style log. Store messages in a map, and return the messages after a given offset.

solution / tests

Challenge #5b: Multi-Node Kafka-Style Log

Implementation using the Lin-KV store:

Compare and Swap to get the next offset for a key
Store the each message in a separated key
Last Write Wins to store the committed offsets

solution / tests

Challenge #5c: Efficient Kafka-Style Log

The previous solution also works for this challenge, so I tried to remove the KV completely and use a leader-follower approach.

The leader node is fixed to n0 (could be done with a consensus algorithm).

The leader handles the offsets to avoid conflicts. All the nodes exchange messages to get updated messages and committed offsets.

solution / tests

Challenge #6a: Single-Node, Totally-Available Transactions

Note: since the txn-rw-register workload doesn't really check any anomalies, and it passes even returning 0 for all keys. I've tried to follow the definitions from Jepsen. Thanks to teivah and mchernyakov solutions, I could check if I was on the right track.

A simple, totally-available transactional store, keeps a single map and lock for each read/write operation.

This one took me lots of back-and-forth because I wasn't sure of how to implement the mutexes to get the right consistency model. Even through the test specify read uncommitted, this doesn't abort G0/P0 anomalies.

Links:

Some tips on the challenge

solution / tests

Challenge #6b: Totally-Available, Read Uncommitted Transactions

This solution uses a g-counter to keep track of the transactions, ensuring that G0/P0 anomalies won't happen and to keep the last possible state of the key, and replicates each write operation to all the other nodes.

In case of network partitions, the key is replicated until it's acknowledged by the node.

solution / tests

Links:

Read Uncommitted from Jepsen docs
A Critique of ANSI SQL Isolation Levels By Berenson, Microsoft

Challenge #6c: Totally-Available, Read Committed Transactions

The solution follows the same idea as the previous one, with g-counters, but it replicates entire transactions instead of individual writes, and first writes to a separated store to then merge changes into the main store.

Like the previous solution, the txn is replicated until it's acknowledged by the node. On replication, it also merges the changes using a separated store.

solution / tests

Links:

Read Committed from Jepsen docs

Project Structure

This repo is a Go workspace where each solution is in a separate module.

Each solution is in a folder with the main.go and main_test.go for tests.

Tests

All solutions have automated tests using custom utils as a way to verify and play with the solutions, and to learn how to write tests in Go.

Running

This project uses Nix for packages and Taskfile to run and build the solutions.

To run the solutions, use the run-* tasks:

task run-all
task run-uniqueids
task run-broadcast-efficient-ii1 # Will run and check the result output

To run the tests, use the test-* tasks:

task test-all
task test-uniqueids
task test-broadcast-efficient-ii

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
broadcast		broadcast
counter		counter
echo		echo
kafka		kafka
scripts		scripts
testutils		testutils
txn		txn
uniqueids		uniqueids
utils		utils
.envrc		.envrc
.gitignore		.gitignore
README.md		README.md
flake.lock		flake.lock
flake.nix		flake.nix
go.work		go.work
go.work.sum		go.work.sum
taskfile.yml		taskfile.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gossip Glomers

Solutions

Challenge #1: Echo

Challenge #2: Unique IDs

Challenge #3a: Single-Node Broadcast

Challenge #3b: Multi-Node Broadcast

Challenge #3c: Fault Tolerant Broadcast

Challenge #3d: Efficient Broadcast, Part I

Challenge #3e: Efficient Broadcast, Part II

Challenge #4: Grow-Only Counter

Challenge #5a: Single-Node Kafka-Style Log

Challenge #5b: Multi-Node Kafka-Style Log

Challenge #5c: Efficient Kafka-Style Log

Challenge #6a: Single-Node, Totally-Available Transactions

Challenge #6b: Totally-Available, Read Uncommitted Transactions

Challenge #6c: Totally-Available, Read Committed Transactions

Project Structure

Tests

Running

About

Languages

fersilva16/gossip-glomers

Folders and files

Latest commit

History

Repository files navigation

Gossip Glomers

Solutions

Project Structure

Tests

Running

About

Topics

Resources

Stars

Watchers

Forks

Languages