Just like mailinator but with pokemons!
Prerequisites:
- git
- make
- docker
- docker-compose
git clone /~https://github.com/bugzmanov/mailiranitar.git
cd mailiranitar
make run
make run
make perftest
make clean
- Runtime & Language: scala on JVM
- Webserver: Finatra (https://twitter.github.io/finatra/)
- Caching library: Caffeine (/~https://github.com/ben-manes/caffeine)
- Performance tests: locust (https://locust.io/)
From a general perspective, we are talking about in-memory storage that potentially has low contention:
- mailboxes are independent
- assuming real-world scenario, concurrent access to a mailbox is relatively infrequent
Because we are using in-memory data, ideally we would want to avoid locking (and serial access) as much as possible. We should try to utilize optimistic locking (lock-free or wait-free) as retries would be less expensive than context switching (assuming low contention).
Three main components of the system:
- Rest API translation layer
- MailboxRegistry
- Individual mailbox
The main constraint is that I have only 7 hrs to finish the task.
The decision is to use finatra framework for this layer:
-
Finatra is relatively high on popular performance benchmarks (https://www.techempower.com/benchmarks/#section=data-r15&hw=ph&test=fortune ). Finatra is the topmost framework in the list that is:
a) mature (lindy-effect)
b) JVM based
c) easy to get going
d) high-level (using raw netty would get better performance, but would be impossible for me to finish the task in 7 hours)
-
Finatra provides basic monitoring capabilities out of the box (ideally I would prefer to build a monitoring system from scratch using micrometer library, but 7 hrs)
Shortcuts:
- I'm butchering REST API. Ideally resources needs to be packaged in proper message envelopes that can deliver errors as well as payload.
- For mail pagination I'm providing only forward links ("next")
- I'm simplifying a lot how a message can look like (omitting details like attachments, headers, etc.)
MailboxRegistry is responsible for keeping track of existing mailboxes, managing their lifecycle, and providing access to individual mailboxes. The structure that keeps track of mailboxes:
- needs to be concurrently read (access by key)
- concurrently modified (access by key)
Because we are expecting high contention accessing the registry, using a map with stripped locking seems to be a reasonable option.
Potential candidates:
- ConcurrentHashMap in jdk https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html
- Guava cache https://guava.dev/releases/21.0/api/docs/com/google/common/cache/Cache.html
- Caffeine cache /~https://github.com/ben-manes/caffeine
- ConcurrentLinkedHashmap: /~https://github.com/ben-manes/concurrentlinkedhashmap/
From a benchmarking perspective (/~https://github.com/ben-manes/caffeine/wiki/Benchmarks) caffeine has the best throughput performance. (its design proposed significant improvements upon the idea of striped locking: /~https://github.com/ben-manes/caffeine/wiki/Design )
Design decisions/shortcuts:
- Each mailbox would expire based on TTL from time of creation or Window TinyLfu in case of reaching max mailbox limit
- Statically defined the maximum allowed live mailboxes (see "Handling Memory Pressure" section)
- Expiration policy is mailbox based (in contrast to mail messages based) as it requires less bookkeeping
Mailboxes are independent of each other from an API perspective, which implies no contention for accessing different boxes.
Access patterns:
- read by key
- random write - delete by key
- read by key and then scan
- write at predefined position - add a new message
The ideal data structure for this pattern would be doubly linked list with a hash table, to keep all access patterns under amortized O(1) (or O(page size) for pagination).
Potential candidates:
LinkedHashMap- not thread-safe, do not expose underlying linked listConcurrentLinkedHashMap- do not expose underlying linked list- Build custom implementation.
I went with creating a custom data structure (mix of linkedlist and map) that can guard concurrent access via StampedLock, we are expecting a low amount of write contention and can utilize an optimistic locking approach with retries.
Two general approaches of handling memory pressure:
- Dynamically memory control
- Statically defined memory control
In dynamically control the system actively monitors itself and makes a decision to evict data when starts running out fo memory. Benefits:
- More efficient memory utilization
- Less burden of software configuration and management
Drawbacks:
- More complex
- Actively consumes CPU resources
- Potentially introduces contention for data access
In a static control - software operator manually defines limits, which are enforced in the runtime. Overall static control is expected to have better performance (and it's also possible to implement it within 7hrs)
To have full statically memory control, it sufficient to define the following limits:
- Max number of live mailboxes the system can support
- Max number of messages a single mailbox can have
- Max size of a message allowed (I did not have time to implement this constraint)
Max number of messages limit is implemented in FIFO fashion (the oldest messages got dropped first) Max number of mailboxes liit is implemented in Window TinyLfu fashion (/~https://github.com/ben-manes/caffeine/wiki/Efficiency)
/~https://github.com/bugzmanov/mailiranitar/blob/master/locust/locustfile.py
To run perf benchmark against running mailiranitar service:
locust -f locust/locustfile.py --headless -u 3000 -r 50 -H http://<host>:8888
The test is running 300 (in this case) concurrent users. Each one of them:
- Either creates a new mailbox (with 0.3 chance) or uses an existing one
- Posts 6 new messages
- Reads a random message
- Reads a page of messages
- Removes a random message
- With p=0.1 deletes the mailbox
Benchmarking results on my MacBook Pro 2013 (using docker):
Type Name # reqs 50% 66% 75% 80% 90% 95% 98% 99% 99.9% 99.99% 100%
------------------------------------------------------------------------------------------------------------------------------------------------------
POST /mailboxes 1402 5 6 7 8 10 14 20 27 66 67 67
DELETE delete message 4139 5 6 7 8 10 13 17 22 51 65 65
POST post message 26376 5 6 7 8 11 14 19 25 60 88 99
GET read message 4196 5 6 7 8 10 13 18 23 64 74 74
GET read messages by page 4244 5 6 7 8 10 13 17 20 53 68 68
DELETE remove mailbox 381 5 6 7 8 10 13 16 19 25 25 25
------------------------------------------------------------------------------------------------------------------------------------------------------
This looks surprisingly slow, to be honest. I would need some time to look into it
HTTP API |
Description |
---|---|
POST /mailboxes |
Create a new, random email address |
POST /mailboxes/{email_address}/messages |
Create a new message for a specific email address |
GET /mailboxes/{email_address}/messages |
Retrieve an index of messages sent to an email address, including sender, subject, and id, in recency order. Support cursor-based pagination through the index |
GET /mailboxes/{email_address}/messages/{message_id} |
Retrieve a specific message by id |
DELETE /mailboxes/{email_address} |
Delete a specific email address and any associated messages |
DELETE /mailboxes/{email_address}/messages/{message_id} |
Delete a specific message by id |
[x] Design
[x] Core
[x] Rest api
[ ] Monitoring (via finatra /admin)