-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mempool Collector Stats + Discussions #1
Comments
indeed, disk usage reports the actually used disk space by the file. since the filesystem stored data in blocks, then a non-sparse file will occupy |
Some stats after creating a summary file of 1,423,508 transactions in JSON and Parquet format:
|
Perhaps the individual tx JSON file should also not contain the signature, to save 20-40% of the storage space, and because the signature is part of the rawTx anyway 🤔 |
everything is part of the rawtx, apart from timestamp and chainId |
It's still convenient for the summarizer service not needing to parse every single rawTx and extracting the fields, although that doesn't seem too much to ask either. I'm still undecided whether it's preferable to have the collector store some fields, or only store rawTx + timestamp (leaning towards only rawTx+timestamp, and batched+gzipped) |
Some early stats about transactions collected and stored with the mempool-collector:
Hourly stats:
<timestampMillis>,<hash>,<rawTx>
Extrapolated to a day:
Note 2023-08-07: The stats below are outdated as they are based on the test storage method of one JSON file per transaction. Storage has now been updated to write into one CSV file per hour, which has very different compression characteristics.
Data collection
JSON file example: /~https://github.com/flashbots/mempool-archiver/blob/main/docs/example-tx-summary.json
Per hour:
Extrapolated to a day:
Data size & compression
Looking at one particular hour specifically: 2023-08-04 UTC between [01:00, 02:00[:
find ./ -type f | wc -l
)du --si -s
)du --si -s --apparent-size
)ls -l | gawk '{sum += $5; n++;} END {print n" "sum" "sum/n;}'
)gzip
individual JSON files:more about "apparent-size": https://man7.org/linux/man-pages/man1/du.1.html
zipping an hourly folder:
The text was updated successfully, but these errors were encountered: