Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash hosts #15

Merged
merged 2 commits into from
Jan 25, 2015
Merged

Hash hosts #15

merged 2 commits into from
Jan 25, 2015

Conversation

attilaolah
Copy link
Contributor

Instead of keeping a map of hosts, we can hash them and keep the hash sums to save space. Hashes are arrays, which means a fixed size, which is easier to manage and behaves more predictably wrt memory consumption.

After some profiling with the hosts file from http://hosts-file.net/download/hosts.txt, I noticed the following:

Memory string [Mb] [8]byte [Mb] improvement
Alloc 107.87 60.56 43.86%
TotalAlloc 215.78 209.75 2.79%
Sys 131.24 67.86 48.29%

The runtime for parsing the entire host file did not change noticeably (it stayed around 3.4 seconds).

So far the only downside of this approach seems to be that there is no way to list blocked hosts in the web UI. Because of that, we can only use this approach for the public hosts files (which are the huge ones anyway, private hosts files will likely be much smaller).

Closes #9, for now (we could still investigate that later).

@attilaolah attilaolah added this to the Gopher Gala milestone Jan 25, 2015
@attilaolah
Copy link
Contributor Author

Note that I'm halving the MD5 sum here, reducing it to 8 bytes (64) bits instead of 16 bytes. This gives us a chance of collision of one in sqrt(2⁶⁴), which is roughly one in four billion (1/4294967296). A hosts file containing that many hosts would be more than a gigabyte.

@attilaolah attilaolah self-assigned this Jan 25, 2015
@attilaolah
Copy link
Contributor Author

Out of curiosity, I also tried using uint64 as the may key, which is also 8 bytes. It behaves exactly like [8]byte, both speed and memory wise.

I'll merge this, seems like nothing broke.

@attilaolah attilaolah merged commit fe1697f into master Jan 25, 2015
@attilaolah attilaolah deleted the hash branch January 25, 2015 22:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Blocklist Tree
1 participant