Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider alternative index approaches #51

Open
jimhester opened this issue Feb 26, 2019 · 1 comment
Open

Consider alternative index approaches #51

jimhester opened this issue Feb 26, 2019 · 1 comment
Labels
feature a feature request or enhancement performance 🚀

Comments

@jimhester
Copy link
Collaborator

jimhester commented Feb 26, 2019

When dealing with very large numeric only files the index takes up ~ the same amount of memory as the actual data.

  • We could investigate writing the index to disk and them mmaping that, which would dramatically reduce the memory requirements.
  • We could keep the index in memory but compress it with something like lz4 and uncompress it as needed, provided we store some bookkeeping information on what row numbers the blocks correspond to.
  • We could do both the above, write the index to disk and uncompress on demand.
@jimhester jimhester changed the title Consider mmapped index Consider alternative index approaches Mar 1, 2019
@jimhester jimhester added the feature a feature request or enhancement label Mar 25, 2019
@kkmann
Copy link

kkmann commented Mar 11, 2020

+1 to that. Maybe it would even be possible to have a hard mem constraint to make sure the index does not blow the RAM (i.e. disk caching if needed)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement performance 🚀
Projects
None yet
Development

No branches or pull requests

3 participants