Skip to content

Latest commit

 

History

History
66 lines (47 loc) · 2.69 KB

readme.md

File metadata and controls

66 lines (47 loc) · 2.69 KB

Dynamic Approximate Nearest Neighbour Benchmarks

Install the framework

Unpackage code and setup python environment

> tar xvzf dyann.tar.gz
> cd dyann
> /apps/python/3.9.X/bin/python3 -m venv env
> source env/bin/activate
> pip install -r requirements.txt --no-cache-dir

Running the code

Quick test

> python download.py data=[datacol_quick]
> python run.py data=[datacol_quick] algo=[linear,hnsw]
> python plot-pareto.py data=[datacol_quick] algo=[linear,hnsw]
> python plot-algo.py data=[datacol_quick] algo=[hnsw]

Preload all datasets and pregenerate all groundtruth (could take hours, ensure at least 30GB space)

> python download.py data=[datacol,datacol_lerp,datacol_efreq,datacol_esfreq]
> python download.py data=[featlearn,featlearn_lerp,featlearn_efreq,featlearn_esfreq]

Generate all benchmarking results (can easily take days or weeks, best run in parallel with a job scheduler)

> python run.py data=[datacol,datacol_lerp,datacol_efreq,datacol_esfreq] algo=[linear,annoy,hnsw,ivfpq,scann,kdtree]
> python run.py data=[featlearn,featlearn_lerp,featlearn_efreq,featlearn_esfreq] algo=[linear,annoy,hnsw,ivfpq,scann,kdtree]

Adding new datasets

A template file for new datasets is provided at ./dyann/data/template.py

Usage Instructions:

  1. Copy template.py and change the filename and class name for your new dataset
  2. Update ./dyann/data/proxy.py to include the names you have chosen
  3. Fill in each of the TODO items (refer to existing datasets for hints if needed)
  4. Create any number of configuration sets in ./conf/data/ with name property set to this filename and scale property providing an optional parameter sweep

Adding new ANN algorithms

A template file for new datasets is provided at ./dyann/algo/template.py

Usage Instructions:

  1. Copy template.py and change the filename and class name for your new ANN algorithm
  2. Update ./dyann/algo/proxy.py to include the names you have chosen
  3. Fill in each of the TODO items (refer to existing algorithms for hints if needed)
  4. Create both the build and search configuration files in ./conf/algo/ with name property set to this filename the lists of parameters for the build and query properties will be swept

Benchmarks for Static ANN