Maximal Exact Match Ordered (MEMO) is a pangenome indexing method based on maximal exact matches (MEMs) between genomes. A single MEMO index can handle arbitrary-length-k k-mer queries over pangenomic windows. MEMO performs membership queries for per-genome k-mer presence/absence and conservation queries for the number of genomes containing the k-mers in a window. MEMO achieves smaller index sizes and faster queries than k-mer-based approaches like KMC3 and PanKmer. See the small example here on running MEMO for visualizing sequence conservation.
MEMO is available as a Docker image on DockerHub.
### Docker:
docker pull hwangstephen/memo:latest
docker run hwangstephen/memo:latest memo -h
### Singularity:
singularity pull docker://hwangstephen/memo:latest
./memo_latest.sif memo -h
MEMO relies on the following dependencies:
Compile MONI from repo:
sudo apt-get install -y build-essential cmake git python3 zlib1g-dev
git clone /~https://github.com/maxrossi91/moni
mkdir build
cd build
cmake ..
make
make install
After downloading/building the required dependencies, clone and run MEMO from its repo:
git clone /~https://github.com/StephenHwang/MEMO.git
cd MEMO/src
./memo -h
To create a MEMO conservation index, specify a list of genomes -g
and an output location -o
and prefix -p
. To create the MEMO membership index, include the -m
flag.
Each line in the genome_list.txt
is the path to each genome in the pangenome; the first genome listed is the pangenome pivot.
./memo index \
-g genome_list.txt \
-o output_dir \
-p output_prefix
Once you have created your indexes, specify your length-k k
, genomic region -r
, and the total number of genomes in your genome (inclusive of pivot) -n
. Then run memo query
for the conservation query. To run the membership query, include the -m
flag.
./memo query \
-b index.parquet \
-k k \
-n num_genomes \
-r chr:start-end \
-o memo_c_out.txt
31-mer sequence conservation of the Human Leucocyte Antigen locus in the HPRC pangenome.
After the conservation query, use MEMO to visualize sequence conservation:
./memo view \
-i memo_c_out.txt \
-o out.png \
-n num_genomes \
-b num_bins
Stephen Hwang, Nathaniel K. Brown, Omar Y. Ahmed, Katharine M. Jenike, Sam Kovaka, Michael C. Schatz, Ben Langmead. MEM-based pangenome indexing for k-mer queries (2024). bioRxiv.