diff --git a/README.md b/README.md index f4af62b37e..a3a8372228 100644 --- a/README.md +++ b/README.md @@ -22,13 +22,13 @@ Anserini is packaged in a self-contained fatjar, which also provides the simples Assuming you've already got Java installed, fetch the fatjar: ```bash -wget https://repo1.maven.org/maven2/io/anserini/anserini/0.35.1/anserini-0.35.1-fatjar.jar +wget https://repo1.maven.org/maven2/io/anserini/anserini/0.36.0/anserini-0.36.0-fatjar.jar ``` The follow commands will generate a SPLADE++ ED run with the dev queries (encoded using ONNX) on the MS MARCO passage corpus: ```bash -java -cp anserini-0.35.1-fatjar.jar io.anserini.search.SearchCollection \ +java -cp anserini-0.36.0-fatjar.jar io.anserini.search.SearchCollection \ -index msmarco-v1-passage.splade-pp-ed \ -topics msmarco-v1-passage.dev \ -encoder SpladePlusPlusEnsembleDistil \ @@ -39,16 +39,17 @@ java -cp anserini-0.35.1-fatjar.jar io.anserini.search.SearchCollection \ To evaluate: ```bash -wget https://raw.githubusercontent.com/castorini/anserini-tools/master/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt -java -cp anserini-0.35.1-fatjar.jar trec_eval -c -M 10 -m recip_rank qrels.msmarco-passage.dev-subset.txt run.msmarco-v1-passage-dev.splade-pp-ed-onnx.txt +java -cp anserini-0.36.0-fatjar.jar trec_eval -c -M 10 -m recip_rank msmarco-passage.dev-subset run.msmarco-v1-passage-dev.splade-pp-ed-onnx.txt ``` -See [detailed instructions](docs/fatjar-regressions/fatjar-regressions-v0.35.1.md) for the current fatjar release of Anserini (v0.35.1) to reproduce regression experiments on the MS MARCO V2.1 corpora for TREC 2024 RAG, on MS MARCO V1 Passage, and on BEIR, all directly from the fatjar! -We also have [forthcoming instructions](docs/fatjar-regressions/fatjar-regressions-v0.35.2-SNAPSHOT.md) for the next release (v0.35.2-SNAPSHOT) if you're interested. +See [detailed instructions](docs/fatjar-regressions/fatjar-regressions-v0.36.0.md) for the current fatjar release of Anserini (v0.36.0) to reproduce regression experiments on the MS MARCO V2.1 corpora for TREC 2024 RAG, on MS MARCO V1 Passage, and on BEIR, all directly from the fatjar! + +
Older instructions ++ [Anserini v0.35.1](docs/fatjar-regressions/fatjar-regressions-v0.35.1.md) + [Anserini v0.35.0](docs/fatjar-regressions/fatjar-regressions-v0.35.0.md)
@@ -447,6 +448,7 @@ Beyond that, there are always [open issues](/~https://github.com/castorini/anserin ## 📜️ Release History ++ v0.36.0: April 28, 2024 [[Release Notes](docs/release-notes/release-notes-v0.36.0.md)] + v0.35.1: April 24, 2024 [[Release Notes](docs/release-notes/release-notes-v0.35.1.md)] + v0.35.0: April 3, 2024 [[Release Notes](docs/release-notes/release-notes-v0.35.0.md)] + v0.25.0: March 27, 2024 [[Release Notes](docs/release-notes/release-notes-v0.25.0.md)] diff --git a/docs/fatjar-regressions/fatjar-regressions-v0.35.0.md b/docs/fatjar-regressions/fatjar-regressions-v0.35.0.md index 1f518310dd..1b78835d47 100644 --- a/docs/fatjar-regressions/fatjar-regressions-v0.35.0.md +++ b/docs/fatjar-regressions/fatjar-regressions-v0.35.0.md @@ -1,5 +1,8 @@ # Anserini Fatjar Regresions (v0.35.0) +❗Anserini v0.35.0 is no longer the latest release. +The latest release is always linked from the main [Anserini](http://anserini.io/) site. + Fetch the fatjar: ```bash diff --git a/docs/fatjar-regressions/fatjar-regressions-v0.35.1.md b/docs/fatjar-regressions/fatjar-regressions-v0.35.1.md index 5eec5c5129..b7f5161f72 100644 --- a/docs/fatjar-regressions/fatjar-regressions-v0.35.1.md +++ b/docs/fatjar-regressions/fatjar-regressions-v0.35.1.md @@ -1,5 +1,10 @@ # Anserini Fatjar Regresions (v0.35.1) +❗Anserini v0.35.1 is no longer the latest release. +The latest release is always linked from the main [Anserini](http://anserini.io/) site. + +❗The published artifacts for Anserini v0.35.1 are problematic. See [Anserini #2468](/~https://github.com/castorini/anserini/pull/2468) for details. + Fetch the fatjar: ```bash diff --git a/docs/fatjar-regressions/fatjar-regressions-v0.35.2-SNAPSHOT.md b/docs/fatjar-regressions/fatjar-regressions-v0.36.0.md similarity index 93% rename from docs/fatjar-regressions/fatjar-regressions-v0.35.2-SNAPSHOT.md rename to docs/fatjar-regressions/fatjar-regressions-v0.36.0.md index f22d56cf01..725572d739 100644 --- a/docs/fatjar-regressions/fatjar-regressions-v0.35.2-SNAPSHOT.md +++ b/docs/fatjar-regressions/fatjar-regressions-v0.36.0.md @@ -1,10 +1,9 @@ -# Anserini Fatjar Regresions (v0.35.2-SNAPSHOT) +# Anserini Fatjar Regresions (v0.36.0) Fetch the fatjar: ```bash -# Change when we publish new artifact. -wget https://repo1.maven.org/maven2/io/anserini/anserini/0.35.1/anserini-0.35.1-fatjar.jar +wget https://repo1.maven.org/maven2/io/anserini/anserini/0.36.0/anserini-0.36.0-fatjar.jar ``` Note that prebuilt indexes will be downloaded to `~/.cache/pyserini/indexes/`. @@ -14,8 +13,8 @@ If you want to change the download location, the current workaround is to use sy Let's start out by setting the `ANSERINI_JAR` and the `OUTPUT_DIR`: ```bash -export ANSERINI_JAR=`ls target/*-fatjar.jar` -export OUTPUT_DIR="runs" +export ANSERINI_JAR="anserini-0.36.0-fatjar.jar" +export OUTPUT_DIR="." ``` ## TREC 2024 RAG @@ -27,7 +26,7 @@ The `msmarco-v2.1-doc-segmented` prebuilt index is 84 GB uncompresed. Here are the instructions for reproducing runs on the MS MARCO V2.1 document corpus with prebuilt indexes (adjust number of threads based on available resources): ```bash -TOPICS=(msmarco-v2-doc-dev msmarco-v2-doc-dev2 trec2021-dl trec2022-dl trec2023-dl); for t in "${TOPICS[@]}" +TOPICS=(msmarco-v2-doc-dev msmarco-v2-doc-dev2 trec2021-dl trec2022-dl trec2023-dl rag24-raggy-dev); for t in "${TOPICS[@]}" do java -cp $ANSERINI_JAR io.anserini.search.SearchCollection -index msmarco-v2.1-doc -topics $t -output $OUTPUT_DIR/run.msmarco-v2.1-doc.bm25.${t}.txt -threads 16 -bm25 done @@ -56,6 +55,11 @@ java -cp $ANSERINI_JAR trec_eval -c -M 100 -m map dl23-doc-msmarco-v2.1 $OUTPUT_ java -cp $ANSERINI_JAR trec_eval -c -M 100 -m recip_rank -c -m ndcg_cut.10 dl23-doc-msmarco-v2.1 $OUTPUT_DIR/run.msmarco-v2.1-doc.bm25.trec2023-dl.txt java -cp $ANSERINI_JAR trec_eval -c -m recall.100 dl23-doc-msmarco-v2.1 $OUTPUT_DIR/run.msmarco-v2.1-doc.bm25.trec2023-dl.txt java -cp $ANSERINI_JAR trec_eval -c -m recall.1000 dl23-doc-msmarco-v2.1 $OUTPUT_DIR/run.msmarco-v2.1-doc.bm25.trec2023-dl.txt +echo '' +java -cp $ANSERINI_JAR trec_eval -c -M 100 -m map rag24.raggy-dev $OUTPUT_DIR/run.msmarco-v2.1-doc.bm25.rag24-raggy-dev.txt +java -cp $ANSERINI_JAR trec_eval -c -M 100 -m recip_rank -c -m ndcg_cut.10 rag24.raggy-dev $OUTPUT_DIR/run.msmarco-v2.1-doc.bm25.rag24-raggy-dev.txt +java -cp $ANSERINI_JAR trec_eval -c -m recall.100 rag24.raggy-dev $OUTPUT_DIR/run.msmarco-v2.1-doc.bm25.rag24-raggy-dev.txt +java -cp $ANSERINI_JAR trec_eval -c -m recall.1000 rag24.raggy-dev $OUTPUT_DIR/run.msmarco-v2.1-doc.bm25.rag24-raggy-dev.txt ``` And these are the expected scores: @@ -81,6 +85,12 @@ recip_rank all 0.5783 ndcg_cut_10 all 0.2914 recall_100 all 0.2604 recall_1000 all 0.5383 + +map all 0.1251 +recip_rank all 0.7060 +ndcg_cut_10 all 0.3631 +recall_100 all 0.2433 +recall_1000 all 0.5317 ``` @@ -88,7 +98,7 @@ recall_1000 all 0.5383 Here are the instructions for reproducing runs on the MS MARCO V2.1 segmented document corpus with prebuilt indexes (adjust number of threads based on available resources): ```bash -TOPICS=(msmarco-v2-doc-dev msmarco-v2-doc-dev2 trec2021-dl trec2022-dl trec2023-dl); for t in "${TOPICS[@]}" +TOPICS=(msmarco-v2-doc-dev msmarco-v2-doc-dev2 trec2021-dl trec2022-dl trec2023-dl rag24-raggy-dev); for t in "${TOPICS[@]}" do java -cp $ANSERINI_JAR io.anserini.search.SearchCollection -index msmarco-v2.1-doc-segmented -topics $t -output $OUTPUT_DIR/run.msmarco-v2.1-doc-segmented.bm25.${t}.txt -threads 16 -bm25 -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000 done @@ -117,6 +127,11 @@ java -cp $ANSERINI_JAR trec_eval -c -M 100 -m map dl23-doc-msmarco-v2.1 $OUTPUT_ java -cp $ANSERINI_JAR trec_eval -c -M 100 -m recip_rank -c -m ndcg_cut.10 dl23-doc-msmarco-v2.1 $OUTPUT_DIR/run.msmarco-v2.1-doc-segmented.bm25.trec2023-dl.txt java -cp $ANSERINI_JAR trec_eval -c -m recall.100 dl23-doc-msmarco-v2.1 $OUTPUT_DIR/run.msmarco-v2.1-doc-segmented.bm25.trec2023-dl.txt java -cp $ANSERINI_JAR trec_eval -c -m recall.1000 dl23-doc-msmarco-v2.1 $OUTPUT_DIR/run.msmarco-v2.1-doc-segmented.bm25.trec2023-dl.txt +echo '' +java -cp $ANSERINI_JAR trec_eval -c -M 100 -m map rag24.raggy-dev $OUTPUT_DIR/run.msmarco-v2.1-doc-segmented.bm25.rag24-raggy-dev.txt +java -cp $ANSERINI_JAR trec_eval -c -M 100 -m recip_rank -c -m ndcg_cut.10 rag24.raggy-dev $OUTPUT_DIR/run.msmarco-v2.1-doc-segmented.bm25.rag24-raggy-dev.txt +java -cp $ANSERINI_JAR trec_eval -c -m recall.100 rag24.raggy-dev $OUTPUT_DIR/run.msmarco-v2.1-doc-segmented.bm25.rag24-raggy-dev.txt +java -cp $ANSERINI_JAR trec_eval -c -m recall.1000 rag24.raggy-dev $OUTPUT_DIR/run.msmarco-v2.1-doc-segmented.bm25.rag24-raggy-dev.txt ``` And these are the expected scores: @@ -142,6 +157,12 @@ recip_rank all 0.6519 ndcg_cut_10 all 0.3356 recall_100 all 0.3049 recall_1000 all 0.5852 + +map all 0.1561 +recip_rank all 0.7465 +ndcg_cut_10 all 0.4227 +recall_100 all 0.2807 +recall_1000 all 0.5745 ``` diff --git a/docs/release-notes/release-notes-v0.36.0.md b/docs/release-notes/release-notes-v0.36.0.md new file mode 100644 index 0000000000..6f60dcdb78 --- /dev/null +++ b/docs/release-notes/release-notes-v0.36.0.md @@ -0,0 +1,71 @@ +# Anserini Release Notes (v0.36.0) + ++ **Release date:** April 28, 2024 ++ **Lucene version:** Lucene 9.9.1 + +## Summary of Changes + ++ Refactored and cleaned-up POM. ++ Added bindings for TREC 2024 RAG Track "RAGgy" topics. ++ Added regressions for MS MARCO V2.1 corpora: document + segmented document ++ Added ability to read YAML configs from fatjar. ++ Added ability to download qrels for `trec_eval` automatically based on symbol bindings. + +## Contributors (This Release) + +Sorted by number of commits: + ++ Jimmy Lin ([lintool](/~https://github.com/lintool)) ++ Daniel Kohn ([DanielKohn1208](/~https://github.com/DanielKohn1208)) ++ Eric Zhang ([16BitNarwhal](/~https://github.com/16BitNarwhal)) + +## All Contributors + +All contributors with five or more commits, sorted by number of commits, [according to GitHub](/~https://github.com/castorini/Anserini/graphs/contributors): + ++ Jimmy Lin ([lintool](/~https://github.com/lintool)) ++ Peilin Yang ([Peilin-Yang](/~https://github.com/Peilin-Yang)) ++ Ogundepo Odunayo ([ToluClassics](/~https://github.com/ToluClassics)) ++ Arthur Chen ([ArthurChen189](/~https://github.com/ArthurChen189)) ++ Xueguang Ma ([MXueguang](/~https://github.com/MXueguang)) ++ Ahmet Arslan ([iorixxx](/~https://github.com/iorixxx)) ++ Tommaso Teofili ([tteofili](/~https://github.com/tteofili)) ++ Edwin Zhang ([edwinzhng](/~https://github.com/edwinzhng)) ++ Rodrigo Nogueira ([rodrigonogueira4](/~https://github.com/rodrigonogueira4)) ++ Jheng-Hong Yang ([justram](/~https://github.com/justram)) ++ Royal Sequiera ([rosequ](/~https://github.com/rosequ)) ++ Emily Wang ([emmileaf](/~https://github.com/emmileaf)) ++ Yuqi Liu ([yuki617](/~https://github.com/yuki617)) ++ Chris Kamphuis ([Chriskamphuis](/~https://github.com/Chriskamphuis)) ++ Victor Yang ([Victor0118](/~https://github.com/Victor0118)) ++ Boris Lin ([borislin](/~https://github.com/borislin)) ++ Nikhil Gupta ([nikhilro](/~https://github.com/nikhilro)) ++ Jasper Xian ([jasper-xian](/~https://github.com/jasper-xian)) ++ Ronak Pradeep ([ronakice](/~https://github.com/ronakice)) ++ Stephanie Hu ([stephaniewhoo](/~https://github.com/stephaniewhoo)) ++ Yuhao Xie ([Kytabyte](/~https://github.com/Kytabyte)) ++ Shane Ding ([shaneding](/~https://github.com/shaneding)) ++ Kuang Lu ([lukuang](/~https://github.com/lukuang)) ++ Mofe Adeyemi ([Mofetoluwa](/~https://github.com/Mofetoluwa)) ++ Joel Mackenzie ([JMMackenzie](/~https://github.com/JMMackenzie)) ++ Xinyu (Crystina) Zhang ([crystina-z](/~https://github.com/crystina-z)) ++ Adam Yang ([adamyy](/~https://github.com/adamyy)) ++ Salman Mohammed ([salman1993](/~https://github.com/salman1993)) ++ Xinyu Mavis Liu ([x389liu](/~https://github.com/x389liu)) ++ Eric Zhang ([16BitNarwhal](/~https://github.com/16BitNarwhal)) ++ Luchen Tan ([LuchenTan](/~https://github.com/LuchenTan)) ++ Manveer Tamber ([manveertamber](/~https://github.com/manveertamber)) ++ Kelvin Jiang ([kelvin-jiang](/~https://github.com/kelvin-jiang)) ++ Hang Cui ([HangCui0510](/~https://github.com/HangCui0510)) ++ Matt Yang ([d1shs0ap](/~https://github.com/d1shs0ap)) ++ Zhiying Jiang ([bazingagin](/~https://github.com/bazingagin)) ++ Johnson Han ([x65han](/~https://github.com/x65han)) ++ Akintunde Oladipo ([theyorubayesian](/~https://github.com/theyorubayesian)) ++ Michael Tu ([tuzhucheng](/~https://github.com/tuzhucheng)) ++ Aileen Lin ([AileenLin](/~https://github.com/AileenLin)) ++ Dayang Shi ([dyshi](/~https://github.com/dyshi)) ++ Yuqing Xie ([amyxie361](/~https://github.com/amyxie361)) ++ Nandan Thakur ([thakur-nandan](/~https://github.com/thakur-nandan)) ++ Peng Shi ([Impavidity](/~https://github.com/Impavidity)) ++ Zeynep Akkalyoncu Yilmaz ([zeynepakkalyoncu](/~https://github.com/zeynepakkalyoncu)) ++ Ryan Clancy ([ryan-clancy](/~https://github.com/ryan-clancy))