Agent leveraging auxiliary tools to improve performance in selected problem domains. This is the reproduction package for our research paper Logic.py: Bridging the Gap between LLMs and Constraint Solvers.
Currently, there is no default LLM inference provider availabe. We use an
internal provider at Meta, which is not part of the open source release. To get
started, create an implementation of chat_completion.py
for your inference
back-end in inference/your_inference_provider.py
. Then replace the
DummyChatCompletion
in inference/chat_completion_factory.py
by your new
provider. If your provider requires secrets, we suggest to use the dotenv
library and add them to a .env
file. You can use the .env-example
as a
starting point.
Set up Conda environment:
conda create --yes --file environment.yml
conda activate polymath
Log into your huggingface account to download datasets:
huggingface-cli login
On HuggingFace, you need to request access to the following two datasets. Access is granted immediately upon filling out a form:
- https://huggingface.co/datasets/yale-nlp/FOLIO
- https://huggingface.co/datasets/allenai/ZebraLogicBench-private
Finally, install datasets and remaining dependencies:
./scripts/setup.sh
conda env update --file environment.yml
Note: Some unit tests expect a working LLM inference set up.
To run all tests, use:
python -m unittest discover
To only run specifc tests, you can run:
python -m unittest agent.symex.tests.test_module_with_type_info_factory -k test_single
To run the benchmark set using our logic agent, use:
python -m agent.logic.zebra_benchmark
This will produce an output JSON file that we evaluate using the original
ZeroEval
environment.
To set up a ZeroEval
Conda environment, follow these instructions adapted
from their README.md
:
cd lib/ZeroEval
conda create -n zeroeval python=3.10
conda activate zeroeval
pip install vllm -U
pip install -r requirements.txt
Afterwards, you can run their evaluation using:
python src/evaluation/zebra_grid_eval.py
This will update result_dirs/zebra-grid.summary.md
to now include the output
JSON generated by our logic agent.
Support for FOLIO is a work in progress and will be updated once the integration is complete.
Support for τ-bench is a work in progress and will be updated once the integration is complete.
This repository uses open source repositories, or public forks thereof, to make it easy for users to build the respective libraries and tools. This is purely for the purpose of convenience, and users are free to download these same tools and libraries from their original repositories.
Polymath is CC BY NC 4.0 licensed, as found in the LICENSE file.