Skip to content

AI Agent leveraging symbolic reasoning and other auxiliary tools to boost its capabilities on various logic and reasoning benchmarks. This project aims to develop a robust and flexible AI system that can tackle complex problems in areas such as decision-making, mathematics, and programming.

License

Notifications You must be signed in to change notification settings

facebookresearch/polymath

Repository files navigation

Polymath

Agent leveraging auxiliary tools to improve performance in selected problem domains. This is the reproduction package for our research paper Logic.py: Bridging the Gap between LLMs and Constraint Solvers.

Project configuration

Initial Setup

Currently, there is no default LLM inference provider availabe. We use an internal provider at Meta, which is not part of the open source release. To get started, create an implementation of chat_completion.py for your inference back-end in inference/your_inference_provider.py. Then replace the DummyChatCompletion in inference/chat_completion_factory.py by your new provider. If your provider requires secrets, we suggest to use the dotenv library and add them to a .env file. You can use the .env-example as a starting point.

Set up Conda environment:

conda create --yes --file environment.yml
conda activate polymath

Log into your huggingface account to download datasets:

huggingface-cli login

On HuggingFace, you need to request access to the following two datasets. Access is granted immediately upon filling out a form:

Finally, install datasets and remaining dependencies:

./scripts/setup.sh

Update dependencies

conda env update --file environment.yml

Run tests

Note: Some unit tests expect a working LLM inference set up.

To run all tests, use:

python -m unittest discover

To only run specifc tests, you can run:

python -m unittest agent.symex.tests.test_module_with_type_info_factory -k test_single

Benchmarks

ZebraLogicBench

To run the benchmark set using our logic agent, use:

python -m agent.logic.zebra_benchmark

This will produce an output JSON file that we evaluate using the original ZeroEval environment.

To set up a ZeroEval Conda environment, follow these instructions adapted from their README.md:

cd lib/ZeroEval
conda create -n zeroeval python=3.10
conda activate zeroeval
pip install vllm -U
pip install -r requirements.txt

Afterwards, you can run their evaluation using:

python src/evaluation/zebra_grid_eval.py

This will update result_dirs/zebra-grid.summary.md to now include the output JSON generated by our logic agent.

FOLIO

Support for FOLIO is a work in progress and will be updated once the integration is complete.

τ-bench:

Support for τ-bench is a work in progress and will be updated once the integration is complete.

Submodules

This repository uses open source repositories, or public forks thereof, to make it easy for users to build the respective libraries and tools. This is purely for the purpose of convenience, and users are free to download these same tools and libraries from their original repositories.

License

Polymath is CC BY NC 4.0 licensed, as found in the LICENSE file.

About

AI Agent leveraging symbolic reasoning and other auxiliary tools to boost its capabilities on various logic and reasoning benchmarks. This project aims to develop a robust and flexible AI system that can tackle complex problems in areas such as decision-making, mathematics, and programming.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published