A RAG app powered by JobRunr

This repository is a companion code example for the blog post Building Local LLM Systems: RAG Implementation with JobRunr and Spring AI. The blog post walks through the challenges of building RAG workflows and demonstrates how JobRunr streamlines the process of keeping embeddings up to date.

RAG helps you enhance the accuracy of large language models (LLMs) by providing the AI with the relevant domain knowledge at query time. This is made easy by Spring AI. Thanks to the features the library provides, Java developers can connect their app to the most popular LLMs models.

JobRunr can make developing a RAG app easier and more enjoyable, especially for enterprises where documents are in a large amount and frequently updated. JobRunr provides the tools to keep your embedding (or vector) store up to date with the changes happening in the document repository.

Background (distributed) batch processing: embeddings can be updated in the background, thus the chat is not blocked while this is not happening. Need to process millions of documents? Scaling is easy thanks to the distributed processing.
Recurring jobs: add logic to update the embeddings and execute it at regular timing with a recurring job.
Automatic retries: anything that can go wrong will go wrong (👋network failure), thanks to automatic retries you'll not need to manually rewire your jobs.

Find more features at https://www.jobrunr.io/en/documentation/.

About this example

This is a console app to chat with an LLM. At startup, the app will register a JobRunr RecurringJob to update embeddings for documents in a user configured folder. For each document found in, or missing from, the folder, a job is created to create/update or delete embeddings for the document. This updating of embeddings heavily relies on API's provided by Spring AI.

To avoid unnecessary computation, this implementation makes use of the last modification time of a file. If the file is new or changed, a job is created to update its embeddings. The app also automatically removes embeddings if a previously processed file is deleted from the configured folder.

Configuration

app.content-dir=path/to/folder
app.embedding-synchronization.cron=0 0 * * *
app.similarity-threshold=0.25

You may clone the JobRunr's documentation repository and use the documentation folder as domain knowledge to trial this app. You can change the synchronization cron expression to something more fitting, you expect you docs to be updated every hour? Change the cron to hourly!

Supported file formats

This example support .md, .pdf and .txt. You can easily extend this list by implementing ContentProcessor. The processing of these documents entirely relies on Spring AI capabilities. There are room for improvement (e.g., cleanup, better chunk sizes, etc.).

How to use

Start the database

docker run -it --rm --name postgres -p 5432:5432 -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=postgres pgvector/pgvector:pg17

Start RagConsoleApplication
Head over to localhost:8000/recurring-jobs and trigger the recurring job to generate initial embeddings, you can avoid this step by reducing the cron (see Configuration) or by running the org.jobrunr.examples.embedding.service.DirectoryManager.manage on startup of the app.
Wait a bit for the embedding to be generated before asking your questions.

Room for improvement

We provide this code to highlight what JobRunr can provide to RAG applications. You should feel free to adapt it to your use case and make it more practical for end-users.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.mvn/wrapper		.mvn/wrapper
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A RAG app powered by JobRunr

About this example

Configuration

Supported file formats

How to use

Room for improvement

About

Releases

Packages

Languages

jobrunr/example-rag

Folders and files

Latest commit

History

Repository files navigation

A RAG app powered by JobRunr

About this example

Configuration

Supported file formats

How to use

Room for improvement

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages