Skip to content

Commit

Permalink
New data indexing docs page (#8013)
Browse files Browse the repository at this point in the history
GitOrigin-RevId: d4af088f50bb592678fe5fd782139c1b938495da
  • Loading branch information
tryptofanik authored and Manul from Pathway committed Jan 16, 2025
1 parent eedec56 commit 804a5f2
Show file tree
Hide file tree
Showing 9 changed files with 236 additions and 388 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,7 @@ Keep in mind that some output connectors to external data storage system might t
title: "Live Data AI Pipelines"
---
#default
- [Data indexing pipeline and RAG.](/developers/user-guide/llm-xpack/vectorstore_pipeline)
- [Data indexing pipeline and RAG.](/developers/user-guide/llm-xpack/docs-indexing)
- [Multimodal RAG.](/developers/templates/multimodal-rag)
- [Unstructured data to SQL on-the-fly.](/developers/templates/unstructured-to-structured)
::
Expand Down

This file was deleted.

This file was deleted.

9 changes: 4 additions & 5 deletions docs/2.developers/4.user-guide/50.llm-xpack/10.overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,16 +193,15 @@ texts = documents.select(chunk=splitter(pw.this.text))

`TokenCountSplitter` returns data in the same format as `ParseUnstructured` - that is for each row it returns a list of tuples, where each tuple consists of a string with the text of a chunk and a dictionary with associated metadata.

With these tools it is easy to create in Pathway a pipeline serving as a Vector Store, but which updates on each data change. You can check such an example in [the llm-app repository](/~https://github.com/pathwaycom/llm-app/blob/main/examples/pipelines/demo-question-answering/app.py). As it is a common pipeline, Pathway provides a [class `VectorStore`](/developers/api-docs/pathway-xpacks-llm/vectorstore#pathway.xpacks.llm.vector_store.VectorStoreServer) which implements this pipeline.


## Ready-to-use Vector Store
## Ready-to-use Document Store

Pathway Vector Store enables building a document index on top of your documents and allows for easy-to-manage, always up-to-date LLM pipelines accessible using a RESTful API. It maintains an index of your documents and allows for querying for documents closest to a given query. It is implemented using two classes - [`VectorStoreServer`](/developers/api-docs/pathway-xpacks-llm/vectorstore#pathway.xpacks.llm.vector_store.VectorStoreServer) and [`VectorStoreClient`](/developers/api-docs/pathway-xpacks-llm/vectorstore#pathway.xpacks.llm.vector_store.VectorStoreClient).
With these tools it is easy to create in Pathway a pipeline serving as a [`DocumentStore`](/developers/api-docs/pathway-xpacks-llm/document_store), which automatically indexes documents and gets updated upon new data.

The `VectorStoreServer` class implements the pipeline for indexing your documents and runs an HTTP REST server for nearest neighbors queries. You can use `VectorStoreServer` by itself to use Pathway as a Vector Store, and you then query it using REST. Alternatively, use `VectorStoreClient` for querying `VectorStoreServer` which implements wrappers for REST calls.
To make interaction with DocumentStore easier you can also use [`DocumentStoreServer`](/developers/api-docs/pathway-xpacks-llm/servers#pathway.xpacks.llm.servers.DocumentStoreServer) that handles API calls.

You can learn more about Vector Store in Pathway in a [dedicated tutorial](/developers/user-guide/llm-xpack/vectorstore_pipeline).
You can learn more about Document Store in Pathway in a [dedicated tutorial](/developers/user-guide/llm-xpack/docs-indexing) and check out a QA app example in [the llm-app repository](/~https://github.com/pathwaycom/llm-app/blob/main/examples/pipelines/demo-question-answering/app.py).

### Integrating with LlamaIndex and LangChain

Expand Down
Loading

0 comments on commit 804a5f2

Please sign in to comment.