-
Notifications
You must be signed in to change notification settings - Fork 793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LangChain representation does not support LCEL Runnables #1564
Comments
Let me start off by saying thank you for taking the time to explore all of this! Interesting features and definitely something worthwhile implementing. Let me go through some snippets first and then dive into the main idea of the solution. Problems
If I am not mistaken, your suggestion is to generalize the
These are currently implemented in an open PR in #1539 and will be merged soon. So that should be resolved.
I agree. Seeing as you essentially give it all topics at once instead of iterating over them, it seems that
That's great, indeed opens it up for many interesting options!
No problem, should be quickly resolved. Motivation
Although I agree with the general principle, the one thing that I am concerned about is user experience. Traditional chains is something that users are likely to know when using LangChain. LangChain, due to the speed of developments, has changed quite drastically a couple of times and I want to make sure that whatever is implemented is likely to stay. Having said that, since it is tightly integrated with its traditional chains (and a major component of LangSmith), it seems that there is little to be worried about. Solution
This looks great! Very clear and nicely implemented. Since there is already a PR open with some of the suggested implementations, it might make sense to add it there. You could also create a PR (if you are interested) after I merged that one but I am not sure if I am going to merge that this week or in a couple of weeks. What would you prefer? Notes
That's perfect! If I am not mistaken, it would also mean that there is no change for the user compared with the previous implementation right? In other words, they could still use their old code (i.e., using In other words, the old examples would still work right? Example code using forked version
Not only a nice example of a more complex LangChain pipeline but something that is highly relevant to using external APIs! |
It's my pleasure. BERTopic is one of my favorite libraries. Before I became aware of BERTopic, I was constructing my own [less robust] embedding / clustering pipelines. I did not have good tools in place for dimensionality reduction, cluster representation, or visualization. BERTopic offers this and more, with plenty of the modularity and flexibility that we all love and need.
That is correct. My changes would allow greater flexibility while still accomodating existing use cases. I did not introduce any massive or breaking changes; everything remains backwards-compatible, as far as I can tell. The one exception may be the
That is a possibility, and I'm sure some users would appreciate it, but I don't think it would offer a ton of benefit in most settings compared to just using
You are certainly not wrong about this. However, since the fork is backwards-compatible and I think the
Thank you. :) I do not want to make a PR with duplicate work, nor do I wish to rush you. Whatever fits with your normal development pattern would be fine with me. It probably makes the most sense to wait until you merge existing PRs and then make a new one accounting for those changes. In the meantime, users can use the
Correct. I would suggest more testing before merging changes, but I tested this to make sure existing documentation was still valid, working code.
Thank you. I think this was a compelling example of what can be done with more complex chains, and at some point, I'd love to share more about how I've been using LangChain to achieve better clustering results with BERTopic. :) Finally, thank you again for all of your hard work and diligence. I look forward to future collaboration. |
I think that upping the version of LangChain would be perfectly okay. The package is known for their quick development and major changes to the API, so it would not be unsurprising that users would need to up LangChain's version in order to get this new functionality. If omitting the type hints would solve this issue, then that would be my preference. Choosing between full backward compatibility and having type hints I would definitely choose the former.
I agree. It's okay to leave it for now. If users would want this feature in a later release, then at least it would be straightforward to implement.
Agreed!
After some testing, I can merge the relevant PRs this week. After that, I can either start working on your suggested changes or, if you want, you can create the PR yourself. It would be greatly appreciated if you have the time. If not, then I'm more than okay implementing the changes myself.
That would be great! Any and all use cases are much appreciated. |
I'd be particularly interested in this! |
@joshuasundance-swca I merged the relevant PR a couple of days ago, so if you are interested, a PR would be very much appreciated. If you are constrained with respect to time or something else, I completely understand! I will pick it up and start implementing it in a couple of days/weeks. |
Wonderful! Thank you. I will submit a PR in the near future. |
I'm sorry to put a list under a single issue like this, but each of the items below are related to
bertopic/representation/_langchain.py
. I wouldn't normally make so many unsolicited changes at once, even in one file, but I was trying to make theLangChain
representation class more robust for my intended use cases and I think others would benefit from each change.I realize this may be a lot for one Issue or PR, but I'd love to get the discussion started. :) I would be more than happy to answer any questions and address any concerns.
Thank you.
Problems
bertopic.representation.LangChain
currently expectschain
to be the output oflangchain.chains.question_answering.load_qa_chain
.nr_samples
,nr_repr_docs
,diversity
) is not exposed to the user..run
method is used, which is not part of the standardRunnable
interface..invoke
could work, but I think.batch
is the preferred method in this case.RunnableConfig
, which could help them avoidRateLimitError
as in Retry strategies for OpenAI RateLimitError #1560LangChain.extract_topics
is incorrect (specifiesfloat
, should beint
).Motivation
The LangChain Expression Language (LCEL) is a powerful declarative syntax for creating
Runnable
objects in LangChain, which operate similar to traditional chains. It is based largely around piping and modularity. In the example code below, I show how LCEL can be more powerful than ordinary chains. This is real code I used today, btw. :) It's worth noting that LCEL transformations would also enable the use of any document truncation methods, including those which are token-based, as long as the user implements it themselves.Solution
I forked the repo and solved each of these [perceived] problems in a branch. I realize this is getting ahead of the usual process, but I did so to be sure my proposed solution would actually work. :) All tests still pass, and mypy is satisfied. However, I wanted to get ample feedback and a chance to collaborate before proposing changes to tests, documentation, etc. In fact, I didn't even change the docstring yet.
It's worth noting that in the case of simply using
Runnable
objects, aDummy
wrapper as seen in the code below will work. You could probably addmax_concurrency
and otherRunnableConfig
settings in the wrapper.Notes
Existing uses and functionality should remain unchanged, because any
Runnable
objects used with theLangChain
representation class must use the same input keys expected in the original implementation.Required input keys:
input_documents: List[str]
question: str
Required output key:
output_text: str
Example code using forked version
The code below uses Microsoft Presidio to remove PII before sending text to the LLM and then puts it back when it gets the result.
This is useful if you want to be sure you're not sending PII to the LLM. Similar use cases, including any arbitrary text transformations, would still be supported.
The text was updated successfully, but these errors were encountered: