-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deploy package to conda-forge #11
Conversation
build package functioning
reserve build and deploy for manual run (after publishing to pypi)
Awesome, thanks for this! What do I need to do to make it work? I noticed for example there's a reference to |
I got the steps working on my local and was able to publish to my account's index, https://anaconda.org/dominictarro/semchunk, but it's a bit different for conda-forge. Publishing to conda-forge is a first for me so we'll learn as we go. Anaconda has their own process that includes forking the staged-recipes repo and us requesting to merge a branch. This happens within the workflow, but I wasn't able to test it. I'm doing this for a package at work rn and can apply some of that learning here. You only need to create an Anaconda account and get an API key with publishing permission. Set that to the repo variable |
I may have spoken too soon. I'll research this more thoroughly and get back to you. |
Keep me posted :) I don’t use Anaconda myself so it’s unfamiliar territory to me. |
conda-forge/staged-recipes#28590 Got everything set up, just need for you to comment on the PR. The whole PR process through conda-forge/staged-recipes is a one-time thing. Once that is done, they will create a repository in conda-forge org named semchunk-feedstock that we are added as maintainers of. All that really goes into maintaining is updating the meta.yaml's SHA256 hash from PyPI (e.g. for 2.2.0) and the version number. Fork the feedstock, make the changes to a branch, and make a PR to the parent feedstock. https://conda-forge.org/docs/maintainer/updating_pkgs/#example-workflow-for-updating-a-package |
@dominictarro So I just need to merge this into my main branch? And set |
Sorry, I should have elaborated. This PR and my fork of semchunk can be closed unmerged. They aren't necessary. I need to update dominictarro/staged-recipes with the conda-forge/staged-recipes latest changes. I will update the meta.yaml in my fork to use semchunk v2.2.2. Since this PR can be closed unmerged, no token needs to be created. conda-forge will copy the build on PyPI to the conda-forge "channel" on conda, and they will use their own CI/CD to do so. To trigger their CI/CD, we just create a PR to the "feedstock" repo that they create with the meta.yaml changes that we want (i.e. version, build hash). There's a tool, regro-cf-autotick-bot, they mention for automatically updating the feedstock when PyPI changes are detected. I haven't seen seemchunk release frequently enough to make it useful, but it's an option. |
@dominictarro Just to clarify, right now |
@umarbutler correct, and sounds good! |
Hey @dominictarro, I'm prepping a new release and I'd like to mention how to install Sorry if this is a silly question but I just want to confirm is this the best way to install semchunk with conda:
Is it possible to just do:
|
The conda install conda-forge::semchunk
# or
conda install -c conda-forge semchunk There's a way to set conda-forge as a default channel to install from so you don't have to specify conda-forge, but it involves modifying a config file. You can also share the conda package's listing: https://anaconda.org/conda-forge/semchunk |
Unrelated but maybe interesting to some users, I created a variant of semchunk for Rust, https://crates.io/crates/semchunk-rs |
@dominictarro I actually came across that earlier but didn't have time to look into it further. Thanks for creating that port! I'll link to it in my README. I had been thinking about trying to speed semchunk up even further by offloading to Rust but I'm not a Rust coder. I was meaning to learn more about it later. Do you think that's possible by combining our work or is your port not as fast as the Python version? |
@umarbutler Frankly, that was my first Rust package, so I'm no expert! My package is for Rust applications, not as a Rust backend with Python bindings. I used a Rust-native tokenizer library, The Rust version clocks in at 6.22s with RoBERTa against the Gutenberg corpus, but you used the GPT-4 tokenizer. I see you recently got your benchmark down from 6.69s to 2.87s. I think if I throw multiprocessing at it I can get the Rust version's numbers down. I'll have to check the changes, see what you did. |
Ah, I think I'll have to do some more research into how the interfacing would work. To be honest,
I don't think there was any change except for getting a much faster PC than what I had before 😆 you should try benchmarking the Python library on the same hardware and then compare the results. |
@dominictarro v3.0.0 is out 🍾 |
Agree. The tokenizer is probably >=90% of the compute at this point. |
Closes #10
A GitHub workflow for deploying the package to conda-forge.