You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I've noticed some unexpected behavior with gitingest when trying to fetch a subdirectory from a specific tag. It seems to grab the whole repository instead of just the subdirectory, which works fine for branches.
Expected:
When I use a URL like this (pointing to a subdirectory within a tag):
gitingest should only fetch the files in that subdirectory, just like it does for branches.
Observed:
gitingest downloads a ton of files (hitting the max file limit on big repos like PyTorch) and seems to ignore the subdirectory part of the tag URL. It's pulling the entire repository for that tag.
Observe (Tag): You'll see output like this, indicating it's processing the whole repository:
Maximum file limit (10000) reached
... (repeated many times) ...
Analysis complete! Output written to: digest.txt
Summary:
Repository: pytorch/pytorch
Files analyzed: 10000 # Should be much smaller!
Estimated tokens: 16.8M
...
Branch (Works): Now try the same subdirectory, but on the main branch:
Other Branch:gitingest .../tree/gh/qqaatw/26/orig/... (Correct output: 4 files)
I've included the full commands and expected output in the original description, but the key difference is the Files analyzed count.
It seems like gitingest handles tagged subdirectories differently than branch subdirectories, leading to unexpected behavior and hitting the file limit.
I'd be happy to help investigate and potentially submit a PR if you can confirm this is a bug! Let me know what you think.
Thanks!
The text was updated successfully, but these errors were encountered:
Hi, thanks for your response! I’m not entirely sure if this addresses the specific issue I raised, so I’d love to clarify things a bit.
To recap the two issues I submitted:
In #195, I noted that cloning large repos (like PyTorch) within 60 seconds wasn’t feasible. I found that removing --recurse-submodules from this line resolved the issue. Out of curiosity, is there a specific reason the team typically includes this flag? It seems particularly inefficient when dealing with large third-party submodules.
In #196(the current page), I pointed out that tags (e.g., cyclotruc/gitingest/tree/v0.1.3) and their subdirectories aren’t being recognized properly (instead the full repository in main branch is being fetched), unlike commits or branches. I’ve identified the root cause of this and am actively working on a fix for this now.
Issue #195 feels like a quick fix, while I’m currently digging into #196.
Could you assign both issues to me so I can take ownership of them?
Hi! I've noticed some unexpected behavior with
gitingest
when trying to fetch a subdirectory from a specific tag. It seems to grab the whole repository instead of just the subdirectory, which works fine for branches.Expected:
When I use a URL like this (pointing to a subdirectory within a tag):
gitingest
should only fetch the files in that subdirectory, just like it does for branches.Observed:
gitingest
downloads a ton of files (hitting the max file limit on big repos like PyTorch) and seems to ignore the subdirectory part of the tag URL. It's pulling the entire repository for that tag.Steps to Reproduce:
--recurse-submodules
– see Why Use--recurse-submodules
in clone_repo? It slows down cloning large repos #195).main
branch:Comparison (Branch Behavior - Working):
Just to confirm, this works perfectly for subdirectories on branches (both
main
and others):gitingest .../tree/main/...
(Correct output: 4 files)gitingest .../tree/gh/qqaatw/26/orig/...
(Correct output: 4 files)I've included the full commands and expected output in the original description, but the key difference is the
Files analyzed
count.It seems like
gitingest
handles tagged subdirectories differently than branch subdirectories, leading to unexpected behavior and hitting the file limit.I'd be happy to help investigate and potentially submit a PR if you can confirm this is a bug! Let me know what you think.
Thanks!
The text was updated successfully, but these errors were encountered: