Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cli): Add support for multiple file exclusions and .gitingestignore file : issue #147 #150

Closed
wants to merge 12 commits into from

Conversation

AbhiRam162105
Copy link
Contributor

@AbhiRam162105 AbhiRam162105 commented Jan 21, 2025

Enhanced CLI and Web Features: Multiple File Exclusion using CLI and .gitingestignore file

Description:

This PR significantly improves the gitingest CLI tool by introducing three major features:

1. Support for Multiple Exclusion Patterns

  • New Feature: The CLI now accepts multiple file exclusion patterns via the -e option.
  • Enhancements:
    • Command line pattern parsing has been enhanced to handle space-separated patterns.
    • Users can exclude multiple files in a single command:
      gitingest -e "LICENSE README.md package.json"
      or using multiple flags:
      gitingest -e LICENSE -e README.md -e package.json

2. .gitingestignore File Support

  • New Feature: Introduced support for a .gitingestignore file to specify files and directories to be ignored.
  • Details:
    • The .gitingestignore file follows a format similar to .gitignore:
      node_modules/
      LICENSE
      package.json
      pnpm-lock.yaml
      pnpm-workspace.yaml
      tsconfig
      
    • Patterns from both command line arguments and the .gitingestignore file are combined for comprehensive exclusion.
    • Works with both local repositories and remote (web) repositories:
      # For local repositories
      gitingest ./my-local-repo --ignore-file .gitingestignore
      
      # For remote repositories
      gitingest /~https://github.com/user/repo --ignore-file .gitingestignore
    • When using with remote repositories, the tool will:
      1. First clone the repository
      2. Look for the .gitingestignore file in the cloned repository
      3. Apply the ignore patterns during analysis
      4. Clean up the temporary files after completion

3. Branch Selection Support

  • New Feature: Added ability to specify which branch to clone and analyze via the -b/--branch option.
  • Details:
    • Users can now specify a branch when analyzing a repository:
      gitingest <repo_url> --branch main
    • If no branch is specified, the default branch of the repository is used.
    • Branch selection works seamlessly with other features like pattern exclusion.

Changes Made:

  1. CLI Updates:

    • Added new --branch/-b option
    • Enhanced pattern handling in --exclude-pattern/-e and --include-pattern/-i options
    • Added --ignore-file option with default value .gitingestignore
  2. Core Functionality:

    • Updated ingest function to handle branch selection
    • Implemented pattern parsing for space-separated patterns
    • Added ignore file parsing functionality with web repository support
    • Improved error handling and parameter validation
    • Added automatic cleanup for temporary files in web repository analysis
  3. Documentation:

    • Updated help messages for all CLI options
    • Added comprehensive docstrings for new parameters
    • Included usage examples for both local and web repositories
    • Added documentation for web repository behavior

Impact:
These enhancements make the gitingest CLI more flexible and user-friendly by:

  • Providing multiple ways to specify file exclusions
  • Supporting branch-specific analysis
  • Following familiar Git-like patterns for configuration
  • Supporting web repositories with the same feature set as local repositories
  • Improving the overall user experience and efficiency

Issue Addressed:

Usage Examples:

  1. Local Repository Analysis:
gitingest ./my-local-repo \
  -b develop \
  -e "*.pyc" \
  -e "node_modules/" \
  -i "*.py" \
  --ignore-file .gitingestignore
  1. Web Repository Analysis:
gitingest /~https://github.com/user/repo \
  -b develop \
  -e "*.pyc" \
  -e "node_modules/" \
  -i "*.py" \
  --ignore-file custom.gitingestignore

Note: The .gitingestignore file functionality works seamlessly with both gitingest cli and gitingest website, making it a versatile tool for managing file exclusions regardless of the repository location.

… and .gitingestignore support

- Added the ability to specify multiple file exclusion patterns using the `-e` option.
- Introduced support for a `.gitingestignore` file to define files and directories to be excluded.
- Improved command line pattern parsing to handle space-separated patterns.
- Combined exclusion patterns from both command line arguments and the `.gitingestignore` file for comprehensive exclusion.
- Removed the unused import 'os' to satisfy pylint checks.
- Corrected docstring issues to comply with darglint requirements.
- Updated documentation with clear examples illustrating the new features.

These enhancements provide users with a more flexible and user-friendly way to exclude files and directories, either through command line options or by using a `.gitingestignore` file.
@AbhiRam162105
Copy link
Contributor Author

@cyclotruc please review this PR.

@AbhiRam162105
Copy link
Contributor Author

@cyclotruc do you have any changes to suggest for this PR.

Copy link
Owner

@cyclotruc cyclotruc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal for gitingest is to provide the exact same behaviour when it's accessed from the web or from the CLI

In that regard, it would be wise to apply this ignore pattern logic also when a .gitingestignore file is included is found in the repo from the web UI

@cyclotruc
Copy link
Owner

@AbhiRam162105 Thank you for this contribution!

I requested a change and I'll be happy to merge this PR once it's done

But I want to warn you that this file might be a temporary feature, there's in fact the project to create a .gitingest file, I'm not sure about the format yet, and this file will contain ignore patterns but also all the other settings possible for gitingest

When this happens, we'll throw a depreciation warning when this file is used, until we finally make the move to .gitingest

I thought about naming this file .gitingest directly but it sounds less natural for the depreciation warning

@AbhiRam162105
Copy link
Contributor Author

AbhiRam162105 commented Jan 25, 2025

Hey @cyclotruc I have made changes such that .gitingestignore file method for file exclusion works with both CLI and when it's used from the web. Now if the .gitingestignore file exists in the repository the file is parsed in both cases and exclusion the files occurs in both CLI and web. Please review the code and feel free to continue with .gitingestignore file until the .gitignore file method is in the works.

@AbhiRam162105
Copy link
Contributor Author

Hey @cyclotruc any updates on this pull request ??

Copy link
Owner

@cyclotruc cyclotruc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @AbhiRam162105 , Sorry for the delay, finally got time to take a look at this PR

I appreciate the effort but unfortunatly, I don't think this implementation went in the right way, the last changes introduced some unecessary complexity that I want to avoid

I see repository cloning in the query_parser, it doesn't seem to be the right place to do that

Instead, you should just apply the parsed patterns from the .gitingestignore file to the parsed_query and then let the rest of the app handle the rest as usual

Let me know if you have any question or need help

@AbhiRam162105
Copy link
Contributor Author

okay @cyclotruc I'll look into this

… logic and enhancing ignore pattern handling
@AbhiRam162105
Copy link
Contributor Author

Hey @cyclotruc I have avoided this complication by modifying query_processor.py file where the repo is already being cloned instead of cloning it again in query_parser.

Hi @AbhiRam162105 , Sorry for the delay, finally got time to take a look at this PR

I appreciate the effort but unfortunatly, I don't think this implementation went in the right way, the last changes introduced some unecessary complexity that I want to avoid

I see repository cloning in the query_parser, it doesn't seem to be the right place to do that

Instead, you should just apply the parsed patterns from the .gitingestignore file to the parsed_query and then let the rest of the app handle the rest as usual

Let me know if you have any question or need help

@AbhiRam162105
Copy link
Contributor Author

Can you please review the code again @cyclotruc and if all is well - merge it. Let me know if anything else in the code is posing a problem.

@AbhiRam162105
Copy link
Contributor Author

@cyclotruc why is the CI test failing. Maybe this could resolve the issue-

By setting the asyncio_default_fixture_loop_scope explicitly in pyproject.toml

`[project]
name = "gitingest"
version = "0.1.2"
description="CLI tool to analyze and create text dumps of codebases for LLMs"
readme = {file = "README.md", content-type = "text/markdown" }
requires-python = ">= 3.10"
dependencies = [
"click>=8.0.0",
"fastapi-analytics",
"fastapi[standard]",
"python-dotenv",
"slowapi",
"starlette",
"tiktoken",
"uvicorn",
]
license = {file = "LICENSE"}
authors = [{name = "Romain Courtois", email = "romain@coderamp.io"}]
classifiers=[
"Development Status :: 3 - Alpha",
"Intended Audience :: Developers",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
]

[project.scripts]
gitingest = "gitingest.cli:main"

[project.urls]
homepage = "https://gitingest.com"
github = "/~https://github.com/cyclotruc/gitingest"

[build-system]
requires = ["setuptools>=61.0", "wheel"]
build-backend = "setuptools.build_meta"

[tool.setuptools]
packages = {find = {where = ["src"]}}
include-package-data = true

Linting configuration

[tool.pylint.format]
max-line-length = 119

[tool.pylint.'MESSAGES CONTROL']
disable = [
"too-many-arguments",
"too-many-positional-arguments",
"too-many-locals",
"too-few-public-methods",
"broad-exception-caught",
"duplicate-code",
]

[tool.pycln]
all = true

[tool.isort]
profile = "black"
line_length = 119
remove_redundant_aliases = true
float_to_top = true
order_by_type = true
filter_files = true

[tool.black]
line-length = 119

Test configuration

[tool.pytest.ini_options]
pythonpath = ["src"]
testpaths = ["tests/"]
python_files = "test_.py"
asyncio_mode = "auto"
python_classes = "Test
"
python_functions = "test_*"
asyncio_default_fixture_loop_scope = "function"`

Is this the right way to do this. Any insight regarding this from you would be helpful.

@cyclotruc
Copy link
Owner

@AbhiRam162105 Hi, So i've started looking into this PR and I must continue tomorrow, there's quite a lot of changes and I'm not sure I understand everything as of right now

Regarding the CI tests failing, to fix them you can run pre-commit install in your dev env, then run the pre-commit hooks with pre-commit run --all-files
It should automatically create changes that you can commit to comply with linter rules

@cyclotruc
Copy link
Owner

Oh, I just noticed, it seems to be a test not passing

Could you invite me to your repo so I can see if I can help you fix those? Thanks a lot

@AbhiRam162105
Copy link
Contributor Author

AbhiRam162105 commented Feb 4, 2025

@cyclotruc I have sent you an invite to my repo. Thank you for the help :)

@AbhiRam162105
Copy link
Contributor Author

@cyclotruc i guess now the tests should pass.

I have setup pre-commit and pytest ran error free

@AbhiRam162105
Copy link
Contributor Author

Hey @cyclotruc can you please review this.

Thanks :)

Copy link
Owner

@cyclotruc cyclotruc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello
Unfortunatly, I won't be able to merge this PR in its current state.

It's actually not easy and very time consuming to precisely tell you what's wrong with it, but in a gist:
I suspect some of this code to be AI generated, there are weird patterns that introduce unecessary complexity into the codebase. validate_patterns or create context functions are good examples of that.

I'm very sorry because I know that I am the one who asked you to add things to this PR when it was in an "OK" state. I can provide you more feedback if needed, you have my discord

@cyclotruc
Copy link
Owner

@AbhiRam162105
There's two options from there:

  • Either we close this PR and wait for the work on .gitingest file to start (as discussed above)
  • You want to pursue this work and then I just think you might be better of by starting fresh. The current changes seem affected by the rebases that had to happen while you worked on this and that's probably how we ended up with such a complex PR

To give you guidance here's what this Pr should do (and it shouldn't take many lines changes):
1 - Check if there's a .gitingestignore file in the repo/directory
2 - Parse it and add the patterns to the config object in parsing

Either way, I apologize for pushing back this review (the complexity made it daunting for me) and I appreciate the efforts you put into this. I would have been happy to merge it

Maybe the solution would be to hop on a quick call together and peer-code it together?

@AbhiRam162105
Copy link
Contributor Author

AbhiRam162105 commented Feb 10, 2025

Hey @cyclotruc I am sorry to hear that. But yes we can hop on a call and get this done. I'll text you on discord regarding this and finish this because ideally I would not want to leave the work incomplete after working on this for so long.

@cyclotruc
Copy link
Owner

I'm closing this in favor of: #191

@cyclotruc cyclotruc closed this Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants