-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: add only new artifacts #48
refactor: add only new artifacts #48
Conversation
@@ -221,6 +227,27 @@ func (c *Crawler) Visit(ctx context.Context, url string) error { | |||
return nil | |||
} | |||
|
|||
// To avoid a large number of requests to the server, we should skip already saved artifacts (if the start date is specified). | |||
// P.S. We do not need to check for updates, since artifacts are immutable | |||
// see https://central.sonatype.org/publish/requirements/immutability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The document seems relevant to the sonatype repository. Is this repository also immutable?
https://mvnrepository.com/repos/central
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find official info about that. But i think rules should be same.
But i saw answer about that - https://stackoverflow.com/questions/40739939/dropping-a-release-from-public-maven-central
Also indirect evidence is this answer (they had the same sha1 for several artifacts):
instead of changing the file - they release a new version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you compare the database from scratch and one from the previous one? Did they match?
I remember we created a small script to compare two databases, but didn't find it... |
I will do that and write to you |
I have bad news... I found problem with this idea - it looks like this date is not date it was added to maven central.
So i close this PR. I have another idea. |
Description
Recently, we’ve been encountering
too many requests
error more frequently (/~https://github.com/aquasecurity/trivy-java-db/actions/runs/12458901714).Increasing the number of parallel goroutines or reducing the delay between requests doesn't resolve the issue.
Therefore, this PR adds functionality to add only new artifacts to the database. The need for adding is determined using the
updateAt
field from thecache/db/metadata.json
file (the date minus one day to account for the time needed to build the database).If the
cache/db
dir is empty, the crawler will save all indexes.Test runs:
2024-10-13
- /~https://github.com/DmitriyLewen/trivy-java-db/actions/runs/12465006507/job/347900622522024-12-22
- /~https://github.com/DmitriyLewen/trivy-java-db/actions/runs/12465457228/job/34791277401