Improve how geolocation DB files are downloaded/updated #2308
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Part of #2124
Change logic to determine if the GeoLite2 db file needs to be downloaded, in an attempt to address recurrent issues where people report they are hitting GeoLite's download API limits.
Current approach has two main problems:
This has presented problems in the past, due to stateful services which continue referencing an old version of the metadata, and database builds released with incorrect metadata, which make Shlink think they are outdated.
This PR introduces a new table in the database where Shlink tracks download attempts, and their results (success or error).
Using this, Shlink can avoid subsequent attempts if there has been a number of consecutive download errors, making sure there's no more than a small number of attempts per day.
Additionally, it can use this to also know when was the last successful attempt, and download again after a reasonable number of days.
The exact rules of the algorithm introduced here are:
Todo
EXPLAIN
in MySQL shows the index ondate_updated
may not be used when ordering the result. Maybeid
needs to be used instead).EDIT: The index is in fact being used and making it very efficient. With 2M rows, sorting by
date_updated
(which is indexed) takes 0.001s, while sorting bydate_created
(which is NOT indexed) takes more than 3s.EDIT: I tried to lock via
SELECT ... FOR UPDATE
and a database transaction, but didn't result in properly locked rows. I need to investigate further but I'll keep the filesystem lock for now.