Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve how geolocation DB files are downloaded/updated #2308

Merged
merged 7 commits into from
Dec 16, 2024

Conversation

acelaya
Copy link
Member

@acelaya acelaya commented Dec 15, 2024

Part of #2124

Change logic to determine if the GeoLite2 db file needs to be downloaded, in an attempt to address recurrent issues where people report they are hitting GeoLite's download API limits.

Current approach has two main problems:

  1. The condition to perform the download is based on the existing GeoLite db file metadata, which includes the time in which it was built. If it's older than 35 days, Shlink tries to download it.
    This has presented problems in the past, due to stateful services which continue referencing an old version of the metadata, and database builds released with incorrect metadata, which make Shlink think they are outdated.
  2. There's no historical information of previous downloads, which means Shlink could retry and fail indefinitely in case of an error.

This PR introduces a new table in the database where Shlink tracks download attempts, and their results (success or error).

Using this, Shlink can avoid subsequent attempts if there has been a number of consecutive download errors, making sure there's no more than a small number of attempts per day.

Additionally, it can use this to also know when was the last successful attempt, and download again after a reasonable number of days.

The exact rules of the algorithm introduced here are:

  1. Get the last 15 download attempts.
  2. If the max amount of consecutive errors has been reached (15), skip the download if the last one is less than 2 days old.
  3. If there are no attempts at all or the database file itself does not exist, try to download the database.
  4. If the last attempt is an error but we haven't reached the max amount of errors, try to download the database.
  5. If the last attempt was successful and is older than 30 days, try to download the database.

Todo

  • Test performance as the table grows (EXPLAIN in MySQL shows the index on date_updated may not be used when ordering the result. Maybe id needs to be used instead).
    EDIT: The index is in fact being used and making it very efficient. With 2M rows, sorting by date_updated (which is indexed) takes 0.001s, while sorting by date_created (which is NOT indexed) takes more than 3s.
  • Wrap process in transaction and lock using the database rather than a symfony locker.
    EDIT: I tried to lock via SELECT ... FOR UPDATE and a database transaction, but didn't result in properly locked rows. I need to investigate further but I'll keep the filesystem lock for now.
  • Verify the mechanism to resolve the filesystem ID is working as expected and resolves different values for, eg. docker containers.
  • Log reason for which a database download was triggered:
    • No download attempts exist.
    • The database file does not exist.
    • Last attempt was error but no max consecutive errors were reached.
    • Last attempt was successful but it's older than 30 days.

Copy link

codecov bot commented Dec 16, 2024

Codecov Report

Attention: Patch coverage is 98.86364% with 1 line in your changes missing coverage. Please review.

Project coverage is 93.71%. Comparing base (9e34183) to head (509ef66).
Report is 9 commits behind head on develop.

Files with missing lines Patch % Lines
...ore/src/Geolocation/Entity/GeolocationDbUpdate.php 96.15% 1 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             develop    #2308      +/-   ##
=============================================
+ Coverage      93.66%   93.71%   +0.05%     
- Complexity      1660     1680      +20     
=============================================
  Files            275      276       +1     
  Lines           5791     5839      +48     
=============================================
+ Hits            5424     5472      +48     
  Misses           367      367              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@acelaya acelaya marked this pull request as ready for review December 16, 2024 19:15
@acelaya acelaya changed the title Feature/geolocation updates Improve how geolocation DB files are downloaded/updated Dec 16, 2024
@acelaya acelaya merged commit d533adf into shlinkio:develop Dec 16, 2024
23 checks passed
@acelaya acelaya deleted the feature/geolocation-updates branch December 16, 2024 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant