Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All workers run job when using sqlite3 data store #959

Open
3 tasks done
legout opened this issue Aug 29, 2024 · 16 comments
Open
3 tasks done

All workers run job when using sqlite3 data store #959

legout opened this issue Aug 29, 2024 · 16 comments
Labels

Comments

@legout
Copy link

legout commented Aug 29, 2024

Things to check first

  • I have checked that my issue does not already have a solution in the FAQ

  • I have searched the existing issues and didn't find my bug already reported there

  • I have checked that my bug is still present in the latest release

Version

v4.0.0a5

What happened?

When using sqlite3 as the data store and redis as the event broker and having several workers running, jobs are executed by all workers.

How can we reproduce the bug?

job.py

def job(name, *args, **kwargs):
    print(name, args, kwargs)

scheduler.py

from apscheduler import Scheduler
from apscheduler.datastores.sqlalchemy import SQLAlchemyDataStore
from apscheduler.eventbrokers.redis import RedisEventBroker
from job import job

def main():
    data_store = SQLAlchemyDataStore(
        engine_or_url="sqlite+aiosqlite:////tmp/test.db"
    )
    event_broker = RedisEventBroker("redis://localhost:6379")

    with Scheduler(data_store, event_broker) as sched:
        sched.add_job(job, args=("job1", 1, 2, 3), kwargs=dict(a="A"))

if __name__ == "__main__":
    main()

worker.py

from apscheduler import Scheduler
from apscheduler.datastores.sqlalchemy import SQLAlchemyDataStore
from apscheduler.eventbrokers.redis import RedisEventBroker

def main():
    data_store = SQLAlchemyDataStore(
        engine_or_url="sqlite+aiosqlite:////tmp/test.db"
    )
    event_broker = RedisEventBroker("redis://localhost:6379")

    with Scheduler(data_store, event_broker) as sched:
        sched.run_until_stopped()

if __name__ == "__main__":
    main()

Here is a screenshot of running two workers and scheduling the job four times.

image

@legout legout added the bug label Aug 29, 2024
@agronholm
Copy link
Owner

Please try the code from master. There are boatloads of fixes there compared to v4.0.0a5.

@legout
Copy link
Author

legout commented Aug 29, 2024

Thanks for the quick reply. I´ll give it a try!

@legout
Copy link
Author

legout commented Aug 29, 2024

I´ve upgraded apscheduler to current master. The problem still exists.

image

Code works as expected, when switching from sqlite3 to postgres.

image

@agronholm
Copy link
Owner

agronholm commented Aug 29, 2024

For the record, sqlite3 is a pretty bad choice when you need concurrency. But this shouldn't be happening regardless, so I'll try to reproduce the problem locally and investigate.

@legout
Copy link
Author

legout commented Aug 29, 2024

For sure. I´ll definitly use postgres in my production environment. But I´ve came across this bug during my testing. :-)

Thanks you very much for developing this fantastic lib.

@legout
Copy link
Author

legout commented Oct 8, 2024

Hi @agronholm

are there already any updates?

Thanks

@agronholm
Copy link
Owner

Sorry, not yet. But rest assured I will look at this before the next release.

@legout
Copy link
Author

legout commented Dec 13, 2024

For the record, sqlite3 is a pretty bad choice when you need concurrency. But this shouldn't be happening regardless, so I'll try to reproduce the problem locally and investigate.

Maybe this might help regarding sqlite and concurrency. :-)

/~https://github.com/tursodatabase/limbo

@legout
Copy link
Author

legout commented Dec 14, 2024

@agronholm
Maybe I can have a look into the code and try to fix this issue. Can you point me to the relevant parts in the code? How does the workers communicate with each others? Is there something like a lock in the datastore or eventbroker as soon as one worker acquires a job?

@agronholm
Copy link
Owner

There are acquired_by and acquired_until fields which are filled in by the data store (after acquiring row-level locks on the jobs). Other schedulers then see these and skip these jobs when looking for new ones.

Probably not relevant to this problem, but there are also JobAcquired events being broadcast by a scheduler when it acquires new jobs. Is this enough information?

@legout
Copy link
Author

legout commented Dec 14, 2024

Yeah, I think this is enough for the start.

@legout
Copy link
Author

legout commented Dec 14, 2024

I hope that it is ok, to document the debugging of this issue further here.

First test with three workers shows:

Sqlite data store:

  • aquired_by and aquired_until fields are set by one worker, but the other ones are not aware of it.

image

Same for the postgres data store looks fine.

image

Does that mean, that the sqlite db is "to slow" for this? Means, the write transaction (setting acquired_by...) of one worker isn´t finished before another worker reads from the jobs table?

@agronholm
Copy link
Owner

Sqlite is supposed to lock the database file for a transaction to prevent concurrent use. Is this not happening?

@legout
Copy link
Author

legout commented Dec 15, 2024

You know, how I can check whether this is happening or not?

@agronholm
Copy link
Owner

You could try to run multiple processes against the same database which increment a value, wait a couple seconds and then decrement it, and then commit the transaction. If the value starts drifting away from 0 and 1, then you know there's a problem.

@legout
Copy link
Author

legout commented Jan 13, 2025

Short update. I have tried several sqlite settings, but all with no success. I´ll give some more details later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants