Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve memory management #36

Merged
merged 13 commits into from
Jul 31, 2024
Merged

Improve memory management #36

merged 13 commits into from
Jul 31, 2024

Conversation

b-rowan
Copy link
Contributor

@b-rowan b-rowan commented Jul 29, 2024

Removes the requirement for device managers to be stored in an in-memory dictionary, which should improve memory management. Closes #35.

@b-rowan b-rowan requested a review from tsagadar July 30, 2024 03:57
@tsagadar
Copy link
Collaborator

When it comes to memory management, it should be avoided that the full state of all devices is potentially kept in memory.

Not caching devices at all however could introduce substantial load to the database (as a single DDI call would trigger multiple loads). Ideally they can be cached somewhere between 10s (covers single DDI request) and 10m (covers full device update process). Frameworks that allow to set a TTL are lru_cache or aiocache - but I never used them.

So my view on the current Manager state is

  • device: introduce a device repository to access devices and that implements caching
  • force_update: move into device state as accomplished by this PR
  • update_complete: move to device state as well (and rename it to match its current use, maybe log_complete). Note: currently it is set to True at two different places in the code. Sensible? I'd prefer to only read/write it in the feedback endpoint
  • poll_time: only used to get more frequent feedback during updates. So I'd only create an entry in the dictionary if a non-default polling time should be used and the entries could also expire after 10m (update duration)
  • log_subscribers: move outside of the manageras accomplished by this PR

@b-rowan
Copy link
Contributor Author

b-rowan commented Jul 30, 2024

Frameworks that allow to set a TTL are lru_cache or aiocache - but I never used them.

I'll look into aiocache for now and see what I can do.

@UpstreamData UpstreamData force-pushed the dev_improve_memory_management branch from 296ab47 to 696de97 Compare July 30, 2024 14:40
@b-rowan
Copy link
Contributor Author

b-rowan commented Jul 30, 2024

Should be good to go now.

  • device - Added caching, now devices must be accessed and saved via the update manager to use. Also disabled cache for tests.
  • update_complete - Renamed to log_complete, and moved into DB. Remove the updates made to it from the update_log function.
  • poll_time - Using in memory dictionary, but not cached. Entries in the dict are now removed when being set to default value.

@tsagadar
Copy link
Collaborator

I think one final step is missing: we still create one DeviceUpdateManager instance per device. This instance has no additional state - but for that reason it is also rather useless.

It feels like we should be able to get rid of global device_managers. Not sure if aiocache supports it: have a helper function get_update_manager(dev_id), that retrieves the device by a get_device helper function (annotated by aiocache) and create an DeviceUpdateManager instance in the fly.

@b-rowan
Copy link
Contributor Author

b-rowan commented Jul 30, 2024

I think one final step is missing: we still create one DeviceUpdateManager instance per device. This instance has no additional state - but for that reason it is also rather useless.

It feels like we should be able to get rid of global device_managers. Not sure if aiocache supports it: have a helper function get_update_manager(dev_id), that retrieves the device by a get_device helper function (annotated by aiocache) and create an DeviceUpdateManager instance in the fly.

The global update_managers is not used for anything except for the unknown variant, it is essentially a way to have custom update managers that aren't just dynamically generated. Nothing ever gets added or removed from it, so it could be a constant if we want.

The update manager itself is just dynamically generated and not cached, which shouldn't be an issue since every update manager uses the same device cache for get_device, and everything else is just functionality.

@tsagadar
Copy link
Collaborator

Sorry - did not look at it properly. Why not drastically simplify this?

async def get_update_manager(dev_id: str) -> UpdateManager:
    if dev_id == "unknown":
        return UnknownUpdateManager("unknown")
    else:
        devices_count.set(await Device.all().count())
        return DeviceUpdateManager(dev_id)

delete_device can be removed, the endpoint can use await Device.filter(id__in=ids).delete() instead
reset_update_manager does no longer make sense

@b-rowan
Copy link
Contributor Author

b-rowan commented Jul 30, 2024

Sorry - did not look at it properly. Why not drastically simplify this?

async def get_update_manager(dev_id: str) -> UpdateManager:
    if dev_id == "unknown":
        return UnknownUpdateManager("unknown")
    else:
        devices_count.set(await Device.all().count())
        return DeviceUpdateManager(dev_id)

delete_device can be removed, the endpoint can use await Device.filter(id__in=ids).delete() instead reset_update_manager does no longer make sense

Will do.

@b-rowan
Copy link
Contributor Author

b-rowan commented Jul 30, 2024

Done. Had to ignore line length with flake8, black should handle that for the most part. Ill push another commit to fix that.

@b-rowan
Copy link
Contributor Author

b-rowan commented Jul 30, 2024

@tsagadar ready for review.

tsagadar

This comment was marked as outdated.

Copy link
Collaborator

@tsagadar tsagadar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When working based on this branch I realized a few things

  • *.py files are not yet reformatted with the new settings
  • Should add .prettierignore to match pre-commit config (open question: should prettier cover *.md, *.yml, *.yaml as well?)
  • flake8 complains about two lines (length) that black is formatting
  • I am not that excited about the consistent 4 spaces formatting: usually *.js, *.json, *.yaml uses the more compact 2 spaces. Understand that Python should have it.

Also: tests do not complete successfully on my end.

@tsagadar
Copy link
Collaborator

The test execution issue could be due to multiple instances of the same device being in memory (as aiocache always deserializes objects).

Would it be sufficient to use the built-in WeakValueDictionary? Won't allow to use ttl but at the end of the day will prevent all memory being used up. So something like this:

class DeviceUpdateManager(UpdateManager):
    default_hardware = None
    devices_cache = weakref.WeakValueDictionary()

    async def get_device(self) -> Device:
        device = DeviceUpdateManager.devices_cache.get(self.dev_id)
        if device is None:
            hardware = DeviceUpdateManager.default_hardware
            if hardware is None:
                hardware = (await Hardware.get_or_create(model="default", revision="default"))[
                    0
                ]
                DeviceUpdateManager.default_hardware = hardware

            device = (await Device.get_or_create(uuid=self.dev_id, defaults={"hardware": hardware}))[0]
            DeviceUpdateManager.devices_cache[self.dev_id] = device

        return device

    async def save_device(self, device: Device, update_fields: list[str]):
        await device.save(update_fields=update_fields)

@tsagadar
Copy link
Collaborator

The following two variables could be moved into UpdateManager as static fields

    device_log_subscriptions: dict[str, list[Callable]] = {}
    device_poll_time: dict[str, str] = {}

@b-rowan b-rowan force-pushed the dev_improve_memory_management branch from a79076f to f791585 Compare July 31, 2024 13:34
@b-rowan
Copy link
Contributor Author

b-rowan commented Jul 31, 2024

flake8 complains about two lines (length) that black is formatting

Not much can be done about these, they are logging comments, I just added an ignore inline for now.

I am not that excited about the consistent 4 spaces formatting: usually *.js, *.json, *.yaml uses the more compact 2 spaces. Understand that Python should have it.

I prefer 4 spaces in HTML for sure, and python definitely should also use 4. Not 100% on the other file types, but I think consistency across the codebase increases readability, and not like modern monitors aren't big enough to display all the data. Up to you, its a pretty easy change in the future if we decide to change it.

*.py files are not yet reformatted with the new settings

Fixed in latest, had to rebase onto master though.

The test execution issue could be due to multiple instances of the same device being in memory (as aiocache always deserializes objects).

I tested this both on my work PC (Arch Linux), and my home desktop (Windows 10), and didn't have any issues with tests. Super confused about what the issue could be...

Would it be sufficient to use the built-in WeakValueDictionary? Won't allow to use ttl but at the end of the day will prevent all memory being used up. So something like this:

I feel like this kindof defeats the purpose of the cache in the first place? We only want to hold on to references for ~10m, to keep gooseBit efficient. We will want to move to full caching in the future anyway I think, so no reason to not just get it over with now.

The following two variables could be moved into UpdateManager as static fields

WDYM by this? Class level variables?

@tsagadar
Copy link
Collaborator

The test execution issue could be due to multiple instances of the same device being in memory (as aiocache always deserializes objects).

I tested this both on my work PC (Arch Linux), and my home desktop (Windows 10), and didn't have any issues with tests. Super confused about what the issue could be...

That is strange. Observed the issue under macOS. Will retest (also on Ubuntu)

Would it be sufficient to use the built-in WeakValueDictionary? Won't allow to use ttl but at the end of the day will prevent all memory being used up. So something like this:

I feel like this kind of defeats the purpose of the cache in the first place? We only want to hold on to references for ~10m, to keep gooseBit efficient. We will want to move to full caching in the future anyway I think, so no reason to not just get it over with now.

Agree when reconsidering this. The garbage collector likely runs too frequently which will result in a low hit rate. Caching the "default_hardware" however could be helpful to avoid one additional DB access for each loaded device.

The following two variables could be moved into UpdateManager as static fields

WDYM by this? Class level variables?

Yes, class level variables.

@tsagadar
Copy link
Collaborator

Maybe it makes sense to commit poetry.lock as well to ensure we run the same versions of the dependencies?

@tsagadar
Copy link
Collaborator

My bad. Did not set AIOCACHE_DISABLE=1. Best to create a poetry command that takes care of this. Can look into it.

@b-rowan
Copy link
Contributor Author

b-rowan commented Jul 31, 2024

Maybe it makes sense to commit poetry.lock as well to ensure we run the same versions of the dependencies?

I don't do this cause merging it is a nightmare, but we can start.

@b-rowan b-rowan merged commit ada5846 into master Jul 31, 2024
@b-rowan b-rowan deleted the dev_improve_memory_management branch August 1, 2024 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve memory management of UpdateManager
3 participants