Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto filter test accounts #3492

Merged
merged 41 commits into from
Mar 11, 2021
Merged

Auto filter test accounts #3492

merged 41 commits into from
Mar 11, 2021

Conversation

timgl
Copy link
Collaborator

@timgl timgl commented Feb 25, 2021

Changes

Adds a little checkbox to each query (default to true but won't apply to any historic queries or dashboards) that allows you to easily filter out test accounts and team members.

image

The migration (and whenever you create a new team) will automatically add the following filters:
image

It'll detect non-generic email addresses and filter those out, plus a few localhost ones (I'm a bit more doubtful about that.)

Checklist

  • All querysets/queries filter by Organization, by Team, and by User
  • Django backend tests
  • Jest frontend tests
  • Cypress end-to-end tests

@timgl timgl temporarily deployed to posthog-auto-filter-tes-5qumpi February 25, 2021 20:54 Inactive
@timgl timgl changed the title WIP auto filter test accounts Auto filter test accounts Feb 26, 2021
@timgl timgl temporarily deployed to posthog-auto-filter-tes-5qumpi February 26, 2021 11:47 Inactive
@timgl timgl requested review from paolodamico and macobo February 26, 2021 11:48
@timgl timgl marked this pull request as ready for review February 26, 2021 11:53
@timgl timgl marked this pull request as draft February 26, 2021 11:53
@timgl timgl temporarily deployed to posthog-auto-filter-tes-5qumpi March 1, 2021 09:37 Inactive
@timgl timgl temporarily deployed to posthog-auto-filter-tes-5qumpi March 1, 2021 13:03 Inactive
@timgl timgl temporarily deployed to posthog-auto-filter-tes-5qumpi March 1, 2021 13:44 Inactive
@timgl timgl temporarily deployed to posthog-auto-filter-tes-5qumpi March 2, 2021 15:26 Inactive
@timgl timgl temporarily deployed to posthog-auto-filter-tes-5qumpi March 2, 2021 15:59 Inactive
@timgl timgl temporarily deployed to posthog-auto-filter-tes-5qumpi March 2, 2021 16:00 Inactive
@timgl timgl temporarily deployed to posthog-auto-filter-tes-5qumpi March 2, 2021 16:03 Inactive
Copy link
Contributor

@paolodamico paolodamico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a look at the general python code & front-end, did some adjustments and otherwise lgtm. It would be great if someone with strong CH expertise take a look at this (maybe @macobo ?)

@timgl timgl temporarily deployed to posthog-auto-filter-tes-mwabnw March 10, 2021 02:07 Inactive
@timgl timgl temporarily deployed to posthog-auto-filter-tes-mwabnw March 10, 2021 14:42 Inactive
Copy link
Contributor

@macobo macobo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some notes, mostly focusing on the BE code. I'm confident this works but code needs a bit of :love:

@@ -21,6 +21,7 @@ def parse_prop_clauses(
prepend: str = "global",
table_name: str = "",
allow_denormalized_props: bool = False,
filter_test_accounts=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this introduces some cognitive debt. Basically it's really hard to figure out when this argument should be true vs false.

I think the right level to solve this at is the filter object.

All insight filters should overload def properties to do this same extend as needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm not sure I follow. Are you saying I should just change the name of the property to something else (should_filter_test_accounts maybe?). I don't find it too confusing, if filter_test_accounts=True you'd expect it to filter out test accounts and vice versa.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant it's not obvious when this argument should be passed vs when not. It results in one additional thing to consider in every codepath.

Given we know whether to filter_test_accounts or not in the filter object, a better solution (which hides this toggle from 90% of code) is to do it in the filter object in def properties() as proposed above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yeah I think you are right, I though there were a few paths were we explicitly didn't want to set filter_test_accounts even if the main filter does have it.

I don't think I understand your solution, do you mind elaborating? What do you mean with overload def properties()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

So a call to parse_prop_clauses looks something like this:

query, params = parse_prop_clauses(filter.properties, ...)

The idea is to make filter.properties dependent on the filter_test_accounts parameter. This can be done something like this in FilterTestAccountsMixin:

    @cached_property
    def properties(self) -> List[Property]:
        properties = super()
        if self.filter_test_accounts: 
            properties.extend(...)
        return properties

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm that feels pretty messy, as you won't know what you are passing through. Should we just pass through the filter object separately and then check if filter_test_accounts is true?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work as well, though not sure whether all callsites have access to a filter object. :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we come full circle that the original solution is the best solution?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood your proposal to be to pass filter object to parse_prop_clauses.

But we're going very long back-and-forth on this. If you're sold on the current approach, then let's go with that and refactor later as needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sold on it but I also don't think the other proposals are much better. I think fine to leave for now, there's good tests so refactoring should be easy.

frontend/src/scenes/dashboard/DashboardItems.tsx Outdated Show resolved Hide resolved
@@ -14,7 +16,28 @@


class TeamManager(models.Manager):
def set_test_account_filters(self, organization: Optional[Any]) -> List:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find any tests for this.

Suggestion: extract this to a module together with GenericEmail instead of putting it on the model + add tests to the module. It keeps utils.py containing utilities + keeps the model file focused, it's becoming a big ball of sphaghetti.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are here. Not sure I agree extracting this makes sense, the only time you use this is when you create a Team so I'd expect to find it in the team file.

posthog/migrations/0132_team_test_account_filters.py Outdated Show resolved Hide resolved
@timgl timgl temporarily deployed to posthog-auto-filter-tes-mwabnw March 11, 2021 12:22 Inactive
@timgl timgl temporarily deployed to posthog-auto-filter-tes-mwabnw March 11, 2021 12:22 Inactive
@timgl timgl temporarily deployed to posthog-auto-filter-tes-mwabnw March 11, 2021 14:19 Inactive
@timgl timgl temporarily deployed to posthog-auto-filter-tes-mwabnw March 11, 2021 15:00 Inactive
@@ -631,8 +631,10 @@ class GenericEmails:

def __init__(self):
with open(get_absolute_path("helpers/generic_emails.txt"), "r") as f:
dd = [x.rstrip() for x in f]
self.emails = dd
self.emails = {x.rstrip(): True for x in f}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just nitpicking since performance is basically the same, but considering the intentions set (with set comprehension: {domain.rstrip() for domain in f} would be a bit clearer.

@timgl timgl temporarily deployed to posthog-auto-filter-tes-mwabnw March 11, 2021 15:48 Inactive
@timgl timgl temporarily deployed to posthog-auto-filter-tes-mwabnw March 11, 2021 16:00 Inactive
@timgl timgl temporarily deployed to posthog-auto-filter-tes-mwabnw March 11, 2021 16:44 Inactive
@timgl timgl merged commit bf2c442 into master Mar 11, 2021
@timgl timgl deleted the auto-filter-test-accounts branch March 11, 2021 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants