Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload static cohort csv #2932

Merged
merged 18 commits into from
Jan 15, 2021
Merged

Upload static cohort csv #2932

merged 18 commits into from
Jan 15, 2021

Conversation

timgl
Copy link
Collaborator

@timgl timgl commented Jan 13, 2021

Changes

Allows users to upload CSV with distinct ids or email addresses to create a cohort

image

image

@fuziontech Would love a check of what I've done with the Kafka topic/protobuf etc.

Checklist

  • All querysets/queries filter by Organization, by Team, and by User
  • Django backend tests
  • Jest frontend tests
  • Cypress end-to-end tests

@timgl timgl requested review from fuziontech and EDsCODE January 13, 2021 13:27
@timgl timgl temporarily deployed to posthog-upload-static-c-njujww January 13, 2021 13:30 Inactive
@timgl timgl temporarily deployed to posthog-upload-static-c-njujww January 13, 2021 13:37 Inactive
@timgl timgl temporarily deployed to posthog-upload-static-c-njujww January 13, 2021 13:44 Inactive
@timgl timgl temporarily deployed to posthog-upload-static-c-njujww January 13, 2021 14:07 Inactive
Copy link
Member

@EDsCODE EDsCODE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backend comments first! QA'ing frontend now

ee/clickhouse/models/cohort.py Outdated Show resolved Hide resolved
posthog/api/cohort.py Show resolved Hide resolved
Copy link
Member

@EDsCODE EDsCODE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frontend lgtm so just the point about repeats and updating cohorts above

@timgl timgl temporarily deployed to posthog-upload-static-c-njujww January 13, 2021 15:46 Inactive
@timgl timgl temporarily deployed to posthog-upload-static-c-njujww January 13, 2021 15:47 Inactive
@timgl timgl temporarily deployed to posthog-upload-static-c-njujww January 13, 2021 16:25 Inactive
@timgl timgl temporarily deployed to posthog-upload-static-c-njujww January 13, 2021 16:49 Inactive
@EDsCODE EDsCODE mentioned this pull request Jan 13, 2021
21 tasks
@Twixes Twixes mentioned this pull request Jan 14, 2021
Copy link
Member

@fuziontech fuziontech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic overall checks out. The only bug that would happen in production is that you are producing Proto encoded data to a table that is expecting JSON. You can checkout what encoding to a json table looks like here /~https://github.com/PostHog/posthog/blob/master/ee/clickhouse/models/person.py#L74

This also raises a really good point that we should probably disable inserting to clickhouse for tests and use the kafka path since that is the only active path in any environment.

Comment on lines 76 to 84
for person_uuid in person_uuids:
pb_event = person_static_cohort_pb2.PersonStaticCohort()
pb_event.id = str(uuid.uuid4())
pb_event.person_id = str(person_uuid)
pb_event.cohort_id = cohort_id
pb_event.team_id = team.pk

p = ClickhouseProducer()
p.produce_proto(sql=INSERT_PERSON_STATIC_COHORT, topic=KAFKA_PERSON_STATIC_COHORT, data=pb_event)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is perfect for a proto encoded table, but the table you have defined is json encoded.

Check out producing for persons to see what producing to a json table looks like
/~https://github.com/PostHog/posthog/blob/master/ee/clickhouse/models/person.py#L74

@timgl timgl temporarily deployed to posthog-upload-static-c-njujww January 14, 2021 20:06 Inactive
@timgl timgl temporarily deployed to posthog-upload-static-c-njujww January 14, 2021 20:38 Inactive
@timgl timgl temporarily deployed to posthog-upload-static-c-njujww January 14, 2021 20:57 Inactive
@fuziontech fuziontech temporarily deployed to posthog-upload-static-c-njujww January 15, 2021 00:03 Inactive
@fuziontech fuziontech temporarily deployed to posthog-upload-static-c-njujww January 15, 2021 03:31 Inactive
@fuziontech fuziontech temporarily deployed to posthog-upload-static-c-njujww January 15, 2021 03:55 Inactive
@fuziontech fuziontech self-requested a review January 15, 2021 04:05
@fuziontech
Copy link
Member

This is ready to go. Table created on cluster in prod. I'll let you land it tomorrow so you can test it out.

@timgl timgl temporarily deployed to posthog-upload-static-c-njujww January 15, 2021 10:09 Inactive
@timgl timgl merged commit 1a6207d into master Jan 15, 2021
@timgl timgl deleted the upload-static-cohort-csv branch January 15, 2021 10:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants