Skip to content

User Data Retention

Aaron Rudkin edited this page Jan 9, 2017 · 1 revision

We retain the following user data;

  1. Generic NGINX server logs, which include User-Agent and IP address data, with default retention policies. We need a retention policy here.

  2. When a user adds a vote to a cart, we store a session cookie. We use only the ID in their session cookie to track the votes and don't do anything to connect carts to users. The default ID is just generated using UUID generation and not tied to any user information. Users can dump their cart and we don't retain anything. If the user saves a cart using a short name, maintain the data with the short name.

  3. For the API quota and search logging system, we log a hash of the user's data. The hash is made up of an sha256 hash of the user's user-agent, IP address, the current day, and a static salt. The static salt is stored separately in an auth file not synced to git. The hash format is: sha256(ip+"/"+user_agent+"/"+static_hash+"/"+day_of_year) and this is documented in logQuota.py. We then truncate the hash to the most significant 16 hex characters, so the resulting data is sparse and unlikely to have collisions, but also meaningless and cannot be reversed. Even given a hack of the hashes it would not be possible to back out the individual components. We need a retention policy here. I am fine with dumping after the digest email goes out.