This repository stores the public source code for the registration crawler or data used in the paper Tripwire: Inferring Internet Site Compromise, presented at IMC 2017.
Please direct questions to Joe DeBlasio.
While we provide complete source for the crawler, I highly discourage you from actually trying to run it, and you do so at your own risk. If, however, you are interested in the heuristics that our crawler uses, or how the system works, the code is all here!
But really, if you've been tasked with getting this crawler running, turn back all ye who enter here. This code is very old, very fragile, and requires a lot of moving parts to get working well.
See public_data.csv
for a dump of the login events database. This CSV has
headers, but more description of the fields are forthcoming.