A group of 20 persons spent the night in a manor. In the morning, Daphine, a young lady, went missing. Her dead body was found the following day in the cellar. The autopsy revealed the following facts:
- She died of a head injury caused by a blunt force instrument.
- Her death happened between midnight and 8am.
- She was not killed in the cellar, but her body was moved there after her death.
- Goose down found in her wound proves that she died in bed, in one of the bedrooms of the manor; unfortunately, all beds have the same down pillows, and pillowcases were changed in the morning, so it is impossible to identify which bedroom.
The police interviewed all 19 suspects, and collected their statements about who they saw in which room at which time. They also assessed their reliability as witnesses, based on a psychological evaluation.
Help the police discover who killed Daphine, using the power of provenance management and probabilistic databases! ProvSQL will help you do that.
A step-by-step solution is provided in the solution.sql file. We recommend not checking this file unless you are stuck.
This tutorial assumes you have access to a working installation of ProvSQL. See the documentation for installation instructions.
Load the setup.sql SQL script within a fresh PostgreSQL database to set it up with all the data required; see, for instance, the create.sh script for an example of how to do this.
-
Familiarize yourself with the content of the database. Four tables are provided: information about persons, rooms, sightings of a person in a room by a witness, and reliability scores of witnesses. As a reminder,
\d
can be used in the PostgreSQL client to show the list of currently defined tables, and\d tablename
to check the schema of the tabletablename
. -
In order to be able to call all ProvSQL functions without prefixing them with the
provsql.
prefix, type the following command:SET search_path TO public, provsql;
This should be done at the start of every new session of the PostgreSQL client.
-
Design a query that retrieves for every sighting: the time of the sighting, the name of the person seen, the name of the witness, the name of the room. Store the content in a table
s
(withCREATE TABLE s AS ...
). -
We will now activate the support of provenance for this table
s
; by default, ProvSQL does not do anything unless a table has provenance enabled. To do this, simply type:SELECT add_provenance('s');
Display the content of the table
s
; theprovsql
column has been added and contains a provenance token, that references a gate in a provenance circuit. Feel free to run some basic queries on the tables
: query results will have provenance annotations (for now, they appear as a unique identifier). It is not possible to directly refer to theprovsql
column, which acts in a “magical” way. But it is possible to obtain the provenance token in a query by using theprovenance()
user-defined function.We will also create a provenance mapping: a mapping maps provenance tokens to actual values in a semiring. The first such mapping we will create uses the name of the witness as a value for the provenance token. The provenance mapping is created with:
SELECT create_provenance_mapping('witness_mapping','s','witness');
assuming the column
witness
of the tables
contains the name of the witness. A mapping is stored in the database as a simple table, as you can check. -
Right now, there are contradictions in the sightings, because witnesses cannot be fully trusted: the same person may be sighted at the same time in different rooms, which of course cannot be correct. Run a query that identifies all such contradictions.
-
Add in the
SELECT
clause of the previous query a termformula(provenance(),'witness_mapping')
. This has for effect to evaluate the provenance annotation of each tuple in a semiring that displays a formula for the provenance, usingwitness_mapping
to map initial provenance tokens to actual names. -
We now want to store in a new table
consistent_s
all sightings of a person at a time in a given room except those that are inconsistent (i.e., which have been identified as contradictions by the previous query). Create such a table with a SQL query, and display its content along with the provenance formula for each tuple. Attention: at the moment, ProvSQL does not support theNOT IN
SQL keyword; the only way to express negation is through theEXCEPT
keyword. -
Create a
suspects
table containing all distinct persons that were sighted (in a consistent sighting) during the time the murder was committed in one of the rooms the murder may have been committed. Display the content of thesuspects
table along with the provenance formula for each tuple. -
We now want to know how many sightings confirmed that a person was a suspect. To do that, we will use the counting m-semiring. First, add a column to the table
s
that contains an integer. Set this integer to1
for all tuples, and create a provenance mappingcount_mapping
that references this column: this is just so that each tuple of thes
table counts for one individual sighting when collecting the counts. Now, using thecounting
function instead of theformula
function, andcount_mapping
instead ofwitness_mapping
, display the number of sightings confirming that a person was a suspect. This number may be 0 in case of inconsistent sightings! Who had the most sightings? -
The number of confirmed sightings is nowhere enough to conclude. We will use the reliability score of each witness to help us identify the murderer. Similarly as in the previous question, add to the
s
table areliability
column to store floating-point reliability scores; set the reliability score of each sighting to that of the witness of the sighting.To assign these reliability scores as the probability of the provenance token of each tuple in
s
, simply do:SELECT set_prob(provenance(),reliability) FROM s;
-
Let us now find the murderer! The police needs a confidence of at least 0.99 before arresting the suspect. To compute the probability of a query answer, we can use the
probability_evaluate
function, which takes the following parameters:- the provenance token, obtained in ProvSQL by
provenance()
- the probability computation method: it can be set to
'monte-carlo'
(for Monte Carlo approximation), to'possible-worlds'
(for exhaustive enumeration of possible worlds), to'tree-decomposition'
(for a method based on computing a tree decomposition of the Boolean circuit) or to'compilation'
(for compilation of the provenance circuit to a d-DNNF); if omitted, a default strategy based on a combination of techniques is applied - an optional third argument: for Monte Carlo sampling, it is the
number of samples (as a character string);
for knowledge compilation, it is the name of the
knowledge compilation tool used (one of
'c2d'
,'d4'
, or'dsharp'
)
- the provenance token, obtained in ProvSQL by