Entity Resolution with Markov Logic
[March 13, 2012]: There are few changes to the assignment below. Changed parts
are marked in blue.
In this assignment, we will solve the Entity Resolution problem using Markov
Logic as discussed in the class.
- Download the Alchemy software.
- Download the following two MLN files for the Entity Resolution problem.
- er-th.mln (Implements the basic threshold based model).
- er-col.mln (Implements the more sophisticated collective inference model).
- Download the following two datasets (already given in the Alchemy format):
- Carefully identify the non-evidence (query) and the evidence predicates.
- You will perform the following steps for each of the two MLN rule files
given above.
- Use the learnwts command in Alchemy (with default settings) to learn the weights of the MLN rules.
Use the er-train.db above as the training file. Make sure to include the flag -queryEvidence which
implements closed world assumption (groundings not
specified in the database are assumed false evidence).
- Use the infer command (with default settings) to infer the probabilities
over the query atoms in the test database (er-test.db). Once again, make sure
to include the flag -queryEvidence.
- Run the following script to append the actual truth value
in front of each ground atom in the file containing inferred probabilities from the part above.
(e.g. "appendTrueVal.pl prob_file out_file testdb_file").
- Run the following script on the output file obtained in the part
above to find the area under precision-recall curve (AUC).
- Consider the AUC obtained for the two different MLN settings.
Which results are better? Is it what you would have expected? Comment.
-
For each of the MLN settings above, separate out the results for bibliography
prediction, author prediction and venue prediction. Compare the AUCs for each
of these cases separately. What do you observe? Comment.
- Read the paper
Entity Resolution with Markov Logic. Which MLN models in the paper do the above
two MLN settings correspond to? Do your results in the part above corroborate the qualitative
findings in the paper? Comment.
- Extra Credit:
-
Read about
precision and recall. Use a threshold varying from 0 to 1 (at the intervals
of .01) to output precision at various levels of recall for the inferred probabilities
as discussed in the class. Draw the precision-recall curve (X-axis is precision, Y-axis is recall) using GNU
plot for each of the two MLN settings on the same graph. Confirm the findings about area under
precision recall curve from the previous part.
- Alchemy comes with a host of other learning algorithms (and a number of parameter
settings) with which one could play around. Try a few other learning algorithms (e.g. Voted
Perceptron) and/or non-default parameter settings to see if you can get
any improvement in the results. Comment.
Turn in the following material:
Coming up soon!