Entity Resolution with Markov Logic

Entity Resolution with Markov Logic

[March 13, 2012]: There are few changes to the assignment below. Changed parts are marked in blue.

In this assignment, we will solve the Entity Resolution problem using Markov Logic as discussed in the class.

Download the Alchemy software.
Download the following two MLN files for the Entity Resolution problem.
- er-th.mln (Implements the basic threshold based model).
- er-col.mln (Implements the more sophisticated collective inference model).
Download the following two datasets (already given in the Alchemy format):
- Training Data
- Test Data
Carefully identify the non-evidence (query) and the evidence predicates.
You will perform the following steps for each of the two MLN rule files given above.
- Use the learnwts command in Alchemy (with default settings) to learn the weights of the MLN rules. Use the er-train.db above as the training file. Make sure to include the flag -queryEvidence which implements closed world assumption (groundings not specified in the database are assumed false evidence).
- Use the infer command (with default settings) to infer the probabilities over the query atoms in the test database (er-test.db). Once again, make sure to include the flag -queryEvidence.
- Run the following script to append the actual truth value in front of each ground atom in the file containing inferred probabilities from the part above. (e.g. "appendTrueVal.pl prob_file out_file testdb_file").
- Run the following script on the output file obtained in the part above to find the area under precision-recall curve (AUC).
Consider the AUC obtained for the two different MLN settings. Which results are better? Is it what you would have expected? Comment.
For each of the MLN settings above, separate out the results for bibliography prediction, author prediction and venue prediction. Compare the AUCs for each of these cases separately. What do you observe? Comment.
Read the paper Entity Resolution with Markov Logic. Which MLN models in the paper do the above two MLN settings correspond to? Do your results in the part above corroborate the qualitative findings in the paper? Comment.
Extra Credit:
- Read about precision and recall. Use a threshold varying from 0 to 1 (at the intervals of .01) to output precision at various levels of recall for the inferred probabilities as discussed in the class. Draw the precision-recall curve (X-axis is precision, Y-axis is recall) using GNU plot for each of the two MLN settings on the same graph. Confirm the findings about area under precision recall curve from the previous part.
- Alchemy comes with a host of other learning algorithms (and a number of parameter settings) with which one could play around. Try a few other learning algorithms (e.g. Voted Perceptron) and/or non-default parameter settings to see if you can get any improvement in the results. Comment.

Turn in the following material: Coming up soon!