-
Notifications
You must be signed in to change notification settings - Fork 32
DataFusion
olehmberg edited this page Apr 26, 2017
·
4 revisions
Load the datasets:
// Load the Data into FusableDataSet
FusableDataSet<Movie, Attribute> ds1 = new FusableDataSet<>();
new MovieXMLReader().loadFromXML(
new File("usecase/movie/input/academy_awards.xml"),
"/movies/movie",
ds1);
ds1.printDataSetDensityReport();
FusableDataSet<Movie, Attribute> ds2 = new FusableDataSet<>();
new MovieXMLReader().loadFromXML(
new File("usecase/movie/input/actors.xml"),
"/movies/movie",
ds2);
ds2.printDataSetDensityReport();
FusableDataSet<Movie, Attribute> ds3 = new FusableDataSet<>();
new MovieXMLReader().loadFromXML(
new File("usecase/movie/input/golden_globes.xml"),
"/movies/movie",
ds3);
ds3.printDataSetDensityReport();
Load the record correspondences between the datasets (obtained from identity resolution):
// load correspondences
CorrespondenceSet<Movie, Attribute> correspondences = new CorrespondenceSet<>();
correspondences.loadCorrespondences(
new File("usecase/movie/correspondences/academy_awards_2_actors_correspondences.csv"),
ds1,
ds2);
correspondences.loadCorrespondences(
new File("usecase/movie/correspondences/actors_2_golden_globes_correspondences.csv"),
ds2,
ds3);
Define the data fusion strategy and add fusers for multiple attributes:
// define the fusion strategy
DataFusionStrategy<Movie, Attribute> strategy = new DataFusionStrategy<>(new MovieXMLReader());
// add attribute fusers
strategy.addAttributeFuser(
Movie.TITLE,
new TitleFuserShortestString(),
new TitleEvaluationRule());
strategy.addAttributeFuser(
Movie.DIRECTOR,
new DirectorFuserLongestString(),
new DirectorEvaluationRule());
strategy.addAttributeFuser(
Movie.DATE,
new DateFuserVoting(),
new DateEvaluationRule());
strategy.addAttributeFuser(
Movie.ACTORS,
new ActorsFuserUnion(),
new ActorsEvaluationRule());
Initialise the data fusion engine with the strategy and run it:
// create the fusion engine
DataFusionEngine<Movie, Attribute> engine = new DataFusionEngine<>(strategy);
// run the fusion
FusableDataSet<Movie, Attribute> fusedDataSet = engine.run(correspondences, null);
Load the gold standard and evaluate the fusion result:
// load the gold standard
DataSet<Movie, Attribute> gs = new FusableDataSet<>();
new MovieXMLReader().loadFromXML(
new File("usecase/movie/goldstandard/fused.xml"),
"/movies/movie",
gs);
// evaluate
DataFusionEvaluator<Movie, Attribute> evaluator = new DataFusionEvaluator<>(
strategy, new RecordGroupFactory<Movie, Attribute>());
evaluator.setVerbose(true);
double accuracy = evaluator.evaluate(fusedDataSet, gs, null);
System.out.println(String.format("Accuracy: %.2f", accuracy));