-
Notifications
You must be signed in to change notification settings - Fork 273
GSOC2013_Progress_Kasun
Type inference to extend coverage Project Proposal
Sources for type inference. The list is based on the comments by Aleksander Pohl on the project proposal
Warm Up period (27th May - 16th June)
- Setup the public clone of Extraction Framework
- Setting up extraction-framework code on IDEA IDE, Building the code ect.
- Working on the Issue#33
- Familiarize with Scala, git, and IDEA
Week 1 (17th June- 23rd June)
- integrate the Airpedia triples classes in 31 languages to theDbpedia-links.
- Integrate triples obtained from the Wikipedia infoboxes, introductory sentences, categories and direct mapping between Wikipedia and Cyc to the Dbpedia-links. Aleksander's classification outputs (not completed)
Week 2 (24th June- 30th June)
- Identify Wikipedia leaf categories #Issue16Investigate on YAGO approach, read YAGO paper again
- Mail discuss tread on choosing the source data for leaf category identification Link to mail tread
- Method of leaf category identification
- get all parent categories
- get all child categories
- substitute "1" from "2" result is the all leaf categories.
-
Processing Wikipedia categories #issue17 Save parent-child relationship of the categories to a MySQL database in-order to address the requirement of the #issue17
-
Created tables
-
Node Table
CREATE TABLE IF NOT EXISTS node
(
node_id
int(10) NOT NULL AUTO_INCREMENT,
category_name
varchar(40) NOT NULL,
is_leaf
tinyint(1) NOT NULL,
is_prominent
tinyint(1) NOT NULL,
score_interlang
double DEFAULT NULL,
score_edit_histo
double NOT NULL,
PRIMARY KEY (node_id
),
UNIQUE KEY category_name
(category_name
)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
- Edge Table
CREATE TABLE IF NOT EXISTS edges
(
parent_id
int(10) NOT NULL,
child_id
int(10) NOT NULL,
PRIMARY KEY (parent_id
,child_id
)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Week 3 (1st July- 7th July) The leaf node detection, finding parent-child relationship approach mentioned in the 2nd week was abandoned due to following reasons.
- "categories that don't have a broader category are not included in skos_categories dump"
evidence for this claim is discussed here 1 2 - data freshness issues- since Dbpedia dumps nearly 1 year old and unavailability of synchronized sub-dumps for data analyze
New approach using Wikipedia Category and Categorylinks SQL dumps is drafted [here] (https://docs.google.com/document/d/1kXhaQu4UrEKX-v1DPwC6V2Sk9SNTDIwvgDtOZX5bZgk/edit?usp=sharing)
- [Wikipedia Data Dump dated 2013/06/04] (http://dumps.wikimedia.org/enwiki/20130604/) was used for above mentioned work