-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Xoff edited this page Mar 20, 2013
·
4 revisions
Metafacture-mediawiki is a plugin for Metafacture.
The modules in Metafacture-Mediawiki can be divided in three groups.
These modules provide MediaWiki xml and wikitext parsing. They create and augment WikiPage
objects.
-
WikiXmlHandler
parses a MediaWiki xml document and emits aWikiPage
object for every page found -
WikiTextParser
uses Sweble to parse the wikitext in aWikiPage
object and attaches the abstract syntax tree (AST) to the object
Please note: Extractors are called analyzers in the code. The code will be updated with the next major revision (see issue #2) but until this happens the documentation is ahead of the code.
The extractors extract information from the different representations of a wiki page in WikiPage
object and turn these information into a Metafacture event stream.
-
AuthorityLinkExtractor
extracts authority file links (GND, LOC, IMDB, VIAF) from Wikipedia articles -
LinkExtractor
extracts all internal links in a wiki page from an AST -
SimpleLinkExtractor
extracts links from a wiki page using regular expression -
TemplateExtractor
extracts all templates from a wiki pages whose name matches a pattern -
MultiExtractor
runs a list of extractors and merges the results into a single record. Additionally, it makes sure that each extractor receives aWikiPage
containing the representations of the wikitext it requires.
These modules help working with WikiPage
objects.
-
AstToJson
adds a serialised representation of an AST to aWikiPage
object -
JsonToAst
adds an AST to aWikiPage
object which is reconstructed from a serialised represenation
Be the first to write a tutorial!