Skip to content

Ferdinand-Wu/dissertation

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository contains the code used in my dissertation research on sentence compression and fusion. The system implements supervised structured prediction for text transformation in which the inference approach relies on integer programming algorithms to jointly produce output sentences characterized by

  • a sequence of n-grams (bigrams or trigrams)
  • an edge-factored dependency tree
  • a SEMAFOR-style frame-semantic parse (compression only)

These models are described and evaluated in Chapter 3, the latter half of Chapter 6 and and Chapter 7 of my dissertation: Multi-Structured Models for Transforming and Aligning Text.

Usage

Honestly, it's unlikely that this code will be directly usable. It was extracted from a larger library without modification, hasn't been tested outside the original development environment and ultimately suffers from all the usual pitfalls of research code written under deadline pressure. Instead, interested users are encouraged to use this repository for reference or as a source of piecemeal solutions in reimplementation efforts.

Nevertheless, if you want to try to get this code running, here is a list of the known requirements:

  • Python 2.6 or 2.7
  • Ensure the distributed modules are on the $PYTHONPATH
  • Module dependencies:
  • External software:
  • Data:
  • Update all paths in the code with appropriate paths to your installations.
  • Launch servers:
    • LM servers through interfaces/srilm.py
    • Optionally, PTB servers through interfaces/treebank/depmodel.py
  • Entry points to the code are transduction/compression.py and transduction/pyrfusion.py.
    • Run these with --help for command-line options.
    • Structural configurations are inferred through feature configurations, defined in transduction/featconfigs.py. The default options have simple names like word, ngram, dep and are listed at the top of the file.
  • Contact me if you want the model files or system outputs from my experiments.

Support

This code is provided as-is and without any implicit or explicit assurance of support. Minor bugs may not be addressed but will be listed in this README.

About

Sentence compression and fusion

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%