GitHub - Ferdinand-Wu/dissertation: Sentence compression and fusion

This repository contains the code used in my dissertation research on sentence compression and fusion. The system implements supervised structured prediction for text transformation in which the inference approach relies on integer programming algorithms to jointly produce output sentences characterized by

a sequence of n-grams (bigrams or trigrams)
an edge-factored dependency tree
a SEMAFOR-style frame-semantic parse (compression only)

These models are described and evaluated in Chapter 3, the latter half of Chapter 6 and and Chapter 7 of my dissertation: Multi-Structured Models for Transforming and Aligning Text.

Usage

Honestly, it's unlikely that this code will be directly usable. It was extracted from a larger library without modification, hasn't been tested outside the original development environment and ultimately suffers from all the usual pitfalls of research code written under deadline pressure. Instead, interested users are encouraged to use this repository for reference or as a source of piecemeal solutions in reimplementation efforts.

Nevertheless, if you want to try to get this code running, here is a list of the known requirements:

Python 2.6 or 2.7
Ensure the distributed modules are on the $PYTHONPATH
Module dependencies:
- argparse (for Python 2.6)
- nltk 3 (with Wordnet and Framenet corpora)
- psutil
- pyutilib.enum
- simplejson
- swig-srilm
- stemming
External software:
- Gurobi 6.0 (offers academic licensing)
- LPsolve
- SRILM
- Stanford parser 2.0.4 (or similar older version which produces projective trees)
- SEMAFOR
- RASP 3.x
- TagChunk
Data:
- Dependency-converted Penn treebank for interfaces/treebank/depmodel.py (not necessary for default features)
- Clarke & Lapata datasets for compression (contact me for dataset splits)
- Pyramid evaluation data from DUC 2005-2007 and TAC 2008-2011 for fusion, available from NIST
Update all paths in the code with appropriate paths to your installations.
Launch servers:
- LM servers through interfaces/srilm.py
- Optionally, PTB servers through interfaces/treebank/depmodel.py
Entry points to the code are transduction/compression.py and transduction/pyrfusion.py.
- Run these with --help for command-line options.
- Structural configurations are inferred through feature configurations, defined in transduction/featconfigs.py. The default options have simple names like word, ngram, dep and are listed at the top of the file.
Contact me if you want the model files or system outputs from my experiments.

Support

This code is provided as-is and without any implicit or explicit assurance of support. Minor bugs may not be addressed but will be listed in this README.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
interfaces		interfaces
learning		learning
lexical		lexical
text		text
transduction		transduction
utils		utils
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Usage

Support

About

Releases

Packages

Languages

License

Ferdinand-Wu/dissertation

Folders and files

Latest commit

History

Repository files navigation

Usage

Support

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages