GitHub - nlp-compromise/penn-treebank: a small, non-commercial, fair-use subset of the Penn-Treebank, in JSON.

a small sample of PENN treebank part-of-speech tagged english dataset, with tags from the nlp-compromise tagset.

simply a transformation of the fair-use subset of the Penn Treebank by the NLTK library, with cosmetic formatting changes for javascript-use.

This data is for non-commercial fair-use only, and all users are encouraged to purchase a license of the full dataset for any commercial projects.

data is (only) 4,000 tagged sentences, with compromise tag-mappings, and some opinionated lumping of punctuation, contractions, etc.

972kb uncompressed.

sample:

{ text: 'Another OTC bank stock involved in a buy-out deal, First Constitution Financial, was higher.',
  tags:
   [ 'Determiner',
     'Noun',
     'Noun',
     'Noun',
     'Verb',
     'Preposition',
     'Determiner',
     'Noun',
     'Noun',
     'Noun',
     'Noun',
     'Noun',
     'Verb',
     'Comparative'
   ]
}

Original statement in NLTK:

Copyright (C) 1995 University of Pennsylvania;
This is a 10% fragment of Penn Treebank, (C) LDC 1995, which has been dependency parsed.
It is made available under fair use for the purposes of illustrating NLTK tools for tokenizing, tagging, chunking and parsing.
This data is for non-commercial use only.;

please file an issue if there are any copyright concerns in placing this on npm or github.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
test		test
.eslintrc		.eslintrc
LICENSE		LICENSE
README.md		README.md
build.js		build.js
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json
penn-data.json		penn-data.json
tagset-map.js		tagset-map.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

nlp-compromise/penn-treebank

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages