GitHub - snooopyman/es-compromise: modesto procesamiento del lenguaje natural

es-compromise

modesto procesamiento del lenguaje natural

npm install es-compromise

_{trabajo en progreso! • work-in-progress!}

_{ver: italian • german • french • english}

es-compromise es un port de compromise en español

El objetivo de este proyecto es proporcionar un etiquetador de POS pequeño, básico y basado en reglas.

_{(this project is a small, basic, rules-based POS tagger!)}

import nlp from 'es-compromise'

let doc = nlp('Tengo que bailar contigo hoy')
doc.match('#Verb').out('array')
// [ 'Tengo', 'bailar' ]

o en el navegador:

<script src="https://unpkg.com/es-compromise"></script>
<script>
  let txt = 'Oh, tú, tú eres el imán y yo soy el metal'
  let doc = esCompromise(txt) // window.esCompromise
  console.log(doc.json())
  // { text:'Oh, tú...', terms:[ ... ] }
</script>

API

es-compromise incluye todos los métodos de compromise/one:

haga clic aquí para ver la API

Output

.text() - return the document as text
.json() - return the document as data
.debug() - pretty-print the interpreted document
.out() - a named or custom output
.html({}) - output custom html tags for matches
.wrap({}) - produce custom output for document matches

Utils

.found [getter] - is this document empty?
.docs [getter] get term objects as json
.length [getter] - count the # of characters in the document (string length)
.isView [getter] - identify a compromise object
.compute() - run a named analysis on the document
.clone() - deep-copy the document, so that no references remain
.termList() - return a flat list of all Term objects in match
.cache({}) - freeze the current state of the document, for speed-purposes
.uncache() - un-freezes the current state of the document, so it may be transformed

Accessors

.all() - return the whole original document ('zoom out')
.terms() - split-up results by each individual term
.first(n) - use only the first result(s)
.last(n) - use only the last result(s)
.slice(n,n) - grab a subset of the results
.eq(n) - use only the nth result
.firstTerms() - get the first word in each match
.lastTerms() - get the end word in each match
.fullSentences() - get the whole sentence for each match
.groups() - grab any named capture-groups from a match
.wordCount() - count the # of terms in the document
.confidence() - an average score for pos tag interpretations

Match

(match methods use the match-syntax.)

.match('') - return a new Doc, with this one as a parent
.not('') - return all results except for this
.matchOne('') - return only the first match
.if('') - return each current phrase, only if it contains this match ('only')
.ifNo('') - Filter-out any current phrases that have this match ('notIf')
.has('') - Return a boolean if this match exists
.before('') - return all terms before a match, in each phrase
.after('') - return all terms after a match, in each phrase
.union() - return combined matches without duplicates
.intersection() - return only duplicate matches
.complement() - get everything not in another match
.settle() - remove overlaps from matches
.growRight('') - add any matching terms immediately after each match
.growLeft('') - add any matching terms immediately before each match
.grow('') - add any matching terms before or after each match
.sweep(net) - apply a series of match objects to the document
.splitOn('') - return a Document with three parts for every match ('splitOn')
.splitBefore('') - partition a phrase before each matching segment
.splitAfter('') - partition a phrase after each matching segment
.lookup([]) - quick find for an array of string matches
.autoFill() - create type-ahead assumptions on the document

Tag

.tag('') - Give all terms the given tag
.tagSafe('') - Only apply tag to terms if it is consistent with current tags
.unTag('') - Remove this term from the given terms
.canBe('') - return only the terms that can be this tag

Case

.toLowerCase() - turn every letter of every term to lower-cse
.toUpperCase() - turn every letter of every term to upper case
.toTitleCase() - upper-case the first letter of each term
.toCamelCase() - remove whitespace and title-case each term

Whitespace

.pre('') - add this punctuation or whitespace before each match
.post('') - add this punctuation or whitespace after each match
.trim() - remove start and end whitespace
.hyphenate() - connect words with hyphen, and remove whitespace
.dehyphenate() - remove hyphens between words, and set whitespace
.toQuotations() - add quotation marks around these matches
.toParentheses() - add brackets around these matches

Loops

.map(fn) - run each phrase through a function, and create a new document
.forEach(fn) - run a function on each phrase, as an individual document
.filter(fn) - return only the phrases that return true
.find(fn) - return a document with only the first phrase that matches
.some(fn) - return true or false if there is one matching phrase
.random(fn) - sample a subset of the results

Insert

.replace(match, replace) - search and replace match with new content
.replaceWith(replace) - substitute-in new text
.remove() - fully remove these terms from the document
.insertBefore(str) - add these new terms to the front of each match (prepend)
.insertAfter(str) - add these new terms to the end of each match (append)
.concat() - add these new things to the end
.swap(fromLemma, toLemma) - smart replace of root-words,using proper conjugation

Transform

.sort('method') - re-arrange the order of the matches (in place)
.reverse() - reverse the order of the matches, but not the words
.normalize({}) - clean-up the text in various ways
.unique() - remove any duplicate matches

Lib

(these methods are on the main nlp object)

nlp.tokenize(str) - parse text without running POS-tagging
nlp.lazy(str, match) - scan through a text with minimal analysis
nlp.plugin({}) - mix in a compromise-plugin
nlp.parseMatch(str) - pre-parse any match statements into json
nlp.world() - grab or change library internals
nlp.model() - grab all current linguistic data
nlp.methods() - grab or change internal methods
nlp.hooks() - see which compute methods run automatically
nlp.verbose(mode) - log our decision-making for debugging
nlp.version - current semver version of the library
nlp.addWords(obj) - add new words to the lexicon
nlp.addTags(obj) - add new tags to the tagSet
nlp.typeahead(arr) - add words to the auto-fill dictionary
nlp.buildTrie(arr) - compile a list of words into a fast lookup form
nlp.buildNet(arr) - compile a list of matches into a fast match form

docs

Los Números

puede analizar números escritos o numéricos

let doc = nlp('tengo cuarenta dolares')
doc.numbers().minus(50)
doc.text()
// tengo moins diez dolares

number docs

Lematización

puede conjugar la raíz de las palabras

let doc = nlp('tiramos nuestros zapatos')
doc.compute('root')
doc.has('{tirar} nuestros {zapato}')
//true

root docs

Contribuyendo

únete para ayudar! - please join to help!

help with first PR1

git clone https://github.com/nlp-compromise/es-compromise.git
cd es-compromise
npm install
npm test
npm watch

Ver también

opennlp-spanish - Java tagger w/ spanish model
TreeTagger - Perl tagger w/ spanish model

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
builds		builds
data		data
demo		demo
learn		learn
scripts		scripts
src		src
tests		tests
types		types
.eslintrc		.eslintrc
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
changelog.md		changelog.md
package-lock.json		package-lock.json
package.json		package.json
rollup.config.js		rollup.config.js
scratch.js		scratch.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

API

haga clic aquí para ver la API

Output

Utils

Accessors

Match

Tag

Case

Whitespace

Loops

Insert

Transform

Lib

Los Números

Lematización

Contribuyendo

Ver también

About

Releases

Packages

Languages

License

snooopyman/es-compromise

Folders and files

Latest commit

History

Repository files navigation

API

haga clic aquí para ver la API

Output

Utils

Accessors

Match

Tag

Case

Whitespace

Loops

Insert

Transform

Lib

Los Números

Lematización

Contribuyendo

Ver también

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages