Skip to content

Commit

Permalink
add more stop words
Browse files Browse the repository at this point in the history
  • Loading branch information
dimus committed Sep 26, 2023
1 parent 8c3e0a1 commit 3156840
Show file tree
Hide file tree
Showing 8 changed files with 2,974 additions and 2,749 deletions.
5 changes: 4 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,11 @@

## [v1.7.5] - 2023-09-12 Tue

- Add: CSV and TSV files provide now verbatim authorship instead of normalized
one.
- Add: a few more "termination words"
- Fix [#254]: treat `fa` as forma.
- Fix [#252]: process `dem` as an author word for `Von dem Bush` and like.
- Fix [#253]: process `dem` as an author word for `Von dem Bush` and like.
- Fix [#251]: do not process `y` as `and` for `Rafael Arango y Molina`.
- Fix [#249]: allow `cf` at the end of the strings, cf for infraspecies.
- Fix [#248]: do not escape double quotes for TSV output.
Expand Down
8 changes: 4 additions & 4 deletions ent/internal/preparser/grammar.peg
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ TailPhrase <- TailLastWordJunk / TailPhrase4 / TailPhrase3 /

TailLastWordJunk <- (("var" / "ined" / "ssp" / "subsp" / "subgen" ) '.'? /
"sensu" / "new" / "non" / "nec" / "hybrid" / "von" / 'P.' _? 'P.' /
"ms") '?'? &SpaceOrEnd
"ms" / 'CF') '?'? &SpaceOrEnd

TailPhrase4 <- ("pro" _ "parte" / "nomen") &NotLetterOrEnd / 'p.' _? 'p.' /
"nom." / "comb."
Expand All @@ -24,13 +24,13 @@ TailPhrase3 <- '('? 's' ('.' _? / _ ) ('s' '.'? &NotLetterOrEnd / 'l.' / 'str.'
'lat.')

TailStopWords <- ("environmental" / "enrichment" / "samples" /
("species" _)? ("group" / "complex") / "clade" /
"author" / "nec" / "vide" / "fide" / "non" ) &NotLetterOrEnd
"species" / "group" / "complex" / "clade" /
"author" / "nec" / "vide" / "species" / "fide" / "non" / "not" ) &NotLetterOrEnd

TailPhrase2 <- ("sero" ("var" / "type") / "sensu" / "auct" / "sec" / "near" /
"str") '.'? &NotLetterOrEnd

TailPhrase1 <- (('('? ('ht' / 'hort')) / 'spec' /
TailPhrase1 <- (('('? ('ht' / 'hort')) / "S" 'pec' /
'nov' '.'? _ 'spec') '.'? &NotLetterOrEnd

SpaceOrEnd <- CommaSpace? END
Expand Down
Loading

0 comments on commit 3156840

Please sign in to comment.