Skip to content

Commit

Permalink
Recognize "ingredient list" prefix (#23)
Browse files Browse the repository at this point in the history
  • Loading branch information
wvengen committed Jan 19, 2024
1 parent 3c4f2a3 commit 0904d14
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 3 deletions.
2 changes: 1 addition & 1 deletion lib/food_ingredient_parser/loose/scanner.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ class Scanner

SEP_CHARS = "|;,.".freeze
MARK_CHARS = "¹²³⁴⁵ᵃᵇᶜᵈᵉᶠᵍªº⁽⁾†‡⁺•°▪◊#^˄*~".freeze
PREFIX_RE = /\A\s*(ingredients|contains|ingred[iï][eë]nt(en)?(declaratie)?|bevat|dit zit er\s?in|samenstelling|zutaten)\b\s*[:;.]?\s*/i.freeze
PREFIX_RE = /\A\s*(ingredients(\s*list)?|contains|ingred[iï][eë]nt(en)?(declaratie)?|bevat|dit zit er\s?in|samenstelling|zutaten)\b\s*[:;.]?\s*/i.freeze
NOTE_RE = /\A\b(dit product kan\b|deze verpakking kan\b|kan sporen\b.*?\bbevatten\b|voor allergenen\b|allergenen\b|allergie[- ]informatie(\s*:|\b)|E\s*=|gemaakt in\b|geproduceerd in\b|bevat mogelijk\b|kijk voor meer\b|allergie-info|in de fabriek\b|in dit bedrijf\b|voor [0-9,.]+ (g\.?|gr\.?|ram|ml).*\bis [0-9,.]+ (g\.?|gr\.?|ram|ml).*\bgebruikt\b)/i.freeze
# Keep in sync with +abbrev+ in the +Common+ grammar, plus relevant ones from the +Amount+ grammar.
ABBREV_RE = Regexp.union(
Expand Down
5 changes: 3 additions & 2 deletions lib/food_ingredient_parser/strict/grammar/root.treetop
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,10 @@ module FoodIngredientParser::Strict::Grammar

rule root_prefix
(
'ingredients'i / 'contains'i /
'ingredients'i ( ws+ 'list'i )? / 'contains'i /
('ingred'i [IÏiï] [EËeë] 'n'i ( 't'i 'en'i? 'declaratie'i? )? ) / 'bevat'i / 'dit zit er in'i / 'samenstelling'i /
'zutaten'i
'zutaten'i /
'ingredienser'i
)
( ws* [:;.] ( ws* newline )? / ws* newline / ws ) ws* # optional colon or other separator
"'"? ws* # stray quote occurs sometimes
Expand Down

0 comments on commit 0904d14

Please sign in to comment.