forked from bruce/linguistics
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.english
245 lines (173 loc) · 7.78 KB
/
README.english
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
= English Ruby Linguistics Module - Synopsis
This is an overview of the functionality currently in the English functions of
the Ruby Linguistics module as of version 0.02:
== Pluralization
require 'linguistics'
Linguistics::use( :en ) # extends Array, String, and Numeric
"box".en.plural
# => "boxes"
"mouse".en.plural
# => "mice"
"ruby".en.plural
# => "rubies"
== Indefinite Articles
"book".en.a
# => "a book"
"article".en.a
# => "an article"
== Present Participles
"runs".en.present_participle
# => "running"
"eats".en.present_participle
# => "eating"
"spies".en.present_participle
# => "spying"
== Ordinal Numbers
5.en.ordinal
# => "5th"
2004.en.ordinal
# => "2004th"
== Numbers to Words
5.en.numwords
# => "five"
2004.en.numwords
# => "two thousand and four"
2385762345876.en.numwords
# => "two trillion, three hundred and eighty-five billion,
seven hundred and sixty-two million, three hundred and
forty-five thousand, eight hundred and seventy-six"
== Quantification
"cow".en.quantify( 5 )
# => "several cows"
"cow".en.quantify( 1005 )
# => "thousands of cows"
"cow".en.quantify( 20_432_123_000_000 )
# => "tens of trillions of cows"
== Conjunctions
animals = %w{dog cow ox chicken goose goat cow dog rooster llama
pig goat dog cat cat dog cow goat goose goose ox alpaca}
puts "The farm has: " + animals.en.conjunction
# => The farm has: four dogs, three cows, three geese, three goats,
two oxen, two cats, a chicken, a rooster, a llama, a pig,
and an alpaca
Note that 'goose' and 'ox' are both correctly pluralized, and the correct
indefinite article 'an' has been used for 'alpaca'.
You can also use the generalization function of the #quantify method to give
general descriptions of object lists instead of literal counts:
allobjs = []
ObjectSpace::each_object {|obj| allobjs << obj.class.name}
puts "The current Ruby objectspace contains: " +
allobjs.en.conjunction( :generalize => true )
which will print something like:
The current Ruby objectspace contains: thousands of Strings,
thousands of Arrays, hundreds of Hashes, hundreds of
Classes, many Regexps, a number of Ranges, a number of
Modules, several Floats, several Procs, several MatchDatas,
several Objects, several IOS, several Files, a Binding, a
NoMemoryError, a SystemStackError, a fatal, a ThreadGroup,
and a Thread
== Infinitives
New in version 0.02:
"leaving".en.infinitive
# => "leave"
"left".en.infinitive
# => "leave"
"leaving".en.infinitive.suffix
# => "ing"
== WordNet® Integration
Also new in version 0.02, if you have the Ruby-WordNet module installed, you can
look up WordNet synsets using the Linguistics interface:
# Test to be sure the WordNet module loaded okay.
Linguistics::EN.has_wordnet?
# => true
# Fetch the default synset for the word "balance"
"balance".synset
# => #<WordNet::Synset:0x40376844 balance (noun): "a state of equilibrium"
(derivations: 3, antonyms: 1, hypernyms: 1, hyponyms: 3)>
# Fetch the synset for the first verb sense of "balance"
"balance".en.synset( :verb )
# => #<WordNet::Synset:0x4033f448 balance, equilibrate, equilibrize, equilibrise
(verb): "bring into balance or equilibrium; "She has to balance work and her
domestic duties"; "balance the two weights"" (derivations: 7, antonyms: 1,
verbGroups: 2, hypernyms: 1, hyponyms: 5)>
# Fetch the second noun sense
"balance".en.synset( 2, :noun )
# => #<WordNet::Synset:0x404ebb24 balance (noun): "a scale for weighing; depends
on pull of gravity" (hypernyms: 1, hyponyms: 5)>
# Fetch the second noun sense's hypernyms (more-general words, like a superclass)
"balance".en.synset( 2, :noun ).hypernyms
# => [#<WordNet::Synset:0x404e5620 scale, weighing machine (noun): "a measuring
instrument for weighing; shows amount of mass" (derivations: 2, hypernyms: 1,
hyponyms: 2)>]
# A simpler way of doing the same thing:
"balance".en.hypernyms( 2, :noun )
# => [#<WordNet::Synset:0x404e5620 scale, weighing machine (noun): "a measuring
instrument for weighing; shows amount of mass" (derivations: 2, hypernyms: 1,
hyponyms: 2)>]
# Fetch the first hypernym's hypernyms
"balance".en.synset( 2, :noun ).hypernyms.first.hypernyms
# => [#<WordNet::Synset:0x404c60b8 measuring instrument, measuring system,
measuring device (noun): "instrument that shows the extent or amount or quantity
or degree of something" (hypernyms: 1, hyponyms: 83)>]
# Find the synset to which both the second noun sense of "balance" and the
# default sense of "shovel" belong.
("balance".en.synset( 2, :noun ) | "shovel".en.synset)
# => #<WordNet::Synset:0x40473da4 instrumentality, instrumentation (noun): "an
artifact (or system of artifacts) that is instrumental in accomplishing some
end" (derivations: 1, hypernyms: 1, hyponyms: 13)>
# Fetch just the words for the other kinds of "instruments"
"instrument".en.hyponyms.collect {|synset| synset.words}.flatten
# => ["analyzer", "analyser", "cautery", "cauterant", "drafting instrument",
"extractor", "instrument of execution", "instrument of punishment", "measuring
instrument", "measuring system", "measuring device", "medical instrument",
"navigational instrument", "optical instrument", "plotter", "scientific
instrument", "sonograph", "surveying instrument", "surveyor's instrument",
"tracer", "weapon", "arm", "weapon system", "whip"]
There are many more WordNet methods supported Ð too many to list here. See the
documentation for the complete list.
== LinkParser Integration
Another new feature in version 0.02 is integration with the Ruby version of the
CMU Link Grammar Parser by Martin Chase. If you have the LinkParser module
installed, you can create linkages from English sentences that let you query for
parts of speech:
# Test to see whether or not the link parser is loaded.
Linguistics::EN.has_link_parser?
# => true
# Diagram the first linkage for a test sentence
puts "he is a big dog".sentence.linkages.first.to_s
+---O*---+
| +--Ds--+
+Ss+ | +-A-+
| | | | |
he is a big dog
# Find the verb in the sentence
"he is a big dog".en.sentence.verb.to_s
# => "is"
# Combined infinitive + LinkParser: Find the infinitive form of the verb of the
given sentence.
"he is a big dog".en.sentence.verb.infinitive
# => "be"
# Find the direct object of the sentence
"he is a big dog".en.sentence.object.to_s
# => "dog"
# Look at the raw LinkParser::Word for the direct object of the sentence.
"he is a big dog".en.sentence.object
# => #<LinkParser::Word:0x403da0a0 @definition=[[{@A-}, Ds-, {@M+}, J-], [{@A-},
Ds-, {@M+}, Os-], [{@A-}, Ds-, {@M+}, Ss+, {@CO-}, {C-}], [{@A-}, Ds-, {@M+},
Ss+, R-], [{@A-}, Ds-, {@M+}, SIs-], [{@A-}, Ds-, {R+}, {Bs+}, J-], [{@A-}, Ds-,
{R+}, {Bs+}, Os-], [{@A-}, Ds-, {R+}, {Bs+}, Ss+, {@CO-}, {C-}], [{@A-}, Ds-,
{R+}, {Bs+}, Ss+, R-], [{@A-}, Ds-, {R+}, {Bs+}, SIs-]], @right=[], @suffix="",
@left=[#<LinkParser::Connection:0x403da028 @rword=#<LinkParser::Word:0x403da0a0
...>, @lword=#<LinkParser::Word:0x403da0b4 @definition=[[Ss-, O+, {@MV+}], [Ss-,
B-, {@MV+}], [Ss-, P+], [Ss-, AF-], [RS-, Bs-, O+, {@MV+}], [RS-, Bs-, B-,
{@MV+}], [RS-, Bs-, P+], [RS-, Bs-, AF-], [{Q-}, SIs+, O+, {@MV+}], [{Q-}, SIs+,
B-, {@MV+}], [{Q-}, SIs+, P+], [{Q-}, SIs+, AF-]],
@right=[#<LinkParser::Connection:0x403da028 ...>], @suffix="", @left=[],
@name="is", @position=1>, @subName="*", @name="O", @length=3>], @name="dog",
@position=4>
# Combine WordNet + LinkParser to find the definition of the direct object of
# the sentence
"he is a big dog".en.sentence.object.gloss
# => "a member of the genus Canis (probably descended from the common wolf) that
has been domesticated by man since prehistoric times; occurs in many breeds;
\"the dog barked all night\""