You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For each result item, scollex should test whether the "child-parent" pair is not also a collocation. As in such case, the syntactical relationship may be incorrect. Because it may or may not be incorrect, Scollex should not remove such items or mark them with flags like "incorrect". We should just add a flag providing information, that the value is also a "traditional" collocate.
It would be probably best to have this functionality built directly into Scollex rather than moving the responsibility e.g. to WaG (imagine e.g. a tile loading data from Scollex and KonText (or MQuery) and combining them.
How it should work:
the import function will have an option -colloc-flags-with-span (int value)
if enabled, the vertical file processing will have two passes:
find all "traditional" collocations and store them in memory
run the current import to find syntactic collocations and for each word pair add a new attribute coOccurrence boolcoOccurrenceScore float64 (we choose a co-occurrence instead of collocation to distinguish further between the collocations we are interested here - syntactic ones and the "traditional ones").
Implementation notes:
to store freq info (Fxy, Fy, Fx) - use map (see FyTable, CounterTable for inspiration, maybe it will be even possible to reuse them)
there will be no need to keep parentSumTable and childSumTable as the relationship in traditional colls is simpler (a word either is not is not in a defined span/window of the other word).
the co-occurence will be defined for two words iff the "other" word is in a span ( -colloc-flags-with-span) of the "main" word (e.g. for span of 3 we will look 3 words backwards and 3 forwards)
The text was updated successfully, but these errors were encountered:
The resulting flag information should be part of the *_fcolls table. Or better - we should not just store a binary info (is vs. is not coll) but a collocation score (log dice).
For each result item, scollex should test whether the "child-parent" pair is not also a collocation. As in such case, the syntactical relationship may be incorrect. Because it may or may not be incorrect, Scollex should not remove such items or mark them with flags like "incorrect". We should just add a flag providing information, that the value is also a "traditional" collocate.
It would be probably best to have this functionality built directly into Scollex rather than moving the responsibility e.g. to WaG (imagine e.g. a tile loading data from Scollex and KonText (or MQuery) and combining them.
How it should work:
import
function will have an option-colloc-flags-with-span
(int value)import
to find syntactic collocations and for each word pair add a new attributecoOccurrence bool
coOccurrenceScore float64
(we choose a co-occurrence instead of collocation to distinguish further between the collocations we are interested here - syntactic ones and the "traditional ones").Implementation notes:
map
(seeFyTable
,CounterTable
for inspiration, maybe it will be even possible to reuse them)parentSumTable
andchildSumTable
as the relationship in traditional colls is simpler (a word either is not is not in a defined span/window of the other word).-colloc-flags-with-span
) of the "main" word (e.g. for span of 3 we will look 3 words backwards and 3 forwards)The text was updated successfully, but these errors were encountered: