forked from LanguageMachines/ticcltools
-
Notifications
You must be signed in to change notification settings - Fork 0
/
NEWS
86 lines (72 loc) · 2.62 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
ticcltools 0.6 2018-06-05
[Ko vander Sloot]
Intermediate release, with a lot of new code to handle N-grams
Also a lot of refactoring is done, for more clear and maintainable code.
This is work in progress still.
* TICCL-unk:
- more extensive acronym detection
- fixed artifreq problems in 'clean' punctuated words
- added filters for 'unwanted' characters
- added a ligature filter to convert evil ligatures
- normalize all hyphens to a 'normal' one (-)
- use a better definition of punctuation (unicode character class is not
good enough to decide)
* TICCL-lexstat:
- the 'separator' symbol should get freq=0, so it isn't counted
- the clip value is added to the output filename
* TICCL-indexer:
- indexer and indexerNT now produce the same output, using different
strategies when a --foci files is used.
* TICCL-LDcalc:
major overhaul for n-grams
- added a ngram point column to the output (so NOT backward compatible!)
- produce a '.short' list for short word corrections
- produce a '.ambi' file with a list of n-grams related to short words
- prune a lot of ngrams from the output
* TICCL-rank:
- output is sorted now
- honor the ngram-points from the new LDcalc. (so NOT backward compatible!)
* TICCL-chain: new module to chain ranked files
* TICCL-lexclean:
-added a -x option for 'inverse' alphabet
* TICCL-anahash:
- added a --list option to produce a list of words and anagram values
[Maarten van Gompel]
* added metadata file: codemeta.json
ticcltools 0.5 2018-02-19
[Ko van der Sloot]
* updated configuration. also for Mac OSX
* use of more ticcutils stuff: diacritics filter
* added a TICCL-mergelex program
* the OMP_THREAD_LIMIT environment variable was ignored sometimes
* TICCL-unk:
- fixed a problem in artifreq handling
- changed acronym detection (work in progress)
- added -o option
TICCL-lexstat:
- added TTR output
- added -o option
TICCL-indexer
- now also handles --foci file. with some speed-up
- added a -t option
TICCL-LDcalc:
- be less picky on a few wrong lines in the data
* added some tests
* when libroaring is installed we built roaring versions of some modules (experimental)
* updated man pages
ticcltools 0.4 2017-04-04
[Ko van der Sloot]
* first official release.
- added functions to test on Word2Vec datafiles
- refactoring and modernizing stuff all around
ticcltools 0.3 2016-01-21
* upload ready
ticcltools 0.2 2016-01-14
[Ko van der Sloot]
* repository moved to GitHub
* Travis support
* first 'distributable' version
* added TICCL-stats program
* added W2V-near and W2V-dist programs bases on new libticcl lib
[0.1] Ko vd Sloot
started autoconfiscating