v5.1.0
The fast
backend
If you want to split sentences quickly, you can use the split_sentences
function with the backend='fast'
option from Kss 5.0.0. This method is based on the fast algorithm utilized in Kss versions prior to 3.0. It offers significantly faster processing compared to the mecab
backend, but less accurate. Therefore, This feature could be useful when you need to split sentences very quickly but don't need high accuracy. Furthermore, the fast
backend has been implemented in both Python and Cython.
- If your environment supports the installation of
Cython
, Kss will use the Cython implementation, which boasts the fastest performance (x600 faster thanmecab
). - Otherwise, it will use the Python implementation, which is slower than the Cython version but faster than the
mecab
backend (x4 faster thanmecab
).
Given the substantial speed advantage of the Cython implementation, it is strongly recommended over the Python alternative. Kss automatically detects the availability of Cython in your environment and will install it if feasible, so you don't need to worry about Cython and C++ dependencies.
Accuracy (Normalized F1)
Backend | blogs_ko | blogs_lee | nested | sample | tweets | v_ending | wikipedia |
---|---|---|---|---|---|---|---|
mecab |
0.8860 | 0.8887 | 0.9206 | 0.9682 | 0.8137 | 0.4815 | 1.0000 |
fast (Python) |
0.6281 | 0.7899 | 0.6899 | 0.7482 | 0.5315 | 0.1596 | 0.7358 |
fast (Cython) |
0.6545 | 0.8132 | 0.6372 | 0.8407 | 0.5892 | 0.1596 | 0.9566 |
Speed (msec)
Backend | blogs_ko | blogs_lee | nested | sample | tweets | v_ending | wikipedia |
---|---|---|---|---|---|---|---|
mecab |
538.10 | 293.31 | 225.05 | 56.35 | 184.91 | 20.55 | 899.99 |
fast (Python) |
146.75 | 70.94 | 52.84 | 12.11 | 37.80 | 4.69 | 255.90 |
fast (Cython) |
0.91 | 0.55 | 0.46 | 0.09 | 0.40 | 0.05 | 1.12 |
Please note that while the core algorithm in the fast
backend mirrors that of Kss C++ 1.3.1, several bugs identified in the original implementation have been rectified in Kss 5.0.0.