Skip to content

v5.1.0

Compare
Choose a tag to compare
@hyunwoongko hyunwoongko released this 31 Mar 21:23
· 72 commits to main since this release

The fast backend

If you want to split sentences quickly, you can use the split_sentences function with the backend='fast' option from Kss 5.0.0. This method is based on the fast algorithm utilized in Kss versions prior to 3.0. It offers significantly faster processing compared to the mecab backend, but less accurate. Therefore, This feature could be useful when you need to split sentences very quickly but don't need high accuracy. Furthermore, the fast backend has been implemented in both Python and Cython.

  • If your environment supports the installation of Cython, Kss will use the Cython implementation, which boasts the fastest performance (x600 faster than mecab).
  • Otherwise, it will use the Python implementation, which is slower than the Cython version but faster than the mecab backend (x4 faster than mecab).

Given the substantial speed advantage of the Cython implementation, it is strongly recommended over the Python alternative. Kss automatically detects the availability of Cython in your environment and will install it if feasible, so you don't need to worry about Cython and C++ dependencies.

Accuracy (Normalized F1)

Backend blogs_ko blogs_lee nested sample tweets v_ending wikipedia
mecab 0.8860 0.8887 0.9206 0.9682 0.8137 0.4815 1.0000
fast (Python) 0.6281 0.7899 0.6899 0.7482 0.5315 0.1596 0.7358
fast (Cython) 0.6545 0.8132 0.6372 0.8407 0.5892 0.1596 0.9566

Speed (msec)

Backend blogs_ko blogs_lee nested sample tweets v_ending wikipedia
mecab 538.10 293.31 225.05 56.35 184.91 20.55 899.99
fast (Python) 146.75 70.94 52.84 12.11 37.80 4.69 255.90
fast (Cython) 0.91 0.55 0.46 0.09 0.40 0.05 1.12

Please note that while the core algorithm in the fast backend mirrors that of Kss C++ 1.3.1, several bugs identified in the original implementation have been rectified in Kss 5.0.0.