Skip to content

Latest commit

 

History

History
27 lines (17 loc) · 944 Bytes

README.md

File metadata and controls

27 lines (17 loc) · 944 Bytes

@cantonese/segmenter

This library implements basic grapheme and word segmentation for Cantonese by comparing a depth-first trie traversal of a word list to a supplied string. The trie is built from an unmodified words.hk word list.

In the future it will use different models informed by natural language processing/computational linguistics.

Implements the proposed Intl.Segmenter API shape.

Installation

npm install --save https://github.com/cantonese/segmenter

Usage

import { Segmenter } from '@cantonese/segmenter';

function transform(segmentInfo) {
  return segmentInfo.segment.reverse();
}

var mySegmenter = new Segmenter('zh-hk', { granularity: 'word' });
var mySegments = mySegmenter.segment('我好鍾意食飯');
var transformed = [...mySegments].map(transform);

var atIndex = mySegments.contains(2);