Skip to content

Releases: segment-any-text/wtpsplit

Release 2.1.1

27 Oct 14:19
Compare
Choose a tag to compare
  • Change default behaviour for newlines in SaT.split.
    • Now, while the model ignores them, they will used to split as simple post-processing.
  • Small bugfixes for LoRA training
  • Update Readme for advanced usage

Release 2.1.0

24 Sep 21:37
00d2d6c
Compare
Choose a tag to compare
  • Adds ONNX support for SaT models.
    • Including export scripts and an updated README.
    • This results in 50% improved inference time on GPU.

Release 2.0.8

09 Sep 10:49
Compare
Choose a tag to compare
  • Fix splitting of short sequences into individual characters (#127)

Release 2.0.7

02 Sep 13:26
Compare
Choose a tag to compare
  • Allow numpy>=2.0
  • Fix adaptation code
  • Add some comments

Release 2.0.5

08 Jul 07:41
Compare
Choose a tag to compare
  • Fixes potential CUDA device error when the input has exactly 511 tokens (#121).

Release 2.0.4

01 Jul 09:32
Compare
Choose a tag to compare
  • Fix a speed issue with SaT (#118). Now it is (as expected) ~6x faster than WtP.

Release 2.0.3

26 Jun 08:05
Compare
Choose a tag to compare

Implement SaT (https://arxiv.org/abs/2406.16678) and switch the default models to SaT🚀

The previous WtP models are still available but SaT is strictly better in accuracy and speed. See the updated README for details: https://github.com/segment-any-text/wtpsplit.

SaT was implemented and developed by @markus583 @igorsterner.

Release 1.3.0

22 Jan 15:30
Compare
Choose a tag to compare

Release 1.2.3

18 Jul 13:47
Compare
Choose a tag to compare
  • fix error with text where length is not a multiple of 4 and shorter than 512 characters in canine-s-* models (#98).

Release 1.2.2

14 Jul 15:55
Compare
Choose a tag to compare
  • add strip_whitespace flag.
  • fix bug with some zero-length sentences being returned if there is lots of trailing whitespace.