Skip to content

Releases: megagonlabs/ginza

Release v5.2.0

30 Mar 22:48
3f994b0
Compare
Choose a tag to compare

What's Changed

  • Require python>=3.8
  • Migrate to spaCy v3.7
  • New functionality
    • add Japanese clause recognition API (experimental)

Full Changelog: v5.1.3...v5.2.0

How to Use ja_ginza_bert_large β1

  • Prepare
    Create a virtual-env to separate ja_ginza_bert_large from other GiNZA model environments.
    (ja_ginza_bert_large requires the latest spacy-transformers version which is not compatible with ja_ginza or ja_ginza_electra)
$ python -m venv venv_bert_large
$ source venv_bert_large/bin/activate
  • Install
$ pip install "https://github.com/megagonlabs/ginza/releases/download/v5.2.0/ja_ginza_bert_large-5.2.0b1-py3-none-any.whl"

For CUDA environments, you need to upgrade spacy with CUDA version number as follows:

$ pip install -U spacy[cuda117]
  • Analyze
$ ginza -g 0 -b ja_ginza_bert_large
銀座でランチをご一緒しましょう。
# text = 銀座でランチをご一緒しましょう。
1       銀座    銀座    PROPN   名詞-固有名詞-地名-一般 _       6       obl     _       SpaceAfter=No|BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|NP_B|Reading=ギンザ|NE=B-GPE|ENE=B-City|ClauseHead=6
2       で      で      ADP     助詞-格助詞     _       1       case    _       SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|Reading=デ|ClauseHead=6
3       ランチ  ランチ  NOUN    名詞-普通名詞-一般      _       6       obj     _       SpaceAfter=No|BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|NP_B|Reading=ランチ|ClauseHead=6
4       を      を      ADP     助詞-格助詞     _       3       case    _       SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|Reading=ヲ|ClauseHead=6
5       ご      ご      NOUN    接頭辞  _       6       compound        _       SpaceAfter=No|BunsetuBILabel=B|BunsetuPositionType=CONT|Reading=ゴ|ClauseHead=6
6       一緒    一緒    VERB    名詞-普通名詞-サ変可能  _       0       root    _       SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=ROOT|Reading=イッショ|ClauseHead=6
7       し      する    AUX     動詞-非自立可能 _       6       aux     _       SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|Inf=サ行変格,連用形-一般|Reading=シ|ClauseHead=6
8       ましょう        ます    AUX     助動詞  _       6       aux     _       SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|Inf=助動詞-マス,意志推量形|Reading=マショウ|ClauseHead=6
9       。      。      PUNCT   補助記号-句点   _       6       punct   _       SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=CONT|Reading=。|ClauseHead=6

Release v5.1.3

25 Sep 04:47
b1c25c3
Compare
Choose a tag to compare

What's Changed

  • Migrate to spaCy v3.6
  • Beta release of ja_ginza_bert_large

Full Changelog: v5.1.2...v5.1.3

How to Use ja_ginza_bert_large β1

  • Prepare
    Create a virtual-env to separate ja_ginza_bert_large from other GiNZA model environments.
    (ja_ginza_bert_large requires the latest spacy-transformers version which is not compatible with ja_ginza or ja_ginza_electra)
$ python -m venv venv_bert_large
$ source venv_bert_large/bin/activate
  • Install
$ pip install "https://github.com/megagonlabs/ginza/releases/download/v5.1.3/ja_ginza_bert_large-5.1.3b1-py3-none-any.whl"

For CUDA environments, you need to upgrade spacy with CUDA version number as follows:

$ pip install -U spacy[cuda117]
  • Analyze
$ ginza -g 0 -b ja_ginza_bert_large
銀座でランチをご一緒しましょう。
# text = 銀座でランチをご一緒しましょう。
1	銀座	銀座	PROPN	名詞-固有名詞-地名-一般	_	6	obl	_	SpaceAfter=No|BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|NP_B|Reading=ギンザ|NE=B-GPE|ENE=B-City
2	で	で	ADP	助詞-格助詞	_	1	case	_	SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|Reading=デ
3	ランチ	ランチ	NOUN	名詞-普通名詞-一般	_	6	obj	_	SpaceAfter=No|BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|NP_B|Reading=ランチ
4	を	を	ADP	助詞-格助詞	_	3	case	_	SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|Reading=ヲ
5	ご	ご	NOUN	接頭辞	_	6	compound	_	SpaceAfter=No|BunsetuBILabel=B|BunsetuPositionType=CONT|Reading=ゴ
6	一緒	一緒	VERB	名詞-普通名詞-サ変可能	_	0	root	_	SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=ROOT|Reading=イッショ
7	し	する	AUX	動詞-非自立可能	_	6	aux	_	SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|Inf=サ行変格,連用形-一般|Reading=シ
8	ましょう	ます	AUX	助動詞	_	6	aux	_	SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|Inf=助動詞-マス,意志推量形|Reading=マショウ
9	。	。	PUNCT	補助記号-句点	_	6	punct	_	SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=CONT|Reading=。

Release v5.1.2

09 Aug 10:16
49c93ad
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v5.1.1...v5.1.2

Release v5.1.1

12 Mar 02:49
24dee81
Compare
Choose a tag to compare

What's Changed

  • auto deploy for pypi by @nimiusrd in #184
  • modify github actions: trigger by tagging, stop uploading test pypi by @r-terada in #233

New Contributors

Full Changelog: v5.1.0...v5.1.1

Release v5.1.0

09 Dec 15:02
82ec7c2
Compare
Choose a tag to compare

ginza-5.1.0

  • 2021-12-10, Euclase
  • Important changes
    • Upgrade: spaCy v3.2 and Sudachi.rs(SudachiPy v0.6.2)
    • Change token information fields #208 #209
      • doc.user_data[“reading_forms”][token.i] -> token.morph.get(“Reading”)
      • doc.user_data[“inflections”][token.i] -> token.morph.get(“Inflection”)
      • force_using_normalized_form_as_lemma(True) -> token.norm_
    • All spaCy models, including non-Japanese, are now available with the ginza command #217
      • Download and analyze the model at once by specifying the model name in the following form #219
      • ginza -m en_core_web_md
    • ginza -f json option always analyze the line which starts with # regardless the option value of -c. #215
  • Improvements
    • Batch analysis processing speeds up by 50-60% in GPU environment and 10-40% in CPU environment
    • Improved processing efficiency of parallel execution options (ginza -p {n_process} and ginzame) of ginza command #204
    • add tests #198 #210 #214
    • add benchmark #207 #220

Release v5.0.3

15 Oct 09:20
277b29d
Compare
Choose a tag to compare

ginza-5.0.3

  • 2021-10-15
  • Bug fix
    • Bunsetu span should not cross the sentence boundary #195

Release v5.0.2

15 Oct 09:20
1753eac
Compare
Choose a tag to compare

ginza-5.0.2

  • 2021-09-06
  • Bug fix
    • Command Line -s option and set_split_mode() not working in v5.0.x #185

Release v5.0.1

26 Aug 04:13
8b10169
Compare
Choose a tag to compare

ginza-5.0.1

  • 2021-08-26
  • Bug fix
    • ginzame not woriking in ginza ver. 5 #179
    • Command Line -d option not working in v5.0.0 #178
  • Improvement
    • accept ja-ginza and ja-ginza-electra for -m option of ginza command

Release v5.0.0

25 Aug 03:24
79f27f8
Compare
Choose a tag to compare

ginza-5.0.0

  • 2021-08-26, Demantoid
  • Important changes
    • Upgrade spaCy to v3
      • Release transformer-based ja-ginza-electra model
      • Improve UPOS accuracy of the standard ja-ginza model by adding morphologizer to the tail of spaCy pipleline
    • Need to insrtall analysis model along with ginza package
      • High accuracy model (>=16GB memory needed)
        • pip install -U ginza ja-ginza-electra
      • Speed oriented model
        • pip install -U ginza ja-ginza
    • Change component names of CompoundSplitter and BunsetuRecognizer to compound_splitter and bunsetu_recognizer respectively
    • Also see spaCy v3 Backwards Incompatibilities
  • Improvements
    • Add command line options
      • -n
        • Force using SudachiPy's normalized_form as Token.lemma_
      • -m (ja_ginza|ja_ginza_electra)
        • Select model package
    • Revise ENE category name
      • Degital_Game to Digital_Game

v4.0.6

01 Jun 08:51
2174288
Compare
Choose a tag to compare

ginza-4.0.6

  • 2021-06-01
  • Bug fix
    • Issue #160: IndexError: list assignment index out of range for empty string