Releases: megagonlabs/ginza
Releases · megagonlabs/ginza
Release v5.2.0
What's Changed
- Require python>=3.8
- Migrate to spaCy v3.7
- New functionality
- add Japanese clause recognition API (experimental)
Full Changelog: v5.1.3...v5.2.0
How to Use ja_ginza_bert_large
β1
- Prepare
Create a virtual-env to separateja_ginza_bert_large
from other GiNZA model environments.
(ja_ginza_bert_large
requires the latestspacy-transformers
version which is not compatible withja_ginza
orja_ginza_electra
)
$ python -m venv venv_bert_large
$ source venv_bert_large/bin/activate
- Install
$ pip install "https://github.com/megagonlabs/ginza/releases/download/v5.2.0/ja_ginza_bert_large-5.2.0b1-py3-none-any.whl"
For CUDA environments, you need to upgrade spacy with CUDA version number as follows:
$ pip install -U spacy[cuda117]
- Analyze
$ ginza -g 0 -b ja_ginza_bert_large
銀座でランチをご一緒しましょう。
# text = 銀座でランチをご一緒しましょう。
1 銀座 銀座 PROPN 名詞-固有名詞-地名-一般 _ 6 obl _ SpaceAfter=No|BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|NP_B|Reading=ギンザ|NE=B-GPE|ENE=B-City|ClauseHead=6
2 で で ADP 助詞-格助詞 _ 1 case _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|Reading=デ|ClauseHead=6
3 ランチ ランチ NOUN 名詞-普通名詞-一般 _ 6 obj _ SpaceAfter=No|BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|NP_B|Reading=ランチ|ClauseHead=6
4 を を ADP 助詞-格助詞 _ 3 case _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|Reading=ヲ|ClauseHead=6
5 ご ご NOUN 接頭辞 _ 6 compound _ SpaceAfter=No|BunsetuBILabel=B|BunsetuPositionType=CONT|Reading=ゴ|ClauseHead=6
6 一緒 一緒 VERB 名詞-普通名詞-サ変可能 _ 0 root _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=ROOT|Reading=イッショ|ClauseHead=6
7 し する AUX 動詞-非自立可能 _ 6 aux _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|Inf=サ行変格,連用形-一般|Reading=シ|ClauseHead=6
8 ましょう ます AUX 助動詞 _ 6 aux _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|Inf=助動詞-マス,意志推量形|Reading=マショウ|ClauseHead=6
9 。 。 PUNCT 補助記号-句点 _ 6 punct _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=CONT|Reading=。|ClauseHead=6
Release v5.1.3
What's Changed
- Migrate to spaCy v3.6
- Beta release of
ja_ginza_bert_large
Full Changelog: v5.1.2...v5.1.3
How to Use ja_ginza_bert_large
β1
- Prepare
Create a virtual-env to separateja_ginza_bert_large
from other GiNZA model environments.
(ja_ginza_bert_large
requires the latestspacy-transformers
version which is not compatible withja_ginza
orja_ginza_electra
)
$ python -m venv venv_bert_large
$ source venv_bert_large/bin/activate
- Install
$ pip install "https://github.com/megagonlabs/ginza/releases/download/v5.1.3/ja_ginza_bert_large-5.1.3b1-py3-none-any.whl"
For CUDA environments, you need to upgrade spacy with CUDA version number as follows:
$ pip install -U spacy[cuda117]
- Analyze
$ ginza -g 0 -b ja_ginza_bert_large
銀座でランチをご一緒しましょう。
# text = 銀座でランチをご一緒しましょう。
1 銀座 銀座 PROPN 名詞-固有名詞-地名-一般 _ 6 obl _ SpaceAfter=No|BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|NP_B|Reading=ギンザ|NE=B-GPE|ENE=B-City
2 で で ADP 助詞-格助詞 _ 1 case _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|Reading=デ
3 ランチ ランチ NOUN 名詞-普通名詞-一般 _ 6 obj _ SpaceAfter=No|BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|NP_B|Reading=ランチ
4 を を ADP 助詞-格助詞 _ 3 case _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|Reading=ヲ
5 ご ご NOUN 接頭辞 _ 6 compound _ SpaceAfter=No|BunsetuBILabel=B|BunsetuPositionType=CONT|Reading=ゴ
6 一緒 一緒 VERB 名詞-普通名詞-サ変可能 _ 0 root _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=ROOT|Reading=イッショ
7 し する AUX 動詞-非自立可能 _ 6 aux _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|Inf=サ行変格,連用形-一般|Reading=シ
8 ましょう ます AUX 助動詞 _ 6 aux _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|Inf=助動詞-マス,意志推量形|Reading=マショウ
9 。 。 PUNCT 補助記号-句点 _ 6 punct _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=CONT|Reading=。
Release v5.1.2
What's Changed
- add pytest github actions workflow by @r-terada in #241
- Migrate to spaCy v3.4 by @hiroshi-matsuda-rit in #250
New Contributors
- @ftnext made their first contribution in #239
- @wafuwafu13 made their first contribution in #244
Full Changelog: v5.1.1...v5.1.2
Release v5.1.1
What's Changed
- auto deploy for pypi by @nimiusrd in #184
- modify github actions: trigger by tagging, stop uploading test pypi by @r-terada in #233
New Contributors
- @sinozu made their first contribution in #230
- @wataruhashimoto52 made their first contribution in #236
Full Changelog: v5.1.0...v5.1.1
Release v5.1.0
ginza-5.1.0
- 2021-12-10, Euclase
- Important changes
- Upgrade: spaCy v3.2 and Sudachi.rs(SudachiPy v0.6.2)
- Change token information fields #208 #209
doc.user_data[“reading_forms”][token.i]
->token.morph.get(“Reading”)
doc.user_data[“inflections”][token.i]
->token.morph.get(“Inflection”)
force_using_normalized_form_as_lemma(True)
->token.norm_
- All spaCy models, including non-Japanese, are now available with the ginza command #217
- Download and analyze the model at once by specifying the model name in the following form #219
ginza -m en_core_web_md
ginza -f json
option always analyze the line which starts with#
regardless the option value of-c
. #215
- Improvements
Release v5.0.3
ginza-5.0.3
- 2021-10-15
- Bug fix
Bunsetu span should not cross the sentence boundary
#195
Release v5.0.2
ginza-5.0.2
- 2021-09-06
- Bug fix
Command Line -s option and set_split_mode() not working in v5.0.x
#185
Release v5.0.1
Release v5.0.0
ginza-5.0.0
- 2021-08-26, Demantoid
- Important changes
- Upgrade spaCy to v3
- Release transformer-based
ja-ginza-electra
model - Improve UPOS accuracy of the standard
ja-ginza
model by addingmorphologizer
to the tail of spaCy pipleline
- Release transformer-based
- Need to insrtall analysis model along with
ginza
package- High accuracy model (>=16GB memory needed)
pip install -U ginza ja-ginza-electra
- Speed oriented model
pip install -U ginza ja-ginza
- High accuracy model (>=16GB memory needed)
- Change component names of
CompoundSplitter
andBunsetuRecognizer
tocompound_splitter
andbunsetu_recognizer
respectively - Also see spaCy v3 Backwards Incompatibilities
- Upgrade spaCy to v3
- Improvements
- Add command line options
-n
- Force using SudachiPy's
normalized_form
asToken.lemma_
- Force using SudachiPy's
-m (ja_ginza|ja_ginza_electra)
- Select model package
- Revise ENE category name
Degital_Game
toDigital_Game
- Add command line options