Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/huseinzol05/malaya
Browse files Browse the repository at this point in the history
  • Loading branch information
huseinzol05 committed Mar 26, 2024
2 parents 2fcbfcf + c35ced9 commit d087a70
Show file tree
Hide file tree
Showing 6 changed files with 60 additions and 35 deletions.
25 changes: 25 additions & 0 deletions .github/workflows/run_pytest.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: Run Unit Tests via Pytest

on: [push]

jobs:
build:
runs-on: self-hosted
strategy:
matrix:
python-version: ["3.7"]

steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python3 -m pip install --upgrade pip
pip3 install malaya tensorflow tensorflow-text torch
pip3 install pytest pytest-cov pytest-codecov gitpython
- name: Test with pytest
run: |
pytest tests/tests --cov --cov-report term --cov-report html
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
<a href="https://malaya.readthedocs.io/"><img alt="Documentation" src="https://readthedocs.org/projects/malaya/badge/?version=latest"></a>
<a href="https://pepy.tech/project/malaya"><img alt="total stats" src="https://static.pepy.tech/badge/malaya"></a>
<a href="https://pepy.tech/project/malaya"><img alt="download stats / month" src="https://static.pepy.tech/badge/malaya/month"></a>
<a href="https://discord.gg/aNzbnRqt3A"><img alt="discord" src="https://img.shields.io/badge/discord%20server-malaya-rgb(118,138,212).svg"></a>
<a href="https://discord.gg/J3aSWyMy9A"><img alt="discord" src="https://img.shields.io/badge/discord%20-Malaysia_AI-rgb(118,138,212).svg"></a>
</p>

=========
Expand Down
40 changes: 20 additions & 20 deletions docs/Contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,17 @@ helps, and credit will always be given.
Code Formatting
----------------

We use `AutoPEP8`_ for code formatting and standard. Checkout `pyproject.toml`_ at root directory.
We use `AutoPEP8`_ for code formatting and standardization. Check out the `pyproject.toml`_ file at root directory.

Report Bugs
-----------

Report bugs through `Github issue`_.

Please report relevant information and preferably code that exhibits the
Please report all relevant information and preferably code that exhibits the
problem.

Do not try to email us about the issues, we will not respond to the emails, submit a proper Github issue.
Do not try to email us about the issues, we will not respond to any emails, submit a proper Github issue instead.

Fix Bugs
--------
Expand All @@ -28,22 +28,22 @@ wants to implement it.
Implement Features
------------------

Look through the `Github issue`_ or `Malaya-project`_ for features. Any
Look through the `Github issue`_ or `Malaya-project`_ for feature requests. Any
unassigned ``improvement`` issue is open to whoever wants to implement
it.

We use Pytorch and heavily use HuggingFace Transformers as backend.

Dataset
-------
Dataset Contributions
---------------------

Create a new issue in `Github issue`_ related to your data including the
data link or attached it there. If you want to improve current dataset
Create a new issue in `Github issue`_ that's related to your data including the
data link or just attach it there. If you want to improve the current dataset
we have, you can check at `Malaya-Dataset`_.

Or, you can simply email your data if you do not want to expose the data
to public. Malaya will not exposed your data, but we will exposed our
trained models based on your data.
Alternatively, you can simply email your data if you do not want to expose the data
to the public. Malaya will not expose your data, but our
trained models that's based on your data will be exposed to the public.

Thanks to,

Expand All @@ -53,23 +53,23 @@ Thanks to,
4. `Singlish text dump`_, contributed by `brytjy`_
5. `Singapore news`_, contributed by `brytjy`_

Improve Documentation
---------------------
Documentation Improvements
--------------------------

Malaya could always use better documentation, might have some typos or
uncorrect object names.
Malaya could always use better documentation, there might be some typos or
incorrect object names.

Submit Feedback
---------------

The best way to send feedback is to open an issue on `Github issue`_.
The best way to send feedback is to open an issue through `Github issue`_.

Unit test
Unit Test
---------

Test every possible program flow! You can check `unit tests here`_.
Help test every step of the program flow! You can check the current available `unit tests here`_.

Feel free to help Malaya to write unit-tests, fork it!
Feel free to help Malaya write unit-tests, fork the repository!

.. _Types of Contributions: #types-of-contributions
.. _Report Bugs: #report-bugs
Expand All @@ -95,4 +95,4 @@ Feel free to help Malaya to write unit-tests, fork it!
.. _Singapore news: https://github.com/mesolitica/malaya-dataset#singapore-news
.. _unit tests here: https://github.com/mesolitica/Malaya/tree/master/tests
.. _AutoPEP8: https://github.com/hhatto/autopep8
.. _pyproject.toml: https://github.com/mesolitica/malaya/blob/master/pyproject.toml
.. _pyproject.toml: https://github.com/mesolitica/malaya/blob/master/pyproject.toml
19 changes: 9 additions & 10 deletions docs/Dataset.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,23 +9,22 @@ Dataset
</a>
</p>

We want to make sure not just the code we open-sourced, but also goes to
dataset, so everyone can validate.
We want to make sure that not just the code is open-sourced, but the
dataset as well, so everyone can help validate it.

You can check in
You can visit our repository at
`Malay-Dataset <https://github.com/huseinzol05/Malay-Dataset>`__ for
our open dataset.
the datasets that we used.

Citation
--------

1. Please citate the repository if use these corpus.
2. Please at least email us first before distributing these data.
Remember all these hard workings we want to give it for free.
3. What do you see just the data, but nobody can see how much we spent
our cost to make it public.
1. Please cite the repository if you use any of our corpus.
2. We kindly ask that you at least email us first before distributing any of our datasets.
Remember that all of these are our hard work and we gave it out for free.
3. What you only see is just the publicized data, but nobody can see how much we spent to make it public.

Contribution
-------------

Contact us at [email protected] or [email protected] if want to contribute to bahasa dataset.
Contact us at [email protected] or [email protected] if you want to contribute to the bahasa dataset.
5 changes: 3 additions & 2 deletions malaya/spelling_correction/symspell.py
Original file line number Diff line number Diff line change
Expand Up @@ -182,8 +182,9 @@ def correct(self, word: str, **kwargs):

if len(hujung_result) and not word.endswith(hujung_result) and combined:
word = word + hujung_result
if len(permulaan_result) and not word.startswith(permulaan_result) and combined:
if permulaan_result[-1] == word[0]:
if len(permulaan_result) and not word.startswith(
permulaan_result) and combined:
if len(word) and permulaan_result[-1] == word[0]:
word = permulaan_result + word[1:]
else:
word = permulaan_result + word
Expand Down
4 changes: 2 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ unidecode
numpy
scipy
ftfy
networkx>=2.8.8
networkx
sentencepiece
tqdm
malaya-boilerplate>=0.0.25rc2
regex
transformers
transformers

0 comments on commit d087a70

Please sign in to comment.