Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dev/v1.1.1 #13

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .coveragerc
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,4 @@ omit =
[report]

[html]
directory = htmlcov
directory = htmlcov
7 changes: 6 additions & 1 deletion .github/workflows/lexpy_build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,12 @@ jobs:
fail-fast: false
matrix:
os: [macos-latest, windows-latest, ubuntu-latest]
python-version: ['3.7', '3.8', '3.9', '3.10', '3.11', '3.12', 'pypy-3.7', 'pypy-3.8', 'pypy-3.9', 'pypy-3.10']
python-version: ['3.7', '3.8', '3.9', '3.10', '3.11', '3.12', '3.13', 'pypy-3.7', 'pypy-3.8', 'pypy-3.9', 'pypy-3.10']
exclude:
- os: macos-latest
python-version: '3.7'
- os: macos-latest
python-version: 'pypy-3.7'

steps:
- name: Checkout
Expand Down
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,4 @@ build
dawg_sample.py
compare_trie_dawg_size.py
compare_trie_dawg_time.py
venv
venv
16 changes: 16 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v3.2.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
- repo: https://github.com/psf/black
rev: 24.10.0
hooks:
- id: black
- repo: https://github.com/PyCQA/flake8
rev: 7.0.0
hooks:
- id: flake8
35 changes: 16 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,13 @@



- A lexicon is a data-structure which stores a set of words. The difference between
a dictionary and a lexicon is that in a lexicon there are **no values** associated with the words.
- A lexicon is a data-structure which stores a set of words. The difference between
a dictionary and a lexicon is that in a lexicon there are **no values** associated with the words.

- A lexicon is similar to a list or a set of words, but the internal representation is different and optimized
for faster searches of words, prefixes and wildcard patterns.
for faster searches of words, prefixes and wildcard patterns.

- Given a word, precisely, the search time is O(W) where W is the length of the word.
- Given a word, precisely, the search time is O(W) where W is the length of the word.

- 2 important lexicon data-structures are **_Trie_** and **_Directed Acyclic Word Graph (DAWG)_**.

Expand Down Expand Up @@ -61,10 +61,10 @@ from lexpy import Trie

trie = Trie()

input_words = ['ampyx', 'abuzz', 'athie', 'athie', 'athie', 'amato', 'amato', 'aneto', 'aneto', 'aruba',
'arrow', 'agony', 'altai', 'alisa', 'acorn', 'abhor', 'aurum', 'albay', 'arbil', 'albin',
'almug', 'artha', 'algin', 'auric', 'sore', 'quilt', 'psychotic', 'eyes', 'cap', 'suit',
'tank', 'common', 'lonely', 'likeable' 'language', 'shock', 'look', 'pet', 'dime', 'small'
input_words = ['ampyx', 'abuzz', 'athie', 'athie', 'athie', 'amato', 'amato', 'aneto', 'aneto', 'aruba',
'arrow', 'agony', 'altai', 'alisa', 'acorn', 'abhor', 'aurum', 'albay', 'arbil', 'albin',
'almug', 'artha', 'algin', 'auric', 'sore', 'quilt', 'psychotic', 'eyes', 'cap', 'suit',
'tank', 'common', 'lonely', 'likeable' 'language', 'shock', 'look', 'pet', 'dime', 'small'
'dusty', 'accept', 'nasty', 'thrill', 'foot', 'steel', 'steel', 'steel', 'steel', 'abuzz']

trie.add_all(input_words) # You can pass any sequence types or a file-like object here
Expand Down Expand Up @@ -170,7 +170,7 @@ print(trie.search_within_distance('arie', dist=2, with_count=True))
# Directed Acyclic Word Graph (DAWG)

- DAWG supports the same set of operations as a Trie. The difference is the number of nodes in a DAWG is always
less than or equal to the number of nodes in Trie.
less than or equal to the number of nodes in Trie.

- They both are Deterministic Finite State Automata. However, DAWG is a minimized version of the Trie DFA.

Expand Down Expand Up @@ -210,10 +210,10 @@ The APIs are exactly same as the Trie APIs
from lexpy import DAWG
dawg = DAWG()

input_words = ['ampyx', 'abuzz', 'athie', 'athie', 'athie', 'amato', 'amato', 'aneto', 'aneto', 'aruba',
'arrow', 'agony', 'altai', 'alisa', 'acorn', 'abhor', 'aurum', 'albay', 'arbil', 'albin',
'almug', 'artha', 'algin', 'auric', 'sore', 'quilt', 'psychotic', 'eyes', 'cap', 'suit',
'tank', 'common', 'lonely', 'likeable' 'language', 'shock', 'look', 'pet', 'dime', 'small'
input_words = ['ampyx', 'abuzz', 'athie', 'athie', 'athie', 'amato', 'amato', 'aneto', 'aneto', 'aruba',
'arrow', 'agony', 'altai', 'alisa', 'acorn', 'abhor', 'aurum', 'albay', 'arbil', 'albin',
'almug', 'artha', 'algin', 'auric', 'sore', 'quilt', 'psychotic', 'eyes', 'cap', 'suit',
'tank', 'common', 'lonely', 'likeable' 'language', 'shock', 'look', 'pet', 'dime', 'small'
'dusty', 'accept', 'nasty', 'thrill', 'foot', 'steel', 'steel', 'steel', 'steel', 'abuzz']


Expand Down Expand Up @@ -317,7 +317,7 @@ print(dawg.search('thrill', with_count=True))

## Special Characters

Special characters, except `?` and `*`, are matched literally.
Special characters, except `?` and `*`, are matched literally.

```python
from lexpy import Trie
Expand Down Expand Up @@ -357,15 +357,12 @@ These are some ideas which I would love to work on next in that order. Pull requ
- Merge trie and DAWG features in one data structure
- Support all functionalities and still be as compressed as possible.
- Serialization / Deserialization
- Pickle is definitely an option.
- Pickle is definitely an option.
- Server (TCP or HTTP) to serve queries over the network.


# Fun Facts
1. The 45-letter word pneumonoultramicroscopicsilicovolcanoconiosis is the longest English word that appears in a major dictionary.
So for all english words, the search time is bounded by O(45).
So for all english words, the search time is bounded by O(45).
2. The longest technical word(not in dictionary) is the name of a protein called as [titin](https://en.wikipedia.org/wiki/Titin). It has 189,819
letters and it is disputed whether it is a word.



2 changes: 1 addition & 1 deletion lexpy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
from lexpy.trie import Trie
from lexpy.dawg import DAWG

__all__ = ['Trie', 'DAWG']
__all__ = ["Trie", "DAWG"]
Loading
Loading