Skip to content

Commit

Permalink
Updated README for v0.9.8
Browse files Browse the repository at this point in the history
  • Loading branch information
Abhishek Singh committed Jan 27, 2021
1 parent e2105e9 commit c81354d
Showing 1 changed file with 20 additions and 9 deletions.
29 changes: 20 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
[![Python 3.7](https://img.shields.io/badge/python-3.7-blue.svg)](https://www.python.org/downloads/release/python-370/)
[![Python 3.8](https://img.shields.io/badge/python-3.8-blue.svg)](https://www.python.org/downloads/release/python-380/)
[![Python 3.9](https://img.shields.io/badge/python-3.9-blue.svg)](https://www.python.org/downloads/release/python-390/)
[![PyPy3](https://img.shields.io/badge/python-PyPy3-blue.svg)](https://www.pypy.org/index.html)



- A lexicon is a data-structure which stores a set of words. The difference between
Expand All @@ -24,9 +26,9 @@ for faster searches of words, prefixes and wildcard patterns.
- 2 important lexicon data-structures are:

- Trie.
- Directed Acyclic Word Graph(DAWG).
- Directed Acyclic Word Graph (DAWG).

Both Trie and DAWG are Finite State Automaton(FSA)
Both Trie and DAWG are Finite State Automaton (FSA)


# Install
Expand Down Expand Up @@ -80,6 +82,8 @@ print(trie.get_word_count())

### Build from a file or file path.

In the file, words should be newline separated.

```python

from lexpy.trie import Trie
Expand Down Expand Up @@ -293,7 +297,7 @@ print(dawg.search_within_distance('arie', dist=2, with_count=True))

### Alphabetical order insertion

If you insert a word which is out-of-order, ``ValueError`` will be raised.
If you insert a word which is lexicographically out-of-order, ``ValueError`` will be raised.
```python
dawg.add('athie', count=1000)
```
Expand Down Expand Up @@ -321,10 +325,21 @@ print(dawg.search('thrill', with_count=True))
## Trie vs DAWG


![Number of nodes comparison](/lexpy_trie_dawg_nodes.png)
![Number of nodes comparison](https://github.com/aosingh/lexpy/blob/master/lexpy_trie_dawg_nodes.png)

![Build time comparison](https://github.com/aosingh/lexpy/blob/master/lexpy_trie_dawg_time.png)



![Build time comparison](/lexpy_trie_dawg_time.png)
# Future Work

These are some ideas which I would love to work on next in that order. Pull requests or discussions are invited.

- Merge trie and DAWG features in one data structure
- Support all functionalities and still be as compressed as possible.
- Serialization / Deserialization
- Pickle is definitely an option.
- Server (TCP or HTTP) to serve queries over the network.


*Fun Facts* :
Expand All @@ -335,7 +350,3 @@ letters and it is disputed whether it is a word.







0 comments on commit c81354d

Please sign in to comment.