Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
capjamesg committed Aug 20, 2024
1 parent 75ead9c commit 671e67c
Show file tree
Hide file tree
Showing 3 changed files with 116 additions and 0 deletions.
116 changes: 116 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@ This project has support for:

This database does not enforce a schema, so you can insert records with different fields.

Here is an example of a query run in the JameSQL web interface:

![JameSQL web interface](assets/screenshot.png)

## Installation

To install this project, run:
Expand Down Expand Up @@ -66,6 +70,62 @@ A query has the following format:

Within the `query` key you can query for documents that match one or more conditions.

An empty query returns no documents.

You can retrieve all documents by using a catch-all query, which uses the following syntax:

```python
{
"query": "*",
"limit": 2,
"sort_by": "song",
"skip": 1
}
```

This is useful if you want to page through documents. You should supply a `sort_by` field to ensure the order of documents is consistent.

### Document ranking

By default, documents are ranked in no order. If you provide a `sort_by` field, documents are sorted by that field.

For more advanced ranking, you can use the `boost` feature. This feature lets you boost the value of a field in a document to calculate a final score.

The default score for each field is `1`.

To use this feature, you must use `boost` on fields that have an index.

Here is an example of a query that uses the `boost` feature:

```python
{
"query": {
"or": {
"post": {
"contains": "taylor swift",
"strict": False,
"boost": 1
},
"title": {
"contains": "desk",
"strict": True,
"boost": 25
}
}
},
"limit": 4,
"sort_by": "_score",
}
```

This query would search for documents whose `post` field contains `taylor swift` or whose `title` field contains `desk`. The `title` field is boosted by 25, so documents that match the `title` field are ranked higher.

The score for each document before boosting is equal to the number of times the query condition is satisfied. For example, if a post contains `taylor swift` twice, the score for that document is `2`; if a title contains `desk` once, the score for that document is `1`.

Documents are then ranked in decreasing order of score.

### Condition matching

There are three operators you can use for condition matching:

- `equals`
Expand Down Expand Up @@ -131,6 +191,48 @@ result = index.search(query)

This will return a list of documents that match the query.

### Strict matching

By default, a search query on a text field will find any document where the field contains any word in the query string. For example, a query for `tolerate it` on a `title` field will match any document whose `title` that contains `tolerate` or `it`. This is called a non-strict match.

Non-strict matches are the default because they are faster to compute than strict matches.

If you want to find documents where terms appear next to each other in a field, you can do so with a strict match. Here is an example of a strict match:

```python
query = {
"query": {
"title": {
"contains": "tolerate it",
"strict": True
}
}
}
```

This will return documents whose title contains `tolerate it` as a single phrase.

### Fuzzy matching

By default, search queries look for the exact string provided. This means that if a query contains a typo (i.e. searching for `tolerate ip` instead of `tolerate it`), no documents will be returned.

JameSQL implements a limited form of fuzzy matching. This means that if a query contains a typo, JameSQL will still return documents that match the query.

The fuzzy matching feature matches documents that contain one typo. If a document contains more than one typo, it will not be returned. A typo is an incorrectly typed character. JameSQL does not support fuzzy matching that accounts for missing or additional characters (i.e. `tolerate itt` will not match `tolerate it`).

You can enable fuzzy matching by setting the `fuzzy` key to `True` in the query. Here is an example of a query that uses fuzzy matching:

```python
query = {
"query": {
"title": {
"contains": "tolerate ip",
"fuzzy": True
}
}
}
```

### Update documents

You need a document UUID to update a document. You can retrieve a UUID by searching for a document.
Expand Down Expand Up @@ -187,6 +289,20 @@ response = index.search(
assert len(response["documents"]) == 0
```

## Web Interface

JameSQL comes with a limited web interface designed for use in testing queries.

_Note: You should not use the web interface if you are extending the query engine. Full error messages are only available in the console when you run the query engine._

To start the web interface, run:

```
python3 web.py
```

The web interface will run on `localhost:5000`.

## Testing

This project comes with two test suites:
Expand Down
Binary file added assets/.DS_Store
Binary file not shown.
Binary file added assets/screenshot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 671e67c

Please sign in to comment.