Skip to content

Commit

Permalink
rewrite testing documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
capjamesg committed Aug 22, 2024
1 parent 29f33a0 commit 51f9786
Showing 1 changed file with 144 additions and 8 deletions.
152 changes: 144 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,38 @@ You can retrieve all documents by using a catch-all query, which uses the follow

This is useful if you want to page through documents. You should supply a `sort_by` field to ensure the order of documents is consistent.

#### Response

All valid queries return responses in the following form:

```json
{
"documents": [
{"uuid": "1", "title": "test", "artist": "..."},
{"uuid": "2", "title": "test", "artist": "..."},
...
],
"query_time": 0.0001,
"total_results": 200
}
```

`documents` is a list of documents that match the query. `query_time` is the amount of time it took to execute the query. `total_results` is the total number of documents that match the query before applying any `limit`.

`total_results` is useful for implementing pagination.

If an error was encountered, the response will be in the following form:

```json
{
"documents": [],
"query_time": 0.0001,
"error": "Invalid query"
}
```

The `error` key contains a message describing the exact error encountered.

### Document ranking

By default, documents are ranked in no order. If you provide a `sort_by` field, documents are sorted by that field.
Expand Down Expand Up @@ -124,6 +156,61 @@ The score for each document before boosting is equal to the number of times the

Documents are then ranked in decreasing order of score.

#### Document ranking with script scores

The script score feature lets you write custom scripts to calculate the score for each document. This is useful if you want to calculate a score based on multiple fields, including numeric fields.

Script scores are applied after all documents are retrieved.

The script score feature supports the following mathematical operations:

- `+` (addition)
- `-` (subtraction)
- `*` (multiplication)
- `/` (division)
- `log` (logarithm)

You can apply a script score at the top level of your query:

```python
{
"query": {
"or": {
"post": {
"contains": "taylor swift",
"strict": False,
"boost": 1
},
"title": {
"contains": "desk",
"strict": True,
"boost": 25
}
}
},
"limit": 4,
"sort_by": "_score",
"script_score": "((post + title) * 2)"
}
```

The above example will calculate the score of documents by adding the score of the `post` field and the `title` field, then multiplying the result by `2`.

A script score is made up of terms. A term is a field name or number (float or int), followed by an operator, followed by another term or number. Terms can be nested.

All terms must be enclosed within parentheses.

To compute a score that adds the `post` score to `title` and multiplies the result by `2`, use the following code:

```text
((post + title) * 2)
```

Invalid forms of this query include:

- `post + title * 2` (missing parentheses)
- `(post + title * 2)` (terms can only include one operator)

### Condition matching

There are three operators you can use for condition matching:
Expand Down Expand Up @@ -233,6 +320,41 @@ query = {
}
```

### Wildcard matching

You can match documents using a single wildcard character. This character is represented by an asterisk `*`.

```python
query = {
"query": {
"title": {
"contains": "tolerat* it",
"fuzzy": True
}
}
}
```

This query will look for all words that match the pattern `tolerat* it`, where the `*` character can be any single character.

### Look for terms close to each other

You can find terms that appear close to each other with a `close_to` query. Here is an example of a query that looks for documents where `made` and `temple` appear within `7` words of each other and `my` appears within `7` words of `temple`:

```python
query = {
"query": {
"close_to": [
{"lyric": "made"},
{"lyric": "temple,"},
{"lyric": "my"},
],
"distance": 7
},
"limit": 10
}
```

### Update documents

You need a document UUID to update a document. You can retrieve a UUID by searching for a document.
Expand Down Expand Up @@ -305,24 +427,38 @@ The web interface will run on `localhost:5000`.

## Testing

This project comes with two test suites:
You can run the project unit tests with the following command:

```
pytest tests/*.py
```

- Unit tests
- A benchmarking test
The tests have three modes:

You can run the unit tests using:
1. Run all unit tests.
2. Run all unit tests with an index of 30,000 small documents and ensure the query engine is fast.
3. Run all unit tests with an index of 30,000 documents with a few dozen words and ensure the query engine is fast.

To run the 30,000 small documents benchmark tests, run:

```
pytest tests/test.py
pytest tests/*.py --benchmark
```

You can run the benchmark test using:
To run the 30,000 documents with a few dozen words benchmark tests, run:

```
pytest tests/stress.py
pytest tests/*.py --long-benchmark
```

The benchmark test evaluates how long it takes to run a query in a database with 300,000 records.
In development, the goal should be making the query engine as fast as possible. The performance tests are designed to monitor for performance regressions, not set a ceiling for acceptable performance.

## Development notes

The following are notes that describe limitations of which I am aware, and may fix in the future:

- `boost` does not work with and/or queries.
- The query engine relies on `uuid`s to uniquely identify items. But these are treated as the partition key, which is not appropriate. Two documents should be able to have the same partition key, as long as they have their own `uuid`.

## License

Expand Down

0 comments on commit 51f9786

Please sign in to comment.