From 7d986282f18eeab22686cb50bcfd92a8f40bedd1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dawid=20Niezg=C3=B3dka?= <22837704+DawidNiezgodka@users.noreply.github.com> Date: Wed, 12 Oct 2022 15:41:29 +0200 Subject: [PATCH] RQ-Docs adjustment (#99) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This PR resolves #97. Co-authored-by: Christoph Böhm --- docs/docs/developer/range-query-details.md | 89 +++--- .../working-with-quick/range-query.md | 266 +++++++++++------- 2 files changed, 221 insertions(+), 134 deletions(-) diff --git a/docs/docs/developer/range-query-details.md b/docs/docs/developer/range-query-details.md index e3d6c4fb..f56c60af 100644 --- a/docs/docs/developer/range-query-details.md +++ b/docs/docs/developer/range-query-details.md @@ -22,69 +22,80 @@ for arriving at a better understanding of the range queries. ### Mirrors -A corresponding mirror is deployed each time you create a new topic in Quick. -A mirror is a Kafka Streams application that reads the content of a topic +A corresponding mirror is deployed each time +you create a new topic in Quick. +A mirror is a Kafka Streams application +that reads the content of a topic and exposes it through a key-value REST API. The API is linked with a specific state store. -A state store in Kafka can either be a persistent state store (by default RocksDB) -or in-memory state store. -Regardless of the chosen state store type, their functionality is the same. -In any case, it's a key-value store, meaning all keys are unique. +A state store in Kafka can either be a persistent state store +(by default RocksDB) or in-memory state store. +Regardless of the chosen state store type, +their functionality is the same. +In any case, it's a key-value store, +meaning all keys are unique. Storing different values for the same key is impossible. Consider the following entries that are saved in a topic: -| key (UserId) | value | -|:------------:|---------------------------------------------| -| `1` | `{userId: 1, purchaseId: "abc", rating: 2}` | -| `1` | `{userId: 1, purchaseId: "def", rating: 4}` | -| `2` | `{userId: 2, purchaseId: "ghi", rating: 4}` | +| key (productId) | value | +|:---------------:|---------------------------------------------------| +| `123` | `{productId: 123, name: "T-Shirt", timestamp: 2}` | +| `123` | `{productId: 123, name: "T-Shirt", timestamp: 3}` | +| `234` | `{productId: 234, name: "Hoodie", timestamp: 4}` | -The table indicates that there are two entries for the `userId=1`. +The table indicates that there are two entries for the `productId=123`. The second entry is newer, meaning its value is the current one in the store. -Suppose you query the store with `userId=1`. -In that case, you get `{userId: 1, purchaseId: "def", rating: 4}`, +Suppose you query the store with `productId=123`. +In that case, you get `{productId: 123, name: "T-Shirt", timestamp: 3}`, and there is no possibility of accessing the earlier value. In subsequent parts of this section, -we refer to queries that can only retrieve the latest record as point queries. -Similarly, suppose a specific mirror is only capable of supporting such queries. In that case, -we say this is a mirror with a point index. +we refer to queries that can only retrieve +the latest record as point queries. +Similarly, suppose a specific mirror is only +capable of supporting such queries. +In that case, we say this is a mirror with a point index. Because of the intrinsic nature of state stores, -providing a possibility to access previous values (making a range query that encompasses more than one value -associated with `userId=1`) demands a change in the key representation. +providing a possibility to access previous values +(making a range query that encompasses more than one value +associated with `productId=123`) demands a change in the key representation. ### Introducing the possibility of carrying out range queries -To circumvent the limitation of a key-value store and be able to perform range queries, +To circumvent the limitation of a key-value store +and be able to perform range queries, Quick uses an alternative approach to deal with keys. -Each key is a flattened string with a combination of the topic key and the value +Each key is a flattened string with a combination of the topic key +and the zero-padded value for which the range queries are requested. -The keys are padded (depending on the type `Int` 10 digits or `Long` 19 digits) +The values of the range field are padded +(depending on the type `Int` 10 digits or `Long` 19 digits) with zeros to keep the lexicographic order. The general format of the key in the state store is: -`_`. -Following the example from the table: If we have a topic with `userId` as its key -and want to create a range over the `rating`, +`_`. +Following the example from the table: +If we have a topic with `productId` as its key +and want to create a range over the `timestamp`, the key in the state store for the first entry looks like this: ``` -00000000001_00000000002 +123_00000000002 ``` And for the second: ``` -00000000001_00000000004 +123_00000000003 ``` -Regarding negative values, the minus sign is appended at the beginning of the padded string. -For example, consider a user with the negative (for whatever reason) id number `userId=-10` -and `rating=10`. +Regarding negative values, the minus sign is appended +at the beginning of the padded string. +For example, consider a product with the negative (for whatever reason) +id number `productId=-10` and `timestamp=10`. Then, the index looks as follows: ``` --00000000010_00000000010 +-10_00000000010 ``` -The flatten-key approach creates unique keys for each user with a given rating. +The flatten-key approach creates unique keys for each product with a given timestamp. Consequently, all the values will be accessible when running a range query. In later parts of this section, a mirror that can support range queries is called a mirror with a range index. - ## Modify your GraphQL schema and define a range in the query The modification of the schema has no impact @@ -95,7 +106,7 @@ until it is applied. When you apply a schema that contains the topic directive with the additional fields (`rangeFrom` and `rangeTo`), -a [RangeQueryFetcher](https://github.com/bakdata/quick/blob/c8778ce527575c545a864ccbc3d98e3502fbb2a2/gateway/src/main/java/com/bakdata/quick/gateway/fetcher/RangeQueryFetcher.java) +a [`RangeQueryFetcher`](https://github.com/bakdata/quick/blob/c8778ce527575c545a864ccbc3d98e3502fbb2a2/gateway/src/main/java/com/bakdata/quick/gateway/fetcher/RangeQueryFetcher.java) is created. This class will be later used to deliver a result of a range query to the user. @@ -107,10 +118,12 @@ a request is sent to the Manager. The Manager prepares the deployment of a mirror, which contains both [Point Index Processor](https://github.com/bakdata/quick/blob/6fed9f20f237663cc00e3359de92efaf40307f28/mirror/src/main/java/com/bakdata/quick/mirror/point/MirrorProcessor.java) and [Range Index Processor](https://github.com/bakdata/quick/blob/6fed9f20f237663cc00e3359de92efaf40307f28/mirror/src/main/java/com/bakdata/quick/mirror/range/MirrorRangeProcessor.java). -Each time a new value is sent to the topic, both processors are called. +Each time a new value is sent to the topic, +both processors are called. The first one creates a new key-value pair if the specified key does not exist. -If it does, the value for the given key is overwritten (precisely as described above). +If it does, the value for the given key is overwritten +(precisely as described above). If the key exists, but you specify `null` as the value, the key and the corresponding (previous) value will be deleted from the state store. The second processor creates the range index in the way that was @@ -122,8 +135,8 @@ When you prepare a range query, you provide two additional parameters in the entry point. These attributes define your range. After you have executed the query, it hits the gateway. -There, it is processed by the [RangeQueryFetcher](https://github.com/bakdata/quick/blob/c8778ce527575c545a864ccbc3d98e3502fbb2a2/gateway/src/main/java/com/bakdata/quick/gateway/fetcher/RangeQueryFetcher.java). -_RangeQueryFetcher_ is responsible for extracting the information +There, it is processed by the [`RangeQueryFetcher`](https://github.com/bakdata/quick/blob/c8778ce527575c545a864ccbc3d98e3502fbb2a2/gateway/src/main/java/com/bakdata/quick/gateway/fetcher/RangeQueryFetcher.java). +`RangeQueryFetcher` is responsible for extracting the information about the range from the query you passed. Having collected the necessary data (information about the key, the start of the range, diff --git a/docs/docs/user/getting-started/working-with-quick/range-query.md b/docs/docs/user/getting-started/working-with-quick/range-query.md index 2626c01f..61acb4f4 100644 --- a/docs/docs/user/getting-started/working-with-quick/range-query.md +++ b/docs/docs/user/getting-started/working-with-quick/range-query.md @@ -1,14 +1,21 @@ # Range queries -We now extend the [e-commerce example](query-data.md) with user ratings. -Thus, users can then rank their purchases. -This allows the company to find purchases that did not satisfy customers. -It could then provide promo codes to the unhappy ones. +We now extend the `Product` type +from the [e-commerce example](query-data.md) +with time information. +This allows a company to analyse +the development of the price over time. +Using this information, +a company could investigate +which factors might have influenced +the price of a specific product. -The company could fetch all purchases and filter them accordingly to find disappointing purchases. -However, range queries allow you to specify a specific range of bad ratings -(say, from 1 to 4 on a 10-point grading scale) -and receive the corresponding records immediately. +The company could fetch all products +and filter them accordingly +to find the desired product's prices in a given period. +However, range queries allow +specifying a product id and a time-range +to retrieve the corresponding records immediately. To integrate range queries into your application, you must take the following steps: @@ -22,39 +29,46 @@ To integrate range queries into your application, you must take the following st To introduce range queries, we will extend the previous schema as follows: ```graphql title="schema.gql" type Query { - userRatings( - userId: Int - ratingFrom: Int - ratingTo: Int - ): [UserRating] @topic(name: "user-rating-range", - keyArgument: "userId", - rangeFrom: "ratingFrom", - rangeTo: "ratingTo") + productPriceInTime( + productId: Int + timestampFrom: Int + timestampTo: Int + ): [Product] @topic(name: "product-price-range", + keyArgument: "productId", + rangeFrom: "timestampFrom", + rangeTo: "timestampTo") } -type UserRating { - userId: Int! - purchaseId: String! - purchase: Purchase @topic(name: "purchase", keyField: "purchaseId") - rating: Int + +type Product { + productId: Int! + name: String + description: String + price: Price + timestamp: Int +} + +type Price { + total: Float + currency: String } ``` -Let's start with the new type called `UserRating`. -It describes a numerical rating a given user assigns -to a specific purchase previously made (identified by `purchaseId`). +As you can see, the `Product` type has been extended. +It contains a timestamp that can describe +the price of the product at a given time. However, the most notable changes are in the `Query` type. -First, (`userRatings`) has new fields: `ratingFrom` and `ratingTo`. +First, the query (`productPriceInTime`) has new fields: `timestampFrom` and `timestampTo`. Second, the `@topic` directive has changed: -In the query `userRatings`, you declare the two fields that describe your desired range -(here, the rating range). +In the query `productPriceInTime`, you declare the two fields that describe your desired range +(here, the timestamp range). These field values are later assigned to two new parameters of the `@topic` directive, `rangeFrom` and `rangeTo` respectively. -In our example, `ratingFrom` and `ratingTo` follow the naming scheme _field**From**_ and _field**To**_ +In our example, `timestampFrom` and `timestampTo` follow the naming scheme _field**From**_ and _field**To**_ where _field_ is the field declared in the topic creation command (see later step 3). Following this convention is not mandatory. You can name the parameters that define your range as you wish. -However, we think that following this pattern increases readability. +However, we suggest to follow this pattern to increase readability. When you execute a range query, you receive a list of entries. Therefore, the return type of the query is a list of _UserRating_. @@ -70,14 +84,22 @@ quick gateway apply example -f schema.gql To use range queries, you must set the `--range-field` parameter when creating the topic. Under the hood, Quick creates additional data structures that enable the execution of range queries. -Use the Quick CLI as follows: +!!! Note + Because of the change in the `Product` type, + you must delete the `product` topic + (if you created it before) + and create it again. + To delete the topic, + use the following command: + `quick topic delete product` +To create a topic with the new parameter, use the Quick CLI as follows: ``` -quick topic create user-rating-range --key int --value schema --schema example.UserRating --range-field rating +quick topic create product-price-range --key int --value schema --schema example.Product --range-field timestamp ``` Note that `--range-field` links a particular field you can later use for range queries. -In our example, the `rating` field of the `UserRating` is linked with a range. -Tha changes in the `Query` described above refer to this field you define here with `--range-field`. +In our example, the `timestamp` field of the `Product` is linked with a range. +The changes in the `Query` described above refer to this field you define here with `--range-field`. `--range-field` is an optional flag. If you do not specify it, Quick can solely return values for a given key. @@ -94,106 +116,158 @@ visit the developer [section on ranges](../../../developer/range-query-details.m ## Execute the query -Before executing our range query, we need some data ;) -You can send purchases and ratings into Quick using [the ingest service](ingest-data.md). -If you followed the previous parts of this guide, -you should already have data in the `purchase` topic. -If you didn't, please complete the [section about ingesting data](ingest-data.md) -and add some purchases: +Before executing our range query, you need some data ;) +You can send products into Quick using [the ingest service](ingest-data.md). -The command below sends ratings to the `user-rating-range` topic. +The command below sends products to the `product-price-range` topic. ```shell - curl --request POST --url "$QUICK_URL/ingest/user-rating-range" \ + curl --request POST --url "$QUICK_URL/ingest/product-price-range" \ --header "content-type:application/json" \ --header "X-API-Key:$QUICK_API_KEY"\ - --data "@./ratings.json" + --data "@./products.json" ``` -Here is an example of the `ratings.json` file: -??? "Example `ratings.json`" - ``` +Here is an example of the `products.json` file: +??? "Example `products.json`" + ``` [ { - "key": 1, + "key": 111, "value": { - "userId": 1, - "purchaseId": "abc", - "rating": 7 + "productId": 111, + "name": "T-Shirt", + "description": "black", + "price": { + "total": 14.99, + "currency": "DOLLAR" + }, + "timestamp": 1 } }, { - "key": 2, + "key": 111, "value": { - "userId": 2, - "purchaseId": "def", - "rating": 2 + "productId": 111, + "name": "T-Shirt", + "description": "black", + "price": { + "total": 19.99, + "currency": "DOLLAR" + }, + "timestamp": 2 } }, { - "key": 2, + "key": 222, "value": { - "userId": 2, - "purchaseId": "ghi", - "rating": 6 + "productId": 222, + "name": "Jeans", + "description": "Non-stretch denim", + "price": { + "total": 79.99, + "currency": "EURO" + }, + "timestamp": 1 } }, { - "key": 2, + "key": 333, "value": { - "userId": 2, - "purchaseId": "jkl", - "rating": 1 + "productId": 333, + "name": "Shoes", + "description": "Sneaker", + "price": { + "total": 99.99, + "currency": "DOLLAR" + }, + "timestamp": 1 } + }, + { + "key": 111, + "value": { + "productId": 111, + "name": "T-Shirt", + "description": "black", + "price": { + "total": 24.99, + "currency": "DOLLAR" + }, + "timestamp": 3 + } + }, + { + "key": 222, + "value": { + "productId": 222, + "name": "Jeans", + "description": "Non-stretch denim", + "price": { + "total": 99.99, + "currency": "EURO" + }, + "timestamp": 2 + } + }, + { + "key": 111, + "value": { + "productId": 111, + "name": "T-Shirt", + "description": "black", + "price": { + "total": 29.99, + "currency": "DOLLAR" + }, + "timestamp": 4 + } } ] ``` -Let's now find purchases the client with `userId=2` was unsatisfied with. -Assuming that a disappointing purchase has a rating lower than 5, -you can execute the following query to obtain the results. + +Let's now find the prices for product `111` in the time-window `1` to `3`. +!!! Note + The upper bound of a range is exclusive. + Therefore, we use `timestampTo:4`. ```graphql query { - userRatings(userId: 2, ratingFrom:1, ratingTo:4) { - userId - rating - purchase { - purchaseId - productId - price { - total - currency - } - } + productPriceInTime(productId:111, timestampFrom:1, timestampTo:4) { + productId, + price + { + total } + timestamp + } } ``` -Here you go - this is the list of poorly rated purchases. + +Here you go - this is the list of the desired products. ```json [ { - "userId": 2, - "rating": 2, - "purchase": { - "purchaseId": "def", - "productId": 123, - "price": { - "total": 30, - "currency": "DOLLAR" - } - } + "productId": 111, + "price": { + "total": 14.99 + }, + "timestamp": 1 }, { - "userId": 2, - "rating": 4, - "purchase": { - "purchaseId": "jkl", - "productId": 456, - "price": { - "total": 99.99, - "currency": "DOLLAR" - } - } + "productId": 111, + "price": { + "total": 19.99 + }, + "timestamp": 2 + }, + { + "productId": 111, + "price": { + "total": 24.99 + }, + "timestamp": 3 } ] ``` + ## Limitations The following listing describes the limitations of the current range queries implementation: