Skip to content

Commit

Permalink
tutorial-02: updates response examples
Browse files Browse the repository at this point in the history
- tutorial 02 typos and wording
- tutorial 07 typos and wording
- child of inveniosoftware#50
  • Loading branch information
topless authored and Pablo Panero committed May 13, 2020
1 parent 433c537 commit 735e372
Show file tree
Hide file tree
Showing 6 changed files with 68 additions and 46 deletions.
57 changes: 39 additions & 18 deletions 02-invenio-tour/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ In this tutorial, we will explore Invenio from a user's perspective. We will
see the different parts of the user interface, explore the REST API and create
and search for records.

### Table of Contents
## Table of Contents

- [Step 1: Prerequisites](#step-1-prerequisites)
- [Step 2: Register a user](#step-2-register-a-user)
Expand Down Expand Up @@ -88,16 +88,23 @@ $ curl -k --header "Content-Type: application/json" \
https://localhost:5000/api/records/?prettyprint=1

{
"created": "2019-03-15T12:22:19.497592+00:00",
"created": "2020-05-12T00:28:31.140277+00:00",
"id": "1",
"links": {"self": "https://localhost:5000/api/records/1"},
"links": {
"files": "https://localhost:5000/api/records/1/files",
"self": "https://localhost:5000/api/records/1"
},
"metadata": {
"contributors": [{"name": "Doe, John"}],
"contributors": [
{
"name": "Doe, John"
}
],
"id": "1",
"title": "Some title"
},
"revision": 0,
"updated": "2019-03-15T12:22:19.497596+00:00"
"updated": "2020-05-12T00:28:31.140284+00:00"
}
```

Expand All @@ -108,17 +115,24 @@ request:
$ curl -k https://localhost:5000/api/records/1?prettyprint=1

{
"created": "2019-03-15T12:22:19.497592+00:00",
"created": "2020-05-12T00:28:31.140277+00:00",
"id": "1",
"links": {"self": "https://localhost:5000/api/records/1"},
"links": {
"files": "https://localhost:5000/api/records/1/files",
"self": "https://localhost:5000/api/records/1"
},
"metadata": {
"contributors": [{"name": "Doe, John"}],
"contributors": [
{
"name": "Doe, John"
}
],
"id": "1",
"title": "Some title"
},
"revision": 0,
"updated": "2019-03-15T12:22:19.497596+00:00"
}
"updated": "2020-05-12T00:28:31.140284+00:00"
}%
```

We can search through all records by making a `GET /api/records/` request:
Expand All @@ -142,22 +156,29 @@ $ curl -k https://localhost:5000/api/records/?prettyprint=1
"hits": {
"hits": [
{
"created": "2019-03-15T12:22:19.497592+00:00",
"created": "2020-05-12T00:28:31.140277+00:00",
"id": "1",
"links": {"self": "https://localhost:5000/api/records/1"},
"links": {
"files": "https://localhost:5000/api/records/1/files",
"self": "https://localhost:5000/api/records/1"
},
"metadata": {
"contributors": [{"name": "Doe, John"}],
"contributors": [
{
"name": "Doe, John"
}
],
"id": "1",
"title": "Some title"
},
"revision": 0,
"updated": "2019-03-15T12:22:19.497596+00:00"
"updated": "2020-05-12T00:28:31.140284+00:00"
}
],
"total": 1
},
"links": {
"self": "https://localhost:5000/api/records/?page=1&sort=mostrecent&size=10"
"self": "https://localhost:5000/api/records/?sort=mostrecent&size=10&page=1"
}
}
```
Expand All @@ -167,8 +188,8 @@ address this in later sessions.

## Step 5: Search and Record UI

Of the REST API is not the only way to display information on records. If you
navigate to the frontpage and click the search button you will go the
The REST API is not the only way to display information on records. If you
navigate to the front page and click the search button you will go the
records search page:

![](./images/frontpage-search.png)
Expand All @@ -183,7 +204,7 @@ Let's create some more records, to demonstrate the querying capabilities:
![](./images/search-more-records.png)

Let's say, we want to get all of the records written by "Smith" we could
naively type `Smith` in the searchbox, but that would give us all records that
naively type `Smith` in the search box, but that would give us all records that
contain the text "Smith" in any of their fields (even the title):

![](./images/search-query.png)
Expand Down
39 changes: 20 additions & 19 deletions 07-data-models-new-field/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Tutorial 07 - Data models: Adding a new field

The goal of this tutorial is to learn how to update your datamodel. We will show how you going to
The goal of this tutorial is to learn how to update your data model. We will show how you going to
update your [`JSONSchema`](https://json-schema.org/) to store a new field in the DB and your ES mapping so you can search for it.
Moreover we will learn how [`Marshmallow`](https://marshmallow.readthedocs.io) schema can be used to validate your data.
Moreover, we will learn how [`Marshmallow`](https://marshmallow.readthedocs.io) schema can be used to validate your data.

### Table of Contents

Expand Down Expand Up @@ -51,9 +51,9 @@ We edit the `my_site/records/jsonschemas/records/record-v1.0.0.json` file:

## Step 3: Update the Elasticsearch mapping

Now our system can validate and store our new field correctly in the DB. Now we want to enable search of a record by this new field. For this purpose we need to update the mapping of our ES index in order to add our new field. By doing that we let ES know how to handle our new field(field type, searchable, analyzable, etc.).
Now our system can validate and store our new field correctly in the DB. Now we want to enable search of a record by this new field. For this purpose, we need to update the mapping of our ES index in order to add our new field. By doing that we let ES know how to handle our new field(field type, searchable, analyzable, etc.).

So, in order to update the mapping we edit the `my_site/records/mappings/v6/records/record-v1.0.0.json` file:
So, in order to update the mapping we edit the `my_site/records/mappings/v7/records/record-v1.0.0.json` file:

```diff
"properties": {
Expand Down Expand Up @@ -100,7 +100,7 @@ Now in order to **reflect our changes** in our system we will have to run the fo
$ ./scripts/setup
```

With that we start fresh our DB and ES along with the updated information about schemas and mappings.
We have created and started a new DB and ES along with the updated schemas and mappings.

**Checkpoint 1**: At this point we are able to create a new record in our system that includes our new field. Let's do this!

Expand All @@ -125,10 +125,10 @@ After executing the command you should see in your console the following output:
{"status": 400, "message": "Validation error.", "errors": [{"field": "owner", "message": "Not a valid integer."}]}
```

It seems that our request wasn't successfull. By checking again the error message we can see that in our request
It seems that our request wasn't successful. By checking again the error message we can see that in our request
the `owner` field has a `string` value rather than an `integer`. But who validated that?

If you remember we talked earlier about `loaders` and specifically we updated our `marshmallow` schema. But how are these related? To answer that let's talk about what is the responsibility of the `loaders`. It's purpose is to load the data which is passed when doing a request to create a new record, validate them by using our `marshmallow` schema and transform them in our internal representation.
If you remember we talked earlier about `loaders` and specifically we updated our `marshmallow` schema. But how is it related? To answer that let's talk about what is the responsibility of the `loaders`. Its purpose is to load the data received in the create a new record request, validate it using our `marshmallow` schema and transform it into our internal representation.

By having that in mind, before when we did our request our loader used the marshmallow `MetadataSchemaV1` schema to validate the incoming data and noticed that the owner field isn't an integer as it was declared so it threw an error.

Expand All @@ -145,23 +145,25 @@ Now you should see an output similar to the below:

```json
{
"created": "2019-03-13T10:39:57.345889+00:00",
"id": "2",
"_bucket": "60f2b083-8f7b-4aba-a00e-09e3bb3e12af",
"created": "2019-11-25T07:41:44.620275+00:00",
"id": "1",
"links": {
"self": "https://localhost:5000/api/records/2"
"files": "https://localhost:5000/api/records/1/files",
"self": "https://localhost:5000/api/records/1"
},
"metadata": {
"contributors": [
{
"name": "Doe, John"
}
],
"id": "2",
"id": "1",
"owner": 1,
"title": "Some title"
},
"revision": 0,
"updated": "2019-03-13T10:39:57.345895+00:00"
"updated": "2019-11-25T07:41:44.620282+00:00"
}
```
**Tip**: Save somewhere the `id` value of this response!
Expand All @@ -178,9 +180,8 @@ Our new record was successfully created!
./scripts/server
```

Let's search now for our newly created record. Replace the `<id>` with the `id` of the
record we had created in the previous step. Run the following command:

Let's search now for our newly created record. Replace the `<id>` with the actual `id` of the
record we created in the previous step. Run the following command:

```bash
$ curl -k "https://localhost:5000/api/records/?q=owner:<id>"
Expand Down Expand Up @@ -219,13 +220,13 @@ $ curl -k "https://localhost:5000/api/records/?q=owner:<id>"
## Step 7: Manipulate response using serializers

Here you can see the data returned from the search regarding our record. The output of each result is controlled
by our `serializers`. These entities are responsible for getting the internal representation of our data and transform
it in an output that the users of our system will see when they will use our api.
by our `serializers`. These entities are responsible for getting the internal representation of our data and transforming
it into an output that the users of our system will see when they will use our API.

The `serializers` are using also a `Marshmallow` schema to validate the internal data and define which information should be returned. This means that if we want to hide some data and don't return every available information all we need to do is just
The `serializers` are also using a `Marshmallow` schema to validate the internal data and define which information should be returned. This means that if we want to hide some data and don't return every available information all we need to do is just
not define it in the `Marshmallow` schema.

For example, from the above output we want to display only when the record was `updated` and not when was `created`. In order to
For example, from the above output, we want to display only when the record was `updated` and not when was `created`. In order to
do that we need to update our `RecordSchemaV1` schema as below:

```diff
Expand Down
18 changes: 9 additions & 9 deletions 08-data-models-from-scratch/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Tutorial 08 - Data models: Build from scratch

In this session we will learn how to build a new data model from scratch. During that
process we will see how to create a new REST **module** for our model and provide functonalities
process we will see how to create a new REST **module** for our model and provide functionalities
such as storing and searching.

### Table of contents
Expand Down Expand Up @@ -54,7 +54,7 @@ First thing we need to do is to create an extension called `Authors` and registe

## Step 3: Internal representation: JSONSchema and Elasticsearch mappings

Now that we have our extension registered, we need to tell Invenio how the internal representation of our data model is. To do so, we use [a JSONSchema](author_module/my_site/authors/jsonschemas/authors/author-v1.0.0.json) and [an Elasticsearch mapping](author_module/my_site/authors/mappings/v6/authors/author-v1.0.0.json): the former to validate the internal JSON format and the latter to tell Elasticsearch what shape our data model has so it can handle correctly its values.
Now that we have our extension registered, we need to tell Invenio how the internal representation of our data model is. To do so, we use [a JSONSchema](author_module/my_site/authors/jsonschemas/authors/author-v1.0.0.json) and [an Elasticsearch mapping](author_module/my_site/authors/mappings/v7/authors/author-v1.0.0.json): the former to validate the internal JSON format and the latter to tell Elasticsearch what shape our data model has so it can handle correctly its values.


### Actions
Expand All @@ -81,12 +81,12 @@ Now that we have our extension registered, we need to tell Invenio how the inter
+ search_index='authors',
```

By doing this we told Invenio to register our new schema and mapping. We are also defining the name of the Elasticsearch index which will be created to enable author search.
By doing this we told Invenio to register our new schema and mapping. We are also defining the name of the Elasticsearch index which will be created to enable author search.


## Step 4: External representation: loaders and serializers

So far we have a new extension which defines how our data model is **stored** and **searchable**, but have not yet provided means to transform this data when its received or served by Invenio. To do so, we will introduce two new concepts: **loaders** whose responsibility is to transform incoming data to the internal format, and **serializers** which will be in charge of transforming the internal data to a different format, based on our needs.
So far we have a new extension which defines how our data model is **stored** and **searchable**, but have not yet provided means to transform this data when it's received or served by Invenio. To do so, we will introduce two new concepts: **loaders** whose responsibility is to transform incoming data to the internal format, and **serializers** which will be in charge of transforming the internal data to a different format, based on our needs.

### Actions

Expand Down Expand Up @@ -138,9 +138,9 @@ For creating and registering the **search serializers** we should:
+ },
```

During the first step we registered our **loader** in the configuration of our new `authors` endpoint. Now every time we try to create a new author the loader is going to transform the incoming data to match the internal representation of an author document in our system.
During the first step, we registered our **loader** in the configuration of our new `authors` endpoint. Now every time we try to create a new author the loader is going to transform the incoming data to match the internal representation of an author document in our system.

In the upcoming steps we created and registered our serializers. We splitted them in two categories: **Record serializers** and **Search serializers**. The first are used to **serialize** the internal representation of one specific record(e.g author) while the latter are transforming each record result of a search. They are capable of doing that by using again a `Marshmallow` schema which we will explain in detail in the next section.
In the upcoming steps, we created and registered our serializers. We split them into two categories: **Record serializers** and **Search serializers**. The first is used to **serialize** the internal representation of one specific record (e.g author) while the latter is transforming each record result of a search. They are capable of doing that by using again a `Marshmallow` schema which we will explain in detail in the next section.


## Step 5: Data validation: Marshmallow
Expand All @@ -151,14 +151,14 @@ In the previous section we have configured loaders and serializers but we also s

- Uncomment the code in the `my_site/authors/marshmallow/json.py`

Here we have added two classes which we made reference in the previous step, `AuthorMetadataSchemaV1` and `AuthorSchemaV1`. The first will take care of validating in coming author metadata and the second will take care of validating the author output format. Marshmallow is not mandatory, but highly recommended since it can do from simple validations to complex ones, for more information visit [Marshmallow documentation](https://marshmallow.readthedocs.io/en/2.x-line/).
Here we have added two classes which we made reference in the previous step, `AuthorMetadataSchemaV1` and `AuthorSchemaV1`. The first will take care of validating incoming author metadata and the second will take care of validating the author output format. Marshmallow is not mandatory, but highly recommended since it can do from simple validations to complex ones, for more information visit [Marshmallow documentation](https://marshmallow.readthedocs.io/en/2.x-line/).


## Step 6: Persistent identifiers

So far we have only cared about our content and its format, but we need to provide a way to retrieve our records. We are doing this by using PIDs, and the difference with normal IDs is that they do not change over time to avoid broken references.

Having identifiers which do not change over time adds certain complexity to the system. We need to have a way of generating new PIDs, which what we will reference as **minters** and we will also need a way of indentifying the PID inside the record metadata, this is what **fetchers** do.
Having identifiers which do not change over time adds certain complexity to the system. We need to have a way of generating new PIDs, which what we will reference as **minters** and we will also need a way of identifying the PID inside the record metadata, this is what **fetchers** do.

### Actions

Expand Down Expand Up @@ -193,7 +193,7 @@ default_endpoint_prefix=True,

This is how we are registering our new minter and fetcher making them available.

**Important**: the value of the `pid_minter` and the `pid_fetcher` defined in `config.py` should match exactly with the entrypoint names defined in `setup.py`. Also we should make sure that the `pid_type` value and the `RECORDS_REST_ENDPOINTS` endpoint key match exactly.
**Important**: the value of the `pid_minter` and the `pid_fetcher` defined in `config.py` should match exactly with the entrypoint names defined in `setup.py`. Also, we should make sure that the `pid_type` value and the `RECORDS_REST_ENDPOINTS` endpoint key match exactly.


## Step 7: Create an author
Expand Down

0 comments on commit 735e372

Please sign in to comment.