Skip to content
This repository has been archived by the owner on May 10, 2023. It is now read-only.

Commit

Permalink
data models scratch: review and update for Invenio v3.2 and ES 7
Browse files Browse the repository at this point in the history
  • Loading branch information
topless authored and Pablo Panero committed May 14, 2020
1 parent 49e90d9 commit 677b424
Show file tree
Hide file tree
Showing 10 changed files with 123 additions and 141 deletions.
16 changes: 6 additions & 10 deletions 08-data-models-from-scratch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ In this session we will learn how to build a new data model from scratch. During
process we will see how to create a new REST **module** for our model and provide functionalities
such as storing and searching.

### Table of contents
## Table of contents

- [Step 1: Bootstrap exercise](#step-1-bootstrap-exercise)
- [Step 2: Create an Authors flask extension](#step-2-create-an-Authors-flask-extension)
Expand Down Expand Up @@ -42,6 +42,7 @@ First thing we need to do is to create an extension called `Authors` and registe

- Uncomment the code we find in the `my_site/authors/ext.py`
- Uncomment in the `setup.py` the following section:

```diff
'invenio_base.api_apps': [
'my_site = my_site.records:Mysite',
Expand All @@ -56,7 +57,6 @@ First thing we need to do is to create an extension called `Authors` and registe

Now that we have our extension registered, we need to tell Invenio how the internal representation of our data model is. To do so, we use [a JSONSchema](author_module/my_site/authors/jsonschemas/authors/author-v1.0.0.json) and [an Elasticsearch mapping](author_module/my_site/authors/mappings/v7/authors/author-v1.0.0.json): the former to validate the internal JSON format and the latter to tell Elasticsearch what shape our data model has so it can handle correctly its values.


### Actions

- Uncomment the entrypoints in `setup.py`:
Expand All @@ -83,7 +83,6 @@ Now that we have our extension registered, we need to tell Invenio how the inter

By doing this we told Invenio to register our new schema and mapping. We are also defining the name of the Elasticsearch index which will be created to enable author search.


## Step 4: External representation: loaders and serializers

So far we have a new extension which defines how our data model is **stored** and **searchable**, but have not yet provided means to transform this data when it's received or served by Invenio. To do so, we will introduce two new concepts: **loaders** whose responsibility is to transform incoming data to the internal format, and **serializers** which will be in charge of transforming the internal data to a different format, based on our needs.
Expand Down Expand Up @@ -142,7 +141,6 @@ During the first step, we registered our **loader** in the configuration of our

In the upcoming steps, we created and registered our serializers. We split them into two categories: **Record serializers** and **Search serializers**. The first is used to **serialize** the internal representation of one specific record (e.g author) while the latter is transforming each record result of a search. They are capable of doing that by using again a `Marshmallow` schema which we will explain in detail in the next section.


## Step 5: Data validation: Marshmallow

In the previous section we have configured loaders and serializers but we also started to configure our first validation check by making reference to two Marshmallow schemas. These schemas will make sure that the data has the correct format both when it arrives to the system and when it is returned to the user.
Expand All @@ -153,7 +151,6 @@ In the previous section we have configured loaders and serializers but we also s

Here we have added two classes which we made reference in the previous step, `AuthorMetadataSchemaV1` and `AuthorSchemaV1`. The first will take care of validating incoming author metadata and the second will take care of validating the author output format. Marshmallow is not mandatory, but highly recommended since it can do from simple validations to complex ones, for more information visit [Marshmallow documentation](https://marshmallow.readthedocs.io/en/2.x-line/).


## Step 6: Persistent identifiers

So far we have only cared about our content and its format, but we need to provide a way to retrieve our records. We are doing this by using PIDs, and the difference with normal IDs is that they do not change over time to avoid broken references.
Expand Down Expand Up @@ -195,7 +192,6 @@ This is how we are registering our new minter and fetcher making them available.

**Important**: the value of the `pid_minter` and the `pid_fetcher` defined in `config.py` should match exactly with the entrypoint names defined in `setup.py`. Also, we should make sure that the `pid_type` value and the `RECORDS_REST_ENDPOINTS` endpoint key match exactly.


## Step 7: Create an author

In order to reflect our changes in the database and Elasticsearch but also to register our new entrypoints in Invenio we need to run the following commands:
Expand All @@ -211,15 +207,15 @@ We can now create new authors:
```bash
$ curl -k --header "Content-Type: application/json" \
--request POST \
--data '{"name":"Zacharias"}' \
--data '{"name":"John Doe"}' \
https://127.0.0.1:5000/api/authors/\?prettyprint\=1

{
"created": "2019-03-17T16:01:07.148176+00:00",
"id": "1",
"metadata": {
"id": "1",
"name": "Zacharias"
"name": "John Doe"
},
"updated": "2019-03-17T16:01:07.148181+00:00"
}
Expand All @@ -235,7 +231,7 @@ $ curl -k "https://127.0.0.1:5000/api/authors/?prettyprint=1"
"buckets": [
{
"doc_count": 1,
"key": "Zacharias"
"key": "John Doe"
}
],
"doc_count_error_upper_bound": 0,
Expand All @@ -249,7 +245,7 @@ $ curl -k "https://127.0.0.1:5000/api/authors/?prettyprint=1"
"id": "1",
"metadata": {
"id": "1",
"name": "Zacharias"
"name": "John Doe"
},
"updated": "2019-03-17T15:55:53.927761+00:00"
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,3 @@
in Elasticsearch. You need to provide one mapping per major version of
Elasticsearch you want to support.
"""

from __future__ import absolute_import, print_function

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,4 @@
# My site is free software; you can redistribute it and/or modify it under
# the terms of the MIT License; see LICENSE file for more details.

"""Mappings for Elasticsearch 5.x."""

from __future__ import absolute_import, print_function
"""Mappings for Elasticsearch 7.x."""
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
{
"mappings": {
"date_detection": false,
"numeric_detection": false,
"properties": {
"$schema": {
"type": "text",
"index": false
},
"id": {
"type": "keyword"
},
"name": {
"type": "keyword"
},
"organization": {
"type": "keyword"
},
"_created": {
"type": "date"
},
"_updated": {
"type": "date"
}
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,3 @@
in Elasticsearch. You need to provide one mapping per major version of
Elasticsearch you want to support.
"""

from __future__ import absolute_import, print_function
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,4 @@
# My site is free software; you can redistribute it and/or modify it under
# the terms of the MIT License; see LICENSE file for more details.

"""Mappings for Elasticsearch 5.x."""

from __future__ import absolute_import, print_function
"""Mappings for Elasticsearch 7.x."""
Original file line number Diff line number Diff line change
@@ -1,29 +1,27 @@
{
"mappings": {
"author-v1.0.0": {
"date_detection": false,
"numeric_detection": false,
"properties": {
"$schema": {
"type": "text",
"index": false
},
"id": {
"type": "keyword"
},
"name": {
"type": "text"
},
"organization": {
"type": "text"
},
"_created": {
"type": "date"
},
"_updated": {
"type": "date"
}
"date_detection": false,
"numeric_detection": false,
"properties": {
"$schema": {
"type": "text",
"index": false
},
"id": {
"type": "keyword"
},
"name": {
"type": "text"
},
"organization": {
"type": "text"
},
"_created": {
"type": "date"
},
"_updated": {
"type": "date"
}
}
}
}
}

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
{
"mappings": {
"date_detection": false,
"numeric_detection": false,
"properties": {
"$schema": {
"type": "text",
"index": false
},
"title": {
"type": "text",
"copy_to": "suggest_title"
},
"suggest_title": {
"type": "completion"
},
"id": {
"type": "keyword"
},
"owner": {
"type": "integer"
},
"keywords": {
"type": "keyword"
},
"publication_date": {
"type": "date",
"format": "date"
},
"contributors": {
"type": "object",
"properties": {
"ids": {
"type": "object",
"properties": {
"source": {
"type": "text"
},
"value": {
"type": "keyword"
}
}
},
"affiliations": {
"type": "text"
},
"role": {
"type": "keyword"
},
"email": {
"type": "text"
},
"name": {
"type": "text"
}
}
},
"_created": {
"type": "date"
},
"_updated": {
"type": "date"
}
}
}
}

0 comments on commit 677b424

Please sign in to comment.