Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADD section to moleling tips about right id modeling #53

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 106 additions & 0 deletions docs/topics/modelling_considerations.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,112 @@
There are some considerations to make when designing how to model a project in the platform. This document will give you
some hints to avoid most common mistakes (use them as hints to guide your modelling, not as strict rules).

## Modeling your IDs in the right way

Entity ids and attribute names should be like *real* IDs. In other words, using whitespaces, accents or
any other funny weird character in ID strings is a really bad idea. In fact, although that is allowed in the NGSIv1
API (due to legacy reasons), it is forbidden in the NGSIv2 version of the API (check the "Field syntax restrictions"
section in the [NGSIv2 specification document](http://telefonicaid.github.io/fiware-orion/api/v2/stable) for details).

Why this is a bad idea? There are several reasons:

* Take into account that the IoT Platform would use that identifiers (or strings derived from that identifiers) in
places where such characters are not allowed. For example, some persistence backends are based in databases which
doesn't accept whitespaces or non-ASCII characters in table databases.
* IDs may appear as part of URLs (e.g. the URL identifying an entity at [Context Broker](../context_broker.md) and
using non-ASCII characters in that places makes these URL more complex.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably this is the weaker part of the PR. More arguments or reinforcement of the existing ones are welcome.


Sometimes you may think you need to use ids with whitespaces and non-ASCII characters to render that information
correctly, e.g. in a graphic-user interface. For instance, you have an entity that you want to show as
"Row 12/Seat B" with an attribute "Occupation status" in your application and you may think that
modeling in the following way is a good idea:


{
"type": "Seat",
"isPattern": "false",
"id": "Row 12/Seat B",
"attributes": [
{
"name": "Occupation status",
"type": "String",
"value": "occupied"
},
...
}
]
}

However, it is not a good idea. If you need descriptive texts for entities or attributes, then use specific
attributes and metadata for them respectively, which values are not ids and doesn't have any of the problems
described above. Taking that into account, you could model in the following for the example above:

{
"type": "Seat",
"isPattern": "false",
"id": "Row12SeatB",
"attributes": [
{
"name": "description",
"type": "String",
"value": "Row 12/Seat B"
},
{
"name": "status",
"type": "String",
"value": "occupied",
"metadata": [
{
"name": "description",
"type": "String",
"value": "Occupation status"
}
]
},
...
}
]
}

As a general guideline, you should use identifiers with the following properties:

* They **must** be unique: It's better to have globally unique IDs if that's possible, but, for the cases
where they aren't, they should be at least unique at the service level. It's also important to design
the process of ID assignment so that the probability of generating an ID collision is as lower as
possible (i.e.: it's better to have a 16 bytes hexadecimal UUID than an 8bit integer).

* They **should** never change (or do it under extraordinary circumstances): the ID uniquely identifies
your entities, and not only the Context Broker, but potentially multiple other systems may use it
to identify objects associated to it (e.g. this specially affects the persistence backends). That
turns any change in the ID into a potential migration of data in multiple systems, with it associated
(usually very large) costs.

* They **should not** be tied to the data: as that bound would make it easier to brake any of the
two first rules. Even if you are completely sure that identifying your users with their Driver Licenses
is unique and immutable, chances are that the Government choose to change it; use a UUID instead.
That will ensure uniqueness and, since the UUID only belongs to the system, you will be the one
who decides when and how it may change (if it is allowed to do it at all). However, note that we
are not using UUID in this documentation for didactic reasons but in real usage use case
you should consider this recommendation.

* (*) They **should not** use the underscore (`_`) character: although accepted by context broker, it is a
bad idea using it as part of your IDs since the persistence backends use the underscore too for special
purposes. On the one hand, it is used as concatenator character. On the other hand, it is used as
replacement character when a character within the ID is not accepted by the persistence backends.

* (*) They **should** avoid using uppercase letters when using PostgreSQL-based persistence backends, e.g.
CKAN (or Carto in the future): they usually convert uppercase letters into lowercase. This means IDs
such as `Car` and `car` will be different at context broker level, but the same at persistence backend
level. Of course, if you are not considering using PostgreSQL-based persistence backends, ignore this
advice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a fouth point about the unconvenience of using "_" in IDs should be included. Taking into account that the part that most has "suffered" this in the past (and present :) in the one about persistence done by Cygnus probably @frbattid is the best one to contribute.

The above consideration applies to entity ids and attribute names but also to other pieces of context
information which take the role of an ID, in particular to entity types, attribute types, metadata names and
metadata types.

(*) This guideline won't make sense once the new encoding is enabled in the IoT Platform. Such a new
encoding uses a concatenator different than underscore and not accepted characters (including uppercase letters) are encoded following Unicode format.

## The IoT Platform is centered in context

The central piece of the IoT Platform is the Context Broker: a component that lets you store and query information
Expand Down