diff --git a/docs/topics/modelling_considerations.md b/docs/topics/modelling_considerations.md index cfa9702..e327550 100644 --- a/docs/topics/modelling_considerations.md +++ b/docs/topics/modelling_considerations.md @@ -3,6 +3,112 @@ There are some considerations to make when designing how to model a project in the platform. This document will give you some hints to avoid most common mistakes (use them as hints to guide your modelling, not as strict rules). +## Modeling your IDs in the right way + +Entity ids and attribute names should be like *real* IDs. In other words, using whitespaces, accents or +any other funny weird character in ID strings is a really bad idea. In fact, although that is allowed in the NGSIv1 +API (due to legacy reasons), it is forbidden in the NGSIv2 version of the API (check the "Field syntax restrictions" +section in the [NGSIv2 specification document](http://telefonicaid.github.io/fiware-orion/api/v2/stable) for details). + +Why this is a bad idea? There are several reasons: + +* Take into account that the IoT Platform would use that identifiers (or strings derived from that identifiers) in + places where such characters are not allowed. For example, some persistence backends are based in databases which + doesn't accept whitespaces or non-ASCII characters in table databases. +* IDs may appear as part of URLs (e.g. the URL identifying an entity at [Context Broker](../context_broker.md) and + using non-ASCII characters in that places makes these URL more complex. + +Sometimes you may think you need to use ids with whitespaces and non-ASCII characters to render that information +correctly, e.g. in a graphic-user interface. For instance, you have an entity that you want to show as +"Row 12/Seat B" with an attribute "Occupation status" in your application and you may think that +modeling in the following way is a good idea: + + + { + "type": "Seat", + "isPattern": "false", + "id": "Row 12/Seat B", + "attributes": [ + { + "name": "Occupation status", + "type": "String", + "value": "occupied" + }, + ... + } + ] + } + +However, it is not a good idea. If you need descriptive texts for entities or attributes, then use specific +attributes and metadata for them respectively, which values are not ids and doesn't have any of the problems +described above. Taking that into account, you could model in the following for the example above: + + { + "type": "Seat", + "isPattern": "false", + "id": "Row12SeatB", + "attributes": [ + { + "name": "description", + "type": "String", + "value": "Row 12/Seat B" + }, + { + "name": "status", + "type": "String", + "value": "occupied", + "metadata": [ + { + "name": "description", + "type": "String", + "value": "Occupation status" + } + ] + }, + ... + } + ] + } + +As a general guideline, you should use identifiers with the following properties: + +* They **must** be unique: It's better to have globally unique IDs if that's possible, but, for the cases + where they aren't, they should be at least unique at the service level. It's also important to design + the process of ID assignment so that the probability of generating an ID collision is as lower as + possible (i.e.: it's better to have a 16 bytes hexadecimal UUID than an 8bit integer). + +* They **should** never change (or do it under extraordinary circumstances): the ID uniquely identifies + your entities, and not only the Context Broker, but potentially multiple other systems may use it + to identify objects associated to it (e.g. this specially affects the persistence backends). That + turns any change in the ID into a potential migration of data in multiple systems, with it associated + (usually very large) costs. + +* They **should not** be tied to the data: as that bound would make it easier to brake any of the + two first rules. Even if you are completely sure that identifying your users with their Driver Licenses + is unique and immutable, chances are that the Government choose to change it; use a UUID instead. + That will ensure uniqueness and, since the UUID only belongs to the system, you will be the one + who decides when and how it may change (if it is allowed to do it at all). However, note that we + are not using UUID in this documentation for didactic reasons but in real usage use case + you should consider this recommendation. + +* (*) They **should not** use the underscore (`_`) character: although accepted by context broker, it is a + bad idea using it as part of your IDs since the persistence backends use the underscore too for special + purposes. On the one hand, it is used as concatenator character. On the other hand, it is used as + replacement character when a character within the ID is not accepted by the persistence backends. + +* (*) They **should** avoid using uppercase letters when using PostgreSQL-based persistence backends, e.g. + CKAN (or Carto in the future): they usually convert uppercase letters into lowercase. This means IDs + such as `Car` and `car` will be different at context broker level, but the same at persistence backend + level. Of course, if you are not considering using PostgreSQL-based persistence backends, ignore this + advice. + +The above consideration applies to entity ids and attribute names but also to other pieces of context +information which take the role of an ID, in particular to entity types, attribute types, metadata names and +metadata types. + +(*) This guideline won't make sense once the new encoding is enabled in the IoT Platform. Such a new +encoding uses a concatenator different than underscore and not accepted characters (including uppercase letters) are encoded following Unicode format. + ## The IoT Platform is centered in context The central piece of the IoT Platform is the Context Broker: a component that lets you store and query information