From 128db8c65a8c8a55e99d6fef0946380209133b1f Mon Sep 17 00:00:00 2001
From: Fermin Galan Marquez <fermin.galanmarquez@telefonica.com>
Date: Tue, 29 Nov 2016 13:39:53 +0100
Subject: [PATCH 1/3] ADD section to moleling tips about right id modeling

---
 docs/topics/modelling_considerations.md | 72 +++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/docs/topics/modelling_considerations.md b/docs/topics/modelling_considerations.md
index cfa9702..addbecc 100644
--- a/docs/topics/modelling_considerations.md
+++ b/docs/topics/modelling_considerations.md
@@ -3,6 +3,78 @@
 There are some considerations to make when designing how to model a project in the platform. This document will give you
 some hints to avoid most common mistakes (use them as hints to guide your modelling, not as strict rules).
 
+## Modeling your IDs in the right way
+
+Entity ids and attribute names should be like *real* IDs. In other words, using whitespaces, accents or
+any other funny weird character in ID strings is a really bad idea. In fact, although that is allowed in the NGSIv1
+API (due to legacy reasons), it is forbidden in the NGSIv2 version of the API (check the "Field syntax restrictions"
+section in the [NGSIv2 specification document](http://telefonicaid.github.io/fiware-orion/api/v2/stable) for details).
+
+Why this is a bad idea? There are several reasons:
+
+* Take into account that the IoT Platform would use that identifiers (or strings derived from that identifiers) in
+  places where such characters are not allowed. For example, some persistence backends are based in databases which 
+  doesn't accept whitespaces or non-ASCII characters in table databases.
+* IDs may appear as part of URLs (e.g. the URL identifying an entity at [Context Broker](../context_broker.md) and
+  using non-ASCII characters in that places makes these URL more complex.
+
+Sometimes you may think you need to use ids with whitespaces and non-ASCII characters to render that information
+correctly, e.g. in a graphic-user interface. For instance, you have an entity that you want to show as
+"Row 12/Seat B" with an attribute "Occupation status" in your application and you may think that 
+modeling in the following way is a good idea:
+
+
+      {
+         "type": "Seat",
+         "isPattern": "false",
+         "id": "Row 12/Seat B",
+         "attributes": [
+           {
+             "name": "Occupation status",
+             "type": "String",
+             "value": "occupied"
+           },
+           ...
+           }
+         ]
+      }
+ 
+However, it is not a good idea. If you need descriptive texts for entities or attributes, then use specific 
+attributes and metadata for them respectively, which values are not ids and doesn't have any of the problems 
+described above. Taking that into account, you could model in the following for the example above:
+
+      {
+         "type": "Seat",
+         "isPattern": "false",
+         "id": "Row12SeatB",
+         "attributes": [
+           {
+             "name": "description",
+             "type": "String",
+             "value": "Row 12/Seat B"
+           },
+           {
+             "name": "status",
+             "type": "String",
+             "value": "occupied",
+             "metadata": [
+               {
+                 "name": "description",
+                 "type": "String",
+                 "value": "Occupation status"
+               }
+             ]
+           },
+           ...
+           }
+         ]
+      }
+
+The above consideration applies to entity ids and attribute names but also to other pieces of context 
+information which take the role of an ID, in particular to entity types, attribute types, metadata names and 
+metadata types. 
+
+ 
 ## The IoT Platform is centered in context
 
 The central piece of the IoT Platform is the Context Broker: a component that lets you store and query information

From 9fe3e5dbe29590183cf12d7d4d543716130df083 Mon Sep 17 00:00:00 2001
From: Fermin Galan Marquez <fermin.galanmarquez@telefonica.com>
Date: Tue, 29 Nov 2016 16:56:35 +0100
Subject: [PATCH 2/3] ADD section to moleling tips about right id modeling

---
 docs/topics/modelling_considerations.md | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/docs/topics/modelling_considerations.md b/docs/topics/modelling_considerations.md
index addbecc..dbebccb 100644
--- a/docs/topics/modelling_considerations.md
+++ b/docs/topics/modelling_considerations.md
@@ -70,11 +70,30 @@ described above. Taking that into account, you could model in the following for
          ]
       }
 
+As a general guideline, you should use identifiers with the following properties:
+
+* They must be unique: It's better to have globally unique IDs if that's possible, but, for the cases 
+  where they aren't, they should be at least unique at the service level. It's also important to design 
+  the process of ID assignment so that the probability of generating an ID collision is as lower as 
+  possible (i.e.: it's better to have a 16 bytes hexadecimal UUID than an 8bit integer).
+
+* They should never change (or do it under extraordinary circumstances): the ID uniquely identifies 
+  your entities, and not only the Context Broker, but potentially multiple other systems may use it 
+  to identify objects associated to it. That turns any change in the ID into a potential migration 
+  of data in multiple systems, with it associated (usually very large) costs.
+
+* Shouldn't be tied to the data: as that bound would make it easier to brake any of the two first 
+  rules. Even if you are completely sure that identifying your users with their Driver Licenses 
+  is unique and immutable, chances are that the Government choose to change it; use a UUID instead. 
+  That will ensure uniqueness and, since the UUID only belongs to the system, you will be the one 
+  who decides when and how it may change (if it is allowed to do it at all). However, note that we 
+  are not using UUID in this documentation for didactic reasons but in real usage use case 
+  you should consider this recommendation.
+
 The above consideration applies to entity ids and attribute names but also to other pieces of context 
 information which take the role of an ID, in particular to entity types, attribute types, metadata names and 
 metadata types. 
 
- 
 ## The IoT Platform is centered in context
 
 The central piece of the IoT Platform is the Context Broker: a component that lets you store and query information

From d0389f7800304e005f855007500fad757f9ad313 Mon Sep 17 00:00:00 2001
From: frbattid <francisco.romerobueno@telefonica.com>
Date: Wed, 30 Nov 2016 11:18:17 +0100
Subject: [PATCH 3/3] [modelling_considerations.md] Add more guidelines for
 entity IDs definition

---
 docs/topics/modelling_considerations.md | 29 +++++++++++++++++++------
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/docs/topics/modelling_considerations.md b/docs/topics/modelling_considerations.md
index dbebccb..e327550 100644
--- a/docs/topics/modelling_considerations.md
+++ b/docs/topics/modelling_considerations.md
@@ -72,27 +72,42 @@ described above. Taking that into account, you could model in the following for
 
 As a general guideline, you should use identifiers with the following properties:
 
-* They must be unique: It's better to have globally unique IDs if that's possible, but, for the cases 
+* They **must** be unique: It's better to have globally unique IDs if that's possible, but, for the cases 
   where they aren't, they should be at least unique at the service level. It's also important to design 
   the process of ID assignment so that the probability of generating an ID collision is as lower as 
   possible (i.e.: it's better to have a 16 bytes hexadecimal UUID than an 8bit integer).
 
-* They should never change (or do it under extraordinary circumstances): the ID uniquely identifies 
+* They **should** never change (or do it under extraordinary circumstances): the ID uniquely identifies 
   your entities, and not only the Context Broker, but potentially multiple other systems may use it 
-  to identify objects associated to it. That turns any change in the ID into a potential migration 
-  of data in multiple systems, with it associated (usually very large) costs.
+  to identify objects associated to it (e.g. this specially affects the persistence backends). That
+  turns any change in the ID into a potential migration of data in multiple systems, with it associated
+  (usually very large) costs.
 
-* Shouldn't be tied to the data: as that bound would make it easier to brake any of the two first 
-  rules. Even if you are completely sure that identifying your users with their Driver Licenses 
+* They **should not** be tied to the data: as that bound would make it easier to brake any of the
+  two first rules. Even if you are completely sure that identifying your users with their Driver Licenses 
   is unique and immutable, chances are that the Government choose to change it; use a UUID instead. 
   That will ensure uniqueness and, since the UUID only belongs to the system, you will be the one 
   who decides when and how it may change (if it is allowed to do it at all). However, note that we 
   are not using UUID in this documentation for didactic reasons but in real usage use case 
   you should consider this recommendation.
+  
+* (*) They **should not** use the underscore (`_`) character: although accepted by context broker, it is a
+  bad idea using it as part of your IDs since the persistence backends use the underscore too for special
+  purposes. On the one hand, it is used as concatenator character. On the other hand, it is used as
+  replacement character when a character within the ID is not accepted by the persistence backends.
+  
+* (*) They **should** avoid using uppercase letters when using PostgreSQL-based persistence backends, e.g.
+  CKAN (or Carto in the future): they usually convert uppercase letters into lowercase. This means IDs
+  such as `Car` and `car` will be different at context broker level, but the same at persistence backend
+  level. Of course, if you are not considering using PostgreSQL-based persistence backends, ignore this
+  advice.
 
 The above consideration applies to entity ids and attribute names but also to other pieces of context 
 information which take the role of an ID, in particular to entity types, attribute types, metadata names and 
-metadata types. 
+metadata types.
+
+(*) This guideline won't make sense once the new encoding is enabled in the IoT Platform. Such a new
+encoding uses a concatenator different than underscore and not accepted characters (including uppercase letters) are encoded following Unicode format.
 
 ## The IoT Platform is centered in context