Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follow the principle of data minimization for events based on explicit subscriptions #295

Open
shilpa-padgaonkar opened this issue Sep 5, 2024 · 12 comments
Assignees
Labels
correction correction in documentation Spring25 subscriptions

Comments

@shilpa-padgaonkar
Copy link
Collaborator

Problem description
Currently, the basic event data included in explicit subscriptions (for eg. in device-roaming-status-subscriptions.yaml
) includes the device info. Do we need to send back the device info? Can we not restrict the event information to what it is meant to deliver for e.g. in this case roaming statuses? This would be an approach compliant with the principle of data minimization/privacy-by-design.

Expected behavior
The event includes only the data that is "really" needed and is stripped off the default/basic info.

Alternative solution

Additional context
The device-roaming-status-subscriptions is just used in this issue as an example, but this is meant for all APIs that offer events based on explicit subscriptions.

@shilpa-padgaonkar shilpa-padgaonkar added the correction correction in documentation label Sep 5, 2024
@hdamker
Copy link
Collaborator

hdamker commented Sep 5, 2024

+1 ... there is also the question which device parameters are being sent back here ... all provided properties within the subscription request (potentially not all of them where used or validated) or only the one used to identify the device?

This issue goes also beyond subscription, also in regular API responses (e.g. getSession in quality-on-demand) we have the same topic. As an (interim?) step, we have added the following note in the sessionInfo definition for v0.11.0 (last version within the release PR):

Note that the device object is defined as optional and will only to be returned if provided in createSession. If more than one type of device identifier was provided, only one identifier will be returned (at implementation choice and with the original value provided in createSession)

@shilpa-padgaonkar
Copy link
Collaborator Author

These 2 issues #150 and #151 might become obsolete if we proceed with the above approach.

@eric-murray
Copy link
Collaborator

I think including device info in the response should always be an option, but we need clearer guidelines as to when such information should be included and when it should be omitted.

For example:

  • If a 3-legged access token is used, no device info should be included in the response
  • If a 2-legged access token is used, a single device identifier should always be included in the response
    • If the API caller provided multiple device identifiers, only one of these identifiers should be in the response (an implementation choice)
    • The device info returned should match a value provided by the API caller (perhaps reformatted to the implementation's preferred format)

The reason for the 2-legged rule would be because the API caller can provide multiple device identifiers (which may or may not identify the same device). To avoid information leaks (e.g. confirming whether or not a given IP address is currently allocated to a given MSISDN), the implementation should just "pick one" and return that, as the API caller needs to know which one is being used by the implementation.

Possibly we could say that, if only one device identifier is provided, then that does not need to be included in the response, but I think it cleaner to always include the device identifier that is being used, even if the implementation had no choice.

The 3-legged rule above is related to the (unresolved) discussion #259, where I currently think a 3-legged token and explicit device identifiers should always result in an error (to avoid the API caller verifying whether or not an explicit device identifier matches the token). If that rule was adopted, then the API caller will not have been able to provide explicit device info in addition to the token, and hence there would be no device info to return.

@bigludo7
Copy link
Collaborator

bigludo7 commented Oct 1, 2024

In order to move forward on this one and #300 - I pushed Eric proposal ont step further with behavior for each operation:

For POST Subscription

  • if 3-legged access token is used,
    • presence of device in the request should be discussed in API misuse #259
    • no device info must be included in the response
  • if 2-legged access token is used,
    • presence of a device identifier is mandatory. If several are provided, as of now, we have the 422 error here. Reading @eric-murray comment I understood we can have a different approach in future (sending back an error if several identifiers are sent whatever if they target same device or not).
    • presence of a device identifier in the response is mandatory and should the same identifier type than the one provided in the request.

For GET/subscriptions/{id} & DELETE/subscriptions/{id}

  • if 3-legged access token is used, server checks if the subscription is associated with device identifier from the token - if not --> 404. In the response for the GET, the device info must not be filled.
  • if 2-legged access token is used we check that the subscription creation was performed with same client id than the GET/DELETE request. In the response for the GET, the device info must be in the response (we use the one provided in the POST body or token)

For GET/subscriptions/

  • if 3-legged access token is used, server send back the list of subscriptions for the device identifier associated with the token. In the response the device identifier must no be filled.
  • if 2-legged access token is used we send the subscription list performed with same client id than the GET/DELETE request. In the response the device identifier must be valued.

To be completed
In the notification itself (sends to the sink), the device info presence is discussed at API group level depending on the event type (as far as I know when the event is for a device, its identifier must be provided in the event).

@eric-murray
Copy link
Collaborator

eric-murray commented Oct 2, 2024

@bigludo7
When a 2-legged access token is used, and the API consumer provides multiple device identifiers in the device object, we want to avoid the situation where the API provider confirms whether or not these device identifiers all match. This is to avoid that API consumers use the API to confirm whether or not, for example, a given IP / port is currently allocated to a given MSISDN.

But multiple device identifiers are allowed in the device object for a good reason - the API consumer may not know which device identifiers are supported by the API provider. Much as we would like all device identifiers to be supported by all API providers, I doubt this will ever be the case. So by providing multiple device identifiers, there is more chance that the API provider can fulfil the API request.

There are three scenarios to consider when multiple device identifiers are provided in the device object:

  • all identifiers identify the same end user
  • the identifiers identify two or more different end users
  • the API provider does not know if the identifiers match or not (e.g. if they cannot support identifying the end user from ip / port)

My preferred "universal" solution for all these scenarios is simply - where multiple device identifiers are provided in the device object, the API provider can just pick one and return that device identifier in the response. Hence the response does not confirm that all device identifiers match, but does confirm which is being used.

The alternative is to only allow a single device identifier to be included in the request device object. The API provider would simply return an error if it could not support that identifier. I think API consumers might find this frustrating, but this would also work. And, in this case, there is no need to include the device identifier in the API response.

@bigludo7
Copy link
Collaborator

bigludo7 commented Oct 7, 2024

Thanks a lot @eric-murray for the explanation

Talking on this with @patrice-conil, from Orange we have a slight preference for the second option because it seems to use more straightforward for the consumer to send back an error (422 with code). The first option work but for us need some additional understanding from the consumer.

As I plan to update the commonalities part does this option 2 is a blocker for you - looking also for the view from the 'usual suspects' on this topic: @shilpa-padgaonkar and @PedroDiez

@shilpa-padgaonkar
Copy link
Collaborator Author

shilpa-padgaonkar commented Oct 16, 2024

@bigludo7 and @eric-murray I would prefer the second option as well.

I see the focus of this discussion has been mainly focussed on responses to POST/GET/DELETE Subscription .
I would like to consider the event detail structure as a part of this discussion too.
For eg. Is the inclusion of device object required in the RoamingStatus or RoamingChangeCountry schemas when we already include the subscriptionId?

@bigludo7
Copy link
Collaborator

@shilpa-padgaonkar

I would like to consider the event detail structure as a part of this discussion too.
For eg. Is the inclusion of device object required in the RoamingStatus or RoamingChangeCountry schemas when we already include the subscriptionId?

Yes this is a fair point to discuss.

About specifically providing the device identifier in the event:

  • Avoid to many call (agree that the notification features the subscription id and via a GET /subscriptions/id the device identifier could be retrieved) but this one additional call,
  • Provide flexibility because we can imagine in future to manage multi-device subscriptions

Pending question is about which identifier provide when the subscription has been created with an 3-legs access token and no identifiers present in the body of the request. I guess in this case we could recommend to use the identifier used in the login_hint ?

Of course this point is relevant only for device-based notification. I'm wondering if we could provide specific guidelines at commonalities level for the event detail structure but added a line that "Follow the principle of data minimization must be considered by the WG during event structure design" is a must.

@bigludo7
Copy link
Collaborator

@shilpa-padgaonkar I've raised a discussion for Geofencing event here: camaraproject/DeviceLocation#270

@PedroDiez
Copy link
Collaborator

PedroDiez commented Oct 24, 2024

Sorry for being late here:

Regarding Subscriptions:

3-legged. Aligned with you all

2-legged. One doubt about the model. Regardless the two options commented by Eric, I assume that the device object is always defined/designed with the different identifier options (IpAddress, phoneNumber,...) and the key point is whether to allow only one identifier or more than 1 in the suscription request.

Option 1: "My preferred "universal" solution for all these scenarios is simply - where multiple device identifiers are provided in the device object, the API provider can just pick one and return that device identifier in the response. Hence the response does not confirm that all device identifiers match, but does confirm which is being used."

Has the "PROS" of providing flexibility to API Client, and the "CONS" of checking consistency between the identifiers provided

Oprion 2: "The alternative is to only allow a single device identifier to be included in the request device object. The API provider would simply return an error if it could not support that identifier. I think API consumers might find this frustrating, but this would also work. And, in this case, there is no need to include the device identifier in the API response."

Has the "PROS" of simplicity and the "CONS" that may happen interoperability topics among Telco Operators

Not an easy question. Let me think a bit about it and provide feedback

Regarding event-notification:

Our initial view is always return Device information regardless the suscription is created 2-legged or 3-legged.
In order to do things easy to API Consumers, so they can directly process the semantic information contained by the event-notification

@eric-murray
Copy link
Collaborator

@camaraproject/commonalities_maintainers @PedroDiez @hdamker @bigludo7
I'd like to try and progress this issue.

As I see it, we have two options when a 2-legged access token and explicit device object are used:

  • Option 1: Allow the API consumer to specify multiple device identifiers in the device object, and all APIs that accept device as a device identifier should include in their response which one of the possible identifiers was used; or
  • Option 2: Restrict the device object to a single identifier at a time from the available options. If the API consumer knows multiple identifiers, they would need to specify them one by one until they find one that works. This will increase latency.

My preference is Option 1, as this is more flexible for the API consumer. The risk with Option 2 is that an application happily uses one of the device identifiers without issue only to find one day that the application fails because the end user is a customer of an API provider that does not support that identifier and the developer did not implement retrying with an alternative identifier.

However, I accept that this is also a risk with Option 1, as the application may still choose to only provide a single device identifier, even if they know more. Either way, better guidance for developers on the use of the device object is required. If latency is an issue, likely the phoneNumber only option can be used as the device identifier, when there is then only one option.

So I can live with either option. I'd propose to resolve this with a simple "vote" in these comments.

@bigludo7
Copy link
Collaborator

bigludo7 commented Dec 6, 2024

Thanks @eric-murray to reinit the conversation.

I've still preference for option 2 but can live without any problem with either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
correction correction in documentation Spring25 subscriptions
Projects
None yet
Development

No branches or pull requests

6 participants