Skip to content

SensorWeb cloud

Fernando Jiménez Moreno edited this page Oct 11, 2016 · 23 revisions

SensorWeb cloud

Tasks breakdown and effort estimation.

SensorWeb API

Some of the tasks I can think of for this are:

  • Design and documentation of the API. So far we have only designed and documented the API for clients. We have the SensorThings API as a good base for our own public API, but we still need to define an API for all the other stuff surrounding it. This involves some deep reading and understanding of the user stories and the SensorThings API.
  • API endpoints implementation. Just implementing the barebones of the API. No handling logic.
  • SensorUp API wrapper implementation. Make our public API handling logic use SensorUp's API sandbox.

Estimation: 3 weeks.

SensorThings API test suite implementation.

There's a specification for a suite of conformance tests. It would be great to have them in place before starting the work in our own SensorThings API implementation. And it makes sense to have them after building the SensorUp sandbox wrapper, so we can test our own tests.

Estimation: 1 week.

SensorThings API implementation.

This task needs a deeper analysis of the API specification that I didn't do yet, so I can't really tell about the smaller tasks.

Estimation: 12 weeks

Users Manager.

Even if we don't use an external IdP for the MVP, we need to implement the concept of SensorWeb user. Without IdP these will be anonymous users. For each user we will be storing things like push endpoints, session tokens, the list of watched sensors, etc.

Estimation: 2 weeks without IdP. +3 weeks for IdP integration.

Notifications manager.

This requires dealing with two external APIs (Apple's and Googles's one) and implementing the logic to handle and using the different push endpoints per user.

Estimation: 3 weeks.

Sensors heartbeat monitor.

Estimation: 2 weeks.

Security.

This one is hard to estimate and even if it listed after all the other tasks, its priority should also be high. This should be a continuous effort, just like testing. Just to give a number, I would say that we can spend 5 weeks polishing all the security related stuff (TLS, client credentials, tokens, scopes, security reviews, etc.).

Estimation: 5 weeks.

Admin console.

This one depends on how much functionality we want to add here. Since this is an internal tool and there are no user stories specifically related to this, we have more flexibility here. IMHO at the very least we need a way to generate and revoke API credentials (the basics of this is already done), manage users (if we finally have IdP) and sensors. Basic authentication is also done, but we need to move to something more solid (likely FxA OAuth, with session management).

Estimation: 6 weeks.

Others

There are some other basic things that we need to do for any cloud service.

  • Backoff protocol
  • Load tests
  • Proper logging
  • Cache
  • Telemetry
  • Load balance strategy. We will be periodically receiving requests from sensor stations every X seconds. We should implement a load balance strategy to spread sensor requests in different temporal windows.

Unknowns

This still concerns me. If we want to have data privacy, I think we need something like what I suggest here. We will need to check with rfkelly how much effort would be required to implement something like that or if he can think of an alternative and simpler approach.

Service cost estimate

Requests per second (RPS)

Sensor stations requests

We will know exactly the number of active sensor stations that we have at any given time. And we will know as well the desired refresh rate for sensors observations. With these two values and assuming that we can spread the number of sensor stations requests along the ideal refresh rate as suggested bellow, we will get an almost constant number of requests per second, that can be calculated just by dividing the number of sensors between its desired refresh rate. So, for example, for 10K sensors with a refresh rate of 20 seconds, we will get ~500 requests per second.

Sensor stations will be using the SensorThings API to push their observations to the cloud. Looking at SensorUp's playground requests

POST /st-playground/proxy/v1.0/Datastreams(275878)/Observations HTTP/1.1
Host: pg-api.sensorup.com
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://pg.sensorup.com/datastream.html?datastreamId=275878
Content-Type: application/json
St-P-Session-Token: XXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXX
Content-Length: 59
Cache-Control: no-cache

{"phenomenonTime":"2016-10-11T14:04:28.685Z","result":"25"}

The above request has a header of 0.556KB and a body of 0.333KB. Apart from the PM2.5 sensor air quality readings we want to get humidity and temperature. So let's say that each observation request takes ~1KB. These requests generates a response of size ~2.62KB (0.333KB for the body and 2.29KB for the headers) that we can round to ~3KB. If we take the number of RPS described above (~500 RPS) we get:

  • Data transfer in: ~13.5GB/month
  • Data transfer out: ~40.5GB/month

These numbers are only for sensor stations pushing their observations. They don't include sensor management related requests, although this kind of requests should not increase the numbers significantly.

Users requests

TBD

Storage cost per user

TBD

AWS cost simulation

TBD

Strategies to reduce the number of requests per second

Given that we control part of the clients that will be making requests to our cloud services, we can implement some strategies to reduce the number of these requests.

Sensor stations

  • Sensor stations may implement different strategies to limit the number of requests to the SW services. For example, a sensor station may choose to batch observations during night time or if no change is found with respect to a previous observation.

  • The cloud service may also implement a protocol to balance the load by spreading the number of requests coming from sensor stations in time. Because we know the number of active sensor stations at any given time and we know the ideal refresh rate, we can create and provide different time windows for different sensor stations to make their requests. This way we will avoid traffic spikes caused by sensor stations or the worst case scenario where all the sensor stations push their observation data at a exact time. For example, let's say that we expect to have 1K active sensor stations. And we want to get observations every 20 seconds. We will be creating 1k different windows of 20 milliseconds each for sensor observations. Each sensor station will need to ask to the cloud service for one of these time windows to push its observations. With these numbers, if we follow this protocol, we will be constantly receiving ~50 sensor stations observations every second. Note that for low refresh rates (< 2 or 3 seconds) and/or low number of active sensors it may not make sense to implement something like this.

  • While sensor stations may batch requests or have a very high refresh rate, we still want to provide close to real time data. To achieve that, the cloud service could request a sensor station to send its latest queued observations by sending a push notification (via MQTT?).

Web app.

  • The SW web app can implement offline support (via Service Workers and Cache APIs) to reduce the number of requests required to the cloud service.