Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache Mongo-DB calls (in-memory and/or Redis) #926

Draft
wants to merge 29 commits into
base: master
Choose a base branch
from

Conversation

jason-fox
Copy link
Contributor

@jason-fox jason-fox commented Oct 26, 2020

Mongo-DB access is slow. This PR minimizes the need to make database calls, by caching the last 1000 device calls and 100 group calls in a refreshable cache. This in turn reduces network traffic and increases maximum throughput.

Multiple Caching strategies are accepted - either memCache, Redis or both. All parameters are settable as config or Docker ENV variables.

Adding dontCache=true as part of any Device or Group configuration will ensure that the data is never cached and therefore always retrieved from the MongoDB database.

Note that because all cacheable queries are potentially received from multiple databases it follows that they must be retrieved as JSON Objects not Mongoose Documents. Therefore the lean option has been enabled on the relevant read queries.

As the mongoose docs state:

Lean is great for high-performance, read-only cases,

so there should be a significant improvement even when not using a cache as well.

@AlvaroVega
Copy link
Member

Did you test it with iotagents in HA environments? Is there a really improvement there?

@jason-fox
Copy link
Contributor Author

jason-fox commented Oct 27, 2020

Last time I checked stress testing throughput for memory was noticeably higher that mongodb - this just piggy backs off that result.

The degree of improvement will depend on how you set up mongo-db - I guess a sharded-environment with many read-only replicas would alleviate the situation somewhat, but why rely on end-users being knowledgeable enough to architect their way out of a problem (Hint: plenty of people seem to use my tutorial docker-compose files as the basis of their production architecture 😱 ). Of course you could always pay for scaling of more IoT Agents as well.

I agree this needs proper benchmarking. Maybe you could stress test yourself or share a common HA benchmark configuration?

@mrutid
Copy link
Member

mrutid commented Oct 27, 2020

Cache policies do have an impact in HA scenarios. We always deploy IoTAgents in HA (Active-Active) clusters. Any provided solution must be fully tested and studied in HA environments.

It's not a matter of benchmarking, it's a matter of accessibility and consistency (ACID). Take into account that full accessibility and consistency is expected among the Agent-Cluster (so if you deploy a device, the device must available and updated through the whole cluster).

const Device = require('../../model/Device');
const async = require('async');
const cacheManager = require('cache-manager');
const memoryCache = cacheManager.caching({store: 'memory', max: 1000, ttl: 10/*seconds*/});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic numbers. MUST be config.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, I think the whole possibilty of not using cache (e.g. something like cache: true|false in configuration) should be provided.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 9471f42 and subsequent pushes.

@jason-fox
Copy link
Contributor Author

jason-fox commented Oct 27, 2020

If you deploy a device, the device must available and updated through the whole cluster

IoT Agent A

  • When you first deploy a device in IoT Agent A - it will read from the DB.
  • If you send a measure a device in IoT Agent A - it will retrieve from the local cache.
  • If no cache hit occurs or the cache times out - it will retrieve from DB once again.

IoT Agent B

  • If you send a measure a device in IoT Agent B - it will initially read from the DB.
  • If you send a second measure from IoT Agent B - it will retrieve from the local cache.
  • If no cache hit occurs or the cache times out - it will retrieve from DB once again.
  1. If you use a single IoT Agent, this will act like a single database since there is only one source of truth.
  2. If you use a multiple IoT Agent instances, and provison once, it will act like a single database since there is only one source of truth.
  3. If you use a multiple IoT Agent instances, and provison multiple times, there is a short latency when updating/deleting. If I modify or delete a group or device then only the local cache of IoT Agent A will be busted. IoT Agent B will continue to retrieve the data from its cache until the record times out.

@jason-fox
Copy link
Contributor Author

So it depends on how you want the system to fall over. Do I want it to fail because it can't handle the throughput or do I want it to fail because it takes around 10 seconds for the data to settle?

-  Ensure memoryCache is reset/busted on each test config
- Amend test config  to use memoryCache to test additional path.
@jason-fox jason-fox requested review from mrutid and fgalan November 3, 2020 09:54
@jason-fox
Copy link
Contributor Author

@mrutid @fgalan - PR updated, no magic numbers and an off-switch added.

  • Added config elements Added Docker ENV variable equivalents.
  • Add documentation.
  • Moved set-up to separate function called from fiware-iotagent-lib.js - this means the existing tests don't need extra clean-up
  • Amended a single test to use cache to ensure test coverage.

@@ -22,7 +22,7 @@
*/
'use strict';

var iotAgentLib = require('../../../lib/fiware-iotagent-lib'),
var iotAgentLib = require('../../../lib/fiware-iotagent-lib'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var iotAgentLib = require('../../../lib/fiware-iotagent-lib'),
var iotAgentLib = require('../../../lib/fiware-iotagent-lib'),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whitespacing fixed by prettier - 3c1190c 60d7dca

@fgalan
Copy link
Member

fgalan commented Nov 5, 2020

A MongoDB cache for IOTAs can be an interesting feature. However, let me clarify that for us the only scenario that makes sense in real-world utilization scenarios is number 3 (cluster of several IOTAgent nodes sharing the same DB doing provisions along the time, not only an initial one). Numbers 1 and 2 are not realistic.

Taking this into account, the following requirements should be covered by a cache implementation as the one suggested in this PR:

  1. It MUST be allowed to enable/disable cache per configuration group at provision time. That is, a new parameter in the configuration group API would allow to specify if that configuration group (i.e. the configuration group information itself and the devices associated to the configuration group) has to be cached or not. The rationale of this is that in productive deployments, we can have clients giving more importance to speed than to consistency and clients the other way around but all them using the same IOTAs cluster. Thus, a way of setting this in client’s configuration groups is a must.
  2. It MUST NOT use any hardwired setting. These settings should be part of the configuration (with reasonable defaults) and taking into account backward compatibility. Looking to the last version of the PR (at 5d08c0e point of time) it seems this is the idea at the end. Nice!
  3. It MUST be properly documented, i.e. information about the cache policies, which one is the recommended for ach deployment type, tradeoffs using the cache (speed vs. consistency), etc. At 5d08c0e we see some bacis documentation about the configuration parameters, but we think more topics should be covered (maybe in a separte .md file).
  4. It SHOULD allow “segmentation”. That is, instead of one cache “box” for everything (so a high load in one client configuration group may starve the cache for other clients) several cache “boxes”, one per configuration group. Each configuration group which cache enabled (according to item 1) would use its isolated cache slice. Some cache parameters would be configurable by configuration group API (e.g. cache slice size) although other could be common (e.g. cache policy).

Side-note: Maybe we can provide one shared cache for groups, and several distinct one for devices. As far as in node.js RAM is a hard-limited resource. The size of the device’s cache belonging to a group should be part of the configuration (maybe in terms of how many devices will be stored).

@jason-fox jason-fox mentioned this pull request Nov 18, 2020
@chicco785
Copy link
Contributor

probably, you need something like redis, in memory cache and ha, don't work well together

@jason-fox
Copy link
Contributor Author

jason-fox commented Nov 25, 2020

probably, you need something like redis, in memory cache and ha, don't work well together

The cache mechanism has an option to connect to Redis. I'll look at this when I have time. I guess the final architecture will look a bit like this:

  • Configurable Caching Policy use:
    • None ✅
    • MemCache ✅
    • RedisCache ✅
    • Both. ✅
  • Size and Retention Limits set for MemCache ✅
  • Retention Limits plus connection config set for RedisCache ✅

It MUST be allowed to enable/disable cache per configuration group at provision time. That is, a new parameter in the configuration group API would allow to specify if that configuration group

  • I think there is a don't cache this option in the API somewhere - this could be based around a regex or maybe the payload ✅

It MUST NOT use any hardwired setting.

Updated and magic numbers removed ✅ but more config will be needed for the Redis switches✅

It MUST be properly documented

Easier to do this once the basic architecture is agreed. ✅

It SHOULD allow “segmentation”.

The config could point to separate Redis instances.✅

@jason-fox
Copy link
Contributor Author

@mrutid @fgalan - This PR is now ready for review.

@jason-fox jason-fox requested a review from fgalan December 9, 2020 11:48
@jason-fox
Copy link
Contributor Author

@mapedraza @mrutid @fgalan - Getting back after the Christmas break, is there any progress on this? Is there anything else that needs to be done from my side?

@mapedraza
Copy link
Collaborator

Hello Jason, thank you for your contribution! The pull request needs to be checked, but we still need some time to review it in deep. We will let you know when we review it.

@mapedraza
Copy link
Collaborator

Finally, I could check this PR with the team.

As you know, in a production environment, adding another component means adding more complexity to the platform operation, so the performance improvement has to be really clear. Mongo-DB is very fast in key accesses and with in-memory datasets and all three cases (Mongo-DB, Redis, and MemCache) are penalized with similar network accesses (Redis and MC we are also adding extra writes that do not exist in the base scenario). It is more complex and it is not clear to me, without tests, that it significantly improves the current scenario (with a well configured Mongo). Could you provide any figure or comparison between both scenarios?

Moreover, to have at the same time Redis and MongoDB, as they cover the same architectural place (External shared state) adds architectural redundancy. Also, Mongo-DB is needed, as some other GEs needs it with use cases that cannot be covered with Redis (i.e. geo-queries done by CB are addressable in MongoDB but not in Redis)  

When @mrutid was talking about consistency, was enough to make it configurable for each config group. Each scenario may require or not the cache, with and specific cache policy (for instance, cache timeout), and also @fgalan also mentioned, segmentation, allowing each provision group to have a different cache slice isolated to prevent a single user monopolize all the system cache

A good approach to have in mind using an in-memory cache, which is a real difference compared to Redis, Mongo-DB, and MemCache, is Orion Subscription Cache. The difference regarding Orion cache is that instead of having the same policy for all (as Orion does) in IOTAs we should have specific policies for each config group.

@jason-fox
Copy link
Contributor Author

jason-fox commented Feb 19, 2021

Two points -

  1. Adding the cache elements (MemCache, Redis or both) to an architecture is entirely optional - you can continue to use a "cacheless" system as before.
  2. Even without a cache, there is a performance change in the PR as the MongoDB code has been altered to use lean() on each request - this should have a significant benefit since it avoids expanding every single MongoDB document when all you need is the JSON.

The Redis cache is just doing the same job it already does when used with QuantumLeap for example (see #926 (comment)). A cache is optional there too. The reason for picking Redis is that cache-manager supports it - I guess someone could look at adding a custom Orion cache support if they wanted.

@jason-fox
Copy link
Contributor Author

jason-fox commented Feb 19, 2021

Relevant StackOverflow discussion of architectural differences between Redis and Mongo: https://stackoverflow.com/questions/5252577/how-much-faster-is-redis-than-mongodb . Summary is Redis should be faster on Read provided that the data lies on a single machine. The data held in a cache shouldn't be too large for that role.

@jason-fox
Copy link
Contributor Author

After discussions with @mapedraza I'm going to split this into smaller chunks for easier review and consumption.

@jason-fox jason-fox changed the title Cache Mongo-DB calls. Cache Mongo-DB calls (in-memory and/or Redis) Mar 2, 2021
Assume that caching is not used by default and then additively use caching. This is necessary for backwards compatibility and tenant slicing.
@jason-fox jason-fox marked this pull request as draft April 14, 2021 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants