Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🧹 use 1 global resource schema instead of per runtime #3326

Merged
merged 16 commits into from
Feb 19, 2024

Conversation

imilchev
Copy link
Member

Use 1 schema for the coordinator instead of creating copies for every runtime we start. That should improve our memory footprint significantly for large scans.

Scanning a cluster with 366 assets with cnspec 10.3.0:
telegram-cloud-photo-size-4-5940308416457195425-y

Scanning the same cluster with the changes from this branch:
telegram-cloud-photo-size-4-5940308416457195424-y

this builds on top of #3308

Copy link
Contributor

github-actions bot commented Feb 15, 2024

Test Results

2 751 tests  ±0   2 750 ✅ ±0   49s ⏱️ -1s
  186 suites ±0       1 💤 ±0 
    5 files   ±0       0 ❌ ±0 

Results for commit f2fcff1. ± Comparison against base commit cf4fad5.

This pull request removes 6 and adds 6 tests. Note that renamed tests count towards both.
go.mondoo.com/cnquery/v10/llx ‑ TestRawData_JSON/0001-01-01_00:00:00_+0000_UTC
go.mondoo.com/cnquery/v10/llx ‑ TestRawData_JSON/292277026596-12-04_15:30:07_+0000_UTC
go.mondoo.com/cnquery/v10/llx ‑ TestRawData_JSON/292277026596-12-04_15:30:09_+0000_UTC
go.mondoo.com/cnquery/v10/llx ‑ TestSuccess/2024-02-14_11:56:07.030723245_+0000_UTC_m=+0.011946430
go.mondoo.com/cnquery/v10/llx ‑ TestTruthy/2024-02-14_11:56:07.030723245_+0000_UTC_m=+0.011946430
go.mondoo.com/cnquery/v10/llx ‑ TestTruthy/2024-02-14_11:56:07.030723245_+0000_UTC_m=+0.011946430#01
go.mondoo.com/cnquery/v10/llx ‑ TestRawData_JSON/0001-01-01_00:53:28_+0053_LMT
go.mondoo.com/cnquery/v10/llx ‑ TestRawData_JSON/292277026596-12-04_16:23:37_+0053_LMT
go.mondoo.com/cnquery/v10/llx ‑ TestRawData_JSON/292277026596-12-04_16:30:07_+0100_CET
go.mondoo.com/cnquery/v10/llx ‑ TestSuccess/2024-02-16_18:44:25.556046445_+0100_CET_m=+0.007813243
go.mondoo.com/cnquery/v10/llx ‑ TestTruthy/2024-02-16_18:44:25.556046445_+0100_CET_m=+0.007813243
go.mondoo.com/cnquery/v10/llx ‑ TestTruthy/2024-02-16_18:44:25.556046445_+0100_CET_m=+0.007813243#01

♻️ This comment has been updated with latest results.

@@ -243,7 +229,7 @@ func (r *Runtime) Connect(req *plugin.ConnectReq) error {
}

r.Recording().EnsureAsset(r.Provider.Connection.Asset, r.Provider.Instance.ID, r.Provider.Connection.Id, asset.Connections[0])
r.schema.prioritizeIDs(BuiltinCoreID, r.Provider.Instance.ID)
// r.schema.prioritizeIDs(BuiltinCoreID, r.Provider.Instance.ID)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we cannot really do this with a global schema. It seems like before we were prioritizing the IDs with the specific provider for the asset coming in first. I am not sure if that would actually break anything. @arlimus not sure if you recall why we needed that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this one is going to break things if we don't keep it around...

The reason is that some keywords are equally implemented in two different providers (like vulnmgmt in os and vsphere) to provide the appropriate level of functionality in either, without having to learn new keywords. It is functionally correct, but it forces us to detect the right provider depending on the current prioritization...

(Long-term this is not a problem, because all keywords will be anchored to a parent, which entirely removes the guessword, but we can't force this into v10 right now...)

There are a few ways to handle this. First we need the list of resource definitions that have the same identifier. Once we have those, we can prioritize when we look up the resource/field based on the current connection.

(a) We can introduce a new field into ResourceInfo that we can later easily remove (once everything is anchored to parents). We can't modify any part of ResourceInfo because it may be different between providers (which is ok). We could however add another field to it to chain things for now: ResourceInfo.Other which is either nil if no other provider offers it or is the next available info.

(b) The alternative is to change the schema lookup from a simple map[string]ResourceInfo into a map[string][]ResourceInfo. It requires a bit more work and rewrite.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found 2 cases around this:

  • 1 resource that can be defined in multiple providers (not extending, just having the same resource in multiple places)
  • resource extensions where the root is in 1 place and the extensions are potentially in multiple providers

To cover both cases I picked approach (a) and added Others field that stores all other definitions of the ResourceInfo. However, I had to add the same Others field to the Field struct, because fields can also exist in multiple providers. Just like the vulnmgmt field that exists in the vsphere and os providers.

I think both cases work now. I still need to clean up the code a bit and will definitely add tests for both cases. Took me quite a bit of time to understand how this should work...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saw that you added others to cover fields as well, that's awesome!

@imilchev imilchev requested a review from arlimus February 15, 2024 12:09
@imilchev imilchev force-pushed the ivan/coordinator-schema branch from 5ced290 to 0e4462f Compare February 16, 2024 14:59
@@ -40,6 +40,7 @@ message ResourceInfo {
string min_mondoo_version = 25;
string defaults = 26;
string provider = 27;
repeated ResourceInfo others = 29;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would you mind adding a comment above this and the others field below:

// This field contains references to other providers with the same resource/field.
// Note: Please do not use this field, it is only temporary and will be removed
// in the future once binding resources are mandatory for all executions.

(if this merges first no worries, we can add it later)

}

for name := range providers {
schema, err := Coordinator.LoadSchema(name)
Copy link
Member

@arlimus arlimus Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

This code will read all schemas that are installed on the system by default on startup. For regular use of scan/run/shell I had originally avoided this, so that we only load schemas we need and dynamically load additional schemas (making startup more lightweight).

Is it necessary to load all schemas for this change?
If that is the case and we require them here, would you mind adding a comment so that we change this back to dynamic loading later on (I'm sure I'll forget otherwise 🙈 ) (something like FIXME: dynamically load providers on startup).

@arlimus
Copy link
Member

arlimus commented Feb 19, 2024

Yoho @imilchev , great updates!! 😁

This is super close now, enough for me to +1 from my side.
I left two comment and marked them as nits so that this PR isn't blocked :) Feel free to get another approval to finish this in case I'm not around (I'm back Tuesday morning PST time, which is a bit late)

@imilchev imilchev force-pushed the ivan/shutdown-provider branch from cf4fad5 to 9b54c9d Compare February 19, 2024 07:48
Signed-off-by: Ivan Milchev <[email protected]>
Signed-off-by: Ivan Milchev <[email protected]>
Signed-off-by: Ivan Milchev <[email protected]>
Signed-off-by: Ivan Milchev <[email protected]>
Signed-off-by: Ivan Milchev <[email protected]>
Signed-off-by: Ivan Milchev <[email protected]>
Signed-off-by: Ivan Milchev <[email protected]>
Signed-off-by: Ivan Milchev <[email protected]>
Signed-off-by: Ivan Milchev <[email protected]>
Signed-off-by: Ivan Milchev <[email protected]>
@imilchev imilchev force-pushed the ivan/coordinator-schema branch from f2fcff1 to fadfcb2 Compare February 19, 2024 09:40
Signed-off-by: Ivan Milchev <[email protected]>
@imilchev imilchev marked this pull request as ready for review February 19, 2024 09:44
@imilchev
Copy link
Member Author

Fixed the comments. Will merge this into #3308 now

@imilchev imilchev merged commit e4b3be2 into ivan/shutdown-provider Feb 19, 2024
7 checks passed
@imilchev imilchev deleted the ivan/coordinator-schema branch February 19, 2024 10:02
@github-actions github-actions bot locked and limited conversation to collaborators Feb 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants