-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PM-14360] Import Batching #703
base: main
Are you sure you want to change the base?
Conversation
New Issues (1)Checkmarx found the following issues in this Pull Request
Fixed Issues (2)Great job! The following issues were fixed in this Pull Request
|
@eliykat Whenever you're back and able to give this first pass a look over, I just had a few questions:
I also still have a few todo's:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, here are a few comments but I'll take a closer look once you've finished your draft.
To answer your questions:
it seems like jest doesnt like ECMAScript Modules (which are used in the googleapis) and plugging it into the existing transformer in the jest config didn't seem to fix it. Not sure how to handle the resulting error.
Looking at the jest documentation, it doesn't have great support for ECMAScript. I think the easiest solution is to decouple the GsuiteDirectoryService
from SyncService
, thus avoiding importing the google apis into jest at all. e.g. move getDirectoryService
to its own class with its own abstraction, and have SyncService
only inject the abstraction.
This will become a problem again if/when we get a Gsuite testbed set up, but I think that solves it for now, and is a better design.
stripe has a cap on how large a transaction can be (999,999.99 USD) so there's a soft limit to how many users you can import and provision seats for at one time. Have we run into this before?
We have not run into this before, but spending $1 million USD in a single provisioning sync seems unlikely. I think we can leave it unhandled.
in terms of batching groups, do we want to batch based on the number of groups, or the number of users within the groups?
The goal here is to avoid timeouts at the network edge, so there isn't any hard and fast rule as to how we should split our requests. What do you think?
I can see an argument for splitting based on the number of users in each group - that's the property that scales as you add more users. But my assumption was that these timeouts are actually fairly generous (this request data is very small and this is the first time I've heard of requests timing out), so the simplest approach of batching by n users and n groups would be enough to resolve the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both strategies should implement this interface. For example, a RequestBuilder
interface which has BatchRequestBuilder
and SingleRequestBuilder
implementations for the new and existing behaviour respectively.
src/services/sync.service.ts
Outdated
return [ | ||
new OrganizationImportRequest({ | ||
groups: (groups ?? []).map((g) => { | ||
return { | ||
name: g.name, | ||
externalId: g.externalId, | ||
memberExternalIds: Array.from(g.userMemberExternalIds), | ||
}; | ||
}), | ||
users: (users ?? []).map((u) => { | ||
return { | ||
email: u.email, | ||
externalId: u.externalId, | ||
deleted: u.deleted || (removeDisabled && u.disabled), | ||
}; | ||
}), | ||
overwriteExisting: overwriteExisting, | ||
largeImport: largeImport, | ||
}), | ||
]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following on from my comment above - this would be moved into the SingleRequestBuilder
implementation.
src/services/sync.service.ts
Outdated
@@ -133,6 +116,36 @@ export class SyncService { | |||
} | |||
} | |||
|
|||
async generateNewHash(reqJson: string): Promise<string> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is doing too much - it's called generateNewHash
, but it's both generating a new hash and comparing it to the existing hash from state. And its return value is not very clear - a new hash if it's different, or an empty string if it's not.
I suggest that generateNewHash
only generates a hash (generateHash
really), and the comparison is done elsewhere (either a separate method or inline).
openldap/ldifs/directory-11000.ldif
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This directory is used for integration tests. Because your new service is more or less functional - i.e. it's just processing a set of inputs to a set of outputs - I'm not sure we need dedicated integration tests on it.
If you disagree and want to integration test it, I don't think we should maintain a set of 11k users and 11k groups. Ideally you'd specify a lower batchSize and use a more maintanable set of test data (e.g. 250 users with a batchSize of 45). I recall we discussed trying to replicate real world conditions as much as possible, but this is a lot of fixture data to maintain forever. And would be comparatively slow to run in the pipeline.
expect(requests.length).toEqual( | ||
Math.ceil(mockGroups.length / batchingService.batchSize) + | ||
Math.ceil(mockUsers.length / batchingService.batchSize), | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I almost always prefer constants in assertions, otherwise you risk rewriting the logic of the SUT.
For example - you know that you've generated 11k users and you have batch sizes of 2k, so you know you're going to have 6 user requests. Assert that your request has a length of 6. (example only, obviously you have groups as well.)
I think it would help with this to specify the batchSize
in the constructor of the service. That way, the test can specify a fixed batch size and base its assertions on that, without affecting the batch size used by the production code.
userSimulator = (userCount: number) => { | ||
const simulatedArray: UserEntry[] = []; | ||
for (let i = 0; i < userCount; i += batchingService.batchSize) { | ||
for (let j = 0; j <= batchingService.batchSize; j++) { | ||
simulatedArray.push(new UserEntry()); | ||
} | ||
} | ||
return simulatedArray; | ||
}; | ||
|
||
groupSimulator = (groupCount: number) => { | ||
const simulatedArray: GroupEntry[] = []; | ||
for (let i = 0; i < groupCount; i += batchingService.batchSize) { | ||
for (let j = 0; j <= batchingService.batchSize; j++) { | ||
simulatedArray.push(new GroupEntry()); | ||
} | ||
} | ||
return simulatedArray; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this needs to know about batchSize - it should just generate the number of mock objects requested. Particularly because this is generating mock input data, which is usually supplied by a directory service which knows nothing about batching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good to me! Comments are all fairly minor.
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## main #703 +/- ##
========================================
+ Coverage 2.23% 6.44% +4.20%
========================================
Files 60 65 +5
Lines 2634 2621 -13
Branches 467 472 +5
========================================
+ Hits 59 169 +110
+ Misses 2572 2436 -136
- Partials 3 16 +13 ☔ View full report in Codecov by Sentry. |
@@ -26,6 +26,12 @@ import { StateService } from "@/jslib/common/src/services/state.service"; | |||
import { StateMigrationService } from "@/jslib/common/src/services/stateMigration.service"; | |||
import { TokenService } from "@/jslib/common/src/services/token.service"; | |||
|
|||
import { DirectoryFactoryService } from "@/src/abstractions/directory-factory.service"; | |||
import { StateService as StateServiceExtended } from "@/src/abstractions/state.service"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this naming... open to suggestions. Alternatively we could rename the existing state service since we have 2 (one in jslib and one in src)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The src StateService extends the jslib StateService, so you can just use the extended class everywhere and it'll satisfy the type constraints.
However the better answer is that you don't need to add to jslib-services.module.ts
. Like the rest of the jslib folder, it's essentially deprecated (but I acknowledge this isn't documented anywhere). Register your new dependencies in src/app/services/services.module.ts
. That file only references the src StateService.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also a test failure here that I can't explain. Is it actually trying to use apiService somewhere? https://github.com/bitwarden/directory-connector/runs/36020428086#r0s26
@@ -26,6 +26,12 @@ import { StateService } from "@/jslib/common/src/services/state.service"; | |||
import { StateMigrationService } from "@/jslib/common/src/services/stateMigration.service"; | |||
import { TokenService } from "@/jslib/common/src/services/token.service"; | |||
|
|||
import { DirectoryFactoryService } from "@/src/abstractions/directory-factory.service"; | |||
import { StateService as StateServiceExtended } from "@/src/abstractions/state.service"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The src StateService extends the jslib StateService, so you can just use the extended class everywhere and it'll satisfy the type constraints.
However the better answer is that you don't need to add to jslib-services.module.ts
. Like the rest of the jslib folder, it's essentially deprecated (but I acknowledge this isn't documented anywhere). Register your new dependencies in src/app/services/services.module.ts
. That file only references the src StateService.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be more explicitly marked as test data. e.g. src/utils/test-fixtures.ts
.
expect(apiService.postPublicImportDirectory).toHaveBeenCalledWith({ | ||
members: [], | ||
groups: [], | ||
overwriteExisting: true, | ||
largeImport: true, | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should pass - we shouldn't be sending empty request objects to the server.
I expected this would be something like:
expect(apiService.postPublicImportDirectory).toHaveBeenCalledWith(mockRequests[0]);
expect(apiService.postPublicImportDirectory).toHaveBeenCalledWith(mockRequests[1]);
expect(apiService.postPublicImportDirectory).toHaveBeenCalledWith(mockRequests[2]);
// etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a big deal, but I could've been more explicit re: naming.
If you had only 1 request builder class, you might do something like:
DefaultRequestBuilder extends RequestBuilder
This is a most classes where you only have a single implementation.
However, here we do have multiple implementations and we already have a way to distinguish them:
SingleRequestBuilder extends RequestBuilder
BatchRequestBuilder extends RequestBuilder
// there is no "default" implementation, you have to choose one or the other
import { BatchRequestBuilder } from "./batch-requests.service"; | ||
import { SingleRequestBuilder } from "./single-request.service"; | ||
import { DefaultBatchRequestBuilder } from "./default-batch-request-builder"; | ||
import { DefaultSingleRequestBuilder } from "./default-single-request-builder"; | ||
|
||
describe("BatchingService", () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
describe
block is still using the old name
🎟️ Tracking
https://bitwarden.atlassian.net/browse/PM-14360
📔 Objective
This PR implements services that allow the directory connector to batch large import requests into multiple small requests to avoid potential timeouts due to packet size exceeding default size limits for some servers. This implementation splits the requests into batches of 2000 users/groups and POSTs them sequentially.
This change also refactors some of the sync service to be more testable and removes some directory related code that does not belong. Also included are some new unit tests to the new functionality and the sync service.
📸 Screenshots
⏰ Reminders before review
🦮 Reviewer guidelines
:+1:
) or similar for great changes:memo:
) or ℹ️ (:information_source:
) for notes or general info:question:
) for questions:thinking:
) or 💭 (:thought_balloon:
) for more open inquiry that's not quite a confirmed issue and could potentially benefit from discussion:art:
) for suggestions / improvements:x:
) or:warning:
) for more significant problems or concerns needing attention:seedling:
) or ♻️ (:recycle:
) for future improvements or indications of technical debt:pick:
) for minor or nitpick changes