Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PM-14360] Import Batching #703

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Conversation

BTreston
Copy link
Contributor

@BTreston BTreston commented Dec 27, 2024

🎟️ Tracking

https://bitwarden.atlassian.net/browse/PM-14360

📔 Objective

This PR implements services that allow the directory connector to batch large import requests into multiple small requests to avoid potential timeouts due to packet size exceeding default size limits for some servers. This implementation splits the requests into batches of 2000 users/groups and POSTs them sequentially.

This change also refactors some of the sync service to be more testable and removes some directory related code that does not belong. Also included are some new unit tests to the new functionality and the sync service.

📸 Screenshots

⏰ Reminders before review

  • Contributor guidelines followed
  • All formatters and local linters executed and passed
  • Written new unit and / or integration tests where applicable
  • Used internationalization (i18n) for all UI strings
  • CI builds passed
  • Communicated to DevOps any deployment requirements
  • Updated any necessary documentation (Confluence, contributing docs) or informed the documentation team

🦮 Reviewer guidelines

  • 👍 (:+1:) or similar for great changes
  • 📝 (:memo:) or ℹ️ (:information_source:) for notes or general info
  • ❓ (:question:) for questions
  • 🤔 (:thinking:) or 💭 (:thought_balloon:) for more open inquiry that's not quite a confirmed issue and could potentially benefit from discussion
  • 🎨 (:art:) for suggestions / improvements
  • ❌ (:x:) or ⚠️ (:warning:) for more significant problems or concerns needing attention
  • 🌱 (:seedling:) or ♻️ (:recycle:) for future improvements or indications of technical debt
  • ⛏ (:pick:) for minor or nitpick changes

@BTreston BTreston requested a review from eliykat December 27, 2024 15:19
Copy link
Contributor

github-actions bot commented Dec 27, 2024

Logo
Checkmarx One – Scan Summary & Details44b808ae-abee-4fb6-9dfb-062af22860d2

New Issues (1)

Checkmarx found the following issues in this Pull Request

Severity Issue Source File / Package Checkmarx Insight
LOW Use_Of_Hardcoded_Password /src/services/sync-config-helpers.ts: 21
detailsThe application uses the hard-coded password "admin" for authentication purposes, either using it to verify users' identities, or to access another...
Attack Vector
Fixed Issues (2)

Great job! The following issues were fixed in this Pull Request

Severity Issue Source File / Package
MEDIUM Cx14b19a02-387a Npm-body-parser-1.20.3
LOW Use_Of_Hardcoded_Password /src/services/ldap-directory.service.integration.spec.ts: 175

@BTreston
Copy link
Contributor Author

BTreston commented Dec 27, 2024

@eliykat Whenever you're back and able to give this first pass a look over, I just had a few questions:

  • it seems like jest doesnt like ECMAScript Modules (which are used in the googleapis) and plugging it into the existing transformer in the jest config didn't seem to fix it. Not sure how to handle the resulting error.
  • stripe has a cap on how large a transaction can be (999,999.99 USD) so there's a soft limit to how many users you can import and provision seats for at one time. Have we run into this before?
  • in terms of batching groups, do we want to batch based on the number of groups, or the number of users within the groups?

I also still have a few todo's:

  • More unit tests for the sync service (ie. for generateNewHash, buildRequests)
  • Refactor the sync call into more testable pieces, potentially move some code that might not belong in this service (looking at some of the directory stuff in this service)

Copy link
Member

@eliykat eliykat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, here are a few comments but I'll take a closer look once you've finished your draft.

To answer your questions:

it seems like jest doesnt like ECMAScript Modules (which are used in the googleapis) and plugging it into the existing transformer in the jest config didn't seem to fix it. Not sure how to handle the resulting error.

Looking at the jest documentation, it doesn't have great support for ECMAScript. I think the easiest solution is to decouple the GsuiteDirectoryService from SyncService, thus avoiding importing the google apis into jest at all. e.g. move getDirectoryService to its own class with its own abstraction, and have SyncService only inject the abstraction.

This will become a problem again if/when we get a Gsuite testbed set up, but I think that solves it for now, and is a better design.

stripe has a cap on how large a transaction can be (999,999.99 USD) so there's a soft limit to how many users you can import and provision seats for at one time. Have we run into this before?

We have not run into this before, but spending $1 million USD in a single provisioning sync seems unlikely. I think we can leave it unhandled.

in terms of batching groups, do we want to batch based on the number of groups, or the number of users within the groups?

The goal here is to avoid timeouts at the network edge, so there isn't any hard and fast rule as to how we should split our requests. What do you think?

I can see an argument for splitting based on the number of users in each group - that's the property that scales as you add more users. But my assumption was that these timeouts are actually fairly generous (this request data is very small and this is the first time I've heard of requests timing out), so the simplest approach of batching by n users and n groups would be enough to resolve the issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both strategies should implement this interface. For example, a RequestBuilder interface which has BatchRequestBuilder and SingleRequestBuilder implementations for the new and existing behaviour respectively.

Comment on lines 241 to 260
return [
new OrganizationImportRequest({
groups: (groups ?? []).map((g) => {
return {
name: g.name,
externalId: g.externalId,
memberExternalIds: Array.from(g.userMemberExternalIds),
};
}),
users: (users ?? []).map((u) => {
return {
email: u.email,
externalId: u.externalId,
deleted: u.deleted || (removeDisabled && u.disabled),
};
}),
overwriteExisting: overwriteExisting,
largeImport: largeImport,
}),
];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following on from my comment above - this would be moved into the SingleRequestBuilder implementation.

@@ -133,6 +116,36 @@ export class SyncService {
}
}

async generateNewHash(reqJson: string): Promise<string> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is doing too much - it's called generateNewHash, but it's both generating a new hash and comparing it to the existing hash from state. And its return value is not very clear - a new hash if it's different, or an empty string if it's not.

I suggest that generateNewHash only generates a hash (generateHash really), and the comparison is done elsewhere (either a separate method or inline).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This directory is used for integration tests. Because your new service is more or less functional - i.e. it's just processing a set of inputs to a set of outputs - I'm not sure we need dedicated integration tests on it.

If you disagree and want to integration test it, I don't think we should maintain a set of 11k users and 11k groups. Ideally you'd specify a lower batchSize and use a more maintanable set of test data (e.g. 250 users with a batchSize of 45). I recall we discussed trying to replicate real world conditions as much as possible, but this is a lot of fixture data to maintain forever. And would be comparatively slow to run in the pipeline.

Comment on lines 41 to 44
expect(requests.length).toEqual(
Math.ceil(mockGroups.length / batchingService.batchSize) +
Math.ceil(mockUsers.length / batchingService.batchSize),
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost always prefer constants in assertions, otherwise you risk rewriting the logic of the SUT.

For example - you know that you've generated 11k users and you have batch sizes of 2k, so you know you're going to have 6 user requests. Assert that your request has a length of 6. (example only, obviously you have groups as well.)

I think it would help with this to specify the batchSize in the constructor of the service. That way, the test can specify a fixed batch size and base its assertions on that, without affecting the batch size used by the production code.

Comment on lines 14 to 31
userSimulator = (userCount: number) => {
const simulatedArray: UserEntry[] = [];
for (let i = 0; i < userCount; i += batchingService.batchSize) {
for (let j = 0; j <= batchingService.batchSize; j++) {
simulatedArray.push(new UserEntry());
}
}
return simulatedArray;
};

groupSimulator = (groupCount: number) => {
const simulatedArray: GroupEntry[] = [];
for (let i = 0; i < groupCount; i += batchingService.batchSize) {
for (let j = 0; j <= batchingService.batchSize; j++) {
simulatedArray.push(new GroupEntry());
}
}
return simulatedArray;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this needs to know about batchSize - it should just generate the number of mock objects requested. Particularly because this is generating mock input data, which is usually supplied by a directory service which knows nothing about batching.

@BTreston BTreston marked this pull request as ready for review January 7, 2025 15:30
@BTreston BTreston requested a review from a team as a code owner January 7, 2025 15:30
@BTreston BTreston requested review from JimmyVo16 and eliykat January 7, 2025 15:30
@BTreston BTreston changed the title initial implementation [PM-14360] Import Batching Jan 17, 2025
Copy link
Member

@eliykat eliykat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me! Comments are all fairly minor.

jslib/angular/src/services/jslib-services.module.ts Outdated Show resolved Hide resolved
jslib/angular/src/services/jslib-services.module.ts Outdated Show resolved Hide resolved
jslib/common/src/abstractions/directory-factory.service.ts Outdated Show resolved Hide resolved
src/app/services/services.module.ts Outdated Show resolved Hide resolved
src/bwdc.ts Outdated Show resolved Hide resolved
src/services/sync.service.spec.ts Outdated Show resolved Hide resolved
src/services/sync.service.spec.ts Outdated Show resolved Hide resolved
src/services/sync.service.ts Outdated Show resolved Hide resolved
jslib/common/src/services/batch-requests.service.ts Outdated Show resolved Hide resolved
jslib/common/src/services/batch-requests.service.spec.ts Outdated Show resolved Hide resolved
Copy link

codecov bot commented Jan 22, 2025

Codecov Report

Attention: Patch coverage is 4.39560% with 87 lines in your changes missing coverage. Please review.

Project coverage is 6.44%. Comparing base (23713d9) to head (47f564b).

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/services/sync.service.ts 0.00% 29 Missing ⚠️
src/services/default-batch-request-builder.ts 0.00% 20 Missing ⚠️
src/services/directory-factory.service.ts 0.00% 17 Missing ⚠️
src/services/default-single-request-builder.ts 0.00% 8 Missing ⚠️
src/bwdc.ts 0.00% 7 Missing ⚠️
src/app/services/services.module.ts 0.00% 3 Missing ⚠️
src/abstractions/directory-factory.service.ts 0.00% 1 Missing ⚠️
src/abstractions/request-builder.service.ts 0.00% 1 Missing ⚠️
src/services/sync-config-helpers.ts 80.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##            main    #703      +/-   ##
========================================
+ Coverage   2.23%   6.44%   +4.20%     
========================================
  Files         60      65       +5     
  Lines       2634    2621      -13     
  Branches     467     472       +5     
========================================
+ Hits          59     169     +110     
+ Misses      2572    2436     -136     
- Partials       3      16      +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -26,6 +26,12 @@ import { StateService } from "@/jslib/common/src/services/state.service";
import { StateMigrationService } from "@/jslib/common/src/services/stateMigration.service";
import { TokenService } from "@/jslib/common/src/services/token.service";

import { DirectoryFactoryService } from "@/src/abstractions/directory-factory.service";
import { StateService as StateServiceExtended } from "@/src/abstractions/state.service";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this naming... open to suggestions. Alternatively we could rename the existing state service since we have 2 (one in jslib and one in src)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The src StateService extends the jslib StateService, so you can just use the extended class everywhere and it'll satisfy the type constraints.

However the better answer is that you don't need to add to jslib-services.module.ts. Like the rest of the jslib folder, it's essentially deprecated (but I acknowledge this isn't documented anywhere). Register your new dependencies in src/app/services/services.module.ts. That file only references the src StateService.

@BTreston BTreston requested a review from eliykat January 22, 2025 20:20
Copy link
Member

@eliykat eliykat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a test failure here that I can't explain. Is it actually trying to use apiService somewhere? https://github.com/bitwarden/directory-connector/runs/36020428086#r0s26

@@ -26,6 +26,12 @@ import { StateService } from "@/jslib/common/src/services/state.service";
import { StateMigrationService } from "@/jslib/common/src/services/stateMigration.service";
import { TokenService } from "@/jslib/common/src/services/token.service";

import { DirectoryFactoryService } from "@/src/abstractions/directory-factory.service";
import { StateService as StateServiceExtended } from "@/src/abstractions/state.service";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The src StateService extends the jslib StateService, so you can just use the extended class everywhere and it'll satisfy the type constraints.

However the better answer is that you don't need to add to jslib-services.module.ts. Like the rest of the jslib folder, it's essentially deprecated (but I acknowledge this isn't documented anywhere). Register your new dependencies in src/app/services/services.module.ts. That file only references the src StateService.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be more explicitly marked as test data. e.g. src/utils/test-fixtures.ts.

Comment on lines +118 to +123
expect(apiService.postPublicImportDirectory).toHaveBeenCalledWith({
members: [],
groups: [],
overwriteExisting: true,
largeImport: true,
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should pass - we shouldn't be sending empty request objects to the server.
I expected this would be something like:

expect(apiService.postPublicImportDirectory).toHaveBeenCalledWith(mockRequests[0]);
expect(apiService.postPublicImportDirectory).toHaveBeenCalledWith(mockRequests[1]);
expect(apiService.postPublicImportDirectory).toHaveBeenCalledWith(mockRequests[2]);
// etc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a big deal, but I could've been more explicit re: naming.

If you had only 1 request builder class, you might do something like:

DefaultRequestBuilder extends RequestBuilder

This is a most classes where you only have a single implementation.

However, here we do have multiple implementations and we already have a way to distinguish them:

SingleRequestBuilder extends RequestBuilder
BatchRequestBuilder extends RequestBuilder
// there is no "default" implementation, you have to choose one or the other

import { BatchRequestBuilder } from "./batch-requests.service";
import { SingleRequestBuilder } from "./single-request.service";
import { DefaultBatchRequestBuilder } from "./default-batch-request-builder";
import { DefaultSingleRequestBuilder } from "./default-single-request-builder";

describe("BatchingService", () => {
Copy link
Member

@eliykat eliykat Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

describe block is still using the old name

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants