Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOP-4020: repos_branches to docsets #909

Merged
merged 27 commits into from
Sep 22, 2023
Merged

DOP-4020: repos_branches to docsets #909

merged 27 commits into from
Sep 22, 2023

Conversation

mmeigs
Copy link
Contributor

@mmeigs mmeigs commented Sep 18, 2023

Ticket

DOP-4020

Staging

https://mongodbcom-cdn.website.staging.corp.mongodb.com/docs-qa/drivers/node/master/
https://mongodbcom-cdn.website.staging.corp.mongodb.com/docs-qa/atlas/master/

Notes

As part of the monorepo project, our queries to repos_branches needs to migrate to query the docsets collection instead.

The Autobuilder's RepoBranchesRepository class and Persistence module were what needed to be modified for this migration.

Other quality of life improvements were found and altered throughout this PR. A little far-reaching. Will add comments to try to explain to help.

Parameter store in AWS now holds values for the docsets collection.

A unique index was added to the docsets collections for the project field.

Comment on lines -8 to -54

function isUserEntitled(entitlementsObject: any): boolean {
return (entitlementsObject?.repos?.length ?? 0) > 0;
}

function isRestrictedToDeploy(userId: string): boolean {
const { restrictedProdDeploy, entitledSlackUsers } = c.get<any>('prodDeploy');
return restrictedProdDeploy && !entitledSlackUsers.includes(userId);
}

function prepResponse(statusCode, contentType, body) {
return {
statusCode: statusCode,
headers: { 'Content-Type': contentType },
body: body,
};
}

async function buildEntitledBranchList(entitlement: any, repoBranchesRepository: RepoBranchesRepository) {
const entitledBranches: string[] = [];
for (const repo of entitlement.repos) {
const [repoOwner, repoName] = repo.split('/');
const branches = await repoBranchesRepository.getRepoBranches(repoName);
for (const branch of branches) {
let buildWithSnooty = true;
if ('buildsWithSnooty' in branch) {
buildWithSnooty = branch['buildsWithSnooty'];
}
if (buildWithSnooty) {
entitledBranches.push(`${repoOwner}/${repoName}/${branch['gitBranchName']}`);
}
}
}
return entitledBranches.sort();
}

function getQSString(qs: string) {
const key_val = {};
const arr = qs.split('&');
if (arr) {
arr.forEach((keyval) => {
const kvpair = keyval.split('=');
key_val[kvpair[0]] = kvpair[1];
});
}
return key_val;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realized these functions were identical and duplicated between v1 and v2. Rolled them out to a handlers file same as others we've done in the past month.

@@ -14,7 +14,6 @@
"githubBotPW": "GITHUB_BOT_PASSWORD",
"fastlyDochubMap": "FASTLY_DOCHUB_MAP",
"entitlementCollection": "USER_ENTITLEMENT_COL_NAME",
"reposBranchesCollection": "REPOS_BRANCHES_COL_NAME",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused almost-duplicate...? There were three of these. Removed here and below.
reposBranches vs repoBranches is how to spot the duplicate.

@@ -47,7 +47,9 @@ Resources:
- Name: JOB_QUEUE_COL_NAME
Value: ${self:custom.jobCollection}
- Name: REPO_BRANCHES_COL_NAME
value: ${self:custom.repoBranchesCollection}
Value: ${self:custom.repoBranchesCollection}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed capitalizaton to match others.

@@ -114,7 +115,8 @@ webhook-env-core: &webhook-env-core
GITHUB_SECRET: ${self:custom.githubSecret}
GITHUB_DELETION_SECRET: ${self:custom.githubDeletionSecret}
GITHUB_BOT_PASSWORD: ${self:custom.githubBotPW}
REPO_BRANCHES_COL_NAME: ${self:custom.repoBranchesCollection}
REPO_BRANCHES_COL_NAME: ${self:custom.repoBranchesCollection}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra space

@@ -22,13 +23,13 @@ let githubCommandExecutor: GithubCommandExecutor;
let jobRepository: JobRepository;
let hybridJobLogger: HybridJobLogger;
let repoEntitlementRepository: RepoEntitlementsRepository;
let repoBranchesRepository: RepoBranchesRepository;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost-duplicate again. Unused. Real variable is below on line 31.

Comment on lines 86 to 117
/**
* Compares the project path from a monorepo push event, and compares it with
* what is configured in the docset entry in Atlas.
* @param path The project path where the snooty.toml file exists from the monorepo.
* This path will reflect the current project path from a given commit.
* @param projectName The project name for the docset entry.
* @returns A boolean representing whether or not the configured docset entry snooty_toml path
* matches the path found in GitHub.
*/
async checkSnootyTomlPath(path: string, projectName: string) {
const query = { project: projectName };
try {
const docsetObject = await this.findOne(
query,
`Mongo Timeout Error: Timedout while retrieving repos entry for ${path}`
);

if (!docsetObject) {
console.warn(`WARNING: The docset does not exist for the following project: ${projectName} \n path: ${path}`);

return false;
}

return docsetObject.directories.snooty_toml === path;
} catch (error) {
console.warn(
`WARNING: Error occurred when retrieving project path for ${projectName}. The following path was provided: ${path}`,
error
);
return false;
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a previous PR that was in flow at the same time as this one. This merges the two classes together.

Comment on lines -11 to -26
async getConfiguredBranchesByGithubRepoName(repoName: string): Promise<any> {
const query = { repoName: repoName };
const reposObject = await this.findOne(
query,
`Mongo Timeout Error: Timedout while retrieving repos entry for ${repoName}`
);
if (reposObject?.branches) {
return {
branches: reposObject.branches,
repoName: reposObject.repoName,
status: 'success',
};
} else {
return { status: 'failure' };
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused function. Deleted.

repoBranchesRepo
repoBranchesRepo,
docsetsRepo,
repoEntitlementsRepo
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added repoEntitlementsRepo because it was missing. Obviously not necessary because tests were not failing. But consistent.

@mmeigs mmeigs marked this pull request as ready for review September 19, 2023 17:38
Copy link
Contributor

@rayangler rayangler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just questions from me as I try to understand things!

api/controllers/v1/github.ts Outdated Show resolved Hide resolved
// if not already set, set cache value for repo_branches
if (!internals[project]) {
internals[project] = res;
internals[project] = res[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not relevant for the current state of our data, but I'm curious: since the matchCondition only expects a project and a branch, is it ever possible for 2 docs repos in the same docset with the same project and branch to both be in res? i.e. is the data we want always guaranteed to be at res[0]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting. I could see that. I could always add checking for the repo that is "prodDeployable" however I believe there are instances where we actually want to use docs-mongodb-internal. So, that kind of throws a wrench in it.
This is the structure of the query that was findOne before, so I am now wondering if this needs to be a larger discussion about how we specify in these queries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, yeah, I think it might be a bit ambiguous since the previous queries didn't really take into account different repos with the same project either.

I'm okay with things as-is, or with the extra check for prodDeployable as long as we're getting back the intended repo info (since associated products seem to mostly work with prod deploy data). Seems like there's pretty minimal risk with keeping things like this for now anyways, so I don't think this should be a blocker

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. And @seungpark does as well. I should have tagged this here.
For posterity, ticket created to address this.

Copy link
Contributor

@seungpark seungpark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmeigs what is the direction for the repo queries vs docsets queries. im assuming anything that is limited to usage of the properties you listed previously (project, bucket, url, prefix, directories) should be pointed to docset and existing repo branches queries should be fine? can we also reflect that in the A/C if this is the case

src/repositories/docsetsRepository.ts Outdated Show resolved Hide resolved
];
}

async getProjectByRepoName(repoName: string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would prefer return types in these functions. Promise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll work on that! These were straight from repoBranchesRepository.

const res = await cursor.toArray();
if (!res.length) {
const msg = `DocsetsRepository.getProjectByRepoName - Could not find project by repoName: ${repoName}`;
this._logger.info(this._repoName, msg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should return or error here. or safeguard the res[0]?.project

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're totally right, thanks!

}

async getRepo(repoName: string): Promise<any> {
const aggregationPipeline = this.getAggregationPipeline('repoName', repoName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as you mentioned in this comment, are we still going to query docsets for getting repo info?

getProjectByRepoName seems like it should be in docsets as project field should only be there in the future (CMIIW)

but getRepo and getRepoBranchesByRepoName seem like they can refer to the REPOS_BRANCHES collection

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These queries are being used to access properties that are only available in the docsets collection as well as properties that are in the repos_branches collection! They need properties such as bucket and url

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha! so these are returning (essentially what is currently) a repo_branches entry, but it will be constructed from docsets since we are aiming to deprecate them from repos branches 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly!

return res[0];
}
}
return { status: 'failure' };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to throw here instead? i'm not sure if it'll ever be the case that cursor will be undefined (aggregate will throw)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right again. I'll fix!

const jobPrefix = repoInfo?.prefix ? repoInfo['prefix'][env] : '';
// TODO: Make job be of type Job
const job = await prepGithubPushPayload(body, repoBranchesRepository, jobPrefix);
const job = await prepGithubPushPayload(body, repoBranchesRepository, docsetsRepository, jobPrefix);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are querying for getRepo twice (here and in getRepoBranchesByRepoName) can pass the repoInfo as a param

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, done!

src/repositories/docsetsRepository.ts Outdated Show resolved Hide resolved
];
}

async getProjectByRepoName(repoName: string) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll work on that! These were straight from repoBranchesRepository.

const res = await cursor.toArray();
if (!res.length) {
const msg = `DocsetsRepository.getProjectByRepoName - Could not find project by repoName: ${repoName}`;
this._logger.info(this._repoName, msg);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're totally right, thanks!

return res[0];
}
}
return { status: 'failure' };
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right again. I'll fix!

Copy link
Contributor

@rayangler rayangler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this lgtm

// if not already set, set cache value for repo_branches
if (!internals[project]) {
internals[project] = res;
internals[project] = res[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, yeah, I think it might be a bit ambiguous since the previous queries didn't really take into account different repos with the same project either.

I'm okay with things as-is, or with the extra check for prodDeployable as long as we're getting back the intended repo info (since associated products seem to mostly work with prod deploy data). Seems like there's pretty minimal risk with keeping things like this for now anyways, so I don't think this should be a blocker

Copy link
Contributor

@seungpark seungpark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment on v2 github controller but otherwise LGTM

@@ -98,10 +101,10 @@ export const TriggerBuild = async (event: APIGatewayEvent): Promise<APIGatewayPr
}

const env = c.get<string>('env');
const repoInfo = await repoBranchesRepository.getRepo(body.repository.name);
const repoInfo = await docsetsRepository.getRepo(body.repository.name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more time here for double fetching getRepo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed! tysm

}

async getRepo(repoName: string): Promise<any> {
const aggregationPipeline = this.getAggregationPipeline('repoName', repoName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha! so these are returning (essentially what is currently) a repo_branches entry, but it will be constructed from docsets since we are aiming to deprecate them from repos branches 👍

@seungpark
Copy link
Contributor

seems like test feature branch deploy is failing: Error: The stack named auto-builder-stack-enhancedApp-stg-DOP-4020-docsets-webhooks failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE

@mmeigs
Copy link
Contributor Author

mmeigs commented Sep 21, 2023

seems like test feature branch deploy is failing: Error: The stack named auto-builder-stack-enhancedApp-stg-DOP-4020-docsets-webhooks failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE

Asked @branberry about this and was told it's expected atm.
Posterity response: "feel free to ignore it. This is an issue with the initial deploy that I need to address. Basically, if there are errors when you first open the PR, the update feature branch job will fail since the queues will not have been created."

@mmeigs mmeigs merged commit 01be34b into master Sep 22, 2023
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants