-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Starting unknown flow for the packages missing latest version in daily ingestion report #160
base: master
Are you sure you want to change the base?
Conversation
@jparsai Your image is available in the registry: |
1 similar comment
@jparsai Your image is available in the registry: |
@jparsai Your image is available in the registry: |
f8a_report/report_helper.py
Outdated
@@ -724,6 +725,16 @@ def check_latest_node(self, latest_epvs, template): | |||
"version": ver | |||
} | |||
missing_latest[eco].append(tmp) | |||
|
|||
# Starting ingestion flow | |||
if os.environ.get("DISABLE_UNKNOWN_PACKAGE_FLOW", "") == "0" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should move this functionality towards the end. After all the reports etc are done, as a separate step. Also, i dont see any point of checking the env variable for each pkg in the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I wanted to do it in the last after main function is finished, but function returns only True/False, I did not want to change anything in existing functionality.
I will add one more object in the tuple being returned which will have JSON of missing nodes and use it at the last after report is generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yzainee now I am done it after all reports are generated.
I tried to add an extra object in tuple being returned from function but that required changes in multiple nested functions and some of them are used at other places as well, so changing them is very risky and causing test cases to fail.
In my dev cluster I am not able to generate all the reports as it collects data from SA, CA, Ingestion etc, so testing all report generation after changing a common function is very difficult and time consuming.
To avoid any modification and risk of failure in existing functionality, I am reading the generated report from S3 bucket and getting data required for ingestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it is a good time to refactor this out. Over the period of time, it has simply added more complexity. We can easily make this much more simpler and reduce the number of query calls as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion:1- Get the ingestion API up and running
2- Refactor the reporting
3 - Use ingestion API in report. This can also act as a beta mode for us to fully see how the ingestion API behaves and then we can take it further to more important services like backbone, api server and then finally jobs and release monitor.
Please share your thoughts : @prashbnair
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes @yzainee about point 1 and 3, I am in line with you, that is nice to have.
About point 2, as you know already its bit complex to understand and refactor, I will be needing some guidance to understand that how can I generate all reports same as production (SA, CA, ingestion, CVE, Snyk etc), so it will help me to debug the code and test any changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run few ingestions on your dev cluster.. it will create entries for that date in your rds. now for testing hard code the date so that you get the data from rds..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes @yzainee, this I have done and getting ingestion accuracy etc. I am not sure how to get missing_latest_nodes, Sentry report, SA, V2.
Just to clear, you are talking about refactoring of only ingestion report? or all the reports like Sentry, SA, Snyk ingestion etc?
…y ingestion report
@jparsai Your image is available in the registry: |
Description
In daily ingestion report, list of such packages is provided which are not having latest versions available in graph db. To ingest those packages into db ingestion flow is triggered from here using worker and Selinon.
Please note as it's a separate and urgent feature, changes planned for dependency unification and vulnerability fix are not done yet and old versions of worker/utils are used same as githb refreshjobs, APi server etc. In upcoming sprints all these changes will be done with priority.