Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting unknown flow for the packages missing latest version in daily ingestion report #160

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jparsai
Copy link
Contributor

@jparsai jparsai commented Jul 27, 2020

Description

In daily ingestion report, list of such packages is provided which are not having latest versions available in graph db. To ingest those packages into db ingestion flow is triggered from here using worker and Selinon.

Please note as it's a separate and urgent feature, changes planned for dependency unification and vulnerability fix are not done yet and old versions of worker/utils are used same as githb refreshjobs, APi server etc. In upcoming sprints all these changes will be done with priority.

@centos-ci
Copy link
Collaborator

@jparsai Your image is available in the registry: docker pull quay.io/openshiftio/rhel-fabric8-analytics-f8a-stacks-report:SNAPSHOT-PR-160

1 similar comment
@centos-ci
Copy link
Collaborator

@jparsai Your image is available in the registry: docker pull quay.io/openshiftio/rhel-fabric8-analytics-f8a-stacks-report:SNAPSHOT-PR-160

@centos-ci
Copy link
Collaborator

@jparsai Your image is available in the registry: docker pull quay.io/openshiftio/rhel-fabric8-analytics-f8a-stacks-report:SNAPSHOT-PR-160

@@ -724,6 +725,16 @@ def check_latest_node(self, latest_epvs, template):
"version": ver
}
missing_latest[eco].append(tmp)

# Starting ingestion flow
if os.environ.get("DISABLE_UNKNOWN_PACKAGE_FLOW", "") == "0" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move this functionality towards the end. After all the reports etc are done, as a separate step. Also, i dont see any point of checking the env variable for each pkg in the list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wanted to do it in the last after main function is finished, but function returns only True/False, I did not want to change anything in existing functionality.
I will add one more object in the tuple being returned which will have JSON of missing nodes and use it at the last after report is generated.

Copy link
Contributor Author

@jparsai jparsai Jul 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yzainee now I am done it after all reports are generated.
I tried to add an extra object in tuple being returned from function but that required changes in multiple nested functions and some of them are used at other places as well, so changing them is very risky and causing test cases to fail.
In my dev cluster I am not able to generate all the reports as it collects data from SA, CA, Ingestion etc, so testing all report generation after changing a common function is very difficult and time consuming.
To avoid any modification and risk of failure in existing functionality, I am reading the generated report from S3 bucket and getting data required for ingestion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it is a good time to refactor this out. Over the period of time, it has simply added more complexity. We can easily make this much more simpler and reduce the number of query calls as well

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:1- Get the ingestion API up and running
2- Refactor the reporting
3 - Use ingestion API in report. This can also act as a beta mode for us to fully see how the ingestion API behaves and then we can take it further to more important services like backbone, api server and then finally jobs and release monitor.

Please share your thoughts : @prashbnair

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @yzainee about point 1 and 3, I am in line with you, that is nice to have.
About point 2, as you know already its bit complex to understand and refactor, I will be needing some guidance to understand that how can I generate all reports same as production (SA, CA, ingestion, CVE, Snyk etc), so it will help me to debug the code and test any changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run few ingestions on your dev cluster.. it will create entries for that date in your rds. now for testing hard code the date so that you get the data from rds..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @yzainee, this I have done and getting ingestion accuracy etc. I am not sure how to get missing_latest_nodes, Sentry report, SA, V2.
Just to clear, you are talking about refactoring of only ingestion report? or all the reports like Sentry, SA, Snyk ingestion etc?

@centos-ci
Copy link
Collaborator

@jparsai Your image is available in the registry: docker pull quay.io/openshiftio/rhel-fabric8-analytics-f8a-stacks-report:SNAPSHOT-PR-160

@jparsai jparsai requested a review from yzainee-zz July 28, 2020 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants