Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annual Reports: migration for API class #39

Merged
merged 1 commit into from
Aug 13, 2024
Merged

Annual Reports: migration for API class #39

merged 1 commit into from
Aug 13, 2024

Conversation

ErnestaP
Copy link
Contributor

@ErnestaP ErnestaP commented Aug 6, 2024

No description provided.

@ErnestaP ErnestaP changed the title WIP: Annual Reports: migration for API class Annual Reports: migration for API class Aug 7, 2024
Comment on lines +58 to +70
previous_task = None
for year in years:
fetch_task = fetch_categories_report_count.override(
task_id=f"fetch_report_{year}"
)(year=year)
populate_task = populate_categories_report_count.override(
task_id=f"populate_report_{year}"
)(entry=fetch_task)
if previous_task:
previous_task >> fetch_task
fetch_task >> populate_task
previous_task = populate_task

Copy link
Contributor

@drjova drjova Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to chain the tasks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to run requests to CDS one by one because otherwise, we will break CDS.
My first attempt at this dag implementation was without chaining the tasks, and CDS could not handle it

count = Column(Integer, nullable=False)
year = Column(Integer, nullable=False)
created_at = Column(DateTime, default=func.now())
updated_at = Column(DateTime, default=func.now(), onupdate=func.now())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we have this it will automatically update the timestamp we don't have to do this

drjova
drjova previously approved these changes Aug 12, 2024
Copy link
Contributor

@drjova drjova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the comment



def get_endpoint(key, year):
cds_token = os.environ.get("CDS_TOKEN", "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to use the airflow variables for this kind of things.


def get_endpoint(key, year):
cds_token = os.environ.get("CDS_TOKEN", "")
if cds_token:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the token should be always be there, we can raise and exception if it's missing

@drjova drjova dismissed their stale review August 13, 2024 10:00

More changes to get use of airflow variables feature

@ErnestaP ErnestaP merged commit 84ffa5b into main Aug 13, 2024
1 check passed
@drjova drjova deleted the annual-reports branch November 5, 2024 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants