-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAGs: migration fro requests lib to http hook #32
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏 few comments
@@ -1,14 +1,14 @@ | |||
def get_url(key, year): | |||
url = { | |||
"publications_total_count": r"http://cdsweb.cern.ch/search?wl=0&ln=en&cc=Published+" | |||
"publications_total_count": r"search?wl=0&ln=en&cc=Published+" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are not urls any more, these are endpoints :)
endpoint=parameters["endpoint"], | ||
_retry_args={ | ||
"stop": stop_after_attempt(3), | ||
"retry": retry_if_exception_type(Exception), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would retry on request related exceptions, to avoid hiding underlying issues
68f5d55
to
513e819
Compare
"type_of_query": key, | ||
} | ||
|
||
@task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we need to add executor here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we do! Adding it
dags/open_access/open_access.py
Outdated
url = utils.get_url(query=f"{base_query}") | ||
data = request_again_if_failed(url=url) | ||
total = get_total_results_count(data.text) | ||
query_p = rf"{base_query}+{query[type_of_query]}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what query_p
means? we can just have query
I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. query_p - p means that it's a query that goes in p
parameter of the endpoint:
return {
"endpoint": rf"search?ln=en&cc={current_collection}&p={query_p}"
+ r"&action_search=Search&op1=a&m1=a&p1=&f1=&c="
+ r"Published+Articles&c=&sf=&so=d&rm=&rg=100&sc=0&of=xm",
"type_of_query": type_of_query,
}
d5d4829
to
e0dfdd9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file is empty
@@ -8,10 +6,16 @@ | |||
LibraryCernPublicationRecords, | |||
) | |||
from common.operators.sqlalchemy_operator import sqlalchemy_task | |||
from common.utils import get_total_results_count, request_again_if_failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can completely remove request_again_if_failed
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is removed
No description provided.