Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cambridge CLI (test version) #2

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Cambridge CLI (test version) #2

wants to merge 2 commits into from

Conversation

jdm010
Copy link

@jdm010 jdm010 commented Aug 16, 2023

CLI for Cambridge University Press.
get_test_data.py must be run first to fetch and store files locally. These files are then randomly modified, so we can check for new releases.
cambridge_cli.py can be used to fetch new releases for the subject specified by the user.

Future versions of the code will be higher-level (where the user can specify the publisher in the CLI), and also be able to fetch releases for all subjects.

jdm010 added 2 commits August 16, 2023 10:36
File containing functions that will be called in other files.
Copy link
Contributor

@drjova drjova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdm010 thank you for the PR, I have few comments. I'm happy to discuss if you have any questions. Also, could you please include tests for this provider?


subjectfiles = []
for subject in subjects:
subjectfiles.append(os.path.join(os.path.dirname(os.path.abspath(__file__)), subject) + '.tsv')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make the download location a env var, so we can easily configure it

@@ -0,0 +1,53 @@
import os
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please rename it to utils.py?

@@ -0,0 +1,39 @@
import os
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference between this and src/providers/cambridge/cambridge_cli.py? Would you like to just test it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just for testing.

Comment on lines +6 to +12
def get_page_content(url):
response = requests.get(url)
if response.status_code == 200:
return response.text
else:
print(f"Error: Failed to fetch the webpage ({response.status_code})")
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get_page_content(url):
response = requests.get(url)
if response.status_code == 200:
return response.text
else:
print(f"Error: Failed to fetch the webpage ({response.status_code})")
return None
def get_page_content(url):
response = requests.get(url)
try:
r = requests.get('http://www.google.com/nothere')
r.raise_for_status()
return r.text
except requests.exceptions.HTTPError as err:
print(f"Error: Failed to fetch the webpage ({response.status_code})")
return None

Also, since it's a cli it's better to use click's echo function

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this code, I get "Error: Failed to fetch the webpage (200)" and the rest of the code does not run.

Comment on lines +42 to +53
with open(input_file, 'r') as f:
lines = f.readlines()

first_line = lines[0] # Preserve the first line

lines_to_remove = random.randint(0, min(5, len(lines) - 1)) # Ensure at least one line is kept
remaining_lines = random.sample(lines[1:], max(len(lines) - 1 - lines_to_remove, 0))
lines_to_keep = [first_line] + remaining_lines

with open(output_file, 'w') as f:
f.writelines(lines_to_keep)
os.remove(input_file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "random" lines?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is used for testing purposes. I'm removing a random assortment of lines from the files that I initially fetch from the publisher. I'm using this as the "old" data, so that when we fetch the new files using the cli, we have something to compare against. So essentially just for testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants