Cambridge CLI (test version) #2

jdm010 · 2023-08-16T08:47:14Z

CLI for Cambridge University Press.
get_test_data.py must be run first to fetch and store files locally. These files are then randomly modified, so we can check for new releases.
cambridge_cli.py can be used to fetch new releases for the subject specified by the user.

Future versions of the code will be higher-level (where the user can specify the publisher in the CLI), and also be able to fetch releases for all subjects.

File containing functions that will be called in other files.

drjova

@jdm010 thank you for the PR, I have few comments. I'm happy to discuss if you have any questions. Also, could you please include tests for this provider?

drjova · 2023-08-16T13:05:02Z

src/providers/cambridge/get_test_data.py

+
+        subjectfiles = []
+        for subject in subjects:
+            subjectfiles.append(os.path.join(os.path.dirname(os.path.abspath(__file__)), subject) + '.tsv')


let's make the download location a env var, so we can easily configure it

drjova · 2023-08-16T13:05:20Z

src/providers/cambridge/functions.py

@@ -0,0 +1,53 @@
+import os


could you please rename it to utils.py?

drjova · 2023-08-16T13:06:44Z

src/providers/cambridge/get_test_data.py

@@ -0,0 +1,39 @@
+import os


what's the difference between this and src/providers/cambridge/cambridge_cli.py? Would you like to just test it?

Yes, just for testing.

drjova · 2023-08-16T13:10:19Z

src/providers/cambridge/functions.py

+def get_page_content(url):
+    response = requests.get(url)
+    if response.status_code == 200:
+        return response.text
+    else:
+        print(f"Error: Failed to fetch the webpage ({response.status_code})")
+        return None


Suggested change

def get_page_content(url):

response = requests.get(url)

if response.status_code == 200:

return response.text

else:

print(f"Error: Failed to fetch the webpage ({response.status_code})")

return None

def get_page_content(url):

response = requests.get(url)

try:

r = requests.get('http://www.google.com/nothere')

r.raise_for_status()

return r.text

except requests.exceptions.HTTPError as err:

print(f"Error: Failed to fetch the webpage ({response.status_code})")

return None

Also, since it's a cli it's better to use click's echo function

With this code, I get "Error: Failed to fetch the webpage (200)" and the rest of the code does not run.

drjova · 2023-08-16T13:10:50Z

src/providers/cambridge/functions.py

+    with open(input_file, 'r') as f:
+        lines = f.readlines()
+
+    first_line = lines[0]  # Preserve the first line
+
+    lines_to_remove = random.randint(0, min(5, len(lines) - 1))  # Ensure at least one line is kept
+    remaining_lines = random.sample(lines[1:], max(len(lines) - 1 - lines_to_remove, 0))
+    lines_to_keep = [first_line] + remaining_lines
+
+    with open(output_file, 'w') as f:
+        f.writelines(lines_to_keep)
+    os.remove(input_file)


What do you mean by "random" lines?

This function is used for testing purposes. I'm removing a random assortment of lines from the files that I initially fetch from the publisher. I'm using this as the "old" data, so that when we fetch the new files using the cli, we have something to compare against. So essentially just for testing.

jdm010 added 2 commits August 16, 2023 10:36

Create functions.py

164e27a

File containing functions that will be called in other files.

Add files via upload

a609f04

drjova reviewed Aug 16, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cambridge CLI (test version) #2

Cambridge CLI (test version) #2

jdm010 commented Aug 16, 2023

drjova left a comment

drjova Aug 16, 2023

drjova Aug 16, 2023

drjova Aug 16, 2023

jdm010 Aug 16, 2023

drjova Aug 16, 2023

jdm010 Aug 16, 2023

drjova Aug 16, 2023

jdm010 Aug 16, 2023

Cambridge CLI (test version) #2

Are you sure you want to change the base?

Cambridge CLI (test version) #2

Conversation

jdm010 commented Aug 16, 2023

drjova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment