update test

yale-swe · Apr 15, 2024 · ee81438 · ee81438
2 parents 96f4eaa + d23c769
commit ee81438
Show file tree

Hide file tree

Showing 10,973 changed files with 18,466,709 additions and 54 deletions.
diff --git a/.github/workflows/python-app.yml b/.github/workflows/python-app.yml
@@ -25,7 +25,7 @@ jobs:
         run: |
           python -m pip install --upgrade pip
           pip install -r requirements.txt
-          pip install pytest
+          pip install pytest flask_testing requests_mock
       - name: Run Tests
         run: |
           cd backend

diff --git a/README.md b/README.md
@@ -13,26 +13,27 @@ In this project, we aim to enhance students’ course selection experience by au
 0. If you don't already have Node.js installed, you can download it [here](https://nodejs.org/en/download/).
 1. Enter the `frontend` directory and install the dependencies:
 
-    ```bash
-    cd frontend
-    npm install
-    ```
+   ```bash
+   cd frontend
+   npm install
+   ```
+
 2. Start the Next.js app:
 
-    ```bash
-    npm run dev
-    ```
+   ```bash
+   npm run dev
+   ```
 
 ### Backend
 
 1. Enter the `backend` directory, create a virtual environment, activate it, and install the dependencies. Make sure you have Python 3.10+ installed.
 
-    ```bash
-    cd backend
-    python -m venv bluebook_env
-    source bluebook_env/bin/activate
-    pip install -r requirements.txt
-    ```
+   ```bash
+   cd backend
+   python -m venv bluebook_env
+   source bluebook_env/bin/activate
+   pip install -r requirements.txt
+   ```
 
 2. You will also need to create `.env` in the the `backend` directory that contains your API key to OpenAI and the MongoDB URI. The `.env` file should look like this:
 
@@ -41,13 +42,30 @@ MONGO_URI="mongodb+srv://xxx"
 OPENAI_API_KEY="sk-xxx"
 ```
 
+Don't push your API key to this repo!
+
 You can get an OpenAI API key [here](https://platform.openai.com/api-keys). The MongoDB URI is shared by the team. You will need to have your IP address allowlisted by MongoDB to query the database. Contact the team for access.
 
+To run sentiment classification, first create a conda environment for Python 3 using the `backend/sentiment_classif_requirements.txt` file:
+
+```bash
+cd backend
+conda create --name <env_name> --file sentiment_classif_requirements.txt
+```
+
+Activate the conda environment by running:
+
+```bash
+conda activate <env_name>
+```
+
+where `<env_name>` is your name of choice for the conda environment.
+
 3. Start the Flask server:
 
-    ```bash
-    python app.py
-    ```
+   ```bash
+   python app.py
+   ```
 
 ## Usage
 
@@ -69,25 +87,25 @@ python app.py
 
 4. You can also use your favorite API client (e.g., Postman) to send a POST request to `http://localhost:8000/api/chat` with the following JSON payload:
 
-    ```json
-    {
-        "role": "user",
-        "content": "Tell me some courses about personal finance"
-    }
-    ```
-
-    You should receive a response with the recommended courses like this:
-
-    ```json
-    {
-        "courses": [
-            {
-                "course_code": "ECON 436",
-                "description": "How much should I be saving at age 35? How much of my portfolio should be invested in stocks at age 50? Which mortgage should I choose, and when should I refinance it? How much can I afford to spend per year in retirement? This course covers prescriptive models of personal saving, asset allocation, borrowing, and spending. The course is designed to answer questions facing anybody who manages their own money or is a manager in an organization that is trying to help clients manage their money.",
-                "title": "Personal Finance"
-            },
-            ...
-        ],
-        "response": "To learn more about personal finance, you can start by taking courses or workshops that focus on financial management, budgeting, investing, and retirement planning. Some universities and educational platforms offer online courses on personal finance, such as ECON 436: Personal Finance and ECON 361: Corporate Finance. Additionally, you can explore resources like books, podcasts, and websites dedicated to personal finance advice and tips. It may also be helpful to consult with a financial advisor or planner for personalized guidance on managing your finances effectively."
-    }
-    ```
+   ```json
+   {
+     "role": "user",
+     "content": "Tell me some courses about personal finance"
+   }
+   ```
+
+   You should receive a response with the recommended courses like this:
+
+   ```json
+   {
+       "courses": [
+           {
+               "course_code": "ECON 436",
+               "description": "How much should I be saving at age 35? How much of my portfolio should be invested in stocks at age 50? Which mortgage should I choose, and when should I refinance it? How much can I afford to spend per year in retirement? This course covers prescriptive models of personal saving, asset allocation, borrowing, and spending. The course is designed to answer questions facing anybody who manages their own money or is a manager in an organization that is trying to help clients manage their money.",
+               "title": "Personal Finance"
+           },
+           ...
+       ],
+       "response": "To learn more about personal finance, you can start by taking courses or workshops that focus on financial management, budgeting, investing, and retirement planning. Some universities and educational platforms offer online courses on personal finance, such as ECON 436: Personal Finance and ECON 361: Corporate Finance. Additionally, you can explore resources like books, podcasts, and websites dedicated to personal finance advice and tips. It may also be helpful to consult with a financial advisor or planner for personalized guidance on managing your finances effectively."
+   }
+   ```
diff --git a/backend/.gitignore b/backend/.gitignore
@@ -1,4 +1,5 @@
 .env
+data/
 __pycache__/
 bluebook_env/
 bluebook_env_1/

diff --git a/backend/add_rating_info.py b/backend/add_rating_info.py
@@ -0,0 +1,104 @@
+import os
+import json
+import argparse
+import subprocess
+import os
+import json
+import shutil
+
+"""
+This file directly adds coursetable rating data to .json files in a directory of choice:
+
+Args
+- target_data_path: Path to the desired folder containing .json files for courses that must be updated with rating info
+- sentiment_data_path: Path to folder containing .json files equipped with rating info
+- years_to_port: List containing integers representing what years of sentiment info to include in the given .json files for parsed courses
+
+Processing
+- Consider each .json file in 'data_path'
+- For each item (representing a single course) in the .json: 
+    - Retrieve the relevant course evaluation files for the year(s) specified
+    - Store result as a new field(s) in the json course_objectect
+
+Result
+- Updated .json files with new rating field for each course's json course_objectect, written in-place.
+"""
+
+# Main function to loop over all JSON course_objectects for the given year
+def main(args):
+
+    # Look at all parsed course files
+    for filename in os.listdir(args.target_data_path):
+        # import ipdb; ipdb.set_trace()
+
+        # Consider each file for the relevant years & load
+        if filename.endswith(".json") and int(filename[:4]) in args.years_to_port:
+            print("On year", filename)
+
+            season_file_path = os.path.join(args.target_data_path, filename)
+            with open(season_file_path, 'r') as f:
+                season_course = json.load(f)
+
+            # Consider each course in the relevant year/season
+            count = 0
+            for course_object in season_course:
+                reviews_missing = True
+
+                season_code = course_object.get("season_code", "")
+                crns = course_object.get("crns")
+                count += 1
+
+                # Identify the CRN of the course
+                for crn in crns:
+                    print(f"On: {season_code}-{crn} / index {count}")
+
+                    # Inspect if there are any json entries for that course in the specified season
+                    grep_cmd = f"ls {args.sentiment_data_path} | grep {season_code}-{crn}"
+                    try: # if so, write the sentiment data to the file
+                        grep_output = subprocess.check_output(grep_cmd, shell=True).decode().strip().split("\n")
+                        season_filename = grep_output[0]
+                        sentiment_file_path = os.path.join(args.sentiment_data_path, season_filename)
+                        with open(sentiment_file_path, 'r') as sentiment_file:
+                            sentiment_json = json.load(sentiment_file)
+                            course_object["ratings"] = sentiment_json["ratings"]
+                        print(f"Finished {season_code}-{crn}")
+                        reviews_missing = False
+                        break
+                    except: # if no matches found, continue
+                        continue
+                if reviews_missing:
+                    course_object["ratings"] = []
+
+            with open(season_file_path, 'w') as f:
+                json.dump(season_course, f, indent=4)
+
+
+############################################################
+############ RUN SENTIMENT CLASSIFICATION HERE #############
+############################################################
+
+if __name__ == "__main__":
+
+    parser = argparse.ArgumentParser()
+
+    # Specify the folder path where JSON files are located
+    parser.add_argument("--target_data_path", 
+                        type=str, 
+                        default="data/parsed_courses",
+                        help="Folder where the .json files that need sentiment info copied over are located.") 
+
+    parser.add_argument("--sentiment_data_path", 
+                        type=str, 
+                        default="data/course_evals",
+                        help="Folder where the .json files with sentiment info are located.") 
+
+    parser.add_argument("--years_to_port", 
+                        nargs="*",  # 0 or more values expected => creates a list
+                        type=int,
+                        default = [2023], # other options: YC401, YC403
+                        help="Specify what years of sentiment info to include in the parsed course data .json files.") 
+
+    args = parser.parse_args()
+
+    main(args)
+
diff --git a/backend/app.py b/backend/app.py
@@ -28,6 +28,19 @@ def load_config(app, test_config=None):
     if test_config:
         # Load test configuration
         app.config.update(test_config)
+        if "COURSE_QUERY_LIMIT" in app.config:
+            global COURSE_QUERY_LIMIT
+            COURSE_QUERY_LIMIT = app.config["COURSE_QUERY_LIMIT"]
+        if "SAFETY_CHECK_ENABLED" in app.config:
+            global SAFETY_CHECK_ENABLED
+            SAFETY_CHECK_ENABLED = app.config["SAFETY_CHECK_ENABLED"]
+        if "DATABASE_RELEVANCY_CHECK_ENABLED" in app.config:
+            global DATABASE_RELEVANCY_CHECK_ENABLED
+            DATABASE_RELEVANCY_CHECK_ENABLED = app.config[
+                "DATABASE_RELEVANCY_CHECK_ENABLED"
+            ]
+        if "FLASK_SECRET_KEY" in app.config:
+            app.secret_key = app.config["FLASK_SECRET_KEY"]
     else:
         # Load configuration from environment variables
         app.config["MONGO_URI"] = os.getenv("MONGO_URI")

diff --git a/backend/port_sentiment_info_to_parsed_courses.py b/backend/port_sentiment_info_to_parsed_courses.py
@@ -0,0 +1,134 @@
+import os
+import json
+import argparse
+import subprocess
+import os
+import json
+import shutil
+
+"""
+This file runs sentiment classification on CourseTable data as follows:
+
+Args
+- target_data_path: Path to the desired folder containing .json files for courses that must be updated with sentiment analysis resulst from specified years
+- sentiment_data_path: Path to folder containing .json files equipped with sentiment analysis results
+- years_to_port: List containing integers representing what years of sentiment info to include in the given .json files for parsed courses
+
+Processing
+- Consider each .json file in 'data_path'
+- For each item (representing a single course) in the .json: 
+    - Retrieve the relevant course evaluation files for the year(s) specified
+    - Store result as a new field(s) in the json course_objectect
+
+Result
+- Updated .json files with new sentiment field for each course's json course_objectect, written in-place.
+
+Notes
+- This file assumes sentiment_classification.py has already been run on the course evaluation data.
+"""
+
+# Main function to loop over all JSON course_objectects for the given year
+def main(args):
+
+    # Look at all parsed course files
+    for filename in os.listdir(args.target_data_path):
+        # import ipdb; ipdb.set_trace()
+
+        # Consider each file for the relevant years & load
+        if filename.endswith(".json") and int(filename[:4]) in args.years_to_port:
+            print("On year", filename)
+
+            season_file_path = os.path.join(args.target_data_path, filename)
+            with open(season_file_path, 'r') as f:
+                season_course = json.load(f)
+
+            # Consider each course in the relevant year/season
+            count = 0
+            for course_object in season_course:
+                reviews_missing = True
+
+                season_code = course_object.get("season_code", "")
+                crns = course_object.get("crns")
+                count += 1
+
+                # Identify the CRN of the course
+                for crn in crns:
+                    print(f"On: {season_code}-{crn} / index {count}")
+
+                    # Inspect if there are any json entries for that course in the specified season
+                    grep_cmd = f"ls {args.sentiment_data_path} | grep {season_code}-{crn}"
+                    try: # if so, write the sentiment data to the file
+                        grep_output = subprocess.check_output(grep_cmd, shell=True).decode().strip().split("\n")
+                        season_filename = grep_output[0]
+                        sentiment_file_path = os.path.join(args.sentiment_data_path, season_filename)
+                        with open(sentiment_file_path, 'r') as sentiment_file:
+                            sentiment_json = json.load(sentiment_file)
+                            course_object["sentiment_info"] = sentiment_json["sentiment_info"]
+                        print(f"Finished {season_code}-{crn}")
+                        reviews_missing = False
+                        break
+                    except: # if no matches found, continue
+                        continue
+                if reviews_missing:
+                    course_object["sentiment_info"] = {
+                        "YC401": {
+                            "sentiment_labels": [],
+                            "sentiment_scores": [],
+                            "sentiment_counts": {},
+                            "sentiment_distribution": {},
+                            "sentiment_overall": ""
+                        },
+                        "YC403": {
+                            "sentiment_labels": [],
+                            "sentiment_scores": [],
+                            "sentiment_counts": {},
+                            "sentiment_distribution": {},
+                            "sentiment_overall": ""
+                        },
+                        "YC409": {
+                            "sentiment_labels": [],
+                            "sentiment_scores": [],
+                            "sentiment_counts": {},
+                            "sentiment_distribution": {},
+                            "sentiment_overall": ""
+                        },
+                        "final_label": "",
+                        "final_count": 0,
+                        "final_proportion": 0.0,
+                        "final_counts": {},
+                        "final_distribution": {}
+                    }
+
+            with open(season_file_path, 'w') as f:
+                json.dump(season_course, f, indent=4)
+
+
+############################################################
+############ RUN SENTIMENT CLASSIFICATION HERE #############
+############################################################
+
+if __name__ == "__main__":
+
+    parser = argparse.ArgumentParser()
+
+    # Specify the folder path where JSON files are located
+    parser.add_argument("--target_data_path", 
+                        type=str, 
+                        default="data/parsed_courses",
+                        help="Folder where the .json files that need sentiment info copied over are located.") 
+
+    parser.add_argument("--sentiment_data_path", 
+                        type=str, 
+                        default="data/course_evals",
+                        help="Folder where the .json files with sentiment info are located.") 
+
+    parser.add_argument("--years_to_port", 
+                        nargs="*",  # 0 or more values expected => creates a list
+                        type=int,
+                        default = [2023], # other options: YC401, YC403
+                        help="Specify what years of sentiment info to include in the parsed course data .json files.") 
+
+    args = parser.parse_args()
+
+    main(args)
+