Skip to content

Commit

Permalink
update test
Browse files Browse the repository at this point in the history
  • Loading branch information
Paul Lin authored and Paul Lin committed Apr 15, 2024
2 parents 96f4eaa + d23c769 commit ee81438
Show file tree
Hide file tree
Showing 10,973 changed files with 18,466,709 additions and 54 deletions.
The diff you're trying to view is too large. We only load the first 3000 changed files.
2 changes: 1 addition & 1 deletion .github/workflows/python-app.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ jobs:
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install pytest
pip install pytest flask_testing requests_mock
- name: Run Tests
run: |
cd backend
Expand Down
94 changes: 56 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,26 +13,27 @@ In this project, we aim to enhance students’ course selection experience by au
0. If you don't already have Node.js installed, you can download it [here](https://nodejs.org/en/download/).
1. Enter the `frontend` directory and install the dependencies:

```bash
cd frontend
npm install
```
```bash
cd frontend
npm install
```

2. Start the Next.js app:

```bash
npm run dev
```
```bash
npm run dev
```

### Backend

1. Enter the `backend` directory, create a virtual environment, activate it, and install the dependencies. Make sure you have Python 3.10+ installed.

```bash
cd backend
python -m venv bluebook_env
source bluebook_env/bin/activate
pip install -r requirements.txt
```
```bash
cd backend
python -m venv bluebook_env
source bluebook_env/bin/activate
pip install -r requirements.txt
```

2. You will also need to create `.env` in the the `backend` directory that contains your API key to OpenAI and the MongoDB URI. The `.env` file should look like this:

Expand All @@ -41,13 +42,30 @@ MONGO_URI="mongodb+srv://xxx"
OPENAI_API_KEY="sk-xxx"
```

Don't push your API key to this repo!

You can get an OpenAI API key [here](https://platform.openai.com/api-keys). The MongoDB URI is shared by the team. You will need to have your IP address allowlisted by MongoDB to query the database. Contact the team for access.

To run sentiment classification, first create a conda environment for Python 3 using the `backend/sentiment_classif_requirements.txt` file:

```bash
cd backend
conda create --name <env_name> --file sentiment_classif_requirements.txt
```

Activate the conda environment by running:

```bash
conda activate <env_name>
```

where `<env_name>` is your name of choice for the conda environment.

3. Start the Flask server:

```bash
python app.py
```
```bash
python app.py
```

## Usage

Expand All @@ -69,25 +87,25 @@ python app.py

4. You can also use your favorite API client (e.g., Postman) to send a POST request to `http://localhost:8000/api/chat` with the following JSON payload:

```json
{
"role": "user",
"content": "Tell me some courses about personal finance"
}
```

You should receive a response with the recommended courses like this:

```json
{
"courses": [
{
"course_code": "ECON 436",
"description": "How much should I be saving at age 35? How much of my portfolio should be invested in stocks at age 50? Which mortgage should I choose, and when should I refinance it? How much can I afford to spend per year in retirement? This course covers prescriptive models of personal saving, asset allocation, borrowing, and spending. The course is designed to answer questions facing anybody who manages their own money or is a manager in an organization that is trying to help clients manage their money.",
"title": "Personal Finance"
},
...
],
"response": "To learn more about personal finance, you can start by taking courses or workshops that focus on financial management, budgeting, investing, and retirement planning. Some universities and educational platforms offer online courses on personal finance, such as ECON 436: Personal Finance and ECON 361: Corporate Finance. Additionally, you can explore resources like books, podcasts, and websites dedicated to personal finance advice and tips. It may also be helpful to consult with a financial advisor or planner for personalized guidance on managing your finances effectively."
}
```
```json
{
"role": "user",
"content": "Tell me some courses about personal finance"
}
```

You should receive a response with the recommended courses like this:

```json
{
"courses": [
{
"course_code": "ECON 436",
"description": "How much should I be saving at age 35? How much of my portfolio should be invested in stocks at age 50? Which mortgage should I choose, and when should I refinance it? How much can I afford to spend per year in retirement? This course covers prescriptive models of personal saving, asset allocation, borrowing, and spending. The course is designed to answer questions facing anybody who manages their own money or is a manager in an organization that is trying to help clients manage their money.",
"title": "Personal Finance"
},
...
],
"response": "To learn more about personal finance, you can start by taking courses or workshops that focus on financial management, budgeting, investing, and retirement planning. Some universities and educational platforms offer online courses on personal finance, such as ECON 436: Personal Finance and ECON 361: Corporate Finance. Additionally, you can explore resources like books, podcasts, and websites dedicated to personal finance advice and tips. It may also be helpful to consult with a financial advisor or planner for personalized guidance on managing your finances effectively."
}
```
1 change: 1 addition & 0 deletions backend/.gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
.env
data/
__pycache__/
bluebook_env/
bluebook_env_1/
Expand Down
104 changes: 104 additions & 0 deletions backend/add_rating_info.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
import os
import json
import argparse
import subprocess
import os
import json
import shutil

"""
This file directly adds coursetable rating data to .json files in a directory of choice:
Args
- target_data_path: Path to the desired folder containing .json files for courses that must be updated with rating info
- sentiment_data_path: Path to folder containing .json files equipped with rating info
- years_to_port: List containing integers representing what years of sentiment info to include in the given .json files for parsed courses
Processing
- Consider each .json file in 'data_path'
- For each item (representing a single course) in the .json:
- Retrieve the relevant course evaluation files for the year(s) specified
- Store result as a new field(s) in the json course_objectect
Result
- Updated .json files with new rating field for each course's json course_objectect, written in-place.
"""

# Main function to loop over all JSON course_objectects for the given year
def main(args):

# Look at all parsed course files
for filename in os.listdir(args.target_data_path):
# import ipdb; ipdb.set_trace()

# Consider each file for the relevant years & load
if filename.endswith(".json") and int(filename[:4]) in args.years_to_port:
print("On year", filename)

season_file_path = os.path.join(args.target_data_path, filename)
with open(season_file_path, 'r') as f:
season_course = json.load(f)

# Consider each course in the relevant year/season
count = 0
for course_object in season_course:
reviews_missing = True

season_code = course_object.get("season_code", "")
crns = course_object.get("crns")
count += 1

# Identify the CRN of the course
for crn in crns:
print(f"On: {season_code}-{crn} / index {count}")

# Inspect if there are any json entries for that course in the specified season
grep_cmd = f"ls {args.sentiment_data_path} | grep {season_code}-{crn}"
try: # if so, write the sentiment data to the file
grep_output = subprocess.check_output(grep_cmd, shell=True).decode().strip().split("\n")
season_filename = grep_output[0]
sentiment_file_path = os.path.join(args.sentiment_data_path, season_filename)
with open(sentiment_file_path, 'r') as sentiment_file:
sentiment_json = json.load(sentiment_file)
course_object["ratings"] = sentiment_json["ratings"]
print(f"Finished {season_code}-{crn}")
reviews_missing = False
break
except: # if no matches found, continue
continue
if reviews_missing:
course_object["ratings"] = []

with open(season_file_path, 'w') as f:
json.dump(season_course, f, indent=4)


############################################################
############ RUN SENTIMENT CLASSIFICATION HERE #############
############################################################

if __name__ == "__main__":

parser = argparse.ArgumentParser()

# Specify the folder path where JSON files are located
parser.add_argument("--target_data_path",
type=str,
default="data/parsed_courses",
help="Folder where the .json files that need sentiment info copied over are located.")

parser.add_argument("--sentiment_data_path",
type=str,
default="data/course_evals",
help="Folder where the .json files with sentiment info are located.")

parser.add_argument("--years_to_port",
nargs="*", # 0 or more values expected => creates a list
type=int,
default = [2023], # other options: YC401, YC403
help="Specify what years of sentiment info to include in the parsed course data .json files.")

args = parser.parse_args()

main(args)

13 changes: 13 additions & 0 deletions backend/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,19 @@ def load_config(app, test_config=None):
if test_config:
# Load test configuration
app.config.update(test_config)
if "COURSE_QUERY_LIMIT" in app.config:
global COURSE_QUERY_LIMIT
COURSE_QUERY_LIMIT = app.config["COURSE_QUERY_LIMIT"]
if "SAFETY_CHECK_ENABLED" in app.config:
global SAFETY_CHECK_ENABLED
SAFETY_CHECK_ENABLED = app.config["SAFETY_CHECK_ENABLED"]
if "DATABASE_RELEVANCY_CHECK_ENABLED" in app.config:
global DATABASE_RELEVANCY_CHECK_ENABLED
DATABASE_RELEVANCY_CHECK_ENABLED = app.config[
"DATABASE_RELEVANCY_CHECK_ENABLED"
]
if "FLASK_SECRET_KEY" in app.config:
app.secret_key = app.config["FLASK_SECRET_KEY"]
else:
# Load configuration from environment variables
app.config["MONGO_URI"] = os.getenv("MONGO_URI")
Expand Down
134 changes: 134 additions & 0 deletions backend/port_sentiment_info_to_parsed_courses.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
import os
import json
import argparse
import subprocess
import os
import json
import shutil

"""
This file runs sentiment classification on CourseTable data as follows:
Args
- target_data_path: Path to the desired folder containing .json files for courses that must be updated with sentiment analysis resulst from specified years
- sentiment_data_path: Path to folder containing .json files equipped with sentiment analysis results
- years_to_port: List containing integers representing what years of sentiment info to include in the given .json files for parsed courses
Processing
- Consider each .json file in 'data_path'
- For each item (representing a single course) in the .json:
- Retrieve the relevant course evaluation files for the year(s) specified
- Store result as a new field(s) in the json course_objectect
Result
- Updated .json files with new sentiment field for each course's json course_objectect, written in-place.
Notes
- This file assumes sentiment_classification.py has already been run on the course evaluation data.
"""

# Main function to loop over all JSON course_objectects for the given year
def main(args):

# Look at all parsed course files
for filename in os.listdir(args.target_data_path):
# import ipdb; ipdb.set_trace()

# Consider each file for the relevant years & load
if filename.endswith(".json") and int(filename[:4]) in args.years_to_port:
print("On year", filename)

season_file_path = os.path.join(args.target_data_path, filename)
with open(season_file_path, 'r') as f:
season_course = json.load(f)

# Consider each course in the relevant year/season
count = 0
for course_object in season_course:
reviews_missing = True

season_code = course_object.get("season_code", "")
crns = course_object.get("crns")
count += 1

# Identify the CRN of the course
for crn in crns:
print(f"On: {season_code}-{crn} / index {count}")

# Inspect if there are any json entries for that course in the specified season
grep_cmd = f"ls {args.sentiment_data_path} | grep {season_code}-{crn}"
try: # if so, write the sentiment data to the file
grep_output = subprocess.check_output(grep_cmd, shell=True).decode().strip().split("\n")
season_filename = grep_output[0]
sentiment_file_path = os.path.join(args.sentiment_data_path, season_filename)
with open(sentiment_file_path, 'r') as sentiment_file:
sentiment_json = json.load(sentiment_file)
course_object["sentiment_info"] = sentiment_json["sentiment_info"]
print(f"Finished {season_code}-{crn}")
reviews_missing = False
break
except: # if no matches found, continue
continue
if reviews_missing:
course_object["sentiment_info"] = {
"YC401": {
"sentiment_labels": [],
"sentiment_scores": [],
"sentiment_counts": {},
"sentiment_distribution": {},
"sentiment_overall": ""
},
"YC403": {
"sentiment_labels": [],
"sentiment_scores": [],
"sentiment_counts": {},
"sentiment_distribution": {},
"sentiment_overall": ""
},
"YC409": {
"sentiment_labels": [],
"sentiment_scores": [],
"sentiment_counts": {},
"sentiment_distribution": {},
"sentiment_overall": ""
},
"final_label": "",
"final_count": 0,
"final_proportion": 0.0,
"final_counts": {},
"final_distribution": {}
}

with open(season_file_path, 'w') as f:
json.dump(season_course, f, indent=4)


############################################################
############ RUN SENTIMENT CLASSIFICATION HERE #############
############################################################

if __name__ == "__main__":

parser = argparse.ArgumentParser()

# Specify the folder path where JSON files are located
parser.add_argument("--target_data_path",
type=str,
default="data/parsed_courses",
help="Folder where the .json files that need sentiment info copied over are located.")

parser.add_argument("--sentiment_data_path",
type=str,
default="data/course_evals",
help="Folder where the .json files with sentiment info are located.")

parser.add_argument("--years_to_port",
nargs="*", # 0 or more values expected => creates a list
type=int,
default = [2023], # other options: YC401, YC403
help="Specify what years of sentiment info to include in the parsed course data .json files.")

args = parser.parse_args()

main(args)

Loading

0 comments on commit ee81438

Please sign in to comment.