Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UI new #25

Open
wants to merge 61 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
d4b6076
added p02 updates
KevinKCui Mar 19, 2024
2dd01d6
copying over cosine sim
KevinKCui Mar 20, 2024
0c7f8c4
basic update working
KevinKCui Mar 22, 2024
2cdff46
added gitignore and requirements
KevinKCui Mar 22, 2024
8ec5151
change logo to polipredictor
CollinWoo Mar 22, 2024
20ae6fe
Update app.py
KevinKCui Mar 22, 2024
f94760a
remove duplicate names
CollinWoo Mar 22, 2024
d65bd4a
edit file path to init.json
CollinWoo Mar 22, 2024
86be3a0
changing paths again
KevinKCui Mar 22, 2024
31d43f7
added newest data
KevinKCui Mar 22, 2024
765aba1
ui changes
Mar 22, 2024
b7fd3c1
Merge branch 'master' into ui-changes
KevinKCui Mar 22, 2024
704f87b
Merge pull request #1 from rah379/ui-changes
KevinKCui Mar 22, 2024
036e0be
improved results to show scores (and show 2 results)
KevinKCui Mar 24, 2024
2d83851
Update init.json
rah379 Mar 25, 2024
17d8a3d
some slight refactoring for code reorg
KevinKCui Mar 30, 2024
12bc59c
clean dataset uploaded (with over 21,000 tweets
KevinKCui Apr 13, 2024
7c2b577
removing duplicates in clean
KevinKCui Apr 13, 2024
e5bcb04
refined cleaning process
KevinKCui Apr 13, 2024
279c2ff
updated file names
KevinKCui Apr 13, 2024
d3e6af9
modify gitignore
CollinWoo Apr 14, 2024
747d66f
added twitter handles
KevinKCui Apr 14, 2024
41a6c3d
removed "greg"
KevinKCui Apr 14, 2024
71ec426
add border around results
CollinWoo Apr 14, 2024
f77ea73
adjust page spacing
CollinWoo Apr 14, 2024
faecb6a
change title
CollinWoo Apr 14, 2024
c596664
implemented boolean search
KevinKCui Apr 14, 2024
bd17f47
changed names name
KevinKCui Apr 14, 2024
fd38d56
Merge branch 'master' of https://github.com/rah379/Team_1_4300_Proj
KevinKCui Apr 14, 2024
0f36c64
oops name
KevinKCui Apr 14, 2024
ed1df1a
add twitter links to results
CollinWoo Apr 14, 2024
f8500a8
cleaned up base to show 5 results
KevinKCui Apr 14, 2024
7529ccf
show top 10
KevinKCui Apr 14, 2024
1b44cd8
remove zeros still buggy
KevinKCui Apr 14, 2024
e3ef72e
most up to date files
KevinKCui Apr 14, 2024
4f0fd24
include images
KevinKCui Apr 14, 2024
3858c96
cleaning up code
KevinKCui Apr 14, 2024
3e0fcb3
implemented basic boolean search
KevinKCui Apr 14, 2024
512d29a
fallback for svd sim = 0
KevinKCui Apr 14, 2024
ff567ec
no matches display code
KevinKCui Apr 14, 2024
c70d41b
refactored util functions
KevinKCui Apr 15, 2024
7a266ac
added top tweets mechanism
KevinKCui Apr 15, 2024
5e09ffb
updated to remove duplicate tweets
KevinKCui Apr 15, 2024
fc9ac44
added tweet trail
KevinKCui Apr 15, 2024
56bfd0b
display top tweet
KevinKCui Apr 15, 2024
5602856
svd correction
KevinKCui Apr 15, 2024
6e82ed7
added trump
KevinKCui Apr 15, 2024
de379d8
normalize card size
CollinWoo Apr 15, 2024
e2c49b5
updated files with trump
KevinKCui Apr 15, 2024
aea67a3
fixed bugs
KevinKCui Apr 15, 2024
5c55b13
Merge branch 'master' of https://github.com/rah379/Team_1_4300_Proj
KevinKCui Apr 15, 2024
4f39a92
changed svd size
KevinKCui Apr 15, 2024
9e0e4a6
Added Popularity Score and changed Similarity output
Apr 15, 2024
ab63a66
add loading icon when searching
CollinWoo Apr 15, 2024
2745c67
updated data
KevinKCui Apr 15, 2024
09c4a7e
Merge branch 'master' of https://github.com/rah379/Team_1_4300_Proj
KevinKCui Apr 15, 2024
ef5594c
slightly cleaning up
KevinKCui Apr 15, 2024
200cea4
accounted for million likes
KevinKCui Apr 15, 2024
d34351b
changed descriptions
KevinKCui Apr 15, 2024
910c43e
ui-new
Apr 15, 2024
11c4868
Merge branch 'master' into ui-new
KevinKCui Apr 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,11 @@ dist/
build/
*.egg-info/
helpers/*
json_template/
json_template/

lib/
bin/
pyvenv.cfg

.idea
4300_project
114 changes: 91 additions & 23 deletions backend/app.py
Original file line number Diff line number Diff line change
@@ -1,46 +1,114 @@
import json
import os
import sys
from flask import Flask, render_template, request
from flask_cors import CORS
from helpers.MySQLDatabaseHandler import MySQLDatabaseHandler
import pandas as pd

# ROOT_PATH for linking with all your files.
import numpy as np
import csv
import re
from helpers.similarity import svd_cos, boolean_search
# ROOT_PATH for linking with all your files.
# Feel free to use a config.py or settings.py with a global export variable
os.environ['ROOT_PATH'] = os.path.abspath(os.path.join("..",os.curdir))
os.environ['ROOT_PATH'] = os.path.abspath(os.path.join("..", os.curdir))

# Get the directory of the current script
current_directory = os.path.dirname(os.path.abspath(__file__))
# print(current_directory)

# loading data
# Specify the path to the JSON file relative to the current script
json_file_path = os.path.join(current_directory, 'init.json')
json_file_path = os.path.join(current_directory, 'data/json/docs.json')

# Assuming your JSON data is stored in a file named 'init.json'
with open(json_file_path, 'r') as file:
data = json.load(file)
episodes_df = pd.DataFrame(data['episodes'])
reviews_df = pd.DataFrame(data['reviews'])
docs = json.load(file)

wcnt = np.load(
os.path.join(current_directory, 'data/numpy/wcn_transpose.npy'))
dcn = np.load(os.path.join(current_directory, 'data/numpy/dcn.npy'))
with open(os.path.join(current_directory, 'data/json/index_politicians.json'), 'r') as f:
itp = json.load(f)
col = ['name', 'chamber', 'party', 'region', 'country']
l_data = []

csv.field_size_limit(sys.maxsize)
with open(os.path.join(current_directory, 'data/people.csv'), mode='r', newline='') as file:
csv_reader = csv.DictReader(file)
for row in csv_reader:
filtered_row = {key: row[key] for key in col}
l_data.append(filtered_row)
people_csv = json.dumps(l_data, indent=4)

with open(os.path.join(current_directory, 'data/tweets/clean.json'), 'r') as f:
tweets = json.load(f)


# names = np.load(os.path.join(current_directory, 'data/numpy/curr_names.npy'))

####################

app = Flask(__name__)
CORS(app)

# Sample search using json with pandas
def json_search(query):
matches = []
merged_df = pd.merge(episodes_df, reviews_df, left_on='id', right_on='id', how='inner')
matches = merged_df[merged_df['title'].str.lower().str.contains(query.lower())]
matches_filtered = matches[['title', 'descr', 'imdb_rating']]
matches_filtered_json = matches_filtered.to_json(orient='records')
return matches_filtered_json
# maybe we can build out the cosine similarity matrix before? and just load that information in

# we should also print out tweets/popularity

@app.route("/")
def normalize_name(name):
pattern = re.compile(r'^(Rep\.|Senator|Speaker|Archive:)\s+', re.IGNORECASE)

while True:
new_name = pattern.sub('', name)
if new_name == name:
break
name = new_name

name = re.sub(r'\b[A-Z]\.\s*', '', name)
name = re.sub(r'\s+(Jr\.|Sr\.)', '', name)

return name.strip()


def update_json(input_json):
people_data = json.loads(people_csv)
people_dict = {normalize_name(person['name']): person for person in people_data}

for _, match in enumerate(input_json['matches']):
normalized_match = normalize_name(match)
if normalized_match in people_dict:
person = people_dict[normalized_match]
input_json['country'] = input_json.get('country', []) + [person['country']]
input_json['chamber'] = input_json.get('chamber', []) + [person['chamber']]
input_json['party'] = input_json.get('party', []) + [person['party']]
input_json['region'] = input_json.get('region', []) + [person['region']]
else:
input_json['country'] = input_json.get('country', []) + ["Not Found"]
input_json['chamber'] = input_json.get('chamber', []) + ["Not Found"]
input_json['party'] = input_json.get('party', []) + ["Not Found"]
input_json['region'] = input_json.get('region', []) + ["Not Found"]

return input_json


@ app.route("/")
def home():
return render_template('base.html',title="sample html")
return render_template('base.html', title="sample html")

@app.route("/episodes")

@ app.route("/episodes")
def episodes_search():
text = request.args.get("title")
return json_search(text)
# print(svd_cos(text, docs, wcnt, dcn, itp))
record = boolean_search(text, itp, tweets)
if record is None:
record = svd_cos(text, docs, tweets, wcnt, dcn, itp)
if record is None:
record = boolean_search(text, itp, tweets, thresh=0)

if record is not None:
record = update_json(record)
# print(record)
return json.dumps(record)


if 'DB_NAME' not in os.environ:
app.run(debug=True,host="0.0.0.0",port=5000)
app.run(debug=True, host="0.0.0.0", port=5000)
1 change: 1 addition & 0 deletions backend/data/json/docs.json

Large diffs are not rendered by default.

Loading