Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roman #35

Open
wants to merge 261 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
261 commits
Select commit Hold shift + click to select a range
eda5524
a
sabrinali2002 Mar 27, 2023
f57c9d8
a
sabrinali2002 Mar 27, 2023
a274a14
a
sabrinali2002 Mar 27, 2023
c84fc99
a
sabrinali2002 Mar 27, 2023
1fae3b7
a
sabrinali2002 Mar 27, 2023
ae4f472
a
sabrinali2002 Mar 27, 2023
63f01b8
a
sabrinali2002 Mar 27, 2023
a4d1f2f
a
sabrinali2002 Mar 27, 2023
e0ee8db
a
sabrinali2002 Mar 27, 2023
4855390
a
sabrinali2002 Mar 27, 2023
e5a35ef
a
sabrinali2002 Mar 27, 2023
b3e0efd
a
sabrinali2002 Mar 27, 2023
09683b1
a
sabrinali2002 Mar 27, 2023
55ceb94
a
sabrinali2002 Mar 27, 2023
7794184
blah
sabrinali2002 Mar 28, 2023
67a3c43
blah
sabrinali2002 Mar 28, 2023
a05feb5
blah
sabrinali2002 Mar 28, 2023
c28a799
blah
sabrinali2002 Mar 28, 2023
3895e83
blah
sabrinali2002 Mar 28, 2023
3a2ad45
blah
sabrinali2002 Mar 28, 2023
8b5621c
blah
sabrinali2002 Mar 28, 2023
841c4a4
blah
sabrinali2002 Mar 28, 2023
01ae962
blah
sabrinali2002 Mar 28, 2023
0202d8f
blah
sabrinali2002 Mar 28, 2023
bab65d2
blah
sabrinali2002 Mar 28, 2023
3625c44
blah
sabrinali2002 Mar 28, 2023
2e1be3e
blah
sabrinali2002 Mar 28, 2023
4b7550c
blah
sabrinali2002 Mar 28, 2023
166a070
blah
sabrinali2002 Mar 28, 2023
a495114
blah
sabrinali2002 Mar 28, 2023
0d82d50
blah
sabrinali2002 Mar 28, 2023
1fa345d
blah
sabrinali2002 Mar 28, 2023
ea5cf95
blah
sabrinali2002 Mar 28, 2023
9bf39e9
blah
sabrinali2002 Mar 28, 2023
b0edd48
blah
sabrinali2002 Mar 28, 2023
76d0cb2
blah
sabrinali2002 Mar 28, 2023
be4486e
blah
sabrinali2002 Mar 28, 2023
959996d
blah
sabrinali2002 Mar 28, 2023
7435770
blah
sabrinali2002 Mar 28, 2023
6d4aa86
blah
sabrinali2002 Mar 28, 2023
e5a4e11
blah
sabrinali2002 Mar 28, 2023
988c714
blah
sabrinali2002 Mar 28, 2023
09820d1
blah
sabrinali2002 Mar 28, 2023
68b75c9
blah
sabrinali2002 Mar 28, 2023
b301f02
blah
sabrinali2002 Mar 28, 2023
6c9421f
blah
sabrinali2002 Mar 28, 2023
b729925
blah
sabrinali2002 Mar 28, 2023
1e7c212
blah
sabrinali2002 Mar 28, 2023
b9b1c77
blah
sabrinali2002 Mar 28, 2023
fe7a9f6
blah
sabrinali2002 Mar 28, 2023
da308d1
blah
sabrinali2002 Mar 28, 2023
fdd632a
blah
sabrinali2002 Mar 28, 2023
5c74c7b
blah
sabrinali2002 Mar 28, 2023
b92114a
blah
sabrinali2002 Mar 28, 2023
76514c7
blah
sabrinali2002 Mar 28, 2023
3943a39
blah
sabrinali2002 Mar 28, 2023
339b075
blah
sabrinali2002 Mar 28, 2023
912a70a
blah
sabrinali2002 Mar 28, 2023
55c279f
blah
sabrinali2002 Mar 28, 2023
8d8d663
blah
sabrinali2002 Mar 28, 2023
0f23477
blah
sabrinali2002 Mar 28, 2023
234cc43
blah
sabrinali2002 Mar 28, 2023
1df1725
blah
sabrinali2002 Mar 28, 2023
da0f6d0
blah
sabrinali2002 Mar 28, 2023
928db12
blah
sabrinali2002 Mar 28, 2023
fb05549
blah
sabrinali2002 Mar 28, 2023
0ca41e5
blah
sabrinali2002 Mar 28, 2023
dd2b47a
blah
sabrinali2002 Mar 28, 2023
44d0fd5
blah
sabrinali2002 Mar 28, 2023
48eb25b
blah
sabrinali2002 Mar 28, 2023
b423832
blah
sabrinali2002 Mar 28, 2023
7b4b918
blah
sabrinali2002 Mar 28, 2023
4d53b43
blah
sabrinali2002 Mar 28, 2023
935842f
blah
sabrinali2002 Mar 28, 2023
e3880be
blah
sabrinali2002 Mar 28, 2023
0e7c69f
blah
sabrinali2002 Mar 28, 2023
ee8fdf8
blah
sabrinali2002 Mar 28, 2023
8c06906
blah
sabrinali2002 Mar 28, 2023
4535a52
blah
sabrinali2002 Mar 28, 2023
0ce0017
blah
sabrinali2002 Mar 28, 2023
7667254
blah
sabrinali2002 Mar 28, 2023
98c300a
blah
sabrinali2002 Mar 28, 2023
857702c
blah
sabrinali2002 Mar 28, 2023
f8edc15
blah
sabrinali2002 Mar 29, 2023
954ca76
blah
sabrinali2002 Mar 29, 2023
19213f4
blah
sabrinali2002 Mar 29, 2023
78bd765
blah
sabrinali2002 Mar 29, 2023
cd57140
blah
sabrinali2002 Mar 29, 2023
55c8c8e
blah
sabrinali2002 Mar 29, 2023
9447ad3
blah
sabrinali2002 Mar 29, 2023
7783ff4
blah
sabrinali2002 Mar 29, 2023
4690b78
blah
sabrinali2002 Mar 29, 2023
6b1845b
blah
sabrinali2002 Mar 29, 2023
925287e
blah
sabrinali2002 Mar 29, 2023
67fdd22
blah
sabrinali2002 Mar 29, 2023
74ac671
blah
sabrinali2002 Mar 29, 2023
bfb7526
blah
sabrinali2002 Mar 29, 2023
7e81f25
blah
sabrinali2002 Mar 29, 2023
485ec5c
blah
sabrinali2002 Mar 29, 2023
45b310d
blah
sabrinali2002 Mar 29, 2023
f6a90ca
blah
sabrinali2002 Mar 29, 2023
243f311
blah
sabrinali2002 Mar 29, 2023
5730082
blah
sabrinali2002 Mar 29, 2023
c310b9c
blah
sabrinali2002 Mar 29, 2023
0f51e1a
blah
sabrinali2002 Mar 29, 2023
8bce0b9
blah
sabrinali2002 Mar 29, 2023
b915bf4
blah
sabrinali2002 Mar 29, 2023
70f190f
blah
sabrinali2002 Mar 29, 2023
0dac4e2
blah
sabrinali2002 Mar 29, 2023
540bf95
blah
sabrinali2002 Mar 29, 2023
dc7475b
blah
sabrinali2002 Mar 29, 2023
6d3bcc6
blah
sabrinali2002 Mar 29, 2023
16517fc
blah
sabrinali2002 Mar 29, 2023
87ef4ce
blah
sabrinali2002 Mar 29, 2023
4dc7b33
blah
sabrinali2002 Mar 29, 2023
2627c14
blah
sabrinali2002 Mar 29, 2023
46f314f
blah
sabrinali2002 Mar 29, 2023
066feb4
blah
sabrinali2002 Mar 29, 2023
f278e42
blah
sabrinali2002 Mar 29, 2023
008de55
blah
sabrinali2002 Mar 29, 2023
0292f98
blah
sabrinali2002 Mar 29, 2023
e0bb7f4
blah
sabrinali2002 Mar 29, 2023
8e76f6e
blah
sabrinali2002 Mar 29, 2023
c9a2c05
blah
sabrinali2002 Mar 29, 2023
74dfac2
blah
sabrinali2002 Mar 29, 2023
1bccd49
blah
sabrinali2002 Mar 29, 2023
63939e6
blah
sabrinali2002 Mar 29, 2023
48c615b
blah
sabrinali2002 Mar 29, 2023
bd2de37
blah
sabrinali2002 Mar 29, 2023
193c901
blah
sabrinali2002 Mar 29, 2023
f88f802
blah
sabrinali2002 Mar 29, 2023
aeead9a
blah
sabrinali2002 Mar 29, 2023
5c60674
blah
sabrinali2002 Mar 29, 2023
a854b34
blah
sabrinali2002 Mar 29, 2023
b52f72e
blah
sabrinali2002 Mar 29, 2023
f071f8c
blah
sabrinali2002 Mar 29, 2023
ac9c593
blah
sabrinali2002 Mar 29, 2023
336cb64
blah
sabrinali2002 Mar 29, 2023
ea1ecdb
blah
sabrinali2002 Mar 29, 2023
faf32d3
blah
sabrinali2002 Mar 29, 2023
45f7c3f
blah
sabrinali2002 Mar 29, 2023
cbe86dd
blah
sabrinali2002 Mar 29, 2023
bbcabf7
blah
sabrinali2002 Mar 29, 2023
c24f789
blah
sabrinali2002 Mar 29, 2023
0b92251
blah
sabrinali2002 Mar 29, 2023
54c403d
blah
sabrinali2002 Mar 29, 2023
638448e
blah
sabrinali2002 Mar 29, 2023
8ad762d
blah
sabrinali2002 Mar 29, 2023
4d968e5
blah
sabrinali2002 Mar 29, 2023
ad9c83f
blah
sabrinali2002 Mar 29, 2023
dd43d65
blah
sabrinali2002 Mar 29, 2023
76625cc
blah
sabrinali2002 Mar 29, 2023
b062f62
blah
sabrinali2002 Mar 29, 2023
976796a
blah
sabrinali2002 Mar 29, 2023
28c948a
blah
sabrinali2002 Mar 29, 2023
c3f3a60
bruh
vn46 Apr 1, 2023
869fd30
test
vn46 Apr 9, 2023
e0b56db
test template
vn46 Apr 9, 2023
b3a312d
test template
vn46 Apr 9, 2023
6e80c84
frontend
vn46 Apr 9, 2023
a2b4b51
add files
vn46 Apr 13, 2023
02fbbb1
test
vn46 Apr 16, 2023
3a41864
test
vn46 Apr 16, 2023
0382fa0
test
vn46 Apr 16, 2023
6f675c8
test
vn46 Apr 16, 2023
a8a78ea
test
vn46 Apr 16, 2023
9abfaa4
database
vn46 Apr 16, 2023
ec22795
test
sabrinali2002 Apr 16, 2023
f3a309a
test
sabrinali2002 Apr 16, 2023
199621f
fix db
vn46 Apr 16, 2023
0736b68
added dump.sql as a backup
jayfeng20 Apr 16, 2023
cbc7016
updated app.py
jayfeng20 Apr 16, 2023
3795e94
Update app.py
jayfeng20 Apr 16, 2023
60cfe18
update init.sql
jayfeng20 Apr 16, 2023
3591263
working query
vn46 Apr 17, 2023
5d18eb9
statedb
vn46 Apr 17, 2023
e7c8394
to sql
vn46 Apr 17, 2023
9fb42a8
region func
vn46 Apr 17, 2023
19c77e4
region
vn46 Apr 17, 2023
64fcbf4
updated
vn46 Apr 17, 2023
470dd86
upate
vn46 Apr 17, 2023
aaf4f8f
updated
vn46 Apr 17, 2023
c130bbe
merge
sabrinali2002 Apr 17, 2023
ccf9a17
merge
sabrinali2002 Apr 17, 2023
f658513
update query
vn46 Apr 17, 2023
f8ca452
updated query
vn46 Apr 17, 2023
3c88cc2
fix bugs
vn46 Apr 17, 2023
89188be
db credentials
vn46 Apr 17, 2023
52cd972
db cred
vn46 Apr 18, 2023
f532e80
corrected password
jayfeng20 Apr 20, 2023
330c49a
corrected ports
jayfeng20 Apr 20, 2023
0d7c4dd
Update app.py
jayfeng20 Apr 20, 2023
cb0cf19
updated app.py
jayfeng20 Apr 20, 2023
9f4ceaf
add link
vn46 Apr 20, 2023
f63807d
Merge branch 'vy'
vn46 Apr 20, 2023
917b08a
test
vn46 Apr 21, 2023
73220b6
test
vn46 Apr 21, 2023
ab51edb
test
vn46 Apr 21, 2023
9dcfe37
test
vn46 Apr 21, 2023
05e4236
test
vn46 Apr 21, 2023
9d80097
test
vn46 Apr 21, 2023
b053757
error mess
vn46 Apr 21, 2023
4b581a2
test
vn46 Apr 21, 2023
9840a67
test
vn46 Apr 21, 2023
4c67551
test
vn46 Apr 21, 2023
31a484c
test
vn46 Apr 21, 2023
b469b1a
test
vn46 Apr 21, 2023
562f8cd
test
vn46 Apr 21, 2023
2ccea20
test
vn46 Apr 21, 2023
d090065
test
vn46 Apr 21, 2023
b199fa5
test
vn46 Apr 21, 2023
1469e2f
test
vn46 Apr 21, 2023
368e2c5
test
vn46 Apr 21, 2023
f7f527d
test
vn46 Apr 21, 2023
345d315
First Commit:
11090 Apr 23, 2023
393b2de
Merge pull request #1 from 11090/univ_subreddit_tfidf
11090 Apr 23, 2023
157e214
Added sr_listed data
11090 Apr 23, 2023
2ce503a
Second Commit:
11090 Apr 23, 2023
0490b9f
added ML algorithm
jayfeng20 Apr 24, 2023
2095683
added ML
jayfeng20 Apr 24, 2023
d51ff01
min edit distance
vn46 Apr 24, 2023
97cd353
min edit distance
vn46 Apr 25, 2023
f3c1291
just aesthetics
jayfeng20 Apr 25, 2023
8ba8b55
Merge branch 'master' of https://github.com/sabrinali2002/CollegeCrush
jayfeng20 Apr 25, 2023
d919e6f
test
sabrinali2002 Apr 25, 2023
5e9ec27
test
sabrinali2002 Apr 25, 2023
9746923
push
sabrinali2002 Apr 25, 2023
dfb40ee
ignores
sabrinali2002 Apr 25, 2023
eebf86c
blah
sabrinali2002 Apr 25, 2023
49ce8b9
edit
sabrinali2002 Apr 25, 2023
d98ac98
edit
sabrinali2002 Apr 25, 2023
08e2834
stuff
sabrinali2002 Apr 26, 2023
4f86820
stuff
sabrinali2002 Apr 26, 2023
23d5005
stuff
sabrinali2002 Apr 26, 2023
b86e554
blah
sabrinali2002 Apr 26, 2023
d369b17
blah
sabrinali2002 Apr 26, 2023
5719cc6
blah
sabrinali2002 Apr 26, 2023
ea43672
test
vn46 Apr 26, 2023
f164054
test
vn46 Apr 26, 2023
edd2fc6
blah
sabrinali2002 Apr 26, 2023
4b691e8
blah
sabrinali2002 Apr 26, 2023
b991679
blah
sabrinali2002 Apr 26, 2023
4c0cc7c
balh
sabrinali2002 Apr 26, 2023
6d06ce1
balh
sabrinali2002 Apr 26, 2023
f8895c3
blah
sabrinali2002 Apr 28, 2023
0ec57fe
Added sr_listed data
11090 Apr 23, 2023
81874db
Roman Synonym Function:
11090 Apr 28, 2023
b4e392b
Synonym Second Commit:
11090 Apr 28, 2023
95a128f
test commit
11090 Apr 28, 2023
71de295
Synonym search
11090 May 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,5 @@ htmlcov/
dist/
build/
*.egg-info/
helpers/*
helpers/*
cs4300-env/
6 changes: 6 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"[python]": {
"editor.defaultFormatter": "ms-python.black-formatter"
},
"python.formatting.provider": "none"
}
778 changes: 778 additions & 0 deletions backend/College_Data.csv

Large diffs are not rendered by default.

132 changes: 132 additions & 0 deletions backend/ML.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import test

"""
Returns a new dataframe that contains only personality related words
"""
def getDataframe(college_data_path):
# dataframe of whole dataset
df = pd.read_csv(college_data_path)
# Loop through each college's tf-idf matrix and extract only the personality-related terms
new_data = []
for i in range(len(df)):
college_data = df.iloc[i, 2:] # Get the tf-idf scores for this college
college_personality_data = college_data[personality_terms] # Extract only the personality-related terms
new_data.append(college_personality_data)

# Create a new dataframe with the extracted data
new_df = pd.DataFrame(new_data, columns=personality_terms)


return new_df

"""
Input:
A list of personality related words
A personality word only dataframe

Output:
labels: an array of each colleges clusters. e.g. [1,2,3] means college at index 0 belongs to cluster 1.
college represented by index 1 belongs to cluster2 ...

Sorted_clusters: A dictionary whose keys are integers from 0 to 39 enumerating the clusters.
and values are lists of colleges that belong to each cluster
"""
def cluster(personality_terms, college_data_path, new_df):

df = pd.read_csv(college_data_path)
# Determine the number of clusters
num_clusters = 40

# Create KMeans model with the desired number of clusters
kmeans_model = KMeans(n_clusters=num_clusters)

# Cluster the colleges
kmeans_model.fit(new_df)

# Evaluate the quality of the clusters using the silhouette score
labels = kmeans_model.labels_

college_names = df.iloc[:,1]

clusters = {}
for i in range(len(labels)):
if labels[i] in clusters:
clusters[labels[i]].append(college_names[i])
else:
clusters[labels[i]] = [college_names[i]]

sorted_clusters = {k: clusters[k] for k in sorted(clusters.keys())}
# print(df.head(20))
# print(labels)
# print(df.iloc[1,1])

return labels, sorted_clusters


"""
Takes in a Dataframe and labels which is a global variable set inside cluster function, and outputs the silhouette score
which can be used for displaying something like "Your similarity with these colleges are [score] %
"""
def s_score(df, labels):
score = silhouette_score(new_df, labels)
score = (score + 1.0) / 2 * 100
return score


"""
Input:

df: The old, unprocessed dataframe
new_df: dataframe that has only personality related words
Clusters: the dictionary output by cluster function
User_input: a list of personality related words input by user

Output:
Cluster_number: an Integer that indicates which cluster has the highest similarity with user input
cluster_sim_score: an float that represents the average cosine similarity between user input and its
most similar clusters of colleges

"""
def find_cluster(df, new_df, clusters, user_input):
personality_words = new_df.columns
college_names = df.iloc[:,1]
vectorizer = TfidfVectorizer()
vectorizer.fit(personality_words)
user_tfidf = vectorizer.transform(user_input)
# calculate the cosine similarity between user input and each college

similarity_scores = cosine_similarity(user_tfidf, new_df.values)

cluster_number = -1
cluster_max_sim_score = -1

for cluster_i, colleges in clusters.items():
score = 0
for college in colleges:
print(df.loc[college_names == college]['index'].values[0])
college_index = df.loc[college_names == college]['index'].values[0]
score += similarity_scores[0][college_index]
size = len(colleges)
avg_score = score / float(size)
if avg_score > cluster_max_sim_score:
cluster_max_sim_score = avg_score
cluster_number = cluster_i

return cluster_number, cluster_max_sim_score

def get_result(input):
most_sim_cluster, sim_score = find_cluster(pd.read_csv(path), new_df, clusters, input)
return clusters[most_sim_cluster],sim_score

#just some tests
personality_terms = test.get_words()
# user_input = ['sad']
path = 'X1_with_labels.csv'
new_df = getDataframe(path)
labels, clusters = cluster(personality_terms, path, new_df)
s_score = s_score(new_df, labels)
378 changes: 378 additions & 0 deletions backend/X1_with_features.csv

Large diffs are not rendered by default.

378 changes: 378 additions & 0 deletions backend/X1_with_labels.csv

Large diffs are not rendered by default.

Loading