Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Challenge Week 12 - Daniel Nolan #23

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 44 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,78 +1,92 @@
# Challenge Week 12 Submission Template

Daniel Nolan

Score: 95/100

# Reddit Data Challenges

## Challenge 1

[Insert Screenshot]
![Challenge 1](http://i.imgur.com/bso7fBY.png)

## Challenge 2

[Explain what's interesting]

Running a command with the key set as "mongodb" does not produce any results. Also, there are many subreddits to any searchable reddit topic.
![Challenge 2](http://i.imgur.com/hTm8QbZ.png)

## Challenge 3

[Explain possible Insights]
From this dataset, I could personally track most used profanity words and topics relative to it. With that information, I could track how many actual comments contained profanity as a percentage of the whole Reddit database given. Or, just for fun, find the trends in Reddit at this time. The subreddits are the main place to track these trending topics when organized from most viewed to least.

What do you think this dataset would tell you about the Reddit Community as a whole?

## Challenge 4

[What it would tell you about the Reddit Community]
As a whole, it would tell me what the most popular topics are among millions of internet users on Reddit. Also, what commonly used words make up main Reddit comments. The words in the picture below were pretty prevalent in the Reddit data which was pretty interesting.
![Challenge 4](http://i.imgur.com/f34io6A.png)

## Challenge 5

[Link to Code or pasted code]
[Answer]
Couldn't get this file to work at all.


## Challenge 6

[What does this change about our analysis?]
We are only considering those comments with ten or more upvotes and not identifying those comments that have a large amount of downvotes. Also, some comments may be found offensive on one subreddit from a user but very popular on another. This could change how frequent a said user shows up if they have to gain at least 10 upvotes.

## Challenge 7

[How would you change your conclusions?]
Since you are not including posts with less than 10 upvotes then yes, the number of frequently commenting users in a certain subreddit would vary. Many people access specific subreddits and the mood of upvoting/downvoting varies with what is happening the day.

## Challenge 8

[Bias in answer]
It only incorporates popular votes with over 10 upvotes and not negative comments/posts.

## Challenge 9

[Other Biases]

We are only viewing the top 50 subreddits when there are thousands of subreddits that we could have looked at instead.
Also, spammers are very noticeable and the sad souls that love to downvote everyone for no reason.
## Challenge 10

[How may you try and prove the bias]
You would have to effectively create a code that can identify top comments that are related to that subreddit first and foremost (to remove spam comments). Then, this code could track downvote patterns and identify downvotes that were input from trolls through a type of Downvote per Minute pattern. Then, the base content of a post/comment could be found legitimate through keywords that mainly pertain to that subreddit ( like the word count example above).

# Yelp and Weather

## Challenge 1

[Screenshot your query and a result]
![Photo](http://i.imgur.com/ZLatVpp.png)

## Challenge 2
The total hpcp value came out to be 62.

[Query snippet]
## Challenge 2
![]()
db.normal.aggregate([{$match:{"DATE":{$regex: /^20100425/}, "STATION_NAME":{$regex: /^LAS VEGAS/}}},{$group: { _id: "$STATION_NAME",avgAmount: {$avg: "$HLY-WND-AVGSPD"}}}])
[Answer]

## Challenge 3
![Challenge3](http://i.imgur.com/EUAdiu2.png)
var query = db.business.find({"city":"Madison","state":"WI"}),array = [], num = db.business.count({"city": "Madison","state":"WI"});

[Query snippet]
[Answer]
for (var i = 0; i < num; i++) array.push(query[i]["business_id"]);
[1630]
db.reviews.count({"business_id":{$in : array}})
[31305]

## Challenge 4
![Challenge4](http://i.imgur.com/GuMTH5I.png)
var query = db.business.find({"city":"LasVegas","state":"WI"}),array = [], num = db.business.count({"city": "Las Vegas","state":"NV"});

[Query snippet]
[Answer]
for (var i = 0; i < num; i++) array.push(query[i]["business_id"]);
[12021]
db.reviews.count({"business_id":{$in : array}})
[522104]

## Challenge 5
![Challenge5](http://i.imgur.com/CCilNDB.png)
var query = db.business.find({"city":"Phoenix","state":"AZ"}),array = [], num = db.business.count({"city": "Phoenix","state":"AZ"});

[Query snippet]
[Answer]

## Challenge 6 [BONUS]

[Code]
[Answer]



for (var i = 0; i < num; i++) array.push(query[i]["business_id"]);
[7499]
db.reviews.count({"business_id":{$in : array}})
[185907]