Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Irfan Nadiadi Learning Challenge 12 #22

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 25 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,20 @@

## Challenge 1

[Insert Screenshot]
![screenshot](reddit/challenge_1.png)

## Challenge 2

[Explain what's interesting]
First, it's incredible how large this dataset really is, which is a testament to the popularity of reddit. The data provided is very granular and has more information than I expected, such as 'banned_by' and 'num_reports'.
Also, I may have done something incorrectly, but I'm finding that the comment with the most upvotes is by osufan765, with only 6 upvotes, which doesn't seem right...

## Challenge 3

[Explain possible Insights]
Aggregating the data to show the summed 'upvote' scores by subreddit could help determine which subreddits are trending. It could also be interesting to see which comments are reported the most and if there are any common words among the body of the post that would indicate topics that the community finds unfavorable.

## Challenge 4

[What it would tell you about the Reddit Community]
I think seeing the kind of posts that the Reddit community finds most unfavorable and favorable would give insight into the kinds of topics that users are most focused on. I use Reddit frequently, and I find that the overall behavior of users on the website is very predictable, and an analysis like this could help to quantify just how predictable (and possibly un-original) content on the website can really be.

## Challenge 5

Expand All @@ -25,49 +26,55 @@

## Challenge 6

[What does this change about our analysis?]
Finding commenters with more than 10 upvotes on their post does not necessarily reflect who is commenting the most. Popular posts on Reddit have hundreds, sometimes thousands of comments, and it is very easy for a post to get buried, never upvoted past 10. As a result, limiting the data to only those comments with greater than 10 upvotes would cut out a significant portion of the data.

## Challenge 7

[How would you change your conclusions?]
Yes they would. The top 50 subreddits are likely to have many commenters, each commenting frequently, but the chances of those users having greater than 10 upvotes consistently is slim. The limited dataset would not accurately represent which users are commenting most in the top subreddits.

## Challenge 8

[Bias in answer]
The top 50 subreddits are the most popular, so inherently there will be more people commenting there than other subreddits.

## Challenge 9

[Other Biases]
I don't think this dataset takes into account comments that are edited or removed, which is another overall limiting factor for the dataset.

## Challenge 10

[How may you try and prove the bias]
It would be helpful to see an unadulterated dataset of comments, which also shows comments that have been deleted and their revision history, if available.

# Yelp and Weather

## Challenge 1

[Screenshot your query and a result]
![screenshot](weather/challenge_1.png)

## Challenge 2

[Query snippet]
[Answer]
![screenshot](weather/challenge_2.png)

db.normals.aggregate([{$match:{'DATE':/20100425.+/}},{$group:{_id:'$STATION_NAME', total:{$avg:'$HLY-WIND-AVGSPD'}}}])

110.083333

## Challenge 3

[Query snippet]
[Answer]
db.businesses.aggregate([{$match:{'city': 'Madison'}},{$group:{_id:0,total:{$sum:'$review_count'}}}])

34410

## Challenge 4

[Query snippet]
[Answer]
db.businesses.aggregate([{$match:{'city': 'Las Vegas'}},{$group:{_id:0,total:{$sum:'$review_count'}}}])

577550

## Challenge 5

[Query snippet]
[Answer]
db.businesses.aggregate([{$match:{'city': 'Phoenix'}},{$group:{_id:0,total:{$sum:'$review_count'}}}])

200089

## Challenge 6 [BONUS]

Expand Down
Binary file added reddit/challenge_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added weather/challenge_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added weather/challenge_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.