Skip to content

Latest commit

 

History

History
29 lines (9 loc) · 847 Bytes

README.md

File metadata and controls

29 lines (9 loc) · 847 Bytes

Analyzing 10GB of Yelp Data on AWS EMR

Leveraging Pyspark, Python, Spark, SQL, SparkR, R and Bash

Pulled 10GB ofYelp Business data through the terminal via Kaggle API. The data was then pushed to and AWS S3 Bucket bucket for storage and analyzed on a Elastic MapReduce Cluster on a Jupyter Notebook using PySpark


AWS Cluster Configuration

cluster

AWS Notebook Configuration

notebook