Skip to content

Latest commit

 

History

History
9 lines (6 loc) · 342 Bytes

README.md

File metadata and controls

9 lines (6 loc) · 342 Bytes

Misc. Spark Common Crawl

Some miscellaneous examples of using Spark to analyze some common-crawl data.

The original use of these scripts were for some simple evaluations. Use them at your own risk and for an example of how to work with the data.

I copied the common-crawl datasets from s3 to a local hdfs cluster.