The purpose of this folder is to present multiple solutions (using DataFrames and RDDs) for classic word count problem.
"... This book will be a great resource for both readers looking to implement existing algorithms in a scalable fashion and readers who are developing new, custom algorithms using Spark. ..." Dr. Matei Zaharia Original Creator of Apache Spark FOREWORD by Dr. Matei Zaharia |
word_count_by_dataframe.log
word_count_by_dataframe.py
word_count_by_dataframe_shorthand.log
word_count_by_dataframe_shorthand.py
Solutions are provided by using reduceByKey()
, groupByKey()
, and combineByKey()
reducers.
In general, solution by using reduceByKey()
and
combineByKey()
are scale-out solutions
than using groupByKey()
.
wordcount_by_groupbykey.py
wordcount_by_groupbykey.sh
wordcount_by_groupbykey_shorthand.py
wordcount_by_groupbykey_shorthand.sh
wordcount_by_combinebykey.py
wordcount_by_combinebykey.sh
wordcount_by_reducebykey.py
wordcount_by_reducebykey.sh
wordcount_by_reducebykey_shorthand.py
wordcount_by_reducebykey_shorthand.sh
wordcount_by_reducebykey_with_filter.py
wordcount_by_reducebykey_with_filter.sh
best regards,
Mahmoud Parsian