#

mapreduce-python

Here are 91 public repositories matching this topic...

mahmoudparsian / big-data-mapreduce-course

Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University

Updated Dec 3, 2024
HTML

CLDXiang / Mining-Frequent-Pattern-from-Search-History

《大数据挖掘技术》@复旦课程项目，试图从搜狗实验室用户查询日志数据（2008）中找出搜索记录中有较高支持度关键词的频繁二项集。在实现层面上，我搭建了一个由五台服务器组成的微型 Hadoop 集群，并且用 Python 实现了 Parallel FP-Growth 算法中的三个 MapReduce 过程。

hadoop mapreduce fp-growth mapreduce-python

Updated Mar 29, 2021
Python

SinghHarshita / Clustering-Algorithms-Spark

KMeans, Cure and Canpoy algorithms are demonstrated using Pyspark.

machine-learning big-data spark apache-spark clustering mapreduce kmeans clustering-algorithm cure canopy big-data-analytics spark-cluster mapreduce-python

Updated May 19, 2021
Jupyter Notebook

NbnbZero / Recommendation-System

基于Item-based CF和XGBRegressor完成的用户对商品的推荐系统

collaborative-filtering pyspark mapreduce-python

Updated Jul 6, 2021
Python

Andy-Pham-72 / hadoop-mini-project

Using hadoop to utilize data from an automobile tracking platform that tracks the history of important incidents after the initial sale of a new vehicle.

python hadoop virtualbox hortonworks-hdp mapreduce-python

Updated Feb 19, 2022
Python

abhibalani / emr_lambda

Lambda to start EMR and run a map reduce job

aws aws-lambda aws-emr hadoop-mapreduce aws-emr-clusters mapreduce-python

Updated Aug 16, 2019
Python

sreetamparida / Hiraishin

A REST-based service that translates the SQL query into MapReduce and Spark jobs. It runs these jobs and provides the JSON object. SQL to MapReduce and Spark translator.

sql spark python3 pyspark mapreduce hadoop-mapreduce hadoop-streaming mapreduce-python sqltomapreduce sqltospark

Updated Sep 30, 2020
Python

anshsarkar / Big-Data-Assignments-UE18CS322

A repository containing the source codes for the assignments done as a part of the Big Data course (UE18CS322) at PES University.

big-data spark hadoop analysis spark-streaming mapreduce-python

Updated Jan 14, 2021
Python

yoongoing / bigdata_pyspark

⚡️공개용 맵리듀스 플랫폼인 Spark를 사용하여 데이터마이닝을 해보자⚡️

spark bigdata jupyter-notebook pyspark mapreduce mapreduce-python dataminig

Updated Dec 11, 2020
Jupyter Notebook

kkoless / MapReduce

Hadoop MapReduce Python

hadoop python3 hadoop-mapreduce mapreduce-python

Updated Dec 21, 2022
Python

nikhitmago / frequent-itemset-association

Market basket analysis of finding frequent itemsets using SON algorithm in Spark

python data-mining spark apriori-son mapreduce-python

Updated Oct 6, 2018
Python

krishnadey30 / NewsHeadlines

This repository have codes that extracts meaningful information from News headline data-set.

python hadoop hadoop-mapreduce news-dataset mapreduce-python

Updated Apr 28, 2019
Python

PrudhviVajja / DistributedMapReduce

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real-world tasks…

gcp inverted-index wordcount memcached-server mapreduce-python

Updated Apr 24, 2024
Python

BenitaDiop / FullStackBigData-with-SPARK

Pulled 10GB ofYelp Business data through the terminal via Kaggle API. The data was then pushed to and AWS S3 Bucket bucket for storage and analyzed on a Elastic MapReduce Cluster on a Jupyter Notebook using PySpark

spark cluster aws-s3 jupyter-notebook bucket pyspark elasticsearch-cluster mapreduce-python

Updated Aug 21, 2023
Jupyter Notebook

Roon311 / WDC-PageRank-Hadoop-MapReduce

Performing Map reduce to get the page rank on the WDC data.

hadoop mapreduce-python

Updated Nov 24, 2023
Python

python-supply / map-reduce-and-multiprocessing

Multiprocessing can be an effective way to speed up a time-consuming workflow via parallelization. This article illustrates how multiprocessing can be utilized in a concise way when implementing MapReduce-like workflows.

python multiprocessing map-reduce parallel-python python-multiprocessing mapreduce-python python-articles python-introduction

Updated Oct 22, 2020
Jupyter Notebook

MaimoonaKhilji / MapReduce-Presentation

Mapreduce Presentation

big-data mapreduce mapreduce-algorithm mapreduce-python

Updated Sep 26, 2022

skotak2 / Pasrsing-Text-with-MapReduce-programming-Paradigm-with-multithreading

Understand how map reduce works for parsing a text data with parallel processing of sub tasks using multi threading

big-data multithreading textdata mapreduce-python

Updated Jan 15, 2021
Python

arminZolfaghari / docker-hadoop

Apache Hadoop docker image | Running Python MapReduce

hadoop hadoop-mapreduce docker-hadoop hadoop-hdfs mapreduce-python

Updated May 28, 2023
Shell

besunny95 / HADOOP-BIGDATA

These are the various programs which i used for my hadoop projects.

python hive hdfs pig-latin spark-sql hume mapreduce-java mapreduce-python

Updated Sep 1, 2022
Jupyter Notebook

Improve this page

Add a description, image, and links to the mapreduce-python topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the mapreduce-python topic, visit your repo's landing page and select "manage topics."