Simulate data partitioning approaches on-top of an open source relational database management system (PostgreSQL) and build a simplified query processor that access data from the generated partitions. Functions implemented -
- rangePartition(), roundRobinPartition()
- roundRobinInsert(), rangeInsert()
- rangeQuery(), pointQuery()
Implement generic parallel sort and parallel join algorithms for an RDBMS. Functions implemented -
- parallelSort(), parallelJoin()
Write a map-reduce program using Hadoop MapReduce framework that performs equijoin on input from a file.
- Analyzed NYC taxi trip data on Apache spark distributed system running on HDFS.
- Performed geospatial data analysis using Spark SQL by performing point query, range query, Hot zone, and Hot cell analysis.
Perform some textual and spatial searching on MongoDB. Functions implemeted -
- FindBusinessBasedOnCity(cityToSearch, saveLocation1, collection)
- FindBusinessBasedOnLocation(categoriesToSearch, myLocation, maxDistance, saveLocation2, collection)