Skip to content

Latest commit

 

History

History
41 lines (33 loc) · 3.92 KB

README.md

File metadata and controls

41 lines (33 loc) · 3.92 KB

Papers on Data Warehouses, Lakes, Lakehouses

Contents

Data Warehouses

  1. The Snowflake Elastic Data Warehouse (Snowflake)
  2. Yellowbrick: An Elastic Data Warehouse on Kubernetes (Yellowbrick Data)
  3. Amazon Redshift Re-invented (Amazon Web Services)
  4. Napa: Powering Scalable Data Warehousing with Robust Query Performance at Google (Google)
  5. WarpGate: A Semantic Join Discovery System for Cloud Data Warehouses (Sigma Computing)
  6. Predicate Caching: Query-Driven Secondary Indexing for Cloud Data Warehouses (Amazon Web Services)
  7. ByteCard: Enhancing ByteDance’s Data Warehouse with Learned Cardinality Estimation (ByteDance)
  8. Amazon Redshift and the Case for Simpler Data Warehouses (Amazon Web Services)
  9. Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing (Hortonworks)
  10. Presto: SQL on Everything (Meta)

Data Lakes

  1. Data lake: a new ideology in big data era (USTB)
  2. Discovering Related Data At Scale (Microsoft)
  3. Data Wrangling: The Challenging Journey from the Wild to the Lake (IBM)
  4. Amalur: Next-generation Data Integration in Data Lakes (TU Delft)
  5. Accelerating Raw Data Analysis with the ACCORDA Software and Hardware Architecture (UChicago)
  6. BtrBlocks: Efficient Columnar Compression for Data Lakes (FAU, TUM)
  7. JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes (UofT)

Data Lakehouses

  1. Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics (Databricks)
  2. Shared Foundations: Modernizing Meta’s Data Lakehouse (Meta)
  3. Photon: A Fast Query Engine for Lakehouse Systems (Databricks)
  4. Analyzing and Comparing Lakehouse Storage Systems (Databricks)
  5. Deep Lake: a Lakehouse for Deep Learning (Activeloop)
  6. BigLake: BigQuery’s Evolution toward a Multi-Cloud Lakehouse (Google)
  7. Adaptive and Robust Query Execution for Lakehouses at Scale (Databricks)
  8. Petabyte-Scale Row-Level Operations in Data Lakehouses (Apple)