Skip to content

Latest commit

 

History

History
246 lines (218 loc) · 15.3 KB

README.md

File metadata and controls

246 lines (218 loc) · 15.3 KB

Texera - Collaborative Data Science and AI/ML Using Workflows

texera-logo
Texera supports scalable data computation and enables advanced AI/ML techniques.
"Collaboration" is a key focus, and we enable an experience similar to Google Docs, but for data science.

Static Badge Static Badge Static Badge Static Badge Static Badge Static Badge Static Badge

Motivation

  • Data science is labor-intensive and particularly challenging for non-IT users applying AI/ML.
  • Many workflow-based data science platforms lack parallelism, limiting their ability to handle big datasets.
  • Cloud services and technologies have advanced significantly over the past decade, enabling powerful browser-based interfaces supported by high-speed networks.
  • Existing data science platforms offer limited interaction during long-running jobs, making them difficult to manage after execution begins.

Goals

  • Provide data science as cloud services;
  • Provide a browser-based GUI to form a workflow without writing code;
  • Allow non-IT people to access data science;
  • Support collaborative data science;
  • Allow users to interact with the execution of a job;
  • Support huge volumes of data efficiently.

Workflow GUI

The Texera interface supports real-time collaboration on data science projects, allowing seamless sharing of data and workflows with easy access to AI/ML techniques and efficient management of public and private resources. The workflow in the use case shown below includes data cleaning, ML model training, and validation. texera-screenshot

Publications (Computer Science)

  • (11/2024) IcedTea: Efficient and Responsive Time-Travel Debugging in Dataflow Systems
    Shengquan Ni, Yicong Huang, Zuozhi Wang, and Chen Li To appear in VLDB 2025
  • (8/2024) Pasta: A Cost-Based Optimizer for Generating Pipelining Schedules for Dataflow DAGs
    Xiaozhen Liu, Yicong Huang, Xinyuan Lin, Avinash Kumar, Sadeem Alsudais, and Chen Li
    To appear in SIGMOD 2025
  • (7/2024) Texera: A System for Collaborative and Interactive Data Analytics Using Workflows
    Zuozhi Wang, Yicong Huang, Shengquan Ni, Avinash Kumar, Sadeem Alsudais, Xiaozhen Liu, Xinyuan Lin, Yunyan Ding, and Chen Li
    In VLDB 2024, Scalable Data Science track | PDF | Slides
  • (3/2024) Demonstration of Udon: Line-by-line Debugging of User-Defined Functions in Data Workflows
    Yicong Huang, Zuozhi Wang, and Chen Li
    In SIGMOD 2024 Best Demo Runner-Up Award🏆 | PDF
  • (2/2024) Data Science Tasks Implemented with Scripts versus GUI-Based Workflows: The Good, the Bad, and the Ugly
    Alexander K Taylor, Yicong Huang, Junheng Hao, Xinyuan Lin, Xiusi Chen, Wei Wang, and Chen Li
    In DataPlat Workshop at ICDE 2024 | PDF | Slides
Expand All
  • (8/2023) Building a Collaborative Data Analytics System: Opportunities and Challenges Zuozhi Wang, Chen Li
    In Tutorial at VLDB 2023 | PDF | Slides
  • (8/2023) Udon: Efficient Debugging of User-Defined Functions in Big Data Systems with Line-by-Line Control Yicong Huang, Zuozhi Wang, and Chen Li
    In SIGMOD 2024 | PDF | Slides
  • (8/2023) Improving Iterative Analytics in GUI-Based Data-Processing Systems with Visualization, Version Control, and Result Reuse
    Sadeem Alsudais Ph.D. Thesis | PDF
  • (7/2023) Using Texera to Characterize Climate Change Discussions on Twitter During Wildfires
    Shengquan Ni, Yicong Huang, Jessie W. Y. Ko, Alexander Taylor, Xiusi Chen, Avinash Kumar, Sadeem Alsudais, Zuozhi Wang, Xiaozhen Liu, Wei Wang, Suellen Hopfer, and Chen Li
    In Data Science Day at KDD 2023
  • (7/2023) Raven: Accelerating Execution of Iterative Data Analytics by Reusing Results of Previous Equivalent Versions
    Sadeem Alsudais, Avinash Kumar, and Chen Li
    In HILDA Workshop at SIGMOD 2023 | PDF
  • (6/2023) Texera: A System for Collaborative and Interactive Data Analytics Using Workflows
    Zuozhi Wang Ph.D. Thesis | PDF
  • (12/2022) Towards Interactive, Adaptive and Result-aware Big Data Analytics
    Avinash Kumar Ph.D. Thesis | PDF
  • (9/2022) Fries: Fast and Consistent Runtime Reconfiguration in Dataflow Systems with Transactional Guarantees
    Zuozhi Wang, Shengquan Ni, Avinash Kumar, and Chen Li
    In VLDB 2023 | PDF | Slides
  • (7/2022) Drove: Tracking Execution Results of Workflows on Large Datasets
    Sadeem Alsudais
    In the Ph.D. Workshop at VLDB 2022 | PDF
  • (6/2022) Demonstration of Accelerating Machine Learning Inference Queries with Correlative Proxy Models
    Zhihui Yang, Yicong Huang, Zuozhi Wang, Feng Gao, Yao Lu, Chen Li, and X. Sean Wang
    In VLDB 2022 | PDF
  • (6/2022) Demonstration of Collaborative and Interactive Workflow-Based Data Analytics in Texera
    Xiaozhen Liu, Zuozhi Wang, Shengquan Ni, Sadeem Alsudais, Yicong Huang, Avinash Kumar, and Chen Li
    In VLDB 2022 | PDF | Demo Video
  • (4/2022) Optimizing Machine Learning Inference Queries with Correlative Proxy Models
    Zhihui Yang, Zuozhi Wang, Yicong Huang, Yao Lu, Chen Li, and X. Sean Wang
    In VLDB 2022 | PDF
  • (7/2020) Demonstration of Interactive Runtime Debugging of Distributed Dataflows in Texera
    Zuozhi Wang, Avinash Kumar, Shengquan Ni, and Chen Li
    In VLDB 2020 | PDF | Video | Slides
  • (1/2020) Amber: A Debuggable Dataflow system based on the Actor Model
    Avinash Kumar, Zuozhi Wang, Shengquan Ni, and Chen Li
    In VLDB 2020 | PDF | Video | Slides
  • (4/2017) A Demonstration of TextDB: Declarative and Scalable Text Analytics on Large Data Sets
    Zuozhi Wang, Flavio Bayer, Seungjin Lee, Kishore Narendran, Xuxi Pan, Qing Tang, Jimmy Wang, and Chen Li
    In ICDE 2017 Best Demo award | PDF | Video

Publications (Interdisciplinary):

  • (2/2025) DS4ALL: Teaching High-School Students Data Science and AI/ML Using the Texera Workflow Platform as a Service
    Jiadong Bai, Xiaozhen Liu, Anthony Cuturrufo, Alexander Kundu Taylor, Jeehyun Hwang, Mingyu Derek Ma, Xinyuan Lin, Yanqiao Zhu, Yicong Huang, Yunyan Ding, Wei Wang, and Chen Li
    To appear in Data Science Education K-12: Research to Practice Annual Conference 2025
  • (7/2024) Brain Image Data Processing Using Collaborative Data Workflows on Texera
    Yunyan Ding, Yicong Huang, Pan Gao, Andy Thai, Atchuth Naveen Chilaparasetti, M. Gopi, Xiangmin Xu, and Chen Li
    In Frontiers Neural Circuits | PDF
  • (1/2024) Wording Matters: The Effect of Linguistic Characteristics and Political Ideology on Resharing of COVID-19 Vaccine Tweets
    Judith Borghouts, Yicong Huang, Suellen Hopfer, Chen Li, and Gloria Mark
    In TOCHI 2024 | PDF
  • (1/2024) How the Experience of California Wildfires Shape Twitter Climate Change Framings
    Jessie W. Y. Ko, Shengquan Ni, Alexander Taylor, Xiusi Chen, Yicong Huang, Avinash Kumar, Sadeem Alsudais, Zuozhi Wang, Xiaozhen Liu, Wei Wang, Chen Li, and Suellen Hopfer In Climatic Change 2024 | PDF
  • (11/2023) The Marketing and Perceptions of Non-Tobacco Blunt Wraps on Twitter
    Joshua U. Rhee, Yicong Huang, Aurash J. Soroosh, Sadeem Alsudais, Shengquan Ni, Avinash Kumar, Jacob Paredes, Chen Li, and David S. Timberlake In Substance Use & Misuse 2023 | PDF
Expand All
  • (3/2023) Understanding Underlying Moral Values and Language Use of COVID-19 Vaccine Attitudes on Twitter
    Judith Borghouts, Yicong Huang, Sydney Gibbs, Suellen Hopfer, Chen Li, and Gloria Mark In PNAS Nexus 2023 | PDF
  • (10/2022) Public Opinions Toward COVID-19 Vaccine Mandates: A Machine Learning-Based Analysis of U.S. Tweets
    Yawen Guo, Jun Zhu, Yicong Huang, Lu He, Changyang He, Chen Li, and Kai Zheng In AMIA 2022 | PDF
  • (9/2021) The Social Amplification and Attenuation of COVID-19 Risk Perception Shaping Mask-Wearing Behavior: A Longitudinal Twitter Analysis
    Suellen Hopfer, Emilia J. Fields, Yuwen Lu, Ganesh Ramakrishnan, Ted Grover, Quishi Bai, Yicong Huang, Chen Li, and Gloria Mark In PLOS ONE 2021 | PDF
  • (4/2021) Why Do People Oppose Mask Wearing? A Comprehensive Analysis of U.S. Tweets During the COVID-19 Pandemic
    Lu He, Changyang He, Tera Leigh Reynolds, Qiushi Bai, Yicong Huang, Chen Li, Kai Zheng, and Yunan Chen
    In JAMIA 2021 | PDF

Education

Data Science for All

An NSF-funded summer program to teach high-school students data science and AI/ML

ICS 80: Data Science and AI/ML Using Workflows

A Spring 2024 course at UCI, teaching 42 undergraduates, most of whom are not computer science majors, to learn data science and AI/ML

Workshop of Data Science for Everyone at Cerritos College

A two-day workshop designed for non-CS students to learn data science and ML without a single line of coding

Videos

Watch the video

dkNET Webinar 04/26/2024

Watch the video

Texera Demo @ VLDB'20

Watch the video

Amber Presentation @ VLDB'20

Getting Started

Texera was formally known as "TextDB" before August 28, 2017.

Acknowledgements

This project is supported by the National Science Foundation under the awards IIS-1745673, IIS-2107150, AWS Research Credits, and Google Cloud Platform Education Programs.

  • NIH NIDDK This project is supported by an NIH NIDDK award.

  • Yourkit Yourkit has given an open source license to use their profiler in this project.

Citation

Please cite Texera as


@article{DBLP:journals/pvldb/WangHNKALLDL24,
  author       = {Zuozhi Wang and
                  Yicong Huang and
                  Shengquan Ni and
                  Avinash Kumar and
                  Sadeem Alsudais and
                  Xiaozhen Liu and
                  Xinyuan Lin and
                  Yunyan Ding and
                  Chen Li},
  title        = {Texera: {A} System for Collaborative and Interactive Data Analytics
                  Using Workflows},
  journal      = {Proc. {VLDB} Endow.},
  volume       = {17},
  number       = {11},
  pages        = {3580--3588},
  year         = {2024},
  url          = {https://www.vldb.org/pvldb/vol17/p3580-wang.pdf},
  timestamp    = {Thu, 19 Sep 2024 13:09:37 +0200},
  biburl       = {https://dblp.org/rec/journals/pvldb/WangHNKALLDL24.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}