-
Notifications
You must be signed in to change notification settings - Fork 0
Module : Extract, Load, Transfrom
Nuwan Waidyanatha edited this page Sep 2, 2023
·
1 revision
The wrangler app is instrumental to ETL tasks.
- Extracts or streams data, of various formats, from any source; mainly using utils/etl/loads apache spark workloads
- spark file workloads (e.g., csv,txt,pdf,json)
- spark RDBMS workloads (e.g., postgres,mysql,etc)
- spark NoSQL workloads unstructured data (mongoDB,couchDB,etc)
- Transforms the data, using utils/etl/transform into a format that makes domain and functional sense.
- The data is extracted and stored in a cleansed and raw form.
- raw data is further cleaned, transformed, cataloged, and historically achieved.
- historic data is available for further curation and use for data mining (AI/ML), visual analytics, and datamart services.
- The ETL processes are, usually, automated with airflow using dag files.
Rezaware abstract BI augmented AI/ML entity framework © 2022 by Nuwan Waidyanatha is licensed under Creative Commons Attribution 4.0 International