This is the code repository for Data Augmentation with Python, published by Packt.
Enhance deep learning accuracy with data augmentation methods for image, text, audio, and tabular data
Data is paramount in AI projects, especially for deep learning and generative AI, as forecasting accuracy relies on input datasets being robust. Acquiring additional data through traditional methods can be challenging, expensive, and impractical, and data augmentation offers an economical option to extend the dataset.
This book covers the following exciting features:
- Write OOP Python code for image, text, audio, and tabular data
- Access over 150,000 real-world datasets from the Kaggle website
- Analyze biases and safe parameters for each augmentation method
- Visualize data using standard and exotic plots in color
- Discover 32 advanced open source augmentation libraries
- Explore machine learning models, such as BERT and Transformer
- Meet Pluto, an imaginary digital coding companion
- Extend your learning with fun facts and fun challenges
If you feel this book is for you, get your copy today!
All of the code is organized into folders.
The code will look like the following:
.pluto.remember_kaggle_access_keys("your_username_here", "your_key_here")
Following is what you need for this book: This book is for data scientists and students interested in the AI discipline. Advanced AI or deep learning skills are not required; however, knowledge of Python programming and familiarity with Jupyter Notebooks are essential to understanding the topics covered in this book.
With the following software and hardware list you can run all code files present in the book (Chapter 1-9).
This book to be a hands-on journey. It will be more effective to read a chapter, run the code on the Python Notebook, re-read the chapter’s part that confused you, and jump back to hacking the code until the concept or technique is firmly understood.
Software required | OS required |
---|---|
Python | Chrome, Edge, Safari, or FireFox browser on Windows, macOS, or Linux. |
Jupyter Notebook (Python Notebook) | |
Python standard libraries, Panda, Matplotlib, and Numpy | |
Python image, text, audio, and tabular data augmentation libraries. |
The default online Jupyter Notebook is the Google Colab. You need a Google account. For other online Jupyter Notebook, like Kaggle, Microsoft, or other online Jupyter Notebook, you need sign up or have an account to their services.
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. Click here to download it.
Duc Haba is a lifelong technologist and researcher. He has been a programmer, Enterprise Mobility Solution Architect, AI Solution Architect, Principal, VP, CTO, and CEO. The companies range from startups and IPOs to enterprise companies.
Duc’s career started with Xerox Parc, researching and building expert systems (ruled-based) for copier diagnostic, and skunk works for the USA DOD. Afterward, he joined Oracle, following Viant Consulting as a founding member. He dove deep into the entrepreneurial culture in Silicon Valley. There were slightly more failures than successes, but the highlights are Viant and RRKidz. Currently, he is happy working at YML.co as the AI Solution Architect.
The book is only possible with the support of my family, fellow researchers, and a gang of professionals at Packt Publishing. Above all else, I hope you enjoy reading the book and hacking the Python Notebook as much as I enjoyed writing it.
- https://www.amazon.com/dp/1803246456
- Author: Duc Haba
- Published: 2023
- Practical Data augmentation techniques for images, texts, audio, and tabular data using real-world datasets
- Beautiful, customized charts and infographics in full color for image, text, audio, and tabular data
- Fully functional object-oriented code using open-source libraries on the Python Notebook for each chapter
- Data Augmentation Made Easy
- Biases in Data Augmentation
- Image Augmentation for Classification
- Image Augmentation for Segmentation
- Text Augmentation
- Text Augmentation with Machine Learning
- Audio Data Augmentation
- Audio Data Augmentation with Spectrogram
- Tabular Data Augmentation
You are welcome to copy, fork, correct, or enhance these Python Notebooks. See GitHub's contributors page for details.
If you have questions or notice any bugs in the Python Notebooks, kindly inform me by creating an issue on GitHub. I will respond and correct any errors promptly. Your participation is greatly appreciated, and if you are in a happy space, give this repo a ⭐ star.