Skip to content

Commit

Permalink
Update T2.md
Browse files Browse the repository at this point in the history
  • Loading branch information
rancidghoul authored Aug 28, 2024
1 parent a068c93 commit 437d6ca
Showing 1 changed file with 10 additions and 7 deletions.
17 changes: 10 additions & 7 deletions source/_posts/T2.md
Original file line number Diff line number Diff line change
Expand Up @@ -260,16 +260,19 @@ Each challenge has a pre-determined score. A participant’s score depends on ho
<hr>

**<span style="color: #90EE90; font-size: 1.5rem;">AI</span>**
**<span style="color: #ADD8E6; font-size: 1rem;">Authors - R. Pranav and Jagaadhep U K</span>**
**<span style="color: #ADD8E6; font-size: 1rem;">Authors - P.Manasa, D.Shalini and Jeba Rachael Nessica </span>**

**<span style="color: #ADD8E6; font-size: 1rem;">Google Colab</span>**
**<span style="color: #ADD8E6; font-size: 1rem;">Data Preprocessing with Iris Dataset</span>**

**Google Colab is a free online tool that lets you write and run Python code right in your web browser. It's like a notebook where you can type your code and see the results immediately. Colab is great for learning and working on data science projects because it supports popular Python libraries like Pandas and TensorFlow. You can also use it with others, making it easy to collaborate on projects. Plus, it provides access to powerful computers (GPUs and TPUs) for faster processing, which is really helpful for running complex tasks.**
**Objective: The goal of this task is to understand and apply basic data preprocessing techniques using the Iris dataset. This task will help you understand the basics of data preprocessing, including handling missing data, feature scaling, encoding categorical variables, and performing exploratory data analysis.
Data preprocessing is a crucial step in the data analysis and machine learning pipeline, where raw data is transformed into a clean and usable format. This process involves several steps
Data Cleaning: Handling missing values, correcting errors, and removing duplicates. Techniques include imputation, where missing data is filled in, or outlier removal to eliminate anomalies.
Data Transformation: Scaling features to ensure they are on the same scale (e.g., normalization or standardization), encoding categorical variables into numerical values (e.g., one-hot encoding), and transforming data to meet the assumptions of a model (e.g., log transformation).
Data Reduction: Reducing the dimensionality of the data through techniques like Principal Component Analysis (PCA) or feature selection, which helps improve model performance and reduce computation time.
Data Splitting: Dividing the dataset into training, validation, and test sets to evaluate model performance and prevent overfitting.
The Iris dataset is a classic dataset in machine learning, consisting of 150 samples of iris flowers, with four features (sepal length, sepal width, petal length, petal width) and a target variable (species of iris).
**

**Note: Use Google Colab to complete the tasks provided. After completing the tasks, upload your Google Colab Notebook to your GitHub repository.**

<span style="color: #ADD8E6;">_References:_</span>
- [<span style="color: #55AAFF;">Google Colab tutorial</span>](https://www.youtube.com/watch?v=rsBiVxzmhG0)

<hr>

Expand Down

0 comments on commit 437d6ca

Please sign in to comment.