AI Authors - R. Pranav and Jagaadhep U K
Google Colab
Google Colab is a free online tool that lets you write and run Python code right in your web browser. It's like a notebook where you can type your code and see the results immediately. Colab is great for learning and working on data science projects because it supports popular Python libraries like Pandas and TensorFlow. You can also use it with others, making it easy to collaborate on projects. Plus, it provides access to powerful computers (GPUs and TPUs) for faster processing, which is really helpful for running complex tasks.
Note: Use Google Colab to complete the tasks provided. After completing the tasks, upload your Google Colab Notebook to your GitHub repository.
References:
Question 1
Numpy
NumPy is a popular Python library used for working with numbers and arrays. Think of it as a tool that helps you do math and handle large sets of numbers easily. It makes it simple to perform calculations on lists of numbers and matrices. NumPy is great for anyone who wants to do data analysis or scientific computing because it speeds up these tasks with its fast and powerful features.
DataSet
We use a dataset of details about 15 students each having attributes – Height, Weight, Age, Average Grade and Courses. We use the python code given below to create a NumPy array of our dataset.
Python code to create NumPy array for the task:
import numpy as np
# Creating a dataset with 15 students and 5 attributes
data = np.array([
[170, 65, 19, 85, 5],
[180, 75, 20, 90, 6],
[160, 55, 18, 80, 4],
[175, 70, 21, 88, 7],
[155, 50, 19, 82, 5],
[165, 62, 22, 89, 6],
[178, 80, 23, 91, 7],
[162, 58, 20, 78, 3],
[172, 68, 19, 86, 5],
[169, 66, 20, 84, 4],
[171, 64, 22, 87, 6],
[177, 72, 21, 90, 9],
[174, 76, 24, 88, 8],
[158, 52, 18, 75, 3],
[164, 63, 19, 81, 4]
])
# Printing the dataset with student labels
print("Student\tHeight\tWeight\tAge\tAvg Grade\tCourses")
for index, student in enumerate(data):
print(f"Student {index + 1}\t{student[0]}\t{student[1]}\t{student[2]}\t{student[3]}\t\t{student[4]}")
Objective:
-
Question 1.1 : Find the Average Height of the Students
Explanation: You need to use the mean() function from NumPy to compute the average value of the height column in the dataset.
-
Question 1.2 : Find the Age of the Oldest Student
Explanation: Use the max() function from NumPy to find the maximum value in the age column and determine the age of the oldest student.
-
Question 1.3 : Find the Index of the Student Who Took the Most Courses
Explanation: Use the argmax() function from NumPy to locate the index of the maximum value in the number of courses column.
-
Question 1.4 : Find the Number of Students with an Average Grade Above 85
Explanation: Use a NumPy condition to filter the dataset for students with an average grade above 85, and then use the sum() function to count them.
-
Question 1.5 : Calculate the Ratio of a Student's Age to Their Average Grade for Each Student
Explanation: Perform element-wise division of the age column by the average grade column to get the ratio for each student.
References:
Question 2
Pandas
Pandas is a powerful open-source data analysis and manipulation library for Python. It provides data structures like Data Frames and Series that are built on top of NumPy arrays and are designed to handle a wide range of data types and operations efficiently. Pandas is extensively used in data science and machine learning for tasks such as data cleaning, transformation, and analysis.
DataSet
We will use a dataset with 15 students, each having 5 attributes. Let's first convert the list into a Pandas DataFrame.
data = [
[170, 65, 19, 85, 5],
[180, 75, 20, 90, 6],
[160, 55, 18, 80, 4],
[175, 70, 21, 88, 7],
[155, 50, 19, 82, 5],
[165, 62, 22, 89, 6],
[178, 80, 23, 91, 7],
[162, 58, 20, 78, 3],
[172, 68, 19, 86, 5],
[169, 66, 20, 84, 4],
[171, 64, 22, 87, 6],
[177, 72, 21, 90, 9],
[174, 76, 24, 88, 8],
[158, 52, 18, 75, 3],
[164, 63, 19, 81, 4]
]
# column names being ‘Height’, ‘Weight’, ‘Age’, ‘Avg_Grade’ and ‘Courses’ in that order.
Objective:
-
Question 2.1 : Create a Pandas DataFrame
Explanation: You need to understand how to convert a NumPy array into a DataFrame and assign column names.
-
Question 2.2 : Describe the DataFrame
Explanation: The describe() function provides various summary statistics (mean, standard deviation, min, max, and percentiles) for numeric columns in the DataFrame.
-
Question 2.3 : Count the Number of Students in Each Age Group
Explanation: Use the value_counts() function to count occurrences of unique values in a column.
-
Question 2.4 : Filter the DataFrame
Explanation: Filtering allows you to extract specific rows from the DataFrame based on certain conditions.
-
Question 2.5 : Calculate the Average Grade for Each Age Group
Explanation: The groupby() function in Pandas is used to group data based on one or more columns. After grouping, you can apply aggregation functions like mean() to these groups. In this task, you will group students by their age and then calculate the average grade for each age group.
References: