Follow the following steps and submit the final notebook in a GitHub repo.
- Download the dataset named “IPL 2022 Batters.csv” from here
- Read the data from the csv file using pandas data frame.
- Check if there is any null value present in the data, if present handle them.
- Plot graphs using matplotlib to explore various correlations in the data for example how the total run of a player is correlated to number of 4s hit or strike rate is correlated to the number of 4s hit.
- Now you must perform linear regression on the number of 4s.
- Create two data frames, one for features ( you may choose whichever columns you find suitable) and one for target (here target is number of 4s).
- Now split the dataset using train_test_split utility from sklearn into training and testing with the number of 4s as a target.
- Now train the linear regression model using sklearn on train data and predict the values for test data.
- Find mean square error on train data predictions and test data predictions
- Properly comment your code with markdown blocks, for each of the steps given abov