This semester's challenge is especially open-ended. Here is a dataset on Kaggle called "CarsForSale". It contains data scraped from the online car marketplace Cars.com. Each row contains 25 pieces of information about a car's listing, such as its price, year, model, and color.
The challenge is to do something interesting with the data. Can you find a pattern, answer a question, or create a visualization?
Dispay the data types and columns
Convert Price string to an int for future processing
RESULT: Price is now a float 64
Convert all certifications to certified
RESULT: Now only Used and Certified are the only unique values
Find topmost variables correlated with ConsumerRating RESULT: CustomerRating Mean is 4.702762961382547
RESULT: These 4 columns have a high correlation to consumer rating ValueForMoneyRating 0.917873,
ReliabilityRating 0.914597,
ComfortRating 0.860040,
PerformanceRating 0.805849,
Display all correlations
Display all price correlations
RESULT: Price has some correlations with Year and an anti-correlated to Mileage
Find most common car sold
Display all availible in histograms
Display final data types
Initiate Pandas Hot encoding for future predictive modeling