The goal of this exercise is to test your ability to tease out non obvious insights from data. We are looking for candidates with sound statistics knowledge and strong analytical capabilities.
The dataset provided to you contains car sales listings and descriptions from Craigslist. Your task is to dig into this dataset and come up with the following aggregations:
- Insightful descriptive statistics that indicate a trend in the data (you can combine columns or use a subset of columns, entirely up to you. Try and be as creative as possible and extract as many non-obvious metrics as you can)
- Groups for your data such that all rows in your data can be easily categorized into groups with similar features. Make sure to list out the features for each group.
- Optional: Develop a model that predicts the price listing of a car depending on its features.
- HINT: Make sure you identify and remove outliers in the dataset that may indicate data corruption or skewed results on analysis.
- HINT: Feel free to choose a subset or combination of columns for grouping the data and in your prediction model
You can download the data from here
- id - entry ID
- url - listing URL
- region - craigslist region
- region_url - region URL
- price - entry price
- year - entry year
- manufacturer - manufacturer of vehicle
- model - model of vehicle
- condition - condition of vehicle
- cylinders - number of cylinders
- fuel - fuel type
- odometer - miles traveled by vehicle
- title_status - title status of vehicle
- transmission - transmission of vehicle
- vin - vehicle identification number
- drive - type of drive
- size - size of vehicle
- type - generic type of vehicle
- paint_color - color of vehicle
- image_url - image URL
- description - listed description of vehicle
- county - county
- state - state of listing
- lat - latitude of listing
- long - longitude of listing
- Implement your solution in the tool of your choice (Python or R preferred, can use a visualization tool to supplement)
- Email the point of contact that sent you this exercise.
- Include all relevant statistical graphs
- Make sure to label all graphs
- Explain your thought process and add comments where needed
- Summarize your findings/takeaways from the assignment
- Be prepared to walk through your solution during the interview with the hiring manager.
Your submission will be evaluated along the following criteria by the Reviewer
- Completeness - Does your submission meet the Deliverables specified above?
- Business Acumen - How easy was it for a business stakeholder (with limited technical knowledge) to understand your solution?
- Storylining - Evaluate the flow of your solution
- Creativity - Evaluate your ability to tease out non-obvious insights from the data
- Technical Capability - How statistically reliable are your insights?
- Critical Thinking - Please be ready to speak to your thinking proccess for the solution here
- Attention to Detail - How did you design your solution?
- Please feel free to ask clarifying questions via email!
- Thank you for the time you are spending as a candidate with STORD!
- Please return your completed assignment within 5 days of receipt