Submission:
- Please submit your project via GitHub and send a private message on Slack to both Dan and Ivan with a link to it.
Project outlines are a valuable resource when working with data projects, as they help keep your project organized. A well constructed outline can clarify your goals and serve as a checklist when conducting research and analysis.
For this project, you will need to complete a problem statement and research design outline for one of the three lightning talks you designed during Part 1. This will serve as the starting point for your analysis. Make sure to include a specific aim and hypthesis, well-defined risks and assumptions, and clearly articulated goals and success metrics.
Remember, completing this task earlier will give you more chances to iterate and improve!
Objective: Create an outline of your research design approach, including hypothesis, assumptions, goals, and success metrics.
-
Requirements:
- Well-articulated problem statement with "specific aim" and hypothesis, based on your lightning talk.
- An outline of any potential methods and models.
- Detailed explanation of the available data. (i.e., build a data dictionary or link to pre-built data dictionaries)
- Describe any outstanding questions, assumptions, risks, and caveats.
- Demonstrate domain knowledge, including specific features or relevant benchmarks from similar projects.
- Define your goals and criteria, in order to explain what success looks like.
-
Bonus:
- Consider alternative hypotheses: if your project is a regression problem, is it possible to rewrite it as a classification problem?
- "Convert" your goal metric from a statistical one (like Mean Squared Error) and tie it to something non-data people can understand, like a cost/benefit analysis, etc.
- Please use the presentation template provided.
- The more time you spend researching, the less time you'll likely spend writing; this is a positive sign!
- While researching, keep track of all of your resources. Make sure they're trustworthy.
- If you've seen similar work online, see if you can find the code that implemented the data munging. It might come in handy.
- If your project requires using an API, make sure you can get access to it. Not everyone gives away API keys immediately, and you don't want to be caught with no data with one week left to work!
- Provide a sense of depth and scale to the project, which can be used to guide where the majority of your time should be spent working on the project.
- Show a clear connection between the datasets and the problem presented. The project should avoid working with independent variables (or features) that would not ordinarily be available in order to predict your target.
The rubric is available here.