- [name-of-a-team-member](URL to this member's github account)
- [name-of-a-team-member](URL to this member's github account)
- [name-of-a-team-member](URL to this member's github account)
- [name-of-a-team-member](URL to this member's github account)
- [name-of-a-team-member](URL to this member's github account)
Q1: There are generally 2 situations you'll start from when approaching a question of data: a) You designed and collected the data yourself OR b) You have to work with a data set you've been given access to. What do you think makes these 2 starting points different? How might it change what analysis you'll do?
[provide response here]
(For the following set of questions we'll assume we're in situation A - you are going to design your own data collection)
Q2: What factors go into deciding what data format to use? Under what circumstances may you use different data types? (i.e., JSON, CSV, Key-Value Store, txt documents)
[provide response here]
Q3: Once you've chosen a format, you'll need to determine fields to capture and store. A common approach for this involves determining what QUESTIONS you want to ask of your dataset. For the following examples, please respond with which field(s) your may need to answer the questions needed:
- You're working for a company that tracks data on public transportation, you know you'll want to be able to ask "What percentage of a time is a bus/train late?"
- You're working for a school district, and you need to be able to help the principal answer the question "Which teachers are most successful at getting students interested in extra-curricular educational activities (e.g., Math Team, Quiz Bowl, Science Olypiad, Robot Building, etc)?
- You're starting a social networking website that helps friends choose what to do on a Friday night, and you need to be able to answer the question, "Who made the suggestion that led to the final decision?"
- [Field(s) here]
- [Field(s) here]
- [Field(s) here]
Q4: Now you need to decide how you'll query your data. What are the costs and benefits of the following options:
- Store the data raw and load it into a Python or JavaScript Shell for analysis.
- Periodically dump the data into a database (like Mongo) and query it.
- Build a webserver and write an API that dumps and queries that data in your database.
- [Costs/Benefits]
- [Costs/Benefits]
- [Costs/Benefits]
Q5: You've now set up your database and have a website with 10,000 users, but have realized that you forgot a much needed field (say, an ID number for each user). What do you do and how might different database designs have helped this situation?
[Respond here]
(For this section, you may need to do some online research to answer the questions.)
[Response]
[Response]
[Response]
Answer the following questions using this scenario: You just got a HUGE dataset from Spotify where each entry contains these fields -> [username, song, # of times played, user rating, genre]
[Reponse]
[Response]
[Response]
[Response]
Answer these last questions generally.
[Response]
[Response]
Q14: Let's think about data science as a way to tell a story about some data. Why would I want to bring a second data set into my story?
[Response]
Q15: This one's just for fun. How percent of the time do you expect to actually get the result you wanted?
[Response]
While there are many tools to do this analysis, we will use the JS library Gauss to accomplish this task since everyone should have used it by now.
[cut and paste command used and the output it produced]
[cut and paste command used and the output it produced]
[cut and paste command used and the related snippet from the output]
[Describe]
[Value]
[Command and output]
[Answer and any images/snippets to justify]
- [device1](URL to this device)
- [device2](URL to this device)
[Response]
[Location]
[Response]
[Response]
[Describe or link to example]
[Necessary? Why? and How?]