-
Notifications
You must be signed in to change notification settings - Fork 91
/
02-how-to-use.Rmd
131 lines (73 loc) · 17 KB
/
02-how-to-use.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
# How to use this book {#c02}
**Abstract**
This chapter describes different ways the reader can use this book to learn about using R and data science tools in their education job. Job descriptions, lifestyles, and programming experience differ for everyone. Learning to program in R on the job or at home will also look different to each reader. Applying R and data science tools in an education job requires learning these skills in a practical and meaningful context. The chapter describes three ways to learn from the book. It also introduces the reader to ways they can support and contribute to the book’s content.
We've heard it from fellow data scientists and experienced it ourselves---learning a programming language is hard. Like learning a foreign language, it is not just about mastering vocabulary. It's also about understanding the language's norms, its underlying structure, and the metaphors that hold the whole thing together.
The beginning of the learning journey is particularly challenging because it feels slow. You already have efficient solutions in your work environment to accomplish your day-to-day work. Introducing code to your workflow will slow you down at first because you won't be as fast as you are with your favorite spreadsheet software. However, you're probably reading this book because you realize that learning how to analyze data using R is like investing in your own personal infrastructure---it takes time while you're building the initial skills. Still, the investment pays off when you start solving complex problems faster and at scale. One person shared this story about their learning journey:
> The first six months were hard. I knew how quickly I could do a pivot table
> in Excel. It took longer in R because I had to go through the syntax and take
> the book out. I forced myself to do it, though. In the long term, I'd be a
> better data scientist. I'm so glad I thought that way, but it was hard the
> first few months.
Learning R for your role in education is doable, challenging, and rewarding all at once. We wrote this book for you because we do this work every day. We're not writing as people who have mastered data science in education. We're writing as people who learned R and data science *after* we chose education. And like you, improving the lives of students is our daily practice. Learning to use R and data science helped us do what matters most to us. Join us in enjoying all that comes with R and data science---both the challenges of learning and the joys of solving problems in creative and efficient ways.
## Different strokes for different data scientists in education
Data science varies across different roles because education spans diverse contexts and age groups. Education organizations require different roles to make them work, which creates different kinds of uses for data science skills and tools. The approach to adopting data science will differ depending on the job role, while some principles will generalize across contexts. For example, a teacher's approach to data analysis is different from an administrator's.
Learning data science and R is not in the typical job description. Many readers of this book are educators working with data and looking to expand their tools. You might even be an educator who *doesn't* work with data, but your interested in learning about the lives of students through data.
Like most professionals in education, you've got a full work schedule and challenging professional demands. Your busy workday doesn't include regular professional development time or dedicated time for self-driven learning. You also have a life outside work, including family, hobbies, and relaxation. We struggle with this ourselves, so this book is designed for use in lots of different ways.
One approach to learning this material is to establish a routine that allows you to engage and practice the content every day, even for just a few minutes at a time. That will make the content ever-present in your mind and help you shift your mindset so you start seeing more opportunities for practice at work. If daily practice is out of reach for you right now, that's okay! We want the book to fit into your life, however that may look.
Here are other ways to use this book:
### Read the book cover to cover (and how to keep going)
We wrote this book assuming you're beginning your journey of learning R and applying data science in your educational role. The book takes you from installing R to practicing more advanced data science skills like text analysis.
If you've never written a line of R code, we welcome you to the community. Consider reading the book cover to cover and doing all the analysis walkthroughs. The walkthroughs are designed to help give a high-level view of an analysis from start to finish.
Starting to learn R can be challenging, especially when trying to figure out where to begin. It is a challenge to formulate a goal or path to get there without first understanding what data analysis can do. Use the walkthroughs to learn with a concrete example rather than to try to learn the concepts and principles in the abstract.
Remember that you'll get more benefit from a few minutes of practice every day than from long hours of practice every once in a while. Typing code every day, even if it doesn't always run, is a daily practice that invites learning and "a-ha" moments.
Its easy to avoid coding when it doesn't feel successful (we've been there), so this book is designed to deliver frequent small wins to keep the momentum going. But even so, we all eventually hit a wall in our learning. When that happens, take a break and then come back and keep coding. When daily coding becomes a habit, so does the learning.
If you get stuck in an advanced chapter and you need a break, try reviewing an earlier chapter. You'll be surprised at how much you learn from reviewing old material with the benefit of new experiences. Sometimes, that kind of back-to-basics attitude is what we need to get a fresh perspective on new challenges.
### Pick a chapter of interest and start there
When we interviewed practitioners for this book, we chose people with different levels of experience in R, in the education field, and in statistics. We asked each interviewee to rate their level of experience on a scale from 1 to 5, with 1 being "no experience" and 5 being "very experienced". These interviews---and others one of us conducted (@rosenberg2021data)---helped us to understand that educational data scientists have a wide range of experience and expertise. We have planned and written this book accordingly. We have different recommendations based on your level of experience.
You can try the same exercise the interviewees did---take a moment to rate your level of experience in:
- Using R
- Education as a field
- Statistics
If you rated yourself as a 1 in Using R, read the book from beginning to end as part of a daily practice. If you rated yourself higher than a 1, consider reviewing the table of contents and skimming all the chapters first. If a particular chapter calls to you, feel free to start your daily practice there. Eventually, we do hope you choose to experience the whole book.
For example, you might be working through a specific use case in your education job---perhaps you are analyzing student quiz scores, evaluating a school program, introducing a data science technique to your teammates, or designing data dashboards. If this describes your situation, feel free to find a section in the book that inspires you or shows you techniques that apply to your project.
This book is primarily about learning to use R as a tool for data science in education. Your experience level with R should be the main factor when you decide how to enjoy the book. But do consider how you rated your level of experience with education and statistics. If these are areas you want to focus on, take your time understanding the education scenarios and techniques described in the book. All three disciplines are important parts of being a data scientist in education.
### Read through the walkthroughs and run the code
If you're experienced in data science using R, you may be interested in starting with the walkthroughs. Each walkthrough is designed to demonstrate basic analytic routines using datasets that look familiar to people working in the education field.
In this approach, be intentional about what you want to learn from the walkthroughs. For example, you may seek out examples of aggregated datasets, exploratory data analysis, the {ggplot2} package, or the `pivot_longer()` function. Read the walkthrough and run the code as you go. After you successfully run the code, experiment with the functions and techniques you learned by changing the code and seeing new results. After running the code in the walkthroughs, reflect on how the lessons can be applied to the datasets, problems, and analytic routines in your education work.
Doing data science in education using R is, at its heart, an endeavor aimed at improving the student experience. The skills taught in the walkthroughs are only one part of doing data science in education using R.
As an experienced R user, you know that this endeavor involves complex problems and collaboration. Since part of your task may be to convince colleagues of the merits of your analytic tools and approaches, this book is written with that context in mind. [Chapter 15](#c15), in particular, explores ways to introduce these skills to your education job and invite others into analytic activities. You'll learn useful perspectives from chapters on concepts you're already familiar with, too.
## A note on statistics
Data science is the intersection between content expertise, programming, and statistics. You'll want to grow all three of these as you learn more about using data science in your education job. Your education knowledge will lead you to the right problems, your statistics skills will bring rigor to your analysis, and your programming skills will scale your analysis to reach more people.
What happens when you remove one of these pieces? Consider a data scientist working in education who is an expert programmer and statistician but has not learned about the real-life conditions that generate education data. She might make analysis decisions that overlook the nuances in the data or make ill-advised recommendations because of those decisions.
As another example, consider a data scientist who is an expert statistician and an education veteran but who has not learned to code. He will find it difficult to scale his analysis, thereby foregoing the chance to make the largest possible improvement to the student experience.
Finally, consider a data scientist who is an expert programmer and an education veteran. Because she is still learning statistics, she can only scale surface-level analysis and might miss chances to understand causal relationships or predict student outcomes.
In this book, you'll spend a lot of time learning R through recognizable education data examples. However, doing a deep dive into statistics and how to use statistical techniques responsibly is better covered by books dedicated solely to the topic.
It's hard to overstate how important this part of learning is in the lives of students and educators. One education data scientist we spoke to said this about the difference between building a model for an online retailer and building a model in education:
> It’s not a big deal if an online shopper gets mistakenly shown 1000 brooms but if I got my model wrong and we close a school, that will change a child's entire life.
In this book, you'll learn statistics techniques like hypothesis testing and model building and how to run these operations in R. However, the explanations in the chapters will not provide a complete background about the statistical techniques.
We encourage you to explore other excellent books like [*Learning Statistics With R*](https://learningstatisticswithr.com/) (https://learningstatisticswithr.com/) [@learningstatswithr] as you learn the required nuances of applying statistical techniques to scenarios outside our walkthroughs.
## What this book doesn't cover
While *Data Science in Education Using R* is a wide-ranging introduction to the topic, there are other topics that this book does not cover. We chose not to include these topics because excellent resources for those topics already exist. Consider exploring the following:
- Git/GitHub: Git and GitHub are version control software programs, which means that they help keep track of different versions of coding files and their changes. Git and GitHub are parts of many data scientists' workflows for solo or collaborative work. However, there is a steep learning curve, and these tools are not necessary to get started with coding in R. An outstanding introduction to Git and GitHub is @bryan2020's freely available book [*Happy Git with R*](https://happygitwithr.com/) (https://happygitwithr.com/).
- Building R packages: If you are carrying out the same analyses many times, it may be helpful to create your own package. Packages are collections of code and sometimes data, such as the {roomba} (for tidying complex, nested lists) [@roomba-pkg] and {tidylpa} [@rosenberg2019tidylpa] (for carrying out Latent Profile Analysis) packages that authors of this book created. However, building an R package is not the focus of this book. Hadley Wickham and Jenny Bryan wrote a very helpful---and freely available---book on the topic called [*R Packages*](https://r-pkgs.org/) (https://r-pkgs.org/) [@rpackages].
- Advanced statistical methodologies: While the book does discuss basic and advanced statistical methods, it is not a specialized methods book. One excellent statistics book is @james2013's *An Introduction to Statistical Learning with Applications in R*.
- Creating a website or book: R is versatile and can be used for more than just performing data analyses. R can be used to write books like this one using the {bookdown} package. Or it can be used to create websites using the {blogdown} package. There are excellent, freely available books on these topics (see @xie2019blogdown's *blogdown: Creating Websites with R Markdown* (https://bookdown.org/yihui/blogdown/) and @xie2019bookdown's *bookdown: Authoring Books and Technical Documents with R Markdown* (https://bookdown.org/yihui/bookdown/). [Quarto](@quarto-cli) is another powerful tool for creating books, websites, presentations, and more. Like R Markdown, it combines code and markdown text to produce beautiful, polished outputs. Quarto offers a consistent authoring experience across formats and is multilingual by design, meaning you can use it even if R isn’t installed.
## Supporting the book
If you find this book useful, please support it by:
* Communicating about the book on social media
* Citing or linking to it
* Starring the GitHub repository for the book (https://github.com/data-edu/data-science-in-education)
* Starring the GitHub repository for the {dataedu} package (https://github.com/data-edu/dataedu)
* Reviewing it on Amazon or Goodreads
* Buying a copy
* Letting others in education and data science know about it
## Contributing to the book
This is the second edition of *Data Science in Education Using R*. We designed this book to be useful and practical for our readers in education. We wrote it as a guide to getting up and running in R, but we know this book does not comprehensively cover every topic related to R. We did this to create a reference that is not intimidating to new users and that creates frequent, small wins while learning to use R.
How do we expand this book as data science in education expands as a field? We wrote this book in the open on GitHub so that community members can help us evolve the work, even after it is formally published. Indeed, this second edition incorporates feedback we received from readers and is much better because of it. In this edition, we've updated our examples, refreshed the packages, and followed the latest best practices so you have access to the latest tools while preserving the theme of continuous learning and growth.
We want this to be the book new data scientists in education have with them as they grow their craft. To achieve that goal, it's important that the stories and examples in the book are based on **your** stories and examples.
We wrote this book in the open on GitHub so that community members can help us evolve the work, even after it is formally published. If you have some experience with Git and want to contribute that way, here's how you can contribute:
If you have some experience with Git and want to contribute that way, here's how:
- Submit an "issue" to our [GitHub repository](https://github.com/data-edu/data-science-in-education) (https://github.com/data-edu/data-science-in-education/issues) that describes a data science problem unique to the education setting
- Submit a pull request to share a solution for the problems discussed in the book
- Share an anonymized dataset for use in the book or a future version of it
And finally, if you are new to data science in education, welcome! We would love to have your feedback by [email](mailto:[email protected]) ([email protected]).