Web scraping with rvest (UseR 2024)

In this tutorial, you'll learn the basics of web scraping with R, using the rvest package. We'll discuss the basic structure of an HTML page, and how to find the elements your interested in with selectorgadget or the browser's developer tools. You'll then learn how to programmatically extract with rvest, turning web pages into tidy data frames.

Bonus content includes scraping multiple pages (with rvest and httr2), scraping dynamic sites where content is generated with JavaScript, extracting data from unofficial APIs, and some hints on using LLMs.

Slides

Requirements

To run the code at home, install the following packages:

# install.packages("pak")
pak::pak(c("tidyverse", "chromote"))

To run the live web-scraping code you'll also need a copy of Chrome installed on your computer.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
README.md		README.md
forbes-api.R		forbes-api.R
forbes-live.R		forbes-live.R
pagination.R		pagination.R
quotes.R		quotes.R
rvest.key		rvest.key
rvest.pdf		rvest.pdf
starwars.R		starwars.R
web-scraping.Rproj		web-scraping.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web scraping with rvest (UseR 2024)

Requirements

About

Releases

Packages

Languages

hadley/web-scraping

Folders and files

Latest commit

History

Repository files navigation

Web scraping with rvest (UseR 2024)

Requirements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages