This repository contains the R scripts to scrape U2 Lyrics, as well as the scraped HTML files, and the assembled output CSV file.
The source of the U2 Lyrics data is:
https://www.azlyrics.com/u/u2band.html
The ultimate output of the R scripts is the CSV file u2-lyrics.csv
.
This file has a data table with four columns:
-
album
: name of album -
year
: year (in which album was published) -
song
: name of song -
lyrics
: text of transcript
This data set can be used for text mining purposes.
README.md
Scraping-U2-Lyrics-with-R.pdf
code/
script1-scrape-album-year-and-song.R
script2-download-song-html-files.R
script3-scrape-u2-lyrics-text.R
data/
u2band.html
u2-songs-info.csv
u2-lyrics.csv
html_files/
Boy-1980-adaywithoutme.html
Boy-1980-ancatdubh.html
...
Songs-Of-Experience-2017-yourethebestthingaboutme.html
As a Data Science and Statistics educator, I love to share the work I do. Each month I spend dozens of hours curating learning materials like this resource. If you find any value and usefulness in it, please consider making a one-time donation---via paypal---in any amount (e.g. the amount you would spend inviting me a cup of coffee or any other drink). Your support really matters.