forked from jennybc/googlesheets
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
292 lines (199 loc) · 13.3 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
---
output:
md_document:
variant: markdown_github
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
[![Project Status: Active - The project has reached a stable, usable state and is being actively developed.](http://www.repostatus.org/badges/0.1.0/active.svg)](http://www.repostatus.org/#active)
[![Build Status](https://travis-ci.org/jennybc/googlesheets.svg?branch=master)](https://travis-ci.org/jennybc/googlesheets) [![Coverage Status](https://coveralls.io/repos/jennybc/googlesheets/badge.svg)](https://coveralls.io/r/jennybc/googlesheets)
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
---
Google Sheets R API
---
Access and manage Google spreadsheets from R with `googlesheets`.
Features:
- Access a spreadsheet by its title, key or URL.
- Extract data or edit data.
- Create | delete | rename | copy | upload | download spreadsheets and worksheets.
`googlesheets` is inspired by [gspread](https://github.com/burnash/gspread), a Google Spreadsheets Python API
The exuberant prose in this README is inspired by [Tabletop.js](https://github.com/jsoma/tabletop): If you've ever wanted to get data in or out of a Google Spreadsheet from R without jumping through a thousand hoops, welcome home!
#### What the hell do I do with this?
Think of `googlesheets` as a read/write CMS that you (or your less R-obsessed friends) can edit through Google Docs, as well via R. It's like Christmas up in here.
Use a [Google Form](http://www.google.com/forms/about/) to conduct a survey, which populates a Google Sheet.
Gather data while you're in the field in a Google Sheet, maybe [with an iPhone](https://itunes.apple.com/us/app/google-sheets/id842849113?mt=8) or [an Android device](https://play.google.com/store/apps/details?id=com.google.android.apps.docs.editors.sheets&hl=en). Take advantage of [data validation](https://support.google.com/docs/answer/139705?hl=en) to limit the crazy on the way in.
There are various ways to harvest web data directly into a Google Sheet. For example:
* [This blog post](http://blog.aylien.com/post/114757623598/sentiment-analysis-of-restaurant-reviews) from Aylien.com has a simple example that uses the `=IMPORTXML()` formula to populate a Sheet with restaurant reviews and ratings from TripAdvisor.
* Martin Hawksey offers [TAGS](https://tags.hawksey.info), a free Google Sheet template to setup and run automated collection of search results from Twitter.
Use `googlesheets` to get all that data into R.
Use it in a Shiny app! *this will be the next demo/vignette I write*
What other ideas do you have?
### Install googlesheets
```{r eval = FALSE}
devtools::install_github("jennybc/googlesheets")
```
### Take a look at the vignette
This README is arguably as or more useful as the vignette and both are still under development. But feel free to [check out the current state of the vignette](http://htmlpreview.github.io/?https://raw.githubusercontent.com/jennybc/googlesheets/master/vignettes/basic-usage.html).
### Load googlesheets
`googlesheets` is designed for use with the `%>%` pipe operator and, to a lesser extent, the data-wrangling mentality of `dplyr`. The examples here use both, but we'll soon develop a vignette that shows usage with plain vanilla R. `googlesheets` uses `dplyr` internally but does not require the user to do so.
```{r load-package}
library("googlesheets")
suppressMessages(library("dplyr"))
```
### See some spreadsheets you can access
The `list_sheets()` function returns the sheets you would see in your Google Sheets home screen: <https://docs.google.com/spreadsheets/>. This should include sheets that you own and may also show sheets owned by others but that you are permitted to access, especially if you have clicked on a link shared by the owner. Expect a prompt to authenticate yourself in the browser at this point (more below re: authentication).
```{r list-sheets}
(my_sheets <- list_sheets())
# (expect a prompt to authenticate with Google interactively HERE)
my_sheets %>% glimpse()
```
### Register a spreadsheet
If you plan to consume data from a sheet or edit it, you must first register it. Basically this is where `googlesheets` makes a note of important info about the sheet that's needed to access via the Sheets API. Once registered, you can print the result to get some basic info about the sheet.
```{r register-sheet}
# Hey let's look at the Gapminder data
gap <- register_ss("Gapminder")
gap
# Need to access a sheet you do not own?
# Access it by key if you know it!
gap_key <- "1HT5B8SgkKqHdqHJmn5xiuaC04Ngb7dG9Tv94004vezA"
gap <- gap_key %>% register_ss
# googlesheets may be able to determine the key from the browser URL
# may not work (yet) for old sheets ... open an issue if have problem
gap_url <- "https://docs.google.com/spreadsheets/d/1HT5B8SgkKqHdqHJmn5xiuaC04Ngb7dG9Tv94004vezA/"
gap <- gap_url %>% register_ss
```
### Get a Google spreadsheet to practice with
If you don't have any suitable Google Sheets lying around, or if you just want to follow along verbatim with this vignette, this bit of code will copy a sheet from the `googlesheets` Google user into your Drive. The sheet holds some of the [Gapminder data](https://github.com/jennybc/gapminder).
```{r copy-gapminder, eval = FALSE}
gap_key <- "1HT5B8SgkKqHdqHJmn5xiuaC04Ngb7dG9Tv94004vezA"
copy_ss(key = gap_key, to = "Gapminder")
```
If that seems to have worked, go check that you see a sheet named Gapminder listed in your Google Sheets home screen: <https://docs.google.com/spreadsheets/>. You could also try `list_sheets()` again and make sure the Gapminder sheet is listed.
Now register your copy of the Gapminder sheet and you can follow along:
```{r register-gapminder, eval = FALSE}
gap <- register_ss("Gapminder")
gap
```
### Consume data
There are three ways to consume data from a worksheet within a Google spreadsheet. The order goes from fastest-but-more-limited to slowest-but-most-flexible:
* `get_via_csv()`: Don't let the name scare you! Nothing is written to file during this process. The name just reflects that, under the hood, we request the data via the "exportcsv" link. For cases where `get_via_csv()` and `get_via_lf()` both work, we see that `get_via_csv()` is around __50 times faster__. Use this when your data occupies a nice rectangle in the sheet and you're willing to consume all of it. You will get a `tbl_df` back, which is basically just a `data.frame`.
* `get_via_lf()`: Gets data via the ["list feed"](https://developers.google.com/google-apps/spreadsheets/#working_with_list-based_feeds), which consumes data row-by-row. Like `get_via_csv()`, this is appropriate when your data occupies a nice rectangle. You will again get a `tbl_df` back, but your variable names may have been mangled (by Google, not us!). Specifically, variable names will be forcefully lowercased and all non-alpha-numeric characters will be removed. Why do we even have this function? The list feed supports some query parameters for sorting and filtering the data, which we plan to support in the near future (#17).
* `get_via_cf()`: Get data via the ["cell feed"](https://developers.google.com/google-apps/spreadsheets/#working_with_cell-based_feeds), which consumes data cell-by-cell. This is appropriate when you want to consume arbitrary cells, rows, columns, and regions of the sheet. It works great for small amounts of data but can be rather slow otherwise. `get_via_cf()` returns a `tbl_df` with __one row per cell__. You can specify cell limits directly in `get_via_cf()` or use convenience wrappers `get_row()`, `get_col()` or `get_cells()` for some common special cases. See below for demos of `reshape_cf()` and `simplify_cf()` which help with post-processing.
```{r csv-list-and-cell-feed}
# Get the data for worksheet "Oceania": the super-fast csv way
oceania_csv <- gap %>% get_via_csv(ws = "Oceania")
str(oceania_csv)
oceania_csv
# Get the data for worksheet "Oceania": the fast tabular way ("list feed")
oceania_list_feed <- gap %>% get_via_lf(ws = "Oceania")
str(oceania_list_feed)
oceania_list_feed
# Get the data for worksheet "Oceania": the slower cell-by-cell way ("cell feed")
oceania_cell_feed <- gap %>% get_via_cf(ws = "Oceania")
str(oceania_cell_feed)
head(oceania_cell_feed, 10)
```
#### Convenience wrappers and post-processing the data
There are a few ways to limit the data you're consuming. You can put direct limits into `get_via_cf()`, but there are also convenience functions to get a row (`get_row()`), a column (`get_col()`), or a range (`get_cells()`). Also, when you consume data via the cell feed (which these wrappers are doing under the hood), you will often want to reshape it or simplify it (`reshape_cf()` and `simplify_cf()`).
```{r wrappers-and-post-processing}
# Reshape: instead of one row per cell, make a nice rectangular data.frame
oceania_reshaped <- oceania_cell_feed %>% reshape_cf()
str(oceania_reshaped)
head(oceania_reshaped, 10)
# Limit data retrieval to certain cells
# Example: first 3 rows
gap_3rows <- gap %>% get_row("Europe", row = 1:3)
gap_3rows %>% head()
# convert to a data.frame (first row treated as header by default)
gap_3rows %>% reshape_cf()
# Example: first row only
gap_1row <- gap %>% get_row("Europe", row = 1)
gap_1row
# convert to a named character vector
gap_1row %>% simplify_cf()
# just 2 columns, converted to data.frame
gap %>%
get_col("Oceania", col = 3:4) %>%
reshape_cf()
# arbitrary cell range
gap %>%
get_cells("Oceania", range = "D12:F15") %>%
reshape_cf(header = FALSE)
# arbitrary cell range, alternative specification
gap %>%
get_via_cf("Oceania", max_row = 5, min_col = 1, max_col = 3) %>%
reshape_cf()
```
### Create sheets
You can use `googlesheets` to create new spreadsheets.
```{r new-sheet}
foo <- new_ss("foo")
foo
```
By default, there will be an empty worksheet called "Sheet1". You can also add, rename, and delete worksheets within an existing sheet via `add_ws()`, `rename_ws()`, and `delete_ws()`. Copy an entire spreadsheet with `copy_ss()`.
### Edit cells
You can modify the data in sheet cells via `edit_cells()`. We'll work on the completely empty sheet created above, `foo`. If your edit populates the sheet with everything it should have, set `trim = TRUE` and we will resize the sheet to match the data. Then the nominal worksheet extent is much more informative (vs. the default of 1000 rows and 26 columns).
```{r edit-cells}
foo <- foo %>% edit_cells(input = head(iris), header = TRUE, trim = TRUE)
```
Go to [your spreadsheets home page](https://docs.google.com/spreadsheets/u/0/), find the new sheet `foo` and look at it. You should see some iris data in the first (and only) worksheet. We'll also take a look at it here, by consuming `foo` via the list feed.
Note that we always store the returned value from `edit_cells()` (and all other sheet editing functions). That's because the registration info changes whenever we edit the sheet and we re-register it inside these functions, so this idiom will help you make sequential edits and queries to the same sheet.
```{r consume-edited-cells}
foo %>% get_via_lf()
```
Read the function documentation for `edit_cells()` for how to specify where the data goes, via an anchor cell, and in which direction, via the shape of the input or the `byrow =` argument.
### Delete sheets
Let's clean up by deleting the `foo` spreadsheet we've been playing with.
```{r delete-sheet}
delete_ss("foo")
```
### Upload delimited files or Excel workbooks
Here's how we can create a new spreadsheet from a suitable local file. First, we'll write then upload a comma-delimited excerpt from the iris data.
```{r new-sheet-from-file}
iris %>% head(5) %>% write.csv("iris.csv", row.names = FALSE)
iris_ss <- upload_ss("iris.csv")
iris_ss
iris_ss %>% get_via_lf()
file.remove("iris.csv")
```
Now we'll upload a multi-sheet Excel workbook. Slowly.
```{r new-sheet-from-xlsx}
gap_xlsx <- upload_ss("tests/testthat/gap-data.xlsx")
gap_xlsx
gap_xlsx %>% get_via_lf(ws = "Oceania")
```
And we clean up after ourselves on Google Drive.
```{r delete-moar-sheets}
delete_ss("iris")
delete_ss("gap-data")
```
### Download sheets as csv, pdf, or xlsx file
You can download a Google Sheet as a csv, pdf, or xlsx file. Downloading the spreadsheet as a csv file will export the first worksheet (default) unless another worksheet is specified.
```{r export-sheet-as-csv}
download_ss("Gapminder", ws = "Africa", to = "~/tmp/gapminder-africa.csv")
## is it there? yes!
read.csv("~/tmp/gapminder-africa.csv") %>% head()
```
Download the entire spreadsheet as an Excel workbook.
```{r export-sheet-as-xlsx}
download_ss("Gapminder", to = "~/tmp/gapminder.xlsx")
```
Go check it out in Excel, if you wish!
And now we clean up the downloaded files.
```{r clean-exported-files}
file.remove(file.path("~/tmp", c("gapminder.xlsx", "gapminder-africa.csv")))
```
### Authorization using OAuth2
If you use a function that requires authentication, it will be auto-triggered. But you can also initiate the process explicitly if you wish, like so:
```{r authorize, eval = FALSE}
# Give googlesheets permission to access your spreadsheets and google drive
authorize()
```
Use `authorize(new_user = TRUE)`, to force the process to begin anew. Otherwise, the credentials left behind will be used to refresh your access token as needed.
##### Stuff we are in the process of bringing back online after the Great Refactor of February 2015
* visual overview of which cells are populated