-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
150 lines (117 loc) · 6.38 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
[![Build Status](https://travis-ci.org/mannau/h5.svg?branch=master)](https://travis-ci.org/mannau/h5)
[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/mannau/h5?branch=master&svg=true)](https://ci.appveyor.com/project/mannau/h5)
[![codecov.io](http://codecov.io/github/mannau/h5/coverage.svg?branch=master)](http://codecov.io/github/mannau/h5?branch=master)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/h5)](http://cran.r-project.org/package=h5)
**[h5](http://cran.r-project.org/web/packages/h5/index.html)** is an R
interface to the [HDF5](https://www.hdfgroup.org/HDF5) library under active development. It is available on [Github](https://github.com/mannau/h5) and already released on [CRAN](https://cran.r-project.org/web/packages/h5/index.html) for all major platforms (Windows, OS X, Linux).
Online documentation for the package is available at http://h5.predictingdaemon.com.
[HDF5](https://www.hdfgroup.org/HDF5/) is an excellent library and data model to
store huge amounts of data in a binary file format. Supporting most major
platforms and programming languages it can be used to exchange data files in a
language independent format. Compared to R's integrated *save()* and *load()*
functions it also supports access to only parts of the binary data files and can
therefore be used to process data not fitting into memory.
**[h5](http://cran.r-project.org/web/packages/h5/index.html)** utilizes the
[HDF5 C++ API](https://www.hdfgroup.org/HDF5/doc/cpplus_RM/) through
**[Rcpp](http://cran.r-project.org/web/packages/Rcpp/index.html)** and S4 classes.
The package is covered by 200+ test cases with a [coverage](https://codecov.io/github/mannau/h5?branch=master) greater than 80%.
# Install
**h5** has already been released on [CRAN](https://cran.r-project.org/web/packages/h5/index.html), and can therefore be installed using
```python
install.packages("h5")
```
The most recent development version can be installed from [Github](https://github.com/mannau/h5) using [**devtools**](https://cran.r-project.org/web/packages/devtools/index.html):
```python
library(devtools)
install_github("mannau/h5")
```
Please note that this version has been tested with the current hdf5 library 1.8.14 (and 1.8.13 for OS X) - you should therefore install the most current hdf5 library including its C++ API for your platform.
## Requirements
### Windows
This package already ships the library for windows operating systems through [h5-libwin](https://github.com/mannau/h5-libwin). No additional requirements need to be installed.
### OS X
Using OS X and [Homebrew](http://brew.sh) you can use the following command to install HDF5 library dependencies and headers:
```shell
brew install homebrew/science/hdf5 --enable-cxx
```
### Linux (e.g. Debian, Ubuntu)
With Debian-based Linux systems you can use the following command to install the dependencies:
```shell
sudo apt-get install libhdf5-dev
```
For older versions (Debian Squeeze, Ubuntu Precise) it is required to install **libhdf5-serial-dev**:
```shell
sudo apt-get install libhdf5-serial-dev
```
Since **h5** requires the 'new' v18 API version which does not seem to be installed on e.g. Precise it might be necessary to install
the dependency libhdf5-serial-dev through the
[ppa:marutter/rrutter](https://launchpad.net/~marutter/+archive/ubuntu/rrutter)
repository (Ubuntu) or soon directly the **h5** package via
[cran2deb](http://debian-r.debian.net) (Debian).
## Custom Install Parameters
If the hdf5 library is not located in a standard directory recognized by the configure script the parameters CPPFLAGS and LIBS may need to be set manually.
This can be done using the --configure-vars option for R CMD INSTALL in the command line, e.g
```shell
R CMD INSTALL h5_<version>.tar.gz --configure-vars='LIBS=<LIBS> CPPFLAGS=<CPPFLAGS>'
```
The most recent version with required paramters can also be directly installed from github using **devtools** in R:
```shell
require(devtools)
install_github("mannau/h5", args = "--configure-vars='LIBS=<LIBS> CPPFLAGS=<CPPFLAGS>'")
```
A concrete OS X example setting could look like this:
```shell
R CMD INSTALL h5_0.9.2.tar.gz --configure-vars='LIBS=-L/usr/local/Cellar/hdf5/1.8.13/lib -L/usr/local/opt/szip/lib -L. -lhdf5_cpp -lhdf5 -lz -lm CPPFLAGS=-I/usr/local/include -I/usr/local/include/freetype2 -I/opt/X11/include'
```
# Quick Start
```{r init, include=FALSE}
if(file.exists("test.h5")) file.remove("test.h5")
```
We start by creating an HDF5 file holding a numeric vector, an integer matrix and a character array.
```{r h5q-1, echo=TRUE, eval=TRUE}
library(h5)
testvec <- rnorm(10)
testmat <- matrix(1:9, nrow = 3)
row.names(testmat) <- 1:3
colnames(testmat) <- c("A", "BE", "BU")
letters1 <- paste(LETTERS[runif(45, min = 1, max = length(LETTERS))])
letters2 <- paste(LETTERS[runif(45, min = 1, max = length(LETTERS))])
testarray <- array(paste0(letters1, letters2), c(3, 3, 5))
file <- h5file("test.h5")
# Save testvec in group 'test' as DataSet 'testvec'
file["test/testvec"] <- testvec
file["test/testmat"] <- testmat
file["test/testarray"] <- testarray
h5close(file)
```
We can now retrieve the data from the file
```{r h5q-2, echo=TRUE, eval=TRUE}
file <- h5file("test.h5")
dataset_testmat <- file["test/testmat"]
# We can now retrieve all data from the DataSet object using e.g. the subsetting operator
dataset_testmat[]
```
We can also subset the data directly, e.g. row 1 and 3
```{r h5q-3, echo=TRUE, eval=TRUE}
dataset_testmat[c(1, 3), ]
```
Note, that we have now lost the row- and column names associated with the *testmat* object
in the retrieved matrix. HDF5 supports metadata with attributes, which we need to
add to (retrieve from) the DataSet manually.
```{r h5q-4, echo=TRUE, eval=TRUE}
h5attr(dataset_testmat, "rownames") <- row.names(testmat)
h5attr(dataset_testmat, "colnames") <- colnames(testmat)
```
We can now retrieve our matrix including meta-data as follows:
```{r h5q-5, echo=TRUE, eval=TRUE}
outmat <- dataset_testmat[]
row.names(outmat) <- h5attr(dataset_testmat, "rownames")
colnames(outmat) <- h5attr(dataset_testmat, "colnames")
identical(outmat, testmat)
```
Do not forget to close the HDF5 file in the end
```{r h5q-6, echo=TRUE, eval=TRUE}
h5close(file)
```
# License [![License](https://img.shields.io/badge/license-BSD%202%20clause-blue.svg?style=flat)](http://opensource.org/licenses/BSD-2-Clause)
This package is shipped with a [BSD-2-Clause License](http://opensource.org/licenses/BSD-2-Clause).