Skip to content

Commit

Permalink
Merge branch 'develop' for v0.1.1
Browse files Browse the repository at this point in the history
  • Loading branch information
JCapul committed Mar 19, 2019
2 parents f8d8e09 + 9537252 commit e6898c3
Show file tree
Hide file tree
Showing 40 changed files with 1,414 additions and 999 deletions.
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# Changelog

## v0.1.1

- Fix errno management from Go to C layer [054dd09]
- Add example with IOR benchmark [6e23b41]
- Update README.md [a740d70]
- Add example workflow with HydroC code and ParaView [88b72c1]
- Add a development docker environment [12af2a6]

## v0.1.0

- Hello github !
4 changes: 3 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ pdwfsgo:
.PHONY: scripts
scripts: scripts/pdwfs
mkdir -p $(BUILDDIR)/bin
install scripts/pdwfs $(BUILDDIR)/bin/
install scripts/pdwfs $(BUILDDIR)/bin/
install scripts/pdwfs-redis $(BUILDDIR)/bin/

test: scripts pdwfslibc
make -C src/go test
Expand All @@ -32,6 +33,7 @@ install: pdwfslibc
install $(BUILDDIR)/lib/libpdwfs_go.so $(PREFIX)/lib
install $(BUILDDIR)/lib/pdwfs.so $(PREFIX)/lib
install scripts/pdwfs $(PREFIX)/bin
install scripts/pdwfs-redis $(PREFIX)/bin
chmod +x $(PREFIX)/bin/*

tag:
Expand Down
155 changes: 148 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,164 @@

[![Build Status](https://travis-ci.org/cea-hpc/pdwfs.png?branch=master)](https://travis-ci.org/cea-hpc/pdwfs)

## PaDaWAn project
pdwfs (pronounced "*padawan-f-s*", see below) is a preload library implementing a minimalist filesystem in user space suitable for intercepting *bulk* I/Os typical of HPC simulations and storing data in memory in a [Redis](https://redis.io) database.

pdwfs objective is to provide a very lightweight infrastructure to execute HPC simulation workflows without writing/reading any intermediate data to/from a (parallel) filesystem. This type of approach is known as *in transit* or *loosely-coupled in situ*, see the two next sections for further details.

pdwfs is written in [Go](https://golang.org) and C and runs on Linux systems only (we provide a Dockerfile for testing and development on other systems).

Though it's a work in progress and still at an early stage of development, it can already be tested with Parallel HDF5, MPI-IO and a POSIX-based ParaView workflow. See Examples section below.


PaDaWAn (for Parallel Data Workflow for Analysis) is a [CEA](http://www.cea.fr) project that aims at providing a lightweight and non-intrusive software infrastructure to facilitate *in transit* execution of file-based HPC simulation workflows.
## PaDaWAn project

## pdwfs
pdwfs is a component of the PaDaWAn project (for Parallel Data Workflow for Analysis), a [CEA](http://www.cea.fr) project that aims at providing building blocks of a lightweight and *least*-intrusive software infrastructure to facilitate *in transit* execution of HPC simulation workflows.

One component of the project is pdwfs (pronounced "*padawan-fs*"): a preload library implementing a simplified file system in user space suitable for intercepting “bulk I/Os” typical of HPC simulations and leveraging Redis in-memory database for data staging.
The foundational work for this project was an initial version of pdwfs entierly written in Python and presented in the paper below:

- *PaDaWAn: a Python Infrastructure for Loosely-Coupled In Situ Workflows*, J. Capul, S. Morais, J-B. Lekien, ISAV@SC (2018).

## In situ / in transit HPC workflows
Within the HPC community, in situ data processing is getting quite some interests as a potential enabler for future exascale-era simulations.

While the original in situ approach consists in executing data processing within the same address space as the simulation and sharing the resources with it, the loosely-coupled flavor of in situ, also called in transit, offers a great deal of flexibility to accommodate various types of workflows to be run “in situ”, that is on the supercomputer while the simulation is running.
The original in situ approach, also called tightly-coupled in situ, consists in executing data processing routines within the same address space as the simulation and sharing the resources with it. It requires the simulation to use a dedicated API and to link against a library embedding a processing runtime. Notable in situ frameworks are ParaView [Catalyst](https://www.paraview.org/in-situ/), VisIt [LibSim](https://wci.llnl.gov/simulation/computer-codes/visit). [SENSEI](http://sensei-insitu.org) provides a common API that can map to various in situ processing backends.

The loosely-coupled flavor of in situ, or in transit, relies on separate resources from the simulation to stage and/or process data. It requires a dedicated distributed infrastructure to extract data from the simulation and send it to a staging area or directly to consumers. Compared to the tightly-coupled in situ approach, it offers greater flexibility to adjust the resources needed by each application in the workflow (not bound to efficiently use the same resources as the simulation). It can also accommodate a larger variety of workflows, in particular those requiring memory space for data windowing (e.g. statistics, time integration).

This latter approach, loosely-coupled in situ or in transit, is at the core of pdwfs.

## Dependencies

pdwfs only dependencies are:
- Go version ≥ 1.11 to build pdwfs
- Redis version ≥ 5.0.3


## Installation

To build pdwfs from source (assuming Go is installed) :
```
$ git clone https://github.com/cea-hpc/pdwfs
$ cd pdwfs
$ make
```
Binary distributions are also available for Linux system and x86_64 architecture in the [releases](http://github.com/cea-hpc/pdwfs/releases) page.

To run the test suite, you will need a running Redis instance on the default host and port. Just type the following command to have an instance running in the background:
```
$ redis-server --daemonize yes
```
Then:
```
$ make test
```
To install pdwfs:
```
$ make PREFIX=/your/installation/path install
```
Default prefix is /usr/local.


We also provide a development Dockerfile based on an official Go image from DockerHub. To build and run the container:
```
$ make -C docker run
```
The working directory in the container is a mounted volume on the pdwfs repo on your host, so to build pdwfs, just use the Makefile as previously described.

NOTE: if you encounter permission denied issue when building pdwfs in the container that's probably because the non-root user and group IDs set in the Dockerfile do not match your UID and GID. Change the UID and GID values to yours in the Dockerfile and re-run the above command.

## Quick start

First, start a default Redis instance in the background.
```
$ redis-server --daemonize yes
```
Then, assuming your simulation will write its data into the output/ directory, simply wrap the execution command of your simulation with pdwfs command-line script like this:
```
$ pdwfs -p output/ -- your_simulation_command
```
That's it ! pdwfs will transparently intercept low-level I/O calls (open, write, read, ...) on any file/directory within the output/ directory and send data to Redis, no data will be written on disk.

To process the simulation data, just run your processing tool the same way:
```
$ pdwfs -p output/ -- your_processing_command
```
To see the data staged within Redis (keys only) and check the memory used (and to give you a hint at how sweet Redis is):
```
$ redis-cli keys *
...
$ redis-cli info memory
...
```

Finally, to stop Redis (and discard all data staged in memory !):
```
$ redis-cli shutdown
```

## How does it work ?

pdwfs used the often-called "LD_PRELOAD trick" to intercept a set of I/O-related function calls provided by the C standard library (libc).

The pdwfs command-line script execute the user command passed in argument with the LD_PRELOAD environment variable set to the installation path of the pdwfs.so shared library.

Currently about 90 libc I/O calls are intercepted but only 50 of them are currently implemented by pdwfs. If your program uses an I/O call intercepted but not implemented, it will raise an error. In this case, please file an [issue](https://github.com/cea-hpc/pdwfs/issues) (or even better, send a pull request!).

The category of calls currently implemented are:
- basic POSIX and C standard I/O calls (open, close, read, write, lseek, access, unlink, stat, fopen, fread, fprintf, ...)
- additional I/O calls used by MPI-IO and Parallel HDF5 libraries (writev, pwrite, pwritev, readv, pread, preadv, statfs, ...)

## Performance

To address the challenge of being competitive with parallel filesystems, an initial set of design choices and trade-offs have been made:
- selecting the widely used database Redis to benefit from its mix of performance, simplicity and flexibility (and performance is an important part of the mix),
- files are sharded (or stripped) accross multiple Redis instances with a predefined layout (by configuration),
- file shards are sent/fetched in parallel using Go concurrency mechanism (goroutines),
- no central metadata server, metadata are distributed accross Redis instances,
- (planned) drastically limit the amount of metadata and metadata manipulations by opting not to implement typical filesystem features such as linking, renaming and timestamping,
- (planned) implement write buffers and leverages Redis pipelining feature,

With this set of choices, we expect our infrastructure to be horizontally scalable (adding more Redis instances to accomodate higher loads) and to accomodate certain I/O loads that are known to be detrimental for parallel filesystem (many files).

On the other hand, a few factors are known for impacting performance versus parallel filesystems:
- Redis uses TCP communications while parallel filesystems rely on RDMA,
- intercepting I/O calls and the use of CGO (system to call Go from C) adds some overhead to the calls,

Obvisouly, proper benchmarking at scale will have to be performed to assess pdwfs performances. Yet, considering these design choices and our past experience with PaDaWAn in designing in transit infrastructure, we are hopefull that we will get decent performances.

It is also noted that a significant difference with parallel filesystems is that pdwfs is a simple infrastructure in user space that can be easily adjusted on a per-simulation or per-workflow basis for efficiency.

## Validation

Test cases have been successfully run so far with the followig codes and tools:
- [IOR](https://github.com/hpc/ior) benchmark with POSIX, parallel HDF5 and MPI-IO methods (OpenMPI v2),
- [HydroC](https://github.com/HydroBench/Hydro) a 2D structured hydrodynamic mini-app using POSIX calls to produce VTK files,
- [ParaView](https://www.paraview.org/in-situ/) VTK file reader.


## Examples
We provide a set of Dockerfiles to test on a laptop the codes and tools described in the Validation section.

- **Example 1**: HydroC + ParaView + FFmpeg workflow

Check the README in the example/HydroC_ParaView directory or just go ahead and type:
```
$ make -C examples/HydroC_ParaView run
```
You can go grab some coffee, building the container takes a while...

- **Example 2**: IOR benchmark

Again, check the README in the corresponding directory or go ahead and type:
```
$ make -C examples/IOR_benchmark run
```
Yep, you can go grab a second coffee...

## Work in progress...
## Known limitations

pdwfs is a work in progress and still fairly experimental.
- Works only for dynamically linked executables,
- Most core or shell utilities for file manipulations (e.g. ls, rm, redirections) requires particular libc calls not implemented,

## License

Expand Down
30 changes: 30 additions & 0 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
FROM golang:1.11.5

ENV GO111MODULE=on

#fetch and build Redis
RUN mkdir -p /tmp/src/redis && \
wget -O redis.tar.gz http://download.redis.io/releases/redis-5.0.3.tar.gz && \
tar xf redis.tar.gz --strip-components=1 -C /tmp/src/redis && \
rm redis.tar.gz && \
cd /tmp/src/redis && make -j "$(nproc)" install

RUN rm -rf /tmp/src

# Switch to non-root user
# replace UID and GID with yours to access your files through a mounted volume
RUN groupadd --gid 1010 dev && \
useradd --uid 1010 --gid dev dev

ENV HOME /home/dev
RUN mkdir -p ${HOME} && chown dev ${HOME}
USER dev
WORKDIR ${HOME}

COPY banner.sh /tmp/
RUN cat /tmp/banner.sh >> .bashrc

CMD bash



16 changes: 16 additions & 0 deletions docker/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@

# uncomment to allow stracing in docker (for debug)
#DOCKER_RUN_OPT = --security-opt seccomp:unconfined

build:
docker build -t pdwfs .

run: build
docker run $(DOCKER_RUN_OPT) -it --rm -v $(shell pwd)/..:/home/dev/pdwfs -w /home/dev/pdwfs --name pdwfs-dev pdwfs

clean:
docker rm $(shell docker ps -qa --no-trunc --filter "status=exited"); \
docker rmi $(shell docker images --filter "dangling=true" -q --no-trunc)



10 changes: 10 additions & 0 deletions docker/banner.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
echo "*************************************************************"
echo "* Welcome to pdwfs development container *"
echo "* *"
echo "* To build pdwfs: *"
echo "* $ make *"
echo "* *"
echo "* To run the test suite you need a running Redis instance: *"
echo "* $ scripts/pdwfs-redis start *"
echo "* $ make test *"
echo "*************************************************************"
59 changes: 59 additions & 0 deletions examples/base_dockerfile/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
FROM centos:latest

RUN yum -y update && yum -y install \
wget \
gcc \
gcc-c++ \
automake \
make \
strace \
zlib-devel \
git \
python-devel; \
yum clean all

# Go language
RUN wget -O go.tar.gz 'https://dl.google.com/go/go1.11.5.linux-amd64.tar.gz' && \
tar xf go.tar.gz -C /usr/local && \
rm go.tar.gz

ENV PATH "/usr/local/go/bin:$PATH"

# OpenMPI
RUN mkdir -p /tmp/src/openmpi && \
wget -O openmpi.tar.gz 'https://download.open-mpi.org/release/open-mpi/v2.1/openmpi-2.1.6.tar.gz' && \
tar xf openmpi.tar.gz --strip-components=1 -C /tmp/src/openmpi && \
rm openmpi.tar.gz && \
cd /tmp/src/openmpi && \
./configure --prefix=/usr/local && make -j "$(nproc)" install

# Redis
RUN mkdir -p /tmp/src/redis && \
wget -O redis.tar.gz http://download.redis.io/releases/redis-5.0.3.tar.gz && \
tar xf redis.tar.gz --strip-components=1 -C /tmp/src/redis && \
rm redis.tar.gz && \
cd /tmp/src/redis && make PREFIX=/usr/local -j "$(nproc)" install

RUN rm -rf /tmp/src

RUN wget -O get-pip.py 'https://bootstrap.pypa.io/get-pip.py' && \
python get-pip.py && \
python -m pip install jupyter matplotlib pandas

EXPOSE 8888

# Switch to non-root user
# replace UID and GID with yours to access your files through a mounted volume
RUN groupadd --gid 1010 rebels && \
useradd --uid 1010 --gid rebels luke

USER luke
ENV HOME /home/luke
WORKDIR ${HOME}

COPY --chown=luke:rebels jupyter_notebook_config.py ${HOME}/.jupyter/

CMD bash



5 changes: 5 additions & 0 deletions examples/base_dockerfile/jupyter_notebook_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
c = get_config()
c.NotebookApp.ip = '0.0.0.0'
c.NotebookApp.open_browser = False
c.NotebookApp.password = ''
c.NotebookApp.token = ''
57 changes: 57 additions & 0 deletions examples/hydroC_ParaView/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
FROM pdwfs-base

USER root

RUN yum -y update; yum -y install numactl-devel; yum clean all

# Download and install ParaView and FFmpeg in /usr/local

RUN wget -O ParaView.tar.xz 'https://www.paraview.org/paraview-downloads/download.php?submit=Download&version=v5.6&type=binary&os=Linux&downloadFile=ParaView-5.6.0-osmesa-MPI-Linux-64bit.tar.xz' && \
mkdir -p /usr/local/ParaView && \
tar xf ParaView.tar.xz --strip-components=1 -C /usr/local/ParaView && \
rm ParaView.tar.xz

ENV PATH "/usr/local/ParaView/bin:$PATH"

RUN wget -O ffmpeg.tar.xz 'https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz' && \
tar xf ffmpeg.tar.xz --strip-components=1 -C /usr/local/bin && \
rm ffmpeg.tar.xz


# Clone and build Hydro simulation code and pdwfs in user space

USER luke

RUN mkdir -p ${HOME}/src && cd ${HOME}/src && \
git clone 'https://github.com/JCapul/Hydro' && \
make -C Hydro/HydroC/HydroC99_2DMpi/Src && \
install Hydro/HydroC/HydroC99_2DMpi/Src/hydro -D ${HOME}/opt/hydro/bin/hydro

ENV PATH "${HOME}/opt/hydro/bin:$PATH"

RUN cd ${HOME}/src && \
git clone 'https://github.com/cea-hpc/pdwfs' && \
make -C pdwfs PREFIX=${HOME}/opt/pdwfs install

ENV PATH "${HOME}/opt/pdwfs/bin:$PATH"

# uncomment to use pdwfs from a volume mounted on the pdwfs source directory on the host (for debug/development)
# (a modification is needed in the Makefile as well)
#ENV PATH "/pdwfs/build/bin:$PATH"

COPY banner.sh /tmp/
RUN cat /tmp/banner.sh >> ${HOME}/.bashrc

RUN mkdir -p ${HOME}/run
WORKDIR ${HOME}/run

COPY --chown=luke:rebels paraview_run.py .
COPY --chown=luke:rebels process_all.py .
COPY --chown=luke:rebels hydro_input.nml .
COPY --chown=luke:rebels run_on_disk.sh .
COPY --chown=luke:rebels run_on_pdwfs.sh .

CMD bash



Loading

0 comments on commit e6898c3

Please sign in to comment.