Skip to content

Commit

Permalink
Hadoop installation
Browse files Browse the repository at this point in the history
  • Loading branch information
JulienPeloton committed Nov 22, 2024
1 parent cd73c33 commit 758c82c
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 1 deletion.
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,16 @@

This API is used internally by Fink web components to retrieve cutouts from the data lake on HDFS. We take advantage of the pyarrow connector to read parquet files to efficiently extract required cutouts from an HDFS block.

## Requirements

You will need Python installed (>=3.11) with requirements listed in `requirements.txt`. You wiil also need Hadoop installed on the machine (see `install/`).

## Usage

To deploy the API, you need access to the Fink HDFS cluster. Once `config.yml` is filled, just deploy using:

```bash
python cutout_app.py
python app.py
```

## Accessing 2D cutout
Expand Down
11 changes: 11 additions & 0 deletions install/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,14 @@ Execute the script to install it under `/opt`:
```bash
./install_hadoop.sh
```

and then update your `.bashrc` with (careful with the version number):

```bash
# Hadoop
export HADOOP_HDFS_HOME=/opt/hadoop-3.3.6
export HADOOP_HOME=$HADOOP_HDFS_HOME
export CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath --glob`
export PATH=$PATH:$HADOOP_HDFS_HOME/bin:$HADOOP_HDFS_HOME/sbin
export ARROW_LIBHDFS_DIR=$HADOOP_HOME/lib/native
```

0 comments on commit 758c82c

Please sign in to comment.