diff --git a/README.md b/README.md index b2be8d2..1ca0e76 100644 --- a/README.md +++ b/README.md @@ -2,12 +2,16 @@ This API is used internally by Fink web components to retrieve cutouts from the data lake on HDFS. We take advantage of the pyarrow connector to read parquet files to efficiently extract required cutouts from an HDFS block. +## Requirements + +You will need Python installed (>=3.11) with requirements listed in `requirements.txt`. You wiil also need Hadoop installed on the machine (see `install/`). + ## Usage To deploy the API, you need access to the Fink HDFS cluster. Once `config.yml` is filled, just deploy using: ```bash -python cutout_app.py +python app.py ``` ## Accessing 2D cutout diff --git a/install/README.md b/install/README.md index 1547c43..8dffb7c 100644 --- a/install/README.md +++ b/install/README.md @@ -5,3 +5,14 @@ Execute the script to install it under `/opt`: ```bash ./install_hadoop.sh ``` + +and then update your `.bashrc` with (careful with the version number): + +```bash +# Hadoop +export HADOOP_HDFS_HOME=/opt/hadoop-3.3.6 +export HADOOP_HOME=$HADOOP_HDFS_HOME +export CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath --glob` +export PATH=$PATH:$HADOOP_HDFS_HOME/bin:$HADOOP_HDFS_HOME/sbin +export ARROW_LIBHDFS_DIR=$HADOOP_HOME/lib/native +```