Hadoop installation

astrolabsoftware · Nov 22, 2024 · 758c82c · 758c82c
1 parent cd73c33
commit 758c82c
Show file tree

Hide file tree

Showing 2 changed files with 16 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -2,12 +2,16 @@
 
 This API is used internally by Fink web components to retrieve cutouts from the data lake on HDFS. We take advantage of the pyarrow connector to read parquet files to efficiently extract required cutouts from an HDFS block.
 
+## Requirements
+
+You will need Python installed (>=3.11) with requirements listed in `requirements.txt`. You wiil also need Hadoop installed on the machine (see `install/`).
+
 ## Usage
 
 To deploy the API, you need access to the Fink HDFS cluster. Once `config.yml` is filled, just deploy using:
 
 ```bash
-python cutout_app.py
+python app.py
 ```
 
 ## Accessing 2D cutout

diff --git a/install/README.md b/install/README.md
@@ -5,3 +5,14 @@ Execute the script to install it under `/opt`:
 ```bash
 ./install_hadoop.sh
 ```
+
+and then update your `.bashrc` with (careful with the version number):
+
+```bash
+# Hadoop
+export HADOOP_HDFS_HOME=/opt/hadoop-3.3.6
+export HADOOP_HOME=$HADOOP_HDFS_HOME
+export CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath --glob`
+export PATH=$PATH:$HADOOP_HDFS_HOME/bin:$HADOOP_HDFS_HOME/sbin
+export ARROW_LIBHDFS_DIR=$HADOOP_HOME/lib/native
+```