Skip to content

Commit

Permalink
[GSProcessing] Remove poetry as a requirement for building images.
Browse files Browse the repository at this point in the history
  • Loading branch information
thvasilo committed Oct 28, 2024
1 parent 993a71f commit f6c2a7b
Show file tree
Hide file tree
Showing 6 changed files with 980 additions and 40 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ The steps required are:

- Clone the GraphStorm repository.
- Install Docker.
- Install Poetry.
- Set up AWS access.
- Build the GraphStorm Processing image using Docker.
- Push the image to the Amazon Elastic Container Registry (ECR).
Expand Down Expand Up @@ -45,22 +44,6 @@ you'll need to have the Docker engine installed.
To install Docker follow the instructions at the
`official site <https://docs.docker.com/engine/install/>`_.

Install Poetry
--------------

We use `Poetry <https://python-poetry.org/docs/>`_ as our build
tool and for dependency management,
so we need to install it to facilitate building the library.

You can install Poetry using:

.. code-block:: bash
curl -sSL https://install.python-poetry.org | python3 -
For detailed installation instructions the
`Poetry docs <https://python-poetry.org/docs/>`_.


Set up AWS access
-----------------
Expand All @@ -85,7 +68,7 @@ create an ECR repository if it doesn't exist, and push the GSProcessing image to
Building the GraphStorm Processing image using Docker
-----------------------------------------------------

Once Docker and Poetry are installed, and your AWS credentials are set up,
Once Docker and the aws CLI are installed, and your AWS credentials are set up,
we can use the provided scripts
in the ``graphstorm-processing/docker`` directory to build the image.

Expand Down
12 changes: 6 additions & 6 deletions graphstorm-processing/docker/0.4.0/emr-serverless/Dockerfile.cpu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
ARG ARCH=x86_64
FROM public.ecr.aws/emr-serverless/spark/emr-7.1.0:20240528-${ARCH} as base
FROM public.ecr.aws/emr-serverless/spark/emr-7.3.0:20241008-${ARCH} as base

USER root
ENV PYTHON_VERSION=3.9.18
Expand Down Expand Up @@ -46,12 +46,12 @@ RUN touch /usr/lib/spark/code/EMR_SERVERLESS_EXECUTION
# GSProcessing codebase
COPY code/ /usr/lib/spark/code/

FROM runtime AS prod
RUN python -m pip install --no-deps /usr/lib/spark/code/graphstorm_processing-*.whl && \
rm /usr/lib/spark/code/graphstorm_processing-*.whl && rm -rf /root/.cache
RUN python3 -m pip install --no-deps /usr/lib/spark/code/graphstorm-processing/
FROM base AS prod

FROM runtime AS test
RUN python -m pip install --no-deps /usr/lib/spark/code/graphstorm-processing/ && rm -rf /root/.cache
FROM base AS test
RUN python3 -m pip install mock pytest && \
rm -rf /root/.cache

USER hadoop:hadoop
WORKDIR /home/hadoop
10 changes: 5 additions & 5 deletions graphstorm-processing/docker/0.4.0/emr/Dockerfile.cpu
Original file line number Diff line number Diff line change
Expand Up @@ -70,9 +70,9 @@ RUN touch /usr/lib/spark/code/EMR_EXECUTION
# GSProcessing codebase
COPY code/ /usr/lib/spark/code/

FROM runtime AS prod
RUN python3 -m pip install --no-deps /usr/lib/spark/code/graphstorm_processing-*.whl && \
rm /usr/lib/spark/code/graphstorm_processing-*.whl && rm -rf /root/.cache
RUN python3 -m pip install --no-deps /usr/lib/spark/code/graphstorm-processing/
FROM base AS prod

FROM runtime AS test
RUN python3 -m pip install --no-deps /usr/lib/spark/code/graphstorm-processing/ && rm -rf /root/.cache
FROM base AS test
RUN python3 -m pip install mock pytest && \
rm -rf /root/.cache
Loading

0 comments on commit f6c2a7b

Please sign in to comment.