diff --git a/docs/source/gs-processing/usage/amazon-sagemaker.rst b/docs/source/gs-processing/usage/amazon-sagemaker.rst
index 78621c4909..624025914f 100644
--- a/docs/source/gs-processing/usage/amazon-sagemaker.rst
+++ b/docs/source/gs-processing/usage/amazon-sagemaker.rst
@@ -45,7 +45,7 @@ job, followed by the re-partitioning job, both on SageMaker:
     INSTANCE_TYPE="ml.t3.xlarge"
     NUM_FILES="4"
 
-    IMAGE_URI="${ACCOUNT}.dkr.ecr.${REGION}.amazonaws.com/graphstorm-processing-sagemaker:0.2.1"
+    IMAGE_URI="${ACCOUNT}.dkr.ecr.${REGION}.amazonaws.com/graphstorm-processing-sagemaker:0.2.1-x86_64"
     ROLE="arn:aws:iam::${ACCOUNT}:role/service-role/${SAGEMAKER_ROLE_NAME}"
 
     OUTPUT_PREFIX="s3://${OUTPUT_BUCKET}/gsprocessing/sagemaker/${GRAPH_NAME}/${INSTANCE_COUNT}x-${INSTANCE_TYPE}-${NUM_FILES}files/"
diff --git a/docs/source/gs-processing/usage/distributed-processing-setup.rst b/docs/source/gs-processing/usage/distributed-processing-setup.rst
index d003b93579..261c0ce9a9 100644
--- a/docs/source/gs-processing/usage/distributed-processing-setup.rst
+++ b/docs/source/gs-processing/usage/distributed-processing-setup.rst
@@ -104,13 +104,70 @@ the following to build the SageMaker image:
     bash docker/build_gsprocessing_image.sh --environment sagemaker
 
 The above will use the SageMaker-specific Dockerfile of the latest available GSProcessing version,
-build an image and tag it as ``graphstorm-processing-sagemaker:${VERSION}`` where
+build an image and tag it as ``graphstorm-processing-sagemaker:${VERSION}-x86_64`` where
 ``${VERSION}`` will take be the latest available GSProcessing version (e.g. ``0.2.1``).
 
 The script also supports other arguments to customize the image name,
 tag and other aspects of the build. See ``bash docker/build_gsprocessing_image.sh --help``
 for more information.
 
+Support for arm64 architecture
+------------------------------
+
+For EMR Serverless images, it is possible to build images that support ``arm64`` instances,
+which can lead to improved runtime and cost compared to ``x86_64``. You can build an ``arm64``
+image natively by installing Docker and following the above process on an ARM instance such
+as ``M6G`` or ``M7G``. See the `AWS documentation <https://aws.amazon.com/ec2/graviton/>`_
+for instances powered by the Graviton processor.
+
+To build ``arm64`` images
+on an ``x86_64`` host you need to enable multi-platform builds for Docker. The easiest way
+to do so is to use QEMU emulation. To install the QEMU related libraries you can run
+
+On Ubuntu
+
+.. code-block:: bash
+
+    sudo apt install -y qemu binfmt-support qemu-user-static
+
+On Amazon Linux/CentOS:
+
+.. code-block:: bash
+
+    sudo yum instal -y qemu-system-arm qemu qemu-user qemu-kvm qemu-kvm-tools \
+        libvirt virt-install libvirt-python libguestfs-tools-c
+
+Finally you'd need to ensure ``binfmt_misc`` is configured for different platforms by running
+
+.. code-block:: bash
+
+    docker run --privileged --rm tonistiigi/binfmt --install all
+
+To verify your Docker installation is ready for multi-platform builds you can run:
+
+.. code-block:: bash
+
+    docker buildx ls
+
+    NAME/NODE   DRIVER/ENDPOINT STATUS  BUILDKIT     PLATFORMS
+    default *   docker
+    default     default         running v0.8+unknown linux/amd64, linux/arm64
+
+To build an EMR Serverless GSProcessing image for the ``arm64`` architecture you can run:
+
+.. code-block:: bash
+
+    bash docker/build_gsprocessing_image.sh --environment sagemaker --architecture arm64
+
+.. note::
+
+    Building images under emulation using QEMU can be significantly slower than native builds
+    (more than 20 minutes to build the GSProcessing ``arm64`` image).
+    To speed up the build process you can build on an ARM instances,
+    look into using ``buildx`` with multiple native nodes, or use cross-compilation.
+    See `the official Docker documentation <https://docs.docker.com/build/building/multi-platform/>`_
+    for details.
+
 Push the image to the Amazon Elastic Container Registry (ECR)
 -------------------------------------------------------------
 
@@ -136,6 +193,13 @@ Example:
 
     bash docker/push_gsprocessing_image.sh -e sagemaker -i "graphstorm-processing" -v "0.2.1" -r "us-west-2" -a "1234567890"
 
+To push an EMR Serverless ``arm64`` image you'd similarly run:
+
+.. code-block:: bash
+
+    bash docker/push_gsprocessing_image.sh -e emr-serverless --architecture arm64 \
+        -i "graphstorm-processing" -v "0.2.1" -r "us-west-2" -a "1234567890"
+
 .. _gsp-upload-data-ref:
 
 Upload data to S3
diff --git a/docs/source/gs-processing/usage/emr-serverless.rst b/docs/source/gs-processing/usage/emr-serverless.rst
index adef4a4a05..35b54e9f1d 100644
--- a/docs/source/gs-processing/usage/emr-serverless.rst
+++ b/docs/source/gs-processing/usage/emr-serverless.rst
@@ -88,14 +88,14 @@ Here we will just show the custom image application creation using the AWS CLI:
 
     aws emr-serverless create-application \
         --name gsprocessing-0.2.1 \
-        --release-label emr-6.11.0 \
+        --release-label emr-6.13.0 \
         --type SPARK \
         --image-configuration '{
-            "imageUri": "<aws-account-id>.dkr.ecr.<region>.amazonaws.com/graphstorm-processing-emr-serverless:0.2.1"
+            "imageUri": "<aws-account-id>.dkr.ecr.<region>.amazonaws.com/graphstorm-processing-emr-serverless:0.2.1-<arch>"
         }'
 
-Here you will need to replace ``<aws-account-id>`` and ``<region>`` with the correct values
-from the image you just created. GSProcessing version ``0.2.1`` uses ``emr-6.11.0`` as its
+Here you will need to replace ``<aws-account-id>``, ``<arch>`` (``x86_64`` or ``arm64``), and ``<region>`` with the correct values
+from the image you just created. GSProcessing version ``0.2.1`` uses ``emr-6.13.0`` as its
 base image, so we need to ensure our application uses the same release.
 
 
@@ -234,7 +234,7 @@ and building the GSProcessing SageMaker ECR image:
     bash docker/push_gsprocessing_image.sh --environment sagemaker --region ${REGION}
 
     SAGEMAKER_ROLE_NAME="enter-your-sagemaker-execution-role-name-here"
-    IMAGE_URI="${ACCOUNT}.dkr.ecr.${REGION}.amazonaws.com/graphstorm-processing-sagemaker:0.2.1"
+    IMAGE_URI="${ACCOUNT}.dkr.ecr.${REGION}.amazonaws.com/graphstorm-processing-sagemaker:0.2.1-x86_64"
     ROLE="arn:aws:iam::${ACCOUNT}:role/service-role/${SAGEMAKER_ROLE_NAME}"
     INSTANCE_TYPE="ml.t3.xlarge"
 
diff --git a/graphstorm-processing/docker/0.2.1/emr-serverless/Dockerfile.cpu b/graphstorm-processing/docker/0.2.1/emr-serverless/Dockerfile.cpu
index 267f986358..8ef9d7bca6 100644
--- a/graphstorm-processing/docker/0.2.1/emr-serverless/Dockerfile.cpu
+++ b/graphstorm-processing/docker/0.2.1/emr-serverless/Dockerfile.cpu
@@ -1,4 +1,7 @@
-FROM public.ecr.aws/emr-serverless/spark/emr-6.11.0:20230629-x86_64 as runtime
+ARG ARCH=x86_64
+FROM public.ecr.aws/emr-serverless/spark/emr-6.13.0:20230906-${ARCH} as base
+FROM base as runtime
+
 USER root
 ENV PYTHON_VERSION=3.9.18
 
diff --git a/graphstorm-processing/docker/build_gsprocessing_image.sh b/graphstorm-processing/docker/build_gsprocessing_image.sh
index 7ecf1e3094..4c53f74416 100644
--- a/graphstorm-processing/docker/build_gsprocessing_image.sh
+++ b/graphstorm-processing/docker/build_gsprocessing_image.sh
@@ -16,6 +16,8 @@ Available options:
 -h, --help          Print this help and exit
 -x, --verbose       Print script debug info (set -x)
 -e, --environment   Image execution environment. Must be one of 'emr-serverless' or 'sagemaker'. Required.
+-a, --architecture  Image architecture. Must be one of 'x86_64' or 'arm64'. Default is 'x86_64'.
+                    Note that only x86_64 architecture is supported for SageMaker.
 -t, --target        Docker image target, must be one of 'prod' or 'test'. Default is 'test'.
 -p, --path          Path to graphstorm-processing directory, default is the current directory.
 -i, --image         Docker image name, default is 'graphstorm-processing'.
@@ -43,6 +45,7 @@ parse_params() {
   VERSION=`poetry version --short`
   BUILD_DIR='/tmp'
   TARGET='test'
+  ARCH='x86_64'
 
   while :; do
     case "${1-}" in
@@ -57,6 +60,10 @@ parse_params() {
       EXEC_ENV="${2-}"
       shift
       ;;
+    -a | --architecture)
+      ARCH="${2-}"
+      shift
+      ;;
     -p | --path)
       GSP_HOME="${2-}"
       shift
@@ -103,15 +110,20 @@ else
     die "--target parameter needs to be one of 'prod' or 'test', got ${TARGET}"
 fi
 
-if [[ ${EXEC_ENV} == "sagemaker" || ${EXEC_ENV} == "emr-serverless" ]]; then
+if [[ ${ARCH} == "x86_64" || ${ARCH} == "arm64" ]]; then
     :  # Do nothing
 else
-    die "--environment parameter needs to be one of 'emr-serverless' or 'sagemaker', got ${EXEC_ENV}"
+    die "--architecture parameter needs to be one of 'arm64' or 'x86_64', got ${ARCH}"
+fi
+
+if [[ ${EXEC_ENV} == "sagemaker" && ${ARCH} == "arm64" ]]; then
+    die "arm64 architecture is not supported for SageMaker"
 fi
 
 # script logic here
 msg "Execution parameters:"
 msg "- ENVIRONMENT: ${EXEC_ENV}"
+msg "- ARCHITECTURE: ${ARCH}"
 msg "- TARGET: ${TARGET}"
 msg "- GSP_HOME: ${GSP_HOME}"
 msg "- IMAGE_NAME: ${IMAGE_NAME}"
@@ -139,7 +151,7 @@ cp ${GSP_HOME}/docker-entry.sh "${BUILD_DIR}/docker/code/"
 poetry export -f requirements.txt --output "${BUILD_DIR}/docker/requirements.txt"
 
 # Set image name
-DOCKER_FULLNAME="${IMAGE_NAME}-${EXEC_ENV}:${VERSION}"
+DOCKER_FULLNAME="${IMAGE_NAME}-${EXEC_ENV}:${VERSION}-${ARCH}"
 
 # Login to ECR to be able to pull source SageMaker image
 if [[ ${EXEC_ENV} == "sagemaker" ]]; then
@@ -147,10 +159,8 @@ if [[ ${EXEC_ENV} == "sagemaker" ]]; then
         | docker login --username AWS --password-stdin 153931337802.dkr.ecr.us-west-2.amazonaws.com
 else
     aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws
-    # aws ecr get-login-password --region us-west-2 \
-    #     | docker login --username AWS --password-stdin 895885662937.dkr.ecr.us-west-2.amazonaws.com
 fi
 
 echo "Build a Docker image ${DOCKER_FULLNAME}"
-DOCKER_BUILDKIT=1 docker build -f "${GSP_HOME}/docker/${VERSION}/${EXEC_ENV}/Dockerfile.cpu" \
-    "${BUILD_DIR}/docker/" -t $DOCKER_FULLNAME --target ${TARGET}
+DOCKER_BUILDKIT=1 docker build --platform "linux/${ARCH}" -f "${GSP_HOME}/docker/${VERSION}/${EXEC_ENV}/Dockerfile.cpu" \
+    "${BUILD_DIR}/docker/" -t $DOCKER_FULLNAME --target ${TARGET} --build-arg ARCH=${ARCH}
diff --git a/graphstorm-processing/docker/push_gsprocessing_image.sh b/graphstorm-processing/docker/push_gsprocessing_image.sh
index 5d6753d083..eaab38876a 100644
--- a/graphstorm-processing/docker/push_gsprocessing_image.sh
+++ b/graphstorm-processing/docker/push_gsprocessing_image.sh
@@ -16,6 +16,7 @@ Available options:
 -h, --help          Print this help and exit
 -x, --verbose       Print script debug info
 -e, --environment   Image execution environment. Must be one of 'emr-serverless' or 'sagemaker'. Required.
+-c, --architecture  Image architecture. Must be one of 'x86_64' or 'arm64'. Default is 'x86_64'.
 -i, --image         Docker image name, default is 'graphstorm-processing'.
 -v, --version       Docker version tag, default is the library's current version (`poetry version --short`)
 -r, --region        AWS Region to which we'll push the image. By default will get from aws-cli configuration.
@@ -43,6 +44,7 @@ parse_params() {
   REGION=$(aws configure get region)
   REGION=${REGION:-us-west-2}
   ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
+  ARCH='x86_64'
 
 
   while :; do
@@ -54,6 +56,10 @@ parse_params() {
       EXEC_ENV="${2-}"
       shift
       ;;
+    -a | --architecture)
+      ARCH="${2-}"
+      shift
+      ;;
     -i | --image)
       IMAGE="${2-}"
       shift
@@ -98,13 +104,14 @@ fi
 # script logic here
 msg "Execution parameters: "
 msg "- ENVIRONMENT: ${EXEC_ENV}"
+msg "- ARCHITECTURE: ${ARCH}"
 msg "- IMAGE: ${IMAGE}"
 msg "- VERSION: ${VERSION}"
 msg "- REGION: ${REGION}"
 msg "- ACCOUNT: ${ACCOUNT}"
 
-SUFFIX="${VERSION}"
-LATEST_SUFFIX="latest"
+SUFFIX="${VERSION}-${ARCH}"
+LATEST_SUFFIX="latest-${ARCH}"
 IMAGE_WITH_ENV="${IMAGE}-${EXEC_ENV}"
 
 
diff --git a/graphstorm-processing/pyproject.toml b/graphstorm-processing/pyproject.toml
index 7bb87f2752..16a5749533 100644
--- a/graphstorm-processing/pyproject.toml
+++ b/graphstorm-processing/pyproject.toml
@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "graphstorm_processing"
-version = "0.1.0"
+version = "0.2.1"
 description = "Distributed graph pre-processing for GraphStorm"
 readme = "README.md"
 packages = [{include = "graphstorm_processing"}]
@@ -10,7 +10,7 @@ authors = [
 
 [tool.poetry.dependencies]
 python = "~3.9.12"
-pyspark = "~3.3.0"
+pyspark = ">=3.3.0, < 3.5.0"
 pyarrow = "~13.0.0"
 spacy = "3.6.0"
 boto3 = "~1.28.1"
diff --git a/graphstorm-processing/tests/resources/small_heterogeneous_graph/gsprocessing-config.json b/graphstorm-processing/tests/resources/small_heterogeneous_graph/gsprocessing-config.json
index 48b3b2deb8..1ea789b69b 100644
--- a/graphstorm-processing/tests/resources/small_heterogeneous_graph/gsprocessing-config.json
+++ b/graphstorm-processing/tests/resources/small_heterogeneous_graph/gsprocessing-config.json
@@ -21,7 +21,7 @@
                     ],
                     "separator": ","
                 },
-                "type": "movies",
+                "type": "movie",
                 "column": "~id"
             },
             {