Skip to content
This repository has been archived by the owner on May 25, 2023. It is now read-only.

"Failed to load NIF library" when running Docker container #8

Open
jy-tan opened this issue Mar 4, 2023 · 4 comments
Open

"Failed to load NIF library" when running Docker container #8

jy-tan opened this issue Mar 4, 2023 · 4 comments

Comments

@jy-tan
Copy link

jy-tan commented Mar 4, 2023

I'm working on an app that uses exFaiss, running into issues with deployment. I can build the Docker image but when i try to run the container it dies immediately with:

"Failed to load NIF library /app/lib/ex_faiss-0.1.0/priv/libex_faiss: '/app/lib/ex_faiss-0.1.0/priv/lib/libfaiss.so: undefined symbol: _Z9vaddq_u1612__Uint16x8_tS_'"

Dockerfile:

FROM elixir:1.14.3-otp-24-slim as builder

RUN apt-get update \
    && apt-get install -y \
     build-essential \
     git curl cmake \
     libblas-dev liblapack-dev \
    && apt-get clean

ENV MIX_ENV=prod

RUN mix local.rebar --force && \
    mix local.hex --force

COPY mix.exs .
COPY mix.lock .

RUN mix deps.get --only $MIX_ENV && \
    mix deps.compile

COPY config ./config
COPY priv ./priv
COPY lib ./lib

RUN mix release

FROM debian:bullseye-slim

RUN apt-get update \
    && apt-get install -y libblas-dev liblapack-dev libgomp1 cmake

WORKDIR /usr/app

COPY --from=builder _build/prod/rel/octomind/ .

CMD ["bin/octomind", "start"]
Full error report
=SUPERVISOR REPORT==== 4-Mar-2023::04:23:05.482765 ===
    supervisor: {local,kernel_sup}
    errorContext: start_error
    reason: {on_load_function_failed,'Elixir.ExFaiss.NIF',
                {error,
                    {load_failed,
                        "Failed to load NIF library /usr/app/lib/ex_faiss-0.1.0/priv/libex_faiss: '/usr/app/lib/ex_faiss-0.1.0/priv/lib/libfaiss.so: undefined symbol: _Z9vaddq_u1612__Uint16x8_tS_'"}}}
    offender: [{pid,undefined},
               {id,kernel_safe_sup},
               {mfargs,{supervisor,start_link,
                                   [{local,kernel_safe_sup},kernel,safe]}},
               {restart_type,permanent},
               {significant,false},
               {shutdown,infinity},
               {child_type,supervisor}]

=CRASH REPORT==== 4-Mar-2023::04:23:05.482810 ===
  crasher:
    initial call: supervisor:kernel/1
    pid: <0.2128.0>
    registered_name: []
    exception exit: {on_load_function_failed,'Elixir.ExFaiss.NIF',
                        {error,
                            {load_failed,
                                "Failed to load NIF library /usr/app/lib/ex_faiss-0.1.0/priv/libex_faiss: '/usr/app/lib/ex_faiss-0.1.0/priv/lib/libfaiss.so: undefined symbol: _Z9vaddq_u1612__Uint16x8_tS_'"}}}
      in function  init:run_on_load_handlers/0 
      in call from kernel:init/1 (kernel.erl, line 189)
      in call from supervisor:init/1 (supervisor.erl, line 330)
      in call from gen_server:init_it/2 (gen_server.erl, line 423)
      in call from gen_server:init_it/6 (gen_server.erl, line 390)
    ancestors: [kernel_sup,<0.2102.0>]
    message_queue_len: 0
    messages: []
    links: [<0.2104.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 987
    stack_size: 28
    reductions: 270
  neighbours:

=CRASH REPORT==== 4-Mar-2023::04:23:06.492776 ===
  crasher:
    initial call: application_master:init/4
    pid: <0.2101.0>
    registered_name: []
    exception exit: {{shutdown,
                      {failed_to_start_child,kernel_safe_sup,
                       {on_load_function_failed,'Elixir.ExFaiss.NIF',
                        {error,
                         {load_failed,
                          "Failed to load NIF library /usr/app/lib/ex_faiss-0.1.0/priv/libex_faiss: '/usr/app/lib/ex_faiss-0.1.0/priv/lib/libfaiss.so: undefined symbol: _Z9vaddq_u1612__Uint16x8_tS_'"}}}}},
                     {kernel,start,[normal,[]]}}
      in function  application_master:init/4 (application_master.erl, line 142)
    ancestors: [<0.2100.0>]
    message_queue_len: 1
    messages: [{'EXIT',<0.2102.0>,normal}]
    links: [<0.2100.0>,<0.2099.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 1598
    stack_size: 28
    reductions: 235
  neighbours:

=INFO REPORT==== 4-Mar-2023::04:23:06.495438 ===
    application: kernel
    exited: {{shutdown,
                 {failed_to_start_child,kernel_safe_sup,
                     {on_load_function_failed,'Elixir.ExFaiss.NIF',
                         {error,
                             {load_failed,
                                 "Failed to load NIF library /usr/app/lib/ex_faiss-0.1.0/priv/libex_faiss: '/usr/app/lib/ex_faiss-0.1.0/priv/lib/libfaiss.so: undefined symbol: _Z9vaddq_u1612__Uint16x8_tS_'"}}}}},
             {kernel,start,[normal,[]]}}
    type: permanent

{"Kernel pid terminated",application_controller,"{application_start_failure,kernel,{{shutdown,{failed_to_start_child,kernel_safe_sup,{on_load_function_failed,'Elixir.ExFaiss.NIF',{error,{load_failed,\"Failed to load NIF library /usr/app/lib/ex_faiss-0.1.0/priv/libex_faiss: '/usr/app/lib/ex_faiss-0.1.0/priv/lib/libfaiss.so: undefined symbol: _Z9vaddq_u1612__Uint16x8_tS_'\"}}}}},{kernel,start,[normal,[]]}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,kernel,{{shutdown,{failed_to_start_child,kernel_safe_sup,{on_load_function_failed,'Elixir.ExFaiss.NIF',{error,{load_failed,"Failed to load NIF library /usr/app/lib/ex_faiss-0.1.0/priv/libex_faiss: '/usr/app/lib/ex_faiss-0.1.0/priv/lib/libfaiss.so: undefined symbol: _Z9vaddq_u1612__Uint16x8_tS_'"}}}}},{kernel,start,[normal,[]]}}})

Crash dump is being written to: erl_crash.dump...done
=SUPERVISOR REPORT==== 4-Mar-2023::04:23:40.861177 ===
    supervisor: {local,kernel_sup}
    errorContext: start_error
    reason: {on_load_function_failed,'Elixir.ExFaiss.NIF',
                {error,
                    {load_failed,
                        "Failed to load NIF library /usr/app/lib/ex_faiss-0.1.0/priv/libex_faiss: '/usr/app/lib/ex_faiss-0.1.0/priv/lib/libfaiss.so: undefined symbol: _Z9vaddq_u1612__Uint16x8_tS_'"}}}
    offender: [{pid,undefined},
               {id,kernel_safe_sup},
               {mfargs,{supervisor,start_link,
                                   [{local,kernel_safe_sup},kernel,safe]}},
               {restart_type,permanent},
               {significant,false},
               {shutdown,infinity},
               {child_type,supervisor}]

=CRASH REPORT==== 4-Mar-2023::04:23:40.861952 ===
  crasher:
    initial call: supervisor:kernel/1
    pid: <0.2128.0>
    registered_name: []
    exception exit: {on_load_function_failed,'Elixir.ExFaiss.NIF',
                        {error,
                            {load_failed,
                                "Failed to load NIF library /usr/app/lib/ex_faiss-0.1.0/priv/libex_faiss: '/usr/app/lib/ex_faiss-0.1.0/priv/lib/libfaiss.so: undefined symbol: _Z9vaddq_u1612__Uint16x8_tS_'"}}}
      in function  init:run_on_load_handlers/0 
      in call from kernel:init/1 (kernel.erl, line 189)
      in call from supervisor:init/1 (supervisor.erl, line 330)
      in call from gen_server:init_it/2 (gen_server.erl, line 423)
      in call from gen_server:init_it/6 (gen_server.erl, line 390)
    ancestors: [kernel_sup,<0.2102.0>]
    message_queue_len: 0
    messages: []
    links: [<0.2104.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 987
    stack_size: 28
    reductions: 270
  neighbours:

=CRASH REPORT==== 4-Mar-2023::04:23:41.866610 ===
  crasher:
    initial call: application_master:init/4
    pid: <0.2101.0>
    registered_name: []
    exception exit: {{shutdown,
                      {failed_to_start_child,kernel_safe_sup,
                       {on_load_function_failed,'Elixir.ExFaiss.NIF',
                        {error,
                         {load_failed,
                          "Failed to load NIF library /usr/app/lib/ex_faiss-0.1.0/priv/libex_faiss: '/usr/app/lib/ex_faiss-0.1.0/priv/lib/libfaiss.so: undefined symbol: _Z9vaddq_u1612__Uint16x8_tS_'"}}}}},
                     {kernel,start,[normal,[]]}}
      in function  application_master:init/4 (application_master.erl, line 142)
    ancestors: [<0.2100.0>]
    message_queue_len: 1
    messages: [{'EXIT',<0.2102.0>,normal}]
    links: [<0.2100.0>,<0.2099.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 1598
    stack_size: 28
    reductions: 235
  neighbours:

=INFO REPORT==== 4-Mar-2023::04:23:41.868296 ===
    application: kernel
    exited: {{shutdown,
                 {failed_to_start_child,kernel_safe_sup,
                     {on_load_function_failed,'Elixir.ExFaiss.NIF',
                         {error,
                             {load_failed,
                                 "Failed to load NIF library /usr/app/lib/ex_faiss-0.1.0/priv/libex_faiss: '/usr/app/lib/ex_faiss-0.1.0/priv/lib/libfaiss.so: undefined symbol: _Z9vaddq_u1612__Uint16x8_tS_'"}}}}},
             {kernel,start,[normal,[]]}}
    type: permanent

Kernel pid terminated (application_controller) ({application_start_failure,kernel,{{shutdown,{failed_to_start_child,kernel_safe_sup,{on_load_function_failed,'Elixir.ExFaiss.NIF',{error,{load_failed,"Failed to load NIF library /usr/app/lib/ex_faiss-0.1.0/priv/libex_faiss: '/usr/app/lib/ex_faiss-0.1.0/priv/lib/libfaiss.so: undefined symbol: _Z9vaddq_u1612__Uint16x8_tS_'"}}}}},{kernel,start,[normal,[]]}}})

Crash dump is being written to: erl_crash.dump...done
{"Kernel pid terminated",application_controller,"{application_start_failure,kernel,{{shutdown,{failed_to_start_child,kernel_safe_sup,{on_load_function_failed,'Elixir.ExFaiss.NIF',{error,{load_failed,\"Failed to load NIF library /usr/app/lib/ex_faiss-0.1.0/priv/libex_faiss: '/usr/app/lib/ex_faiss-0.1.0/priv/lib/libfaiss.so: undefined symbol: _Z9vaddq_u1612__Uint16x8_tS_'\"}}}}},{kernel,start,[normal,[]]}}}"}

Greatly appreciate any help on this!

@seanmor5
Copy link
Contributor

seanmor5 commented Mar 4, 2023

Hey @jy-tan I did some troubleshooting, I was able to get this to work in Docker on my mac. The issue seemed related to building on ARM (do you happen to be using an M1 or other ARM machine? See also: facebookresearch/faiss#2335)

Here's the Dockerfile I got to work (it can probably be cleaned up a bit):

FROM hexpm/elixir:1.14.3-erlang-24.3.4.9-ubuntu-xenial-20210804 as builder

RUN apt-get update \
    && apt-get install -y \
     build-essential \
     git curl cmake \
     libblas-dev liblapack-dev \
     libatlas-base-dev libatlas3-base \
     wget libclang-8-dev libssl-dev \
    && apt-get clean

RUN wget https://github.com/Kitware/CMake/releases/download/v3.19.3/cmake-3.19.3.tar.gz

RUN tar xvzf cmake-3.19.3.tar.gz
WORKDIR cmake-3.19.3
RUN ./configure --prefix=/cmake-3.19.3/cmake &&  make -j
RUN ln -sf /cmake-3.19.3/bin/cmake $(which cmake)

WORKDIR ../
ENV FAISS_BUILD_FLAGS=-DCMAKE_CXX_COMPILER=clang++-8
ENV MIX_ENV=prod

RUN mix local.rebar --force && \
    mix local.hex --force

COPY mix.exs .
COPY mix.lock .

RUN mix deps.get --only $MIX_ENV && \
    mix deps.compile

COPY config ./config
COPY priv ./priv
COPY lib ./lib

RUN mix release

FROM debian:bullseye-slim

RUN apt-get update \
    && apt-get install -y libblas-dev liblapack-dev libgomp1 cmake libatlas-base-dev libatlas3-base libncurses5

WORKDIR /usr/app

COPY --from=builder _build/prod/rel/exfaiss_deployment_test/ .

CMD ["bin/exfaiss_deployment_test", "start_iex"]

Note the key differences here are:

  • switched to ubuntu for build, I don't think this is necessary though I just didn't feel like building clang-8 from scratch
  • build cmake from scratch as the ubuntu distro cmake version is too old to build faiss
  • add libatlas
  • set ENV FAISS_BUILD_FLAGS to use clang++-8 (make sure you update ex_faiss, I just added this flag)
  • I also had to add libncurses5 to debian:bullseye-slim

This is based on instructions from here: https://github.com/facebookresearch/faiss/wiki/Installing-Faiss#compiling-faiss-on-arm

After building I can go into iex and run:

iex(exfaiss_deployment_test@00e6eb890d6b)1> ExFaiss.Index.new(384, "Flat")
%ExFaiss.Index{
  dim: 384,
  ref: #Reference<0.30079746.2162032642.130368>,
  device: :host
}

If this fixes it, we can update the docs and README with deployment instructions. I can check on an x86 machine if this is still necessary

@jy-tan
Copy link
Author

jy-tan commented Mar 5, 2023

Thanks @seanmor5 for looking into this! I'm unable to build the image this time though, here are the last few lines of the build logs:

#15 582.4 g++: internal compiler error: Killed (program cc1plus)
#15 582.4 Please submit a full bug report,
#15 582.4 with preprocessed source if appropriate.
#15 582.4 See <file:///usr/share/doc/gcc-5/README.Bugs> for instructions.
#15 583.1 make[2]: *** [Source/CMakeFiles/CMakeLib.dir/cmUtilitySourceCommand.cxx.o] Error 4
#15 583.2 Source/CMakeFiles/CMakeLib.dir/build.make:3799: recipe for target 'Source/CMakeFiles/CMakeLib.dir/cmUtilitySourceCommand.cxx.o' failed
#15 584.6 g++: internal compiler error: Killed (program cc1plus)
#15 584.6 Please submit a full bug report,
#15 584.6 with preprocessed source if appropriate.
#15 584.6 See <file:///usr/share/doc/gcc-5/README.Bugs> for instructions.
#15 584.9 Source/CMakeFiles/CMakeLib.dir/build.make:3110: recipe for target 'Source/CMakeFiles/CMakeLib.dir/cmInstallProgramsCommand.cxx.o' failed
#15 584.9 make[2]: *** [Source/CMakeFiles/CMakeLib.dir/cmInstallProgramsCommand.cxx.o] Error 4
#15 588.5 g++: internal compiler error: Killed (program cc1plus)
#15 588.5 Please submit a full bug report,
#15 588.5 with preprocessed source if appropriate.
#15 588.5 See <file:///usr/share/doc/gcc-5/README.Bugs> for instructions.
#15 588.5 make[2]: *** [Source/CMakeFiles/CMakeLib.dir/cmStringCommand.cxx.o] Error 4
#15 588.5 Source/CMakeFiles/CMakeLib.dir/build.make:3552: recipe for target 'Source/CMakeFiles/CMakeLib.dir/cmStringCommand.cxx.o' failed
#15 593.0 g++: internal compiler error: Killed (program cc1plus)
#15 593.0 Please submit a full bug report,
#15 593.0 with preprocessed source if appropriate.
#15 593.0 See <file:///usr/share/doc/gcc-5/README.Bugs> for instructions.
#15 593.6 Source/CMakeFiles/CMakeLib.dir/build.make:1420: recipe for target 'Source/CMakeFiles/CMakeLib.dir/cmInstallExportGenerator.cxx.o' failed
#15 593.7 make[2]: *** [Source/CMakeFiles/CMakeLib.dir/cmInstallExportGenerator.cxx.o] Error 4
#15 710.5 CMakeFiles/Makefile2:2379: recipe for target 'Source/CMakeFiles/CMakeLib.dir/all' failed
#15 710.5 make[1]: *** [Source/CMakeFiles/CMakeLib.dir/all] Error 2
#15 710.5 make: *** [all] Error 2
#15 710.5 Makefile:181: recipe for target 'all' failed
#15 ERROR: executor failed running [/bin/sh -c ./configure --prefix=/cmake-3.19.3/cmake &&  make -j]: exit code: 2
------
 > [builder  6/16] RUN ./configure --prefix=/cmake-3.19.3/cmake &&  make -j:
------
executor failed running [/bin/sh -c ./configure --prefix=/cmake-3.19.3/cmake &&  make -j]: exit code: 2

(let me know if you would like the full log)

I'm using Docker in an M1 Mac (with Virtualization framework).
Some other approaches I tried:

  • Increasing Docker resources to use my full memory capacity (just 8GB), but to no avail.
  • Building this in a GitHub workflow (using ubuntu-latest) fails too (not able to get any error messages yet, will add when I have them)

@seanmor5
Copy link
Contributor

seanmor5 commented Mar 5, 2023

My only guess is it's running out of memory during compilation? I think I have my Docker setup on the M1 to consume max memory available, maybe try going up in increments until you're at the max and see if it goes away?

@jy-tan
Copy link
Author

jy-tan commented Mar 6, 2023

Overcame the memory issue by setting a low number of jobs (make -j 2).

Also wondering if you're missing the usage of FAISS_BUILD_FLAGS in the Makefile? Your latest commit shows only a change in the comments.

I'm having issues compiling EXLA currently (besides that it's fine), I'll open a separate issue in Nx!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants