fp16 does not work on CPU #1020

nicolabertoldi · 2023-12-19T20:18:48Z

Bug description

I am not able to run inference with fp16 on CPU.

How to reproduce

Describe steps or include command to reproduce the behavior.

echo "▁Hello" | marian-decoder -m model.bin -v model.spv model.spv --cpu-threads 1 --precision float16

Context

Marian version: v1.12.0 65bf82f 2023-02-21 09:56:29 -0800
CMake command:

cmake ..  -DCOMPILE_CPU=on -DCOMPILE_FP16=on

./marian-decoder --build-info all

AVX2_FOUND=true
AVX512_FOUND=true
AVX_FOUND=true
BLAS_flexiblas_LIBRARY=BLAS_flexiblas_LIBRARY-NOTFOUND
BLAS_goto2_LIBRARY=BLAS_goto2_LIBRARY-NOTFOUND
BLAS_mkl_LIBRARY=BLAS_mkl_LIBRARY-NOTFOUND
BLAS_mkl_em64t_LIBRARY=BLAS_mkl_em64t_LIBRARY-NOTFOUND
BLAS_mkl_ia32_LIBRARY=BLAS_mkl_ia32_LIBRARY-NOTFOUND
BLAS_mkl_intel_LIBRARY=BLAS_mkl_intel_LIBRARY-NOTFOUND
BLAS_mkl_intel_lp64_LIBRARY=BLAS_mkl_intel_lp64_LIBRARY-NOTFOUND
BLAS_mkl_rt_LIBRARY=BLAS_mkl_rt_LIBRARY-NOTFOUND
BLAS_openblas_LIBRARY=/usr/lib/x86_64-linux-gnu/libopenblas.so
BUILD_ARCH=native
CMAKE_ADDR2LINE=/usr/bin/addr2line
CMAKE_AR=/usr/bin/ar
CMAKE_BUILD_TYPE=Release
CMAKE_COLOR_MAKEFILE=ON
CMAKE_CXX_COMPILER=/usr/bin/c++
CMAKE_CXX_COMPILER_AR=/usr/bin/gcc-ar-9
CMAKE_CXX_COMPILER_RANLIB=/usr/bin/gcc-ranlib-9
CMAKE_CXX_FLAGS=-std=c++11 -pthread -Wl,--no-as-needed -fPIC -Wno-unused-result  -march=native  -DUSE_SENTENCEPIECE -DCUDA_FOUND -DUSE_NCCL -DMKL_ILP64 -m64
CMAKE_CXX_FLAGS_DEBUG=-O0 -g -rdynamic
CMAKE_CXX_FLAGS_MINSIZEREL=-Os -DNDEBUG
CMAKE_CXX_FLAGS_RELEASE=-O3 -m64 -funroll-loops -g -rdynamic
CMAKE_CXX_FLAGS_RELWITHDEBINFO=-O3 -m64 -funroll-loops -g -rdynamic
CMAKE_C_COMPILER=/usr/bin/cc
CMAKE_C_COMPILER_AR=/usr/bin/gcc-ar-9
CMAKE_C_COMPILER_RANLIB=/usr/bin/gcc-ranlib-9
CMAKE_C_FLAGS=-pthread -Wl,--no-as-needed -fPIC -Wno-unused-result  -march=native  -DMKL_ILP64 -m64
CMAKE_C_FLAGS_DEBUG=-O0 -g -rdynamic
CMAKE_C_FLAGS_MINSIZEREL=-Os -DNDEBUG
CMAKE_C_FLAGS_RELEASE=-O3 -m64 -funroll-loops -g -rdynamic
CMAKE_C_FLAGS_RELWITHDEBINFO=-O3 -m64 -funroll-loops -g -rdynamic
CMAKE_DLLTOOL=CMAKE_DLLTOOL-NOTFOUND
CMAKE_INSTALL_BINDIR=bin
CMAKE_INSTALL_DATAROOTDIR=share
CMAKE_INSTALL_INCLUDEDIR=include
CMAKE_INSTALL_LIBDIR=lib
CMAKE_INSTALL_LIBEXECDIR=libexec
CMAKE_INSTALL_LOCALSTATEDIR=var
CMAKE_INSTALL_OLDINCLUDEDIR=/usr/include
CMAKE_INSTALL_PREFIX=/usr/local
CMAKE_INSTALL_SBINDIR=sbin
CMAKE_INSTALL_SHAREDSTATEDIR=com
CMAKE_INSTALL_SYSCONFDIR=etc
CMAKE_LINKER=/usr/bin/ld
CMAKE_MAKE_PROGRAM=/usr/bin/make
CMAKE_NM=/usr/bin/nm
CMAKE_OBJCOPY=/usr/bin/objcopy
CMAKE_OBJDUMP=/usr/bin/objdump
CMAKE_RANLIB=/usr/bin/ranlib
CMAKE_READELF=/usr/bin/readelf
CMAKE_SKIP_INSTALL_RPATH=NO
CMAKE_SKIP_RPATH=NO
CMAKE_STRIP=/usr/bin/strip
CMAKE_TAPI=CMAKE_TAPI-NOTFOUND
CMAKE_VERBOSE_MAKEFILE=FALSE
COMPILE-FP16=on
COMPILE_AMPERE=ON
COMPILE_AMPERE_RTX=ON
COMPILE_AVX=ON
COMPILE_AVX2=ON
COMPILE_AVX512=ON
COMPILE_CPU=on
COMPILE_CUDA=ON
COMPILE_EXAMPLES=OFF
COMPILE_KEPLER=OFF
COMPILE_LIBRARY_ONLY=OFF
COMPILE_MAXWELL=OFF
COMPILE_PASCAL=ON
COMPILE_SERVER=OFF
COMPILE_SSE2=ON
COMPILE_SSE3=ON
COMPILE_SSE4_1=ON
COMPILE_SSE4_2=ON
COMPILE_TESTS=OFF
COMPILE_TURING=ON
COMPILE_VOLTA=ON
CUDA_64_BIT_DEVICE_CODE=ON
CUDA_ATTACH_VS_BUILD_RULE_TO_CUDA_FILE=ON
CUDA_BUILD_CUBIN=OFF
CUDA_BUILD_EMULATION=OFF
CUDA_CUDART_LIBRARY=/usr/local/cuda-11.8/lib64/libcudart.so
CUDA_CUDA_LIBRARY=/usr/lib/x86_64-linux-gnu/libcuda.so
CUDA_HOST_COMPILATION_CPP=ON
CUDA_HOST_COMPILER=/usr/bin/cc
CUDA_NVCC_EXECUTABLE=/usr/local/cuda-11.8/bin/nvcc
CUDA_NVCC_FLAGS=-DUSE_SENTENCEPIECE-DCUDA_FOUND-DUSE_NCCL--default-streamper-thread-O3-g--use_fast_math-Wno-deprecated-gpu-targets-gencode=arch=compute_60,code=sm_60-gencode=arch=compute_61,code=sm_61-arch=sm_70-gencode=arch=compute_70,code=sm_70-gencode=arch=compute_70,code=compute_70-gencode=arch=compute_75,code=sm_75-gencode=arch=compute_75,code=compute_75-gencode=arch=compute_80,code=sm_80-gencode=arch=compute_80,code=compute_80-gencode=arch=compute_86,code=sm_86-gencode=arch=compute_86,code=compute_86-ccbin/usr/bin/cc-std=c++11-Xcompiler -fPIC-Xcompiler -Wno-unused-result-Xcompiler -Wno-deprecated-Xcompiler -Wno-pragmas-Xcompiler -Wno-unused-value-Xcompiler -Werror-DDETERMINISTIC=0
CUDA_OpenCL_LIBRARY=/usr/local/cuda-11.8/lib64/libOpenCL.so
CUDA_PROPAGATE_HOST_FLAGS=OFF
CUDA_SDK_ROOT_DIR=CUDA_SDK_ROOT_DIR-NOTFOUND
CUDA_SEPARABLE_COMPILATION=OFF
CUDA_TOOLKIT_INCLUDE=/usr/local/cuda-11.8/include
CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.8
CUDA_USE_STATIC_CUDA_RUNTIME=ON
CUDA_VERBOSE_BUILD=OFF
CUDA_VERSION=11.8
CUDA_cublasLt_LIBRARY=/usr/local/cuda-11.8/lib64/libcublasLt.so
CUDA_cublas_LIBRARY=/usr/local/cuda-11.8/lib64/libcublas.so
CUDA_cudadevrt_LIBRARY=/usr/local/cuda-11.8/lib64/libcudadevrt.a
CUDA_cudart_static_LIBRARY=/usr/local/cuda-11.8/lib64/libcudart_static.a
CUDA_cufft_LIBRARY=/usr/local/cuda-11.8/lib64/libcufft.so
CUDA_cupti_LIBRARY=/usr/local/cuda-11.8/extras/CUPTI/lib64/libcupti.so
CUDA_curand_LIBRARY=/usr/local/cuda-11.8/lib64/libcurand.so
CUDA_cusolver_LIBRARY=/usr/local/cuda-11.8/lib64/libcusolver.so
CUDA_cusparse_LIBRARY=/usr/local/cuda-11.8/lib64/libcusparse.so
CUDA_nppc_LIBRARY=/usr/local/cuda-11.8/lib64/libnppc.so
CUDA_nppial_LIBRARY=/usr/local/cuda-11.8/lib64/libnppial.so
CUDA_nppicc_LIBRARY=/usr/local/cuda-11.8/lib64/libnppicc.so
CUDA_nppidei_LIBRARY=/usr/local/cuda-11.8/lib64/libnppidei.so
CUDA_nppif_LIBRARY=/usr/local/cuda-11.8/lib64/libnppif.so
CUDA_nppig_LIBRARY=/usr/local/cuda-11.8/lib64/libnppig.so
CUDA_nppim_LIBRARY=/usr/local/cuda-11.8/lib64/libnppim.so
CUDA_nppist_LIBRARY=/usr/local/cuda-11.8/lib64/libnppist.so
CUDA_nppisu_LIBRARY=/usr/local/cuda-11.8/lib64/libnppisu.so
CUDA_nppitc_LIBRARY=/usr/local/cuda-11.8/lib64/libnppitc.so
CUDA_npps_LIBRARY=/usr/local/cuda-11.8/lib64/libnpps.so
CUDA_nvToolsExt_LIBRARY=/usr/local/cuda-11.8/lib64/libnvToolsExt.so
CUDA_rt_LIBRARY=/usr/lib/x86_64-linux-gnu/librt.so
DETERMINISTIC=OFF
DOXYGEN_DOT_EXECUTABLE=/usr/bin/dot
DOXYGEN_EXECUTABLE=DOXYGEN_EXECUTABLE-NOTFOUND
GENERATE_MARIAN_INSTALL_TARGETS=OFF
GIT_EXECUTABLE=/usr/bin/git
INTEL_ROOT=/opt/intel
INTGEMM_CPUID_ENVIRONMENT=ON
INTGEMM_DONT_BUILD_TESTS=ON
MKL_CORE_LIBRARY=/opt/intel/mkl/lib/intel64/libmkl_core.a
MKL_INCLUDE_DIR=/opt/intel/mkl/include
MKL_INCLUDE_DIRS=/opt/intel/mkl/include
MKL_INTERFACE_LIBRARY=/opt/intel/mkl/lib/intel64/libmkl_intel_ilp64.a
MKL_LIBRARIES=-Wl,--start-group/opt/intel/mkl/lib/intel64/libmkl_intel_ilp64.a/opt/intel/mkl/lib/intel64/libmkl_sequential.a/opt/intel/mkl/lib/intel64/libmkl_core.a-Wl,--end-group
MKL_ROOT=/opt/intel/mkl
MKL_SEQUENTIAL_LAYER_LIBRARY=/opt/intel/mkl/lib/intel64/libmkl_sequential.a
SPM_ARTIFACT_NAME=sentencepiece
SPM_BUILD_TEST=OFF
SPM_COVERAGE=OFF
SPM_ENABLE_NFKC_COMPILE=OFF
SPM_ENABLE_SHARED=OFF
SPM_ENABLE_TCMALLOC=ON
SPM_ENABLE_TENSORFLOW_SHARED=OFF
SPM_NO_THREADLOCAL=OFF
SPM_TCMALLOC_STATIC=OFF
SPM_USE_BUILTIN_PROTOBUF=ON
SQLITE_ENABLE_ASSERT_HANDLER=OFF
SQLITE_ENABLE_COLUMN_METADATA=ON
SQLITE_USE_LEGACY_STRUCT=OFF
SSE2_FOUND=true
SSE3_FOUND=true
SSE4_1_FOUND=true
SSE4_2_FOUND=true
SSSE3_FOUND=true
TCMALLOC_LIB=/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so
USE_APPLE_ACCELERATE=OFF
USE_CCACHE=OFF
USE_CUDNN=OFF
USE_DOXYGEN=ON
USE_FBGEMM=OFF
USE_MKL=ON
USE_MPI=OFF
USE_NCCL=ON
USE_OPENMP=OFF
USE_SENTENCEPIECE=ON
USE_STATIC_LIBS=OFF
WORMHOLE=OFF
cblas_openblas_INCLUDE=/usr/include/x86_64-linux-gnu
cblas_openblas_LIBRARY=/usr/lib/x86_64-linux-gnu/libopenblas.so

Log file:
Here is the log of the marian-decoder

[2023-12-19 18:14:58] [config] allow-unk: false
[2023-12-19 18:14:58] [config] authors: false
[2023-12-19 18:14:58] [config] beam-size: 12
[2023-12-19 18:14:58] [config] bert-class-symbol: "[CLS]"
[2023-12-19 18:14:58] [config] bert-mask-symbol: "[MASK]"
[2023-12-19 18:14:58] [config] bert-masking-fraction: 0.15
[2023-12-19 18:14:58] [config] bert-sep-symbol: "[SEP]"
[2023-12-19 18:14:58] [config] bert-train-type-embeddings: true
[2023-12-19 18:14:58] [config] bert-type-vocab-size: 2
[2023-12-19 18:14:58] [config] best-deep: false
[2023-12-19 18:14:58] [config] build-info: ""
[2023-12-19 18:14:58] [config] check-nan: false
[2023-12-19 18:14:58] [config] cite: false
[2023-12-19 18:14:58] [config] cpu-threads: 1
[2023-12-19 18:14:58] [config] data-threads: 4
[2023-12-19 18:14:58] [config] dec-cell: ssru
[2023-12-19 18:14:58] [config] dec-cell-base-depth: 2
[2023-12-19 18:14:58] [config] dec-cell-high-depth: 1
[2023-12-19 18:14:58] [config] dec-depth: 2
[2023-12-19 18:14:58] [config] devices:
[2023-12-19 18:14:58] [config]   - 0
[2023-12-19 18:14:58] [config] dim-emb: 256
[2023-12-19 18:14:58] [config] dim-rnn: 512
[2023-12-19 18:14:58] [config] dim-vocabs:
[2023-12-19 18:14:58] [config]   - 16000
[2023-12-19 18:14:58] [config]   - 16000
[2023-12-19 18:14:58] [config] dump-config: ""
[2023-12-19 18:14:58] [config] enc-cell: gru
[2023-12-19 18:14:58] [config] enc-cell-depth: 1
[2023-12-19 18:14:58] [config] enc-depth: 4
[2023-12-19 18:14:58] [config] enc-type: bidirectional
[2023-12-19 18:14:58] [config] factors-combine: sum
[2023-12-19 18:14:58] [config] factors-dim-emb: 0
[2023-12-19 18:14:58] [config] force-decode: false
[2023-12-19 18:14:58] [config] gemm-type: float32
[2023-12-19 18:14:58] [config] ignore-model-config: false
[2023-12-19 18:14:58] [config] input:
[2023-12-19 18:14:58] [config]   - stdin
[2023-12-19 18:14:58] [config] input-types:
[2023-12-19 18:14:58] [config]   []
[2023-12-19 18:14:58] [config] interpolate-env-vars: false
[2023-12-19 18:14:58] [config] layer-normalization: true
[2023-12-19 18:14:58] [config] lemma-dependency: ""
[2023-12-19 18:14:58] [config] lemma-dim-emb: 0
[2023-12-19 18:14:58] [config] log: ""
[2023-12-19 18:14:58] [config] log-level: info
[2023-12-19 18:14:58] [config] log-time-zone: ""
[2023-12-19 18:14:58] [config] max-length: 1000
[2023-12-19 18:14:58] [config] max-length-crop: false
[2023-12-19 18:14:58] [config] max-length-factor: 3
[2023-12-19 18:14:58] [config] maxi-batch: 1
[2023-12-19 18:14:58] [config] maxi-batch-sort: none
[2023-12-19 18:14:58] [config] mini-batch: 1
[2023-12-19 18:14:58] [config] mini-batch-words: 0
[2023-12-19 18:14:58] [config] model-mmap: false
[2023-12-19 18:14:58] [config] models:
[2023-12-19 18:14:58] [config]   - /data/workspace/experiments/marian/models/default/model.bin
[2023-12-19 18:14:58] [config] n-best: false
[2023-12-19 18:14:58] [config] no-spm-decode: false
[2023-12-19 18:14:58] [config] normalize: 0
[2023-12-19 18:14:58] [config] num-devices: 0
[2023-12-19 18:14:58] [config] optimize: false
[2023-12-19 18:14:58] [config] output: stdout
[2023-12-19 18:14:58] [config] output-approx-knn:
[2023-12-19 18:14:58] [config]   []
[2023-12-19 18:14:58] [config] output-omit-bias: false
[2023-12-19 18:14:58] [config] output-sampling:
[2023-12-19 18:14:58] [config]   []
[2023-12-19 18:14:58] [config] precision:
[2023-12-19 18:14:58] [config]   - float16
[2023-12-19 18:14:58] [config] quantize-range: 0
[2023-12-19 18:14:58] [config] quiet: false
[2023-12-19 18:14:58] [config] quiet-translation: false
[2023-12-19 18:14:58] [config] relative-paths: false
[2023-12-19 18:14:58] [config] right-left: false
[2023-12-19 18:14:58] [config] seed: 0
[2023-12-19 18:14:58] [config] shortlist:
[2023-12-19 18:14:58] [config]   []
[2023-12-19 18:14:58] [config] skip: false
[2023-12-19 18:14:58] [config] skip-cost: false
[2023-12-19 18:14:58] [config] stat-freq: 0
[2023-12-19 18:14:58] [config] tied-embeddings: false
[2023-12-19 18:14:58] [config] tied-embeddings-all: true
[2023-12-19 18:14:58] [config] tied-embeddings-src: false
[2023-12-19 18:14:58] [config] transformer-aan-activation: swish
[2023-12-19 18:14:58] [config] transformer-aan-depth: 2
[2023-12-19 18:14:58] [config] transformer-aan-nogate: false
[2023-12-19 18:14:58] [config] transformer-decoder-autoreg: rnn
[2023-12-19 18:14:58] [config] transformer-decoder-dim-ffn: 0
[2023-12-19 18:14:58] [config] transformer-decoder-ffn-depth: 0
[2023-12-19 18:14:58] [config] transformer-depth-scaling: false
[2023-12-19 18:14:58] [config] transformer-dim-aan: 1024
[2023-12-19 18:14:58] [config] transformer-dim-ffn: 1024
[2023-12-19 18:14:58] [config] transformer-ffn-activation: relu
[2023-12-19 18:14:58] [config] transformer-ffn-depth: 2
[2023-12-19 18:14:58] [config] transformer-guided-alignment-layer: last
[2023-12-19 18:14:58] [config] transformer-heads: 4
[2023-12-19 18:14:58] [config] transformer-no-projection: false
[2023-12-19 18:14:58] [config] transformer-pool: false
[2023-12-19 18:14:58] [config] transformer-postprocess: dan
[2023-12-19 18:14:58] [config] transformer-postprocess-emb: d
[2023-12-19 18:14:58] [config] transformer-postprocess-top: ""
[2023-12-19 18:14:58] [config] transformer-preprocess: ""
[2023-12-19 18:14:58] [config] transformer-rnn-projection: false
[2023-12-19 18:14:58] [config] transformer-tied-layers:
[2023-12-19 18:14:58] [config]   []
[2023-12-19 18:14:58] [config] transformer-train-position-embeddings: false
[2023-12-19 18:14:58] [config] tsv: false
[2023-12-19 18:14:58] [config] tsv-fields: 0
[2023-12-19 18:14:58] [config] type: transformer
[2023-12-19 18:14:58] [config] ulr: false
[2023-12-19 18:14:58] [config] ulr-dim-emb: 0
[2023-12-19 18:14:58] [config] ulr-trainable-transformation: false
[2023-12-19 18:14:58] [config] version: v1.10.14; 1cc16bc4 2022-10-10 12:00:00 +0200
[2023-12-19 18:14:58] [config] vocabs:
[2023-12-19 18:14:58] [config]   - /data/workspace/experiments/marian/models/default/model.spv
[2023-12-19 18:14:58] [config]   - /data/workspace/experiments/marian/models/default/model.spv
[2023-12-19 18:14:58] [config] weights:
[2023-12-19 18:14:58] [config]   []
[2023-12-19 18:14:58] [config] word-penalty: 0
[2023-12-19 18:14:58] [config] word-scores: false
[2023-12-19 18:14:58] [config] workspace: 512
[2023-12-19 18:14:58] [config] Loaded model has been created with Marian v1.10.14; 1cc16bc4 2022-10-10 12:00:00 +0200
[2023-12-19 18:14:58] [data] Loading vocabulary from text file /data/workspace/experiments/marian/models/default/model.spv
[2023-12-19 18:14:58] [data] Loading vocabulary from text file /data/workspace/experiments/marian/models/default/model.spv
[2023-12-19 18:14:58] Loading model from /data/workspace/experiments/marian/models/default/model.bin
[2023-12-19 18:14:58] [memory] Extending reserved space to 512 MB (device cpu0)
[2023-12-19 18:14:58] Loaded model config
[2023-12-19 18:14:58] Loading scorer of type transformer as feature F0
[2023-12-19 18:14:59] [memory] Reserving 81 MB, device cpu0
[2023-12-19 18:14:59] Error: Unsupported type for element-wise operation: float16
[2023-12-19 18:14:59] Error: Aborted from void marian::cpu::Element(const Functor&, marian::Tensor, Tensors ...) [with Functor = marian::functional::Assign<marian::functional::Var<1>, marian::functional::BinaryFunctor<marian::functional::elem::Mult, marian::functional::Capture, marian::functional::Assignee<2> > >; Tensors = {IntrusivePtr<marian::TensorBase>}; marian::Tensor = IntrusivePtr<marian::TensorBase>] in /data/workspace/code/marian/src/tensors/cpu/element.h:122

[CALL STACK]
[0x55c3c328d12e]    void marian::cpu::  Element  <marian::functional::Assign<marian::functional::Var<1>,marian::functional::BinaryFunctor<marian::functional::elem::Mult,marian::functional::Capture,marian::functional::Assignee<2>>>,IntrusivePtr<marian::TensorBase>>(marian::functional::Assign<marian::functional::Var<1>,marian::functional::BinaryFunctor<marian::functional::elem::Mult,marian::functional::Capture,marian::functional::Assignee<2>>> const&,  IntrusivePtr<marian::TensorBase>,  IntrusivePtr<marian::TensorBase>) + 0x2ce
[0x55c3c328d76c]    void marian::  Element  <marian::functional::Assign<marian::functional::Var<1>,marian::functional::BinaryFunctor<marian::functional::elem::Mult,marian::functional::Capture,marian::functional::Assignee<2>>>,IntrusivePtr<marian::TensorBase>>(marian::functional::Assign<marian::functional::Var<1>,marian::functional::BinaryFunctor<marian::functional::elem::Mult,marian::functional::Capture,marian::functional::Assignee<2>>>,  IntrusivePtr<marian::TensorBase>,  IntrusivePtr<marian::TensorBase>) + 0x1bc
[0x55c3c328d8a1]    std::_Function_handler<void (),marian::ScalarMultNodeOp::forwardOps()::{lambda()#1}>::  _M_invoke  (std::_Any_data const&) + 0xa1
[0x55c3c32a653f]    marian::Node::  forward  ()                        + 0x21f
[0x55c3c311f8f5]    marian::ExpressionGraph::  forward  (std::__cxx11::list<IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase>>>,std::allocator<IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase>>>>>&,  bool) + 0x205
[0x55c3c3121019]    marian::ExpressionGraph::  forwardNext  ()         + 0x2d9
[0x55c3c32f5459]    marian::BeamSearch::  search  (std::shared_ptr<marian::ExpressionGraph>,  std::shared_ptr<marian::data::CorpusBatch>) + 0x42f9
[0x55c3c2fb2e8a]    marian::Translate<marian::BeamSearch>::run()::{lambda(unsigned long)#1}::  operator()  (unsigned long) const + 0x6ba
[0x55c3c2fb4de4]    marian::ThreadPool::enqueue<marian::Translate<marian::BeamSearch>::run()::{lambda(unsigned long)#1}&,unsigned long>(std::result_of&&,(marian::Translate<marian::BeamSearch>::run()::{lambda(unsigned long)#1}&)...)::{lambda()#1}::  operator()  () const + 0x34
[0x55c3c2fb59c4]    std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base,std::__future_base::_Result_base::_Deleter> (),std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>,std::__future_base::_Result_base::_Deleter>,std::__future_base::_Task_state<marian::ThreadPool::enqueue<marian::Translate<marian::BeamSearch>::run()::{lambda(unsigned long)#1}&,unsigned long>(std::result_of&&,(marian::Translate<marian::BeamSearch>::run()::{lambda(unsigned long)#1}&)...)::{lambda()#1},std::allocator<int>,void ()>::_M_run()::{lambda()#1},void>>::  _M_invoke  (std::_Any_data const&) + 0x34
[0x55c3c2f5865d]    std::__future_base::_State_baseV2::  _M_do_set  (std::function<std::unique_ptr<std::__future_base::_Result_base,std::__future_base::_Result_base::_Deleter> ()>*,  bool*) + 0x2d
[0x7fdeb06c34df]                                                       + 0x114df
[0x55c3c2f59b7c]    std::__future_base::_Task_state<marian::ThreadPool::enqueue<marian::Translate<marian::BeamSearch>::run()::{lambda(unsigned long)#1}&,unsigned long>(std::result_of&&,(marian::Translate<marian::BeamSearch>::run()::{lambda(unsigned long)#1}&)...)::{lambda()#1},std::allocator<int>,void ()>::  _M_run  () + 0xfc
[0x55c3c2f5900e]    std::thread::_State_impl<std::thread::_Invoker<std::tuple<marian::ThreadPool::reserve(unsigned long)::{lambda()#1}>>>::  _M_run  () + 0x16e
[0x7fdeb05a6df4]                                                       + 0xd6df4
[0x7fdeb06ba609]                                                       + 0x8609
[0x7fdeb0291353]    clone                                              + 0x43

Aborted (core dumped)

Add any other information about the problem here.

the inference works properly on CPU without --fp16
the inference works properly on GPU both with and without --fp16
the decoder cell type is: dec-cell: ssru

Actually, by looking at this
it seems that float16 is not enabled at all, whereas the documentation (https://marian-nmt.github.io/docs/cmd/marian-decoder/) says it is.

The text was updated successfully, but these errors were encountered:

emjotde · 2023-12-29T00:04:10Z

Hi, fp16 isn't a CPU type, that's GPU-only. The error message could be a bit clearer, but that's not supposed to work.

nicolabertoldi · 2023-12-30T19:07:06Z

thank you very much for your reply.
I would suggest to explain better in the documentation as well.

nicolabertoldi added the bug label Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fp16 does not work on CPU #1020

fp16 does not work on CPU #1020

nicolabertoldi commented Dec 19, 2023 •

edited

Loading

emjotde commented Dec 29, 2023

nicolabertoldi commented Dec 30, 2023

fp16 does not work on CPU #1020

fp16 does not work on CPU #1020

Comments

nicolabertoldi commented Dec 19, 2023 • edited Loading

Bug description

How to reproduce

Context

emjotde commented Dec 29, 2023

nicolabertoldi commented Dec 30, 2023

nicolabertoldi commented Dec 19, 2023 •

edited

Loading