-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running into Cublas Error: 7 for target factors for marian 1.12 #1023
Comments
Same problem here, non-factored models work, factored models (both source and target factors) fail with the same error, our configuration is newest marian-dev and |
I have the same issue. Given you have been waiting for 3 weeks with no response from developers, I think it is fair to assume that Marian is not being supported anymore. |
I don't have commit access. If @mjpost wants to claim Marian is still maintained https://x.com/mjpost/status/1799130562344656901 he should address this issue. |
@hieuhoang is still fixing bugs in Moses! |
Bug description
Marian 1.12 (
65bf82ffce52f4854295d8b98482534f176d494e
) runs into this error for target factored data:How to reproduce
Run marian 1.12 compiled against CUDA 11+ with target factors.
I am trying to train marian models from scratch using factored data. It succeeds for source factors, but source-and-target factors or target factor trainings fail the CUBLAS check.
I compile
65bf82ffce52f4854295d8b98482534f176d494e
in a docker container and have tried this with a set of cuda-, nvidia- and marian-versions on ubuntu 22.04 and 18.04Variants that were tried:
Context
Marian output
marian version (in the docker environment)
nvidia-smi output
host system 1
host system 2
failing marian 1.12 cuda 12.3 docker container on host 1
working marian 1.11 cuda 10.2 docker container on host 1
failing marian 1.12 cuda 12.3 docker container on host 2
working marian 1.11 cuda 10.2 docker container on host 2
I notice the CUDA versions that nvidia-smi outputs seem to be whatever is higher, host system or docker CUDA, but all containers have been build to run the packed cuda.
The text was updated successfully, but these errors were encountered: