Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A task never ends when the task name is too long and the attempt cannot be killed. #1592

Open
utani-co opened this issue Jun 2, 2021 · 1 comment

Comments

@utani-co
Copy link

utani-co commented Jun 2, 2021

Hi all,

In our environment, a task never ended when FileSystemException occurred because of too long task name.
And the attempt could not be killed.

Here is a simple example for raising this error:
(In the actual our workflow, the task name become long because the tasks are deeply nested.)

Workflow name: task_name_is_too_long

timezone: Asia/Tokyo

+LoooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooogName:
  echo>: "never ending task"

When we ran the digfile, the attempt status remained "Pending" and the attempt never ended.
Even though we killed the attempt, the attempt status remained "Canceling" and did not become "Canceled".

The Exception logs is as bellow:

2021-06-02 18:21:58.492 +0900 [INFO] (XNIO-1 task-12): Starting a new session project id=1 workflow name=task_name_is_too_long session_time=2021-06-02T18:21:59+09:00
2021-06-02 18:21:59.643 +0900 [ERROR] (task-thread-1): Uncaught exception. Task queue will detect this failure and this task will be retried later.
java.lang.RuntimeException: java.nio.file.FileSystemException: /home/digdag/logs/tasklog/2021-06-02/0.1task_name_is_too_long@20210602T182159+0900/+task_name_is_too_long+LoooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooogName@[email protected]: File name too long
        at com.google.common.base.Throwables.propagate(Throwables.java:241)
        at io.digdag.core.log.LocalFileLogServerFactory$LocalFileLogServer.newDirectTaskLogger(LocalFileLogServerFactory.java:149)
        at io.digdag.core.log.LogServerManager.newInProcessTaskLogger(LogServerManager.java:85)
        at io.digdag.core.agent.InProcessTaskCallbackApi.newTaskLogger(InProcessTaskCallbackApi.java:99)
        at io.digdag.core.agent.OperatorManager.run(OperatorManager.java:127)
        at io.digdag.server.metrics.DigdagTimedMethodInterceptor.invokeMain(DigdagTimedMethodInterceptor.java:58)
        at io.digdag.server.metrics.DigdagTimedMethodInterceptor.invoke(DigdagTimedMethodInterceptor.java:31)
        at io.digdag.core.agent.MultiThreadAgent.lambda$null$0(MultiThreadAgent.java:132)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.file.FileSystemException: /home/digdag/logs/tasklog/2021-06-02/0.1task_name_is_too_long@20210602T182159+0900/+task_name_is_too_long+LoooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooogName@[email protected]: File name too long
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
        at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
        at java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434)
        at java.nio.file.Files.newOutputStream(Files.java:216)
        at io.digdag.core.log.CountingLogOutputStream.<init>(CountingLogOutputStream.java:19)
        at io.digdag.core.log.LocalFileLogServerFactory$LocalFileLogServer$LocalFileDirectTaskLogger.openNewFile(LocalFileLogServerFactory.java:181)
        at io.digdag.core.log.LocalFileLogServerFactory$LocalFileLogServer$LocalFileDirectTaskLogger.<init>(LocalFileLogServerFactory.java:172)
        at io.digdag.core.log.LocalFileLogServerFactory$LocalFileLogServer.newDirectTaskLogger(LocalFileLogServerFactory.java:146)
        ... 11 common frames omitted

Our Enviroment is as bellow:

  • digdag: v0.10.0 (server mode)
  • OS: CentOS7.9
  • DB: PostgreSQL11
  • Task Log Storage: local

Expected Results

The task should end to "Error" and the attempt should end to "Failure" when this error happens.

@hiroyuki-sato
Copy link
Contributor

Hello, @utani-co
Thank you for your report. I reproduced this issue in my environment.

Probably this issue relate to #729

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants