Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Panic on pyflyte run --remote #4421

Closed
2 tasks done
AmadiL opened this issue Nov 14, 2023 · 6 comments
Closed
2 tasks done

[BUG] Panic on pyflyte run --remote #4421

AmadiL opened this issue Nov 14, 2023 · 6 comments
Labels
bug Something isn't working flytekit FlyteKit Python related issue waiting for reporter Used for when we need input from the bug reporter

Comments

@AmadiL
Copy link

AmadiL commented Nov 14, 2023

Describe the bug

Anytime I run pyflyte run --remote the flyte-binary crashes with:

{
    "json": {
        "src": "base.go:55"
    },
    "level": "fatal",
    "msg": "panic-ed for request: [id:\u003cresource_type:TASK project:\"default\" domain:\"main\" name:\"flyte_example.workflows.slope\" version:\"80GZB-Kw1bLBbjYRj6fnmg==\" \u003e spec:\u003ctemplate:\u003cid:\u003cresource_type:TASK project:\"default\" domain:\"main\" name:\"flyte_example.workflows.slope\" version:\"80GZB-Kw1bLBbjYRj6fnmg==\" \u003e type:\"python-task\" metadata:\u003cruntime:\u003ctype:FLYTE_SDK version:\"1.10.0\" flavor:\"python\" \u003e retries:\u003c\u003e \u003e interface:\u003cinputs:\u003cvariables:\u003ckey:\"x\" value:\u003ctype:\u003ccollection_type:\u003csimple:INTEGER \u003e \u003e description:\"x\" \u003e \u003e variables:\u003ckey:\"y\" value:\u003ctype:\u003ccollection_type:\u003csimple:INTEGER \u003e \u003e description:\"y\" \u003e \u003e \u003e outputs:\u003cvariables:\u003ckey:\"o0\" value:\u003ctype:\u003csimple:FLOAT \u003e description:\"o0\" \u003e \u003e \u003e \u003e container:\u003cimage:\"cr.flyte.org/flyteorg/flytekit:py3.11-1.10.0\" args:\"pyflyte-fast-execute\" args:\"--additional-distribution\" args:\"s3://flyte-******/default/main/ZY2AUNVDSAU77PEEEXAYEOMJMM======/script_mode.tar.gz\" args:\"--dest-dir\" args:\"/root\" args:\"--\" args:\"pyflyte-execute\" args:\"--inputs\" args:\"{{.input}}\" args:\"--output-prefix\" args:\"{{.outputPrefix}}\" args:\"--raw-output-data-prefix\" args:\"{{.rawOutputDataPrefix}}\" args:\"--checkpoint-path\" args:\"{{.checkpointOutputPrefix}}\" args:\"--prev-checkpoint\" args:\"{{.prevCheckpointPrefix}}\" args:\"--resolver\" args:\"flytekit.core.python_auto_container.default_task_resolver\" args:\"--\" args:\"task-module\" args:\"flyte_example.workflows\" args:\"task-name\" args:\"slope\" resources:\u003c\u003e \u003e \u003e description:\u003clong_description:\u003cformat:DESCRIPTION_FORMAT_RST \u003e \u003e \u003e ] with err: runtime error: invalid memory address or nil pointer dereference with Stack: goroutine 4321 [running]:\nruntime/debug.Stack()\n\t/usr/local/go/src/runtime/debug/stack.go:24 +0x65\ngithub.com/flyteorg/flyte/flyteadmin/pkg/rpc/adminservice.(*AdminService).interceptPanic(0xc010600000, {0x42bac40, 0xc018a267e0}, {0x42a79e0?, 0xc018a26810})\n\t/flyteorg/build/flyteadmin/pkg/rpc/adminservice/base.go:55 +0x85\npanic({0x2c21a60, 0x5b41910})\n\t/usr/local/go/src/runtime/panic.go:884 +0x212\ngithub.com/flyteorg/flyte/flyteadmin/pkg/manager/impl/util.fromAdminProtoTaskResourceSpec({_, _}, _)\n\t/flyteorg/build/flyteadmin/pkg/manager/impl/util/resources.go:60 +0x9b\ngithub.com/flyteorg/flyte/flyteadmin/pkg/manager/impl/util.GetTaskResources({_, _}, _, {_, _}, {_, _})\n\t/flyteorg/build/flyteadmin/pkg/manager/impl/util/resources.go:111 +0x37e\ngithub.com/flyteorg/flyte/flyteadmin/pkg/manager/impl.(*TaskManager).CreateTask(0xc00fd8db90, {0x42bac40, 0xc018a267e0}, {0xc018a5e1c0, 0xc018a68180, {}, {0x0, 0x0, 0x0}, 0x0})\n\t/flyteorg/build/flyteadmin/pkg/manager/impl/task_manager.go:64 +0xbb\ngithub.com/flyteorg/flyte/flyteadmin/pkg/rpc/adminservice.(*AdminService).CreateTask.func1()\n\t/flyteorg/build/flyteadmin/pkg/rpc/adminservice/task.go:25 +0x9b\ngithub.com/flyteorg/flyte/flytestdlib/promutils.StopWatch.Time({{0x7fa5240a5140?, 0xc0101b5800?}, 0xca4e00?}, 0xc0186f3850)\n\t/flyteorg/build/flytestdlib/promutils/scope.go:58 +0xc2\ngithub.com/flyteorg/flyte/flyteadmin/pkg/rpc/adminservice/util.(*RequestMetrics).Time(...)\n\t/flyteorg/build/flyteadmin/pkg/rpc/adminservice/util/metrics.go:33\ngithub.com/flyteorg/flyte/flyteadmin/pkg/rpc/adminservice.(*AdminService).CreateTask(0xc010600000, {0x42bac40?, 0xc018a267e0?}, 0xc018a798c0?)\n\t/flyteorg/build/flyteadmin/pkg/rpc/adminservice/task.go:24 +0x169\ngithub.com/flyteorg/flyte/flyteidl/gen/pb-go/flyteidl/service._AdminService_CreateTask_Handler.func1({0x42bac40, 0xc018a267e0}, {0x2f82c00?, 0xc018a26810})\n\t/flyteorg/build/flyteidl/gen/pb-go/flyteidl/service/admin.pb.go:1077 +0x78\ngithub.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).UnaryServerInterceptor.func1({0x42bac40, 0xc018a267e0}, {0x2f82c00, 0xc018a26810}, 0x7fa4fffe24d8?, 0xc0105d2f30)\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/server_metrics.go:107 +0x87\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x42bac40?, 0xc018a267e0?}, {0x2f82c00?, 0xc018a26810?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25 +0x3a\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1({0x42bac40, 0xc018a267e0}, {0x2f82c00, 0xc018a26810}, 0xc011a4ca20?, 0x2c1fa80?)\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:34 +0xbe\ngithub.com/flyteorg/flyte/flyteidl/gen/pb-go/flyteidl/service._AdminService_CreateTask_Handler({0x30eadc0?, 0xc010600000}, {0x42bac40, 0xc018a267e0}, 0xc0184d3dc0, 0xc0005dc8a0)\n\t/flyteorg/build/flyteidl/gen/pb-go/flyteidl/service/admin.pb.go:1079 +0x138\ngoogle.golang.org/grpc.(*Server).processUnaryRPC(0xc0005f2000, {0x42c9be0, 0xc011689a00}, 0xc0184e0360, 0xc00f0c1c20, 0x5b64020, 0x0)\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1337 +0xdf0\ngoogle.golang.org/grpc.(*Server).handleStream(0xc0005f2000, {0x42c9be0, 0xc011689a00}, 0xc0184e0360, 0x0)\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1714 +0xa2f\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.1()\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:959 +0x98\ncreated by google.golang.org/grpc.(*Server).serveStreams.func1\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:957 +0x18c\n",
    "ts": "2023-11-14T15:12:17Z"
}

Expected behavior

Remote execution of the workflow.

Additional context to reproduce

python 3.11.4
flytekit 1.10.0
flyte-binary 1.10.0

Screenshots

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@AmadiL AmadiL added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers labels Nov 14, 2023
@AmadiL
Copy link
Author

AmadiL commented Nov 14, 2023

I've just noticed it happens after I update task-resource-attribute:

$ flytectl update task-resource-attribute --attrFile tra.yaml
The following changes are to be applied.
--- before
+++ after
@@ -1,2 +1,6 @@
-null
+TaskResourceAttributes:
+    limits:
+        cpu: "2"
+        gpu: "1"
+        memory: 4Gi
 

Continue? [y/n]: y
Updated attributes from default project and domain mai

@AmadiL
Copy link
Author

AmadiL commented Nov 14, 2023

Update: errors disappeared after I've added defaults to the TaskResourceAttributes.
I guess this still an unexpected behaviour.

@eapolinario
Copy link
Contributor

@AmadiL , what architecture are you running this? How are you deploying flyte-binary?

@eapolinario eapolinario added waiting for reporter Used for when we need input from the bug reporter and removed untriaged This issues has not yet been looked at by the Maintainers labels Nov 30, 2023
@pingsutw pingsutw added the flytekit FlyteKit Python related issue label Feb 15, 2024
@kumare3
Copy link
Contributor

kumare3 commented Apr 17, 2024

This is really odd, this is flyteadmin causing an error with fatal. @AmadiL can you provide more backend logs? Seems like some bad state in the backend
cc @katrogan fyi

@amadeusz-ds
Copy link

Sorry, I no longer have access to that project.

I deployed it using flyte-binary helm chart and by default TaskResourceAttributes were null which caused this error.
@eapolinario @kumare3

@kumare3
Copy link
Contributor

kumare3 commented Apr 17, 2024

Aah good to know - bad config. Let's close this

@kumare3 kumare3 closed this as completed Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flytekit FlyteKit Python related issue waiting for reporter Used for when we need input from the bug reporter
Projects
None yet
Development

No branches or pull requests

5 participants