Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS jobstore buckets should inherit owner tags from cluster #4029

Closed
glennhickey opened this issue Feb 1, 2022 · 3 comments
Closed

AWS jobstore buckets should inherit owner tags from cluster #4029

glennhickey opened this issue Feb 1, 2022 · 3 comments
Labels

Comments

@glennhickey
Copy link
Contributor

glennhickey commented Feb 1, 2022

I've been running jobs with AWS autoscale for the first time in a bit and it seems when Cactus fails, the jobstore is left in an unusable state. For example, I just aborted a workflow with aws:us-west-2:glennhickey-jobstore-pa3 as the jobstore.

But if I try to run it with --restart I get

cactus-graphmap aws:us-west-2:glennhickey-jobstore-pa3 ./apes.pan.txt s3://vg-k8s/vgamb/users/hickey/apes-pangenome/apes.minigraph.gfa.gz s3://vg-k8s/vgamb/users/hickey/apes-pangenome/apes.pan.paf --realTimeLogging --reference hg38  --base --nodeTypes r4.8xlarge:1.25 --maxNodes 25 --nodeStorage 1000 --batchSystem mesos --provisioner aws --defaultPreemptable  --realTimeLogging --mapCores 32 --outputFasta  s3://vg-k8s/vgamb/users/hickey/apes-pangenome/apes.gfa.fa --delFilter 5000000 --logFile paf.log --restart
/usr/local/lib/python3.8/dist-packages/requests/__init__.py:102: RequestsDependencyWarning: urllib3 (1.26.8) or chardet (2.3.0)/charset_normalizer (2.0.10) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported "
[2022-02-01T14:27:54+0000] [MainThread] [I] [toil.lib.retry] Got An error occurred (404) when calling the HeadBucket operation: Not Found, trying again in 0s.
[2022-02-01T14:27:54+0000] [MainThread] [I] [toil.lib.retry] Got An error occurred (404) when calling the HeadBucket operation: Not Found, trying again in 1s.
[2022-02-01T14:27:55+0000] [MainThread] [I] [toil.lib.retry] Got An error occurred (404) when calling the HeadBucket operation: Not Found, trying again in 1s.
[2022-02-01T14:27:56+0000] [MainThread] [I] [toil.lib.retry] Got An error occurred (404) when calling the HeadBucket operation: Not Found, trying again in 4s.
^CTraceback (most recent call last):

and the message loops forever. Same deal if I try

toil clean aws:us-west-2:glennhickey-jobstore-pa3 
/usr/local/lib/python3.8/dist-packages/requests/__init__.py:102: RequestsDependencyWarning: urllib3 (1.26.8) or chardet (2.3.0)/charset_normalizer (2.0.10) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported "
[2022-02-01T14:28:46+0000] [MainThread] [I] [toil.lib.retry] Got An error occurred (404) when calling the HeadBucket operation: Not Found, trying again in 0s.
[2022-02-01T14:28:46+0000] [MainThread] [I] [toil.lib.retry] Got An error occurred (404) when calling the HeadBucket operation: Not Found, trying again in 1s.
[2022-02-01T14:28:47+0000] [MainThread] [I] [toil.lib.retry] Got An error occurred (404) when calling the HeadBucket operation: Not Found, trying again in 1s.
[2022-02-01T14:28:48+0000] [MainThread] [I] [toil.lib.retry] Got An error occurred (404) when calling the HeadBucket operation: Not Found, trying again in 4s.
[2022-02-01T14:28:52+0000] [MainThread] [I] [toil.lib.retry] Got An error occurred (404) when calling the HeadBucket operation: Not Found, trying again in 16s.

this has happened everytime my workflow has aborted for whatever reason. Not sure if it's related to changes in toil or our aws enironment...

┆Issue is synchronized with this Jira Story
┆friendlyId: TOIL-1140

@glennhickey
Copy link
Contributor Author

Oh boy, I bet this is about me forgetting to set TOIL_OWNER_TAG. If it is, I would like to change this issue to a feature request:

When I create an cluster with toil launch-cluster --owner MYEMAIL, would it be possible to have TOIL_OWNER_TAG set to MYEMAIL by default whenever I open a shell on the cluster?

@glennhickey glennhickey changed the title AWS jobstore buckets can't be cleaned or resumed AWS jobstore buckets should inherit owner tags from cluster Feb 1, 2022
@adamnovak
Copy link
Member

Sounds like there's two problems here:

  1. Without the bucket, we can't clean the job store and destroy the SimpleDB domain.
  2. It would be nice if the cluster's owner tag became the default TOIL_OWNER_TAG value in the default environment on the cluster (maybe in something mounted as /etc/profile in the appliance container?).

@adamnovak
Copy link
Member

The first problem is tracked by #3924 I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants