AWS jobstore buckets should inherit owner tags from cluster #4029

glennhickey · 2022-02-01T14:30:31Z

I've been running jobs with AWS autoscale for the first time in a bit and it seems when Cactus fails, the jobstore is left in an unusable state. For example, I just aborted a workflow with aws:us-west-2:glennhickey-jobstore-pa3 as the jobstore.

But if I try to run it with --restart I get

cactus-graphmap aws:us-west-2:glennhickey-jobstore-pa3 ./apes.pan.txt s3://vg-k8s/vgamb/users/hickey/apes-pangenome/apes.minigraph.gfa.gz s3://vg-k8s/vgamb/users/hickey/apes-pangenome/apes.pan.paf --realTimeLogging --reference hg38  --base --nodeTypes r4.8xlarge:1.25 --maxNodes 25 --nodeStorage 1000 --batchSystem mesos --provisioner aws --defaultPreemptable  --realTimeLogging --mapCores 32 --outputFasta  s3://vg-k8s/vgamb/users/hickey/apes-pangenome/apes.gfa.fa --delFilter 5000000 --logFile paf.log --restart
/usr/local/lib/python3.8/dist-packages/requests/__init__.py:102: RequestsDependencyWarning: urllib3 (1.26.8) or chardet (2.3.0)/charset_normalizer (2.0.10) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported "
[2022-02-01T14:27:54+0000] [MainThread] [I] [toil.lib.retry] Got An error occurred (404) when calling the HeadBucket operation: Not Found, trying again in 0s.
[2022-02-01T14:27:54+0000] [MainThread] [I] [toil.lib.retry] Got An error occurred (404) when calling the HeadBucket operation: Not Found, trying again in 1s.
[2022-02-01T14:27:55+0000] [MainThread] [I] [toil.lib.retry] Got An error occurred (404) when calling the HeadBucket operation: Not Found, trying again in 1s.
[2022-02-01T14:27:56+0000] [MainThread] [I] [toil.lib.retry] Got An error occurred (404) when calling the HeadBucket operation: Not Found, trying again in 4s.
^CTraceback (most recent call last):

and the message loops forever. Same deal if I try

toil clean aws:us-west-2:glennhickey-jobstore-pa3 
/usr/local/lib/python3.8/dist-packages/requests/__init__.py:102: RequestsDependencyWarning: urllib3 (1.26.8) or chardet (2.3.0)/charset_normalizer (2.0.10) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported "
[2022-02-01T14:28:46+0000] [MainThread] [I] [toil.lib.retry] Got An error occurred (404) when calling the HeadBucket operation: Not Found, trying again in 0s.
[2022-02-01T14:28:46+0000] [MainThread] [I] [toil.lib.retry] Got An error occurred (404) when calling the HeadBucket operation: Not Found, trying again in 1s.
[2022-02-01T14:28:47+0000] [MainThread] [I] [toil.lib.retry] Got An error occurred (404) when calling the HeadBucket operation: Not Found, trying again in 1s.
[2022-02-01T14:28:48+0000] [MainThread] [I] [toil.lib.retry] Got An error occurred (404) when calling the HeadBucket operation: Not Found, trying again in 4s.
[2022-02-01T14:28:52+0000] [MainThread] [I] [toil.lib.retry] Got An error occurred (404) when calling the HeadBucket operation: Not Found, trying again in 16s.

this has happened everytime my workflow has aborted for whatever reason. Not sure if it's related to changes in toil or our aws enironment...

┆Issue is synchronized with this Jira Story
┆friendlyId: TOIL-1140

The text was updated successfully, but these errors were encountered:

glennhickey · 2022-02-01T14:52:37Z

Oh boy, I bet this is about me forgetting to set TOIL_OWNER_TAG. If it is, I would like to change this issue to a feature request:

When I create an cluster with toil launch-cluster --owner MYEMAIL, would it be possible to have TOIL_OWNER_TAG set to MYEMAIL by default whenever I open a shell on the cluster?

adamnovak · 2022-02-01T15:36:07Z

Sounds like there's two problems here:

Without the bucket, we can't clean the job store and destroy the SimpleDB domain.
It would be nice if the cluster's owner tag became the default TOIL_OWNER_TAG value in the default environment on the cluster (maybe in something mounted as /etc/profile in the appliance container?).

adamnovak · 2022-02-01T15:36:52Z

The first problem is tracked by #3924 I think.

glennhickey changed the title ~~AWS jobstore buckets can't be cleaned or resumed~~ AWS jobstore buckets should inherit owner tags from cluster Feb 1, 2022

unito-bot added the intern label May 31, 2022

Hexotical mentioned this issue Sep 29, 2022

Create a unit test to launch an AWS cluster with tags and check tags are passed through to container. #4230

Closed

Hexotical added a commit that referenced this issue Sep 29, 2022

Fix #4029

21820af

DailyDreaming closed this as completed in 8f38862 Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS jobstore buckets should inherit owner tags from cluster #4029

AWS jobstore buckets should inherit owner tags from cluster #4029

glennhickey commented Feb 1, 2022 •

edited by unito-bot

Loading

glennhickey commented Feb 1, 2022

adamnovak commented Feb 1, 2022

adamnovak commented Feb 1, 2022

AWS jobstore buckets should inherit owner tags from cluster #4029

AWS jobstore buckets should inherit owner tags from cluster #4029

Comments

glennhickey commented Feb 1, 2022 • edited by unito-bot Loading

glennhickey commented Feb 1, 2022

adamnovak commented Feb 1, 2022

adamnovak commented Feb 1, 2022

glennhickey commented Feb 1, 2022 •

edited by unito-bot

Loading