You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been running jobs with AWS autoscale for the first time in a bit and it seems when Cactus fails, the jobstore is left in an unusable state. For example, I just aborted a workflow with aws:us-west-2:glennhickey-jobstore-pa3 as the jobstore.
But if I try to run it with --restart I get
cactus-graphmap aws:us-west-2:glennhickey-jobstore-pa3 ./apes.pan.txt s3://vg-k8s/vgamb/users/hickey/apes-pangenome/apes.minigraph.gfa.gz s3://vg-k8s/vgamb/users/hickey/apes-pangenome/apes.pan.paf --realTimeLogging --reference hg38 --base --nodeTypes r4.8xlarge:1.25 --maxNodes 25 --nodeStorage 1000 --batchSystem mesos --provisioner aws --defaultPreemptable --realTimeLogging --mapCores 32 --outputFasta s3://vg-k8s/vgamb/users/hickey/apes-pangenome/apes.gfa.fa --delFilter 5000000 --logFile paf.log --restart
/usr/local/lib/python3.8/dist-packages/requests/__init__.py:102: RequestsDependencyWarning: urllib3 (1.26.8) or chardet (2.3.0)/charset_normalizer (2.0.10) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported "
[2022-02-01T14:27:54+0000] [MainThread] [I] [toil.lib.retry] Got An error occurred (404) when calling the HeadBucket operation: Not Found, trying again in 0s.
[2022-02-01T14:27:54+0000] [MainThread] [I] [toil.lib.retry] Got An error occurred (404) when calling the HeadBucket operation: Not Found, trying again in 1s.
[2022-02-01T14:27:55+0000] [MainThread] [I] [toil.lib.retry] Got An error occurred (404) when calling the HeadBucket operation: Not Found, trying again in 1s.
[2022-02-01T14:27:56+0000] [MainThread] [I] [toil.lib.retry] Got An error occurred (404) when calling the HeadBucket operation: Not Found, trying again in 4s.
^CTraceback (most recent call last):
and the message loops forever. Same deal if I try
toil clean aws:us-west-2:glennhickey-jobstore-pa3
/usr/local/lib/python3.8/dist-packages/requests/__init__.py:102: RequestsDependencyWarning: urllib3 (1.26.8) or chardet (2.3.0)/charset_normalizer (2.0.10) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported "
[2022-02-01T14:28:46+0000] [MainThread] [I] [toil.lib.retry] Got An error occurred (404) when calling the HeadBucket operation: Not Found, trying again in 0s.
[2022-02-01T14:28:46+0000] [MainThread] [I] [toil.lib.retry] Got An error occurred (404) when calling the HeadBucket operation: Not Found, trying again in 1s.
[2022-02-01T14:28:47+0000] [MainThread] [I] [toil.lib.retry] Got An error occurred (404) when calling the HeadBucket operation: Not Found, trying again in 1s.
[2022-02-01T14:28:48+0000] [MainThread] [I] [toil.lib.retry] Got An error occurred (404) when calling the HeadBucket operation: Not Found, trying again in 4s.
[2022-02-01T14:28:52+0000] [MainThread] [I] [toil.lib.retry] Got An error occurred (404) when calling the HeadBucket operation: Not Found, trying again in 16s.
this has happened everytime my workflow has aborted for whatever reason. Not sure if it's related to changes in toil or our aws enironment...
┆Issue is synchronized with this Jira Story
┆friendlyId: TOIL-1140
The text was updated successfully, but these errors were encountered:
Oh boy, I bet this is about me forgetting to set TOIL_OWNER_TAG. If it is, I would like to change this issue to a feature request:
When I create an cluster with toil launch-cluster --owner MYEMAIL, would it be possible to have TOIL_OWNER_TAG set to MYEMAIL by default whenever I open a shell on the cluster?
glennhickey
changed the title
AWS jobstore buckets can't be cleaned or resumed
AWS jobstore buckets should inherit owner tags from cluster
Feb 1, 2022
Without the bucket, we can't clean the job store and destroy the SimpleDB domain.
It would be nice if the cluster's owner tag became the default TOIL_OWNER_TAG value in the default environment on the cluster (maybe in something mounted as /etc/profile in the appliance container?).
I've been running jobs with AWS autoscale for the first time in a bit and it seems when Cactus fails, the jobstore is left in an unusable state. For example, I just aborted a workflow with
aws:us-west-2:glennhickey-jobstore-pa3
as the jobstore.But if I try to run it with
--restart
I getand the message loops forever. Same deal if I try
this has happened everytime my workflow has aborted for whatever reason. Not sure if it's related to changes in toil or our aws enironment...
┆Issue is synchronized with this Jira Story
┆friendlyId: TOIL-1140
The text was updated successfully, but these errors were encountered: