Skip to content

Commit

Permalink
♻ that container (#184)
Browse files Browse the repository at this point in the history
* ♻ that container

* Add logging (#185)

* adds logging of watcher-level errors, worker receives, and completion status

* prefixed logs from child processes

* fixes logger factory to accept a message

* --> false for legibility

* move binary split to dependency

* package lock changes

* Scale down threshold (#187)

* change scale-down MetricIntervalLowerBound to MetricIntervalUpperBound

* exit main loop after workers finish

* resolve() after all workers return instead of exiting

* fix tests and mocks

* cleanup

* logs

* another log

* use logger and process.stdout for logs

* more logs

* edit logs

* Add alarms to "♻️ that container" PR  (#198)

* Add alarms and alarm docs

* Add failedPlacementAlarmPeriods

* Add CloudWatch Alarms snapshots

* Update template jest snapshots

* Add CloudWatch Alarms snapshots

* Add failedworker and failedworkerplacement metric

* Typo r/LogGroup/Logs

* Change metric name

* Metric Filter of worker errors to "[failure]"

* Have current published version instead of undefined

* Jake's Review

* uh update-jest

* Update alarms.md

* Add template validation tests (#215)

* Add travis user

* Ensure this fails

* Add validation for notificationEmail or notificationTopic

* Add minSize and maxSize of service scaleup and scaledown, deadletter queue threshold, info to doc (#211)

* Closes #208, #207, #206, #182, #149, #72, #15

(cherry picked from commit 8de328df79ccf52b8d612c625891555808c2fa0e)

* Add minSize as option

* update jest tests

* Change MinSize to 0

* update jest

* identation and minSize to 0

* Add deadletterThreshold info in Worker-retry-cycle

* Update tests with maxSize property

* remove superfluous logging

* add fresh mode as a watchbot option

* if else

* freshMode

* console log

* typeof

* true

* concise

* add fresh

* fix tests

* fix binary test

* update snapshots

* Allow users to write to any volume (#200)

* Restrict writes to volumes and clean them after every job

* Try out the `ReadOnlyRootFilesystem` option

* Capitalization

* Add watchbot-log

* use strict

* No need to chmod now
  • Loading branch information
Ryan Clark authored and Jake Pruitt committed Jun 16, 2018
1 parent c950b97 commit 7fc31c2
Show file tree
Hide file tree
Showing 77 changed files with 9,159 additions and 13,498 deletions.
59 changes: 40 additions & 19 deletions .eslintrc
Original file line number Diff line number Diff line change
@@ -1,24 +1,45 @@
{
"rules": {
"indent": [2, 2],
"quotes": [2, "single"],
"quote-props": [2, "as-needed"],
"no-console": [1],
"semi": [2, "always"],
"space-before-function-paren": [2, "never"],
"object-curly-spacing": [2, "always"],
"array-bracket-spacing": [2, "never"],
"comma-spacing": [2, { "before": false, "after": true }],
"key-spacing": [2, { "beforeColon": false, "afterColon": true }]
},
"extends": "eslint:recommended",
"env": {
"node": true,
"es6": true
"node": true,
"es6": true,
"jest": true
},
"globals": {
"process": true,
"module": true,
"require": true
"parserOptions": {
"ecmaVersion": "2017"
},
"extends": "eslint:recommended"
"plugins": [
"node"
],
"rules": {
"arrow-parens": ["error", "always"],
"no-var": "error",
"prefer-const": "error",
"array-bracket-spacing": ["error", "never"],
"comma-dangle": ["error", "never"],
"computed-property-spacing": ["error", "never"],
"eol-last": "error",
"eqeqeq": ["error", "smart"],
"indent": ["error", 2, { "SwitchCase": 1 }],
"no-confusing-arrow": ["error", {"allowParens": false}],
"no-extend-native": "error",
"no-mixed-spaces-and-tabs": "error",
"no-spaced-func": "error",
"no-trailing-spaces": "error",
"no-unused-vars": "error",
"no-use-before-define": ["error", "nofunc"],
"object-curly-spacing": ["error", "always"],
"prefer-arrow-callback": "error",
"quotes": ["error", "single", "avoid-escape"],
"semi": ["error", "always"],
"space-infix-ops": "error",
"spaced-comment": ["error", "always"],
"keyword-spacing": ["error", { before: true, after: true }],
"template-curly-spacing": ["error", "never"],
"semi-spacing": "error",
"strict": "error",
"no-console": "off",
"node/no-unsupported-features": ["error", {"version": 8}],
"node/no-missing-require": "error"
}
}
9 changes: 3 additions & 6 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
cache: apt
language: node_js
node_js:
- '4'
env:
global:
- secure: Pn+WVEuDcuNlUia2ehSAgH5wdWlBdSDVXYSjzAoBJO9oPotrfLSN26Lu+gOTXyJ92h99vwN00osQyw7cqGYrX2dO2nu0JfVhIZ99iXJmjRnWQvkf0DcE6RsxcL1bIRYtlJlQMr2sS2zJ7owUAzX7EFpOvurT0EeuB3IQg5bwhoy3nTJuwxg7XWJ9wWHyFSXAieTfyODw3LGemmtZjTe0K5Vu07VQkph0MMohWF64dXptEE99Df6LumMQ4u6RTczayih7EXWwXRvDzw4r1AUQKBkAa767vpg3UelWb6mHKwhl6GpjXiaW8qlfIszyFyHU8Dc454wagd3VhPG4UxdZUdnV8hjd79Ak1bKubjlo9ZgxZAYqT11+g6UNZ2bmO8hLsnvvXbxWmZ0I81lf5iCkzPhqOLoIVyfQs8aINrmfqoqCMgwKlzoT+cda6v/5kxzG9UO9Hmhm2XhHMAleo36rRw+a0x9yNtCU8qVrmEVIyxVMvzsdDm8/pAUCyzUd042rCH1YAzRvk2Lb39nbNW4gGaJKghvjOF9iAB4QisyqfvnmTN7JvIL1M5EdAJaa86XROFTfUlPoTaLyGeztGWNQpzry4IqNBQFBrzvPrw0gnP9eUWRPxtttZFK/2GOS7sZT7MaFOpZ4w6YcaR9VrviqryepYM8eluA7EDqkrSUpCrs=
- secure: Pk6fpHDyWPwh8Zek++MA5Q4aVZzQGOjh7Tz5LD4UN+g31CY7EGQ1nivIhVQLQKlS/4H18VXYew3wyxKDmec7Iwz2GqCj2ZDTVCXQG2ITGA6BB3NJ5Ei0C0Ah12Vv5sHlPTCo8UhkpTBNt6ytEV1CWhotY+WajIIEm4HN3PgmpX+tam4cPEUezK+ScJH8xhQCuQqow42T47hSXvkPUu17tMaZfhGLCGX3fKzcEoFYiqdHIbKxSaQ//uRVbXWcSHgsNsA1nFHUgemJC1lH2o02uAdtZuMQ+JyIsQY1nMamsgCv3JpGlgSaIrt6wiJX1ipJOmtRH8L5fRmWgFKsCXDSo6oMfE8arMF1vED6AhTWlv3Kb5v1boehNwgheSxLLvHESP7T0wsc77nNybpnUSV8bOJ3bfPknES3YoGcyffPyu4Xi/MG7ibGvd9B/OnmysZyHinBkb9JI7X1paAVW2RQgEXm0sx0rkqO74N7vEFAfxlU+GgN12AAq79pKUCRMIOXKjB9lZFQnYXmTa1H/CyNE//gdHBLdZGSjHMpJ8N0+tww0SIQo1FHhJmJH0/p5heZJRzTARfVtRGsbEH7+W/5IUTtW10YtGWYtKAS+8S3cD6WIfrm20jZaFbk9clmW/n4WRB0y2s5LxwPF4sNStZRJDhn0ozrcM1IRVJz4r/rdik=
- '8'
services:
- docker
26 changes: 0 additions & 26 deletions Dockerfile

This file was deleted.

23 changes: 0 additions & 23 deletions bin/bootstrap.sh

This file was deleted.

44 changes: 0 additions & 44 deletions bin/cli.js

This file was deleted.

15 changes: 7 additions & 8 deletions bin/watchbot-log.js
100755 → 100644
Original file line number Diff line number Diff line change
@@ -1,20 +1,19 @@
#!/usr/bin/env node
'use strict';

/**
* watchbot-log "something that you want logged"
* - or -
* echo "somehing that you want logged" | watchbot-log
*/

var watchbot = require('..');
var args = process.argv.slice(2);
const Logger = require('..').Logger;
const args = process.argv.slice(2);

const logger = new Logger('worker');

if (args[0]) {
return watchbot.log(args[0]);
return logger.log(args[0]);
}

process.stdin.on('data', function(d) {
d.toString().trim().split('\n').forEach(function(line) {
watchbot.log(line);
});
});
process.stdin.pipe(logger.stream());
8 changes: 0 additions & 8 deletions bin/watchbot-progress.sh

This file was deleted.

63 changes: 31 additions & 32 deletions bin/watchbot.js
Original file line number Diff line number Diff line change
@@ -1,34 +1,33 @@
#!/usr/bin/env node

var fastlog = require('fastlog')('watchbot');
var _ = require('underscore');
var sendNotification = require('../lib/notifications')(process.env.NotificationTopic).send;
var watchbot = require('..');

var required = [
'Cluster',
'TaskDefinition',
'Concurrency',
'QueueUrl',
'TaskEventQueueUrl',
'NotificationTopic',
'StackName',
'LogGroupArn',
'AlarmOnEachFailure'
];

var missing = _.difference(required, Object.keys(process.env));
if (missing.length) {
var err = new Error('Missing from environment: ' + missing.join(', '));
fastlog.error(err);
sendNotification('[watchbot] config error', err.message);
process.exit(1);
}

/**
* The main Watchbot loop. This function runs continuously on one or more containers,
* each of which is responsible for polling SQS and spawning tasks to process
* messages, while maintaining a predefined task concurrency and reporting any failed
* processing tasks.
*/
watchbot.main(process.env);
'use strict';

const Watcher = require('../lib/watcher');
const Logger = require('../lib/logger');

const main = async () => {
if (process.argv[2] !== 'listen')
throw new Error(`Invalid arguments: ${process.argv.slice(2).join(' ')}`);

const logger = Logger.create('watcher');
const command = process.argv.slice(3).join(' ');
const volumes = process.env.Volumes.split(',');

const options = {
queueUrl: process.env.QueueUrl,
fresh: process.env.fresh === 'true' ? true : false,
workerOptions: { command, volumes }
};

const watcher = Watcher.create(options);

try {
await watcher.listen();
} catch (err) {
logger.log(`[error] ${err.stack}`);
}
};

module.exports = main;

if (require.main === module) main();
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
var cloudfriend = require('@mapbox/cloudfriend');
const cf = require('@mapbox/cloudfriend');

module.exports = {
AWSTemplateFormatVersion: '2010-09-09',
Description: 'ecs-clusters ci resources for validating the template',
Resources: {
User: {
Type: 'AWS::IAM::User',
Expand All @@ -12,7 +13,7 @@ module.exports = {
PolicyDocument: {
Statement: [
{
Action: 'cloudformation:ValidateTemplate',
Action: ['cloudformation:ValidateTemplate'],
Effect: 'Allow',
Resource: '*'
}
Expand All @@ -25,12 +26,13 @@ module.exports = {
AccessKey: {
Type: 'AWS::IAM::AccessKey',
Properties: {
UserName: cloudfriend.ref('User')
UserName: cf.ref('User')
}
}
},
Outputs: {
AccessKeyId: { Value: cloudfriend.ref('AccessKey') },
SecretAccessKey: { Value: cloudfriend.getAtt('AccessKey', 'SecretAccessKey') }
AccessKeyId: { Value: cf.ref('AccessKey') },
SecretAccessKey: { Value: cf.getAtt('AccessKey', 'SecretAccessKey') }
}
};

15 changes: 0 additions & 15 deletions docs/alarms.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,6 @@ This document describes CloudWatch alarms that Watchbot configures. If one of th

**In all cases**, SQS messages that failed and led to these alarms are put back into SQS to be retried. [See the worker retry documentation](./worker-retry-cycle.md) for more info.

## FailedWorkerPlacement

### Why?

There were more than 5 attempts to place a worker container in 60 seconds that could not be placed. The failed placement could be due to:

- insufficient CPU or memory was available on the cluster
- failure to start a docker container on a host EC2 because of disk I/O exhaustion
- any other scenario that may have caused the worker not to be placed

### What to do

Most of the time, this is due to a lack of available cluster resources. Use [the provided CLI command](./command-line-utilities.md#assessing-worker-capacity-in-your-service's cluster) to get a sense of how much space is free in your cluster. If necessary, increase available resources on your cluster by removing other tasks or launching new EC2s.

If resource availability on the cluster does not appear to be the problem, then you'll need to dig into logs in order to understand what the problem is. Check watcher logs for any indication of the reason for job placement failure by searching for `failedPlacement`.

## WorkerErrors

Expand Down
Loading

0 comments on commit 7fc31c2

Please sign in to comment.