diff --git a/doc/workflow_steps.md b/doc/workflow_steps.md index c5d5489..fdad115 100644 --- a/doc/workflow_steps.md +++ b/doc/workflow_steps.md @@ -44,12 +44,76 @@ The grouping has two goals: - `under_keys`: notifies DumpCleaner that the cleanup data is not a list but actually a hash of multiple lists and that the grouping should be done only in lists under the specified keys of the data hash. This is useful in cases when the cleanup data needs to hold multiple unrelated lists of values. +#### Examples: + +
configuration | input data | output data |
---|---|---|
+ +```yaml +- step: GroupByBytesize +``` + | ++ +``` +["newspaper", "show", "rest", "résumé"] +``` + | ++ +``` +{ + "9-9" => ["newspaper"], + "4-4" => ["show", "rest"], + "6-8" => ["résumé"] +} +``` + | +
+ +```yaml +- step: GroupByBytesize + params: + under_keys: + - words +``` + | ++ +``` +{ + "words" => ["newspaper", "show", "rest", "résumé"], + "domains" => ["gmail.com", "example.com"] +} + +``` + | ++ +``` +{ + "words" => { + "9-9" => ["newspaper"], + "4-4" => ["show", "rest"], + "6-8" => ["résumé"] + }, + "domains" => ["gmail.com", "example.com"] +} +``` + | +
configuration | input data | output data |
---|---|---|
+ +```yaml +- step: LoadYamlFile + params: + file: some_file.yml +``` + +```yaml +# some_file.yml: +- words +- to +- load +``` + | ++ +`nil` (or just anything) + | ++ +``` +["words", "to", "load"] +``` + | +
+ +```yaml +- step: LoadYamlFile + params: + file: dictionary.yml + under_key: words +``` + +```yaml +# dictionary.yml: +- words +- to +- load +``` + | ++ +`nil` (or just anything) + | ++ +``` +{ + "words" => ["words", "to", "load"] +} +``` + | +
+ +```yaml +- step: LoadYamlFile + params: + file: dictionary.yml + under_key: words +``` + +```yaml +# dictionary.yml: +- words +- to +- load +``` + | ++ +``` +{ + "existing_key" => ["some", "other", "words"] +} +``` + | ++ +``` +{ + "existing_key" => ["some", "other", "words"], + "words" => ["words", "to", "load"] +} +``` + | +