The following document specifies the Flux language and query execution.
This document is a living document and does not represent the current implementation of Flux. Any section that is not currently implemented is commented with a IMPL#XXX where XXX is an issue number tracking discussion and progress towards implementation.
The Flux language is centered on querying and manipulating time series data.
The syntax of the language is specified using Extended Backus-Naur Form (EBNF):
Production = production_name "=" [ Expression ] "." .
Expression = Alternative { "|" Alternative } .
Alternative = Term { Term } .
Term = production_name | token [ "…" token ] | Group | Option | Repetition .
Group = "(" Expression ")" .
Option = "[" Expression "]" .
Repetition = "{" Expression "}" .
Productions are expressions constructed from terms and the following operators, in increasing precedence:
| alternation
() grouping
[] option (0 or 1 times)
{} repetition (0 to n times)
Lower-case production names are used to identify lexical tokens. Non-terminals are in CamelCase. Lexical tokens are enclosed in double quotes "" or back quotes ``.
Source code is encoded in UTF-8. The text need not be canonicalized.
This document will use the term character to refer to a Unicode code point.
The following terms are used to denote specific Unicode character classes:
newline = /* the Unicode code point U+000A */ .
unicode_char = /* an arbitrary Unicode code point except newline */ .
unicode_letter = /* a Unicode code point classified as "Letter" */ .
unicode_digit = /* a Unicode code point classified as "Number, decimal digit" */ .
The Unicode Standard 8.0, Section 4.5 "General Category" defines a set of character categories. Flux treats all characters in any of the Letter categories Lu, Ll, Lt, Lm, or Lo as Unicode letters, and those in the Number category Nd as Unicode digits.
The underscore character _ (U+005F) is considered a letter.
letter = unicode_letter | "_" .
decimal_digit = "0" … "9" .
Comments serve as documentation.
Comments begin with the character sequence //
and stop at the end of the line.
Comments cannot start inside string or regexp literals. Comments act like newlines.
Flux is built up from tokens. There are several classes of tokens: identifiers, keywords, operators, and literals. White space, formed from spaces, horizontal tabs, carriage returns, and newlines, is ignored except as it separates tokens that would otherwise combine into a single token. While breaking the input into tokens, the next token is the longest sequence of characters that form a valid token.
Identifiers name entities within a program. An identifier is a sequence of one or more letters and digits. An identifier must start with a letter.
identifier = letter { letter | unicode_digit } .
Examples:
a
_x
longIdentifierName
αβ
The following keywords are reserved and may not be used as identifiers:
and import not return option test
empty in or package builtin
IMPL#256 Add in and empty operator support
The following character sequences represent operators:
+ == != ( ) =>
- < !~ [ ] ^
* > =~ { }
/ <= = , :
% >= <- . |>
Numeric literals may be integers or floating point values. Literals have arbitrary precision and will be coerced to a specific type when used.
The following coercion rules apply to numeric literals:
- an integer literal can be coerced to an "int", "uint", or "float" type,
- an float literal can be coerced to a "float" type,
- an error will occur if the coerced type cannot represent the literal value.
IMPL#255 Allow numeric literal coercion.
An integer literal is a sequence of digits representing an integer value. Only decimal integers are supported.
int_lit = "0" | decimal_lit .
decimal_lit = ( "1" … "9" ) { decimal_digit } .
Examples:
0
42
317316873
A floating-point literal is a decimal representation of a floating-point value. It has an integer part, a decimal point, and a fractional part. The integer and fractional part comprise decimal digits. One of the integer part or the fractional part may be elided.
float_lit = decimals "." [ decimals ]
| "." decimals .
decimals = decimal_digit { decimal_digit } .
Examples:
0.
72.40
072.40 // == 72.40
2.71828
.26
IMPL#254 Parse float literals
A duration literal is a representation of a length of time. It has an integer part and a duration unit part. Multiple durations may be specified together and the resulting duration is the sum of each smaller part. When several durations are specified together, larger units must appear before smaller ones, and there can be no repeated units.
duration_lit = { int_lit duration_unit } .
duration_unit = "y" | "mo" | "w" | "d" | "h" | "m" | "s" | "ms" | "us" | "µs" | "ns" .
Units | Meaning | Base |
---|---|---|
y | year (12 months) | mo |
mo | month | mo |
w | week (7 days) | ns |
d | day | ns |
h | hour (60 minutes) | ns |
m | minute (60 seconds) | ns |
s | second | ns |
ms | milliseconds (1 thousandth of a second) | ns |
us or µs | microseconds (1 millionth of a second) | ns |
ns | nanoseconds (1 billionth of a second) | ns |
Durations represent a length of time. Lengths of time are dependent on specific instants in time they occur and as such, durations do not represent a fixed amount of time. A duration is composed of a tuple of months and nanoseconds along with whether the duration is positive or negative. Each duration unit corresponds to one of these two base units. It is possible to compose a duration of multiple base units.
Durations cannot be combined by addition and subtraction. This is because all of the values in the tuple must have the same sign and it is not possible to guarantee that is true when addition and subtraction are allowed. Durations can be multiplied by any integer value. The unary negative operator is the equivalent of multiplying the duration by -1. These operations are performed on each time unit independently.
Examples:
1s
10d
1h15m // 1 hour and 15 minutes
5w
1mo5d // 1 month and 5 days
-1mo5d // negative 1 month and 5 days
5w * 2 // 10 weeks
Durations can be added to date times to produce a new date time.
Addition and subtraction of durations to date times do not commute and are left associative. Addition and subtraction of durations to date times applies months, days and seconds in that order. When months are added to a date times and the resulting date is past the end of the month, the day is rolled back to the last day of the month.
Examples:
2018-01-01T00:00:00Z + 1d // 2018-01-02T00:00:00Z
2018-01-01T00:00:00Z + 1mo // 2018-02-01T00:00:00Z
2018-01-01T00:00:00Z + 2mo // 2018-03-01T00:00:00Z
2018-01-31T00:00:00Z + 2mo // 2018-03-31T00:00:00Z
2018-02-28T00:00:00Z + 2mo // 2018-04-28T00:00:00Z
2018-01-31T00:00:00Z + 1mo // 2018-02-28T00:00:00Z, February 31th is rolled back to the last day of the month, February 28th in 2018.
// Addition and subtraction of durations to date times does not commute
2018-02-28T00:00:00Z + 1mo + 1d // 2018-03-29T00:00:00Z
2018-02-28T00:00:00Z + 1d + 1mo // 2018-04-01T00:00:00Z
2018-01-01T00:00:00Z + 2mo - 1d // 2018-02-28T00:00:00Z
2018-01-01T00:00:00Z - 1d + 3mo // 2018-03-31T00:00:00Z
2018-01-31T00:00:00Z + 1mo + 1mo // 2018-03-28T00:00:00Z
2018-01-31T00:00:00Z + 2mo // 2018-03-31T00:00:00Z
// Addition and subtraction of durations to date times applies months, days and seconds in that order.
2018-01-28T00:00:00Z + 1mo + 2d // 2018-03-02T00:00:00Z
2018-01-28T00:00:00Z + 1mo2d // 2018-03-02T00:00:00Z
2018-01-28T00:00:00Z + 2d + 1mo // 2018-02-28T00:00:00Z, explicit left associative add of 2d first changes the result
2018-02-01T00:00:00Z + 2mo2d // 2018-04-03T00:00:00Z
2018-01-01T00:00:00Z + 1mo30d // 2018-03-02T00:00:00Z, Months are applied first to get February 1st, then days are added resulting in March 2 in 2018.
2018-01-31T00:00:00Z + 1mo1d // 2018-03-01T00:00:00Z, Months are applied first to get February 28th, then days are added resulting in March 1 in 2018.
// Multiplication works
2018-01-01T00:00:00Z + 1mo * 1 // 2018-02-01T00:00:00Z
2018-01-01T00:00:00Z + 1mo * 2 // 2018-03-01T00:00:00Z
2018-01-01T00:00:00Z + 1mo * 3 // 2018-04-01T00:00:00Z
2018-01-31T00:00:00Z + 1mo * 1 // 2018-02-28T00:00:00Z
2018-01-31T00:00:00Z + 1mo * 2 // 2018-03-31T00:00:00Z
2018-01-31T00:00:00Z + 1mo * 3 // 2018-04-30T00:00:00Z
IMPL#657 Implement Duration vectors
A date and time literal represents a specific moment in time. It has a date part, a time part and a time offset part. The format follows the RFC 3339 specification. The time is optional, when it is omitted the time is assumed to be midnight for the default location. The time_offset is optional, when it is omitted the location option is used to determine the offset.
date_time_lit = date [ "T" time ] .
date = year_lit "-" month "-" day .
year = decimal_digit decimal_digit decimal_digit decimal_digit .
month = decimal_digit decimal_digit .
day = decimal_digit decimal_digit .
time = hour ":" minute ":" second [ fractional_second ] [ time_offset ] .
hour = decimal_digit decimal_digit .
minute = decimal_digit decimal_digit .
second = decimal_digit decimal_digit .
fractional_second = "." { decimal_digit } .
time_offset = "Z" | ("+" | "-" ) hour ":" minute .
Examples:
1952-01-25T12:35:51Z
2018-08-15T13:36:23-07:00
2009-10-15T09:00:00 // October 15th 2009 at 9 AM in the default location
2018-01-01 // midnight on January 1st 2018 in the default location
IMPL#152 Implement shorthand time literals
A string literal represents a sequence of characters enclosed in double quotes. Within the quotes any character may appear except an unescaped double quote. String literals support several escape sequences.
\n U+000A line feed or newline
\r U+000D carriage return
\t U+0009 horizontal tab
\" U+0022 double quote
\\ U+005C backslash
\${ U+0024 U+007B dollar sign and opening curly bracket
Additionally any byte value may be specified via a hex encoding using \x
as the prefix.
string_lit = `"` { unicode_value | byte_value | StringExpression | newline } `"` .
byte_value = `\` "x" hex_digit hex_digit .
hex_digit = "0" … "9" | "A" … "F" | "a" … "f" .
unicode_value = unicode_char | escaped_char .
escaped_char = `\` ( "n" | "r" | "t" | `\` | `"` ) .
StringExpression = "${" Expression "}" .
Examples:
"abc"
"string with double \" quote"
"string with backslash \\"
"日本語"
"\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e" // the explicit UTF-8 encoding of the previous line
String literals are also interpolated for embedded expressions to be evaluated as strings.
Embedded expressions are enclosed within the literals ${
and }
respectively.
The expressions are evaluated in the scope containing the string literal.
The result of an expression is formatted as a string and replaces the string content between the brackets.
All types are formatted as strings according to their literal representation.
A function "printf" exists to allow more precise control over formatting of various types.
To include the literal ${
within a string it must be escaped.
IMPL#248 Add printf function
Interpolation example:
n = 42
"the answer is ${n}" // the answer is 42
"the answer is not ${n+1}" // the answer is not 43
"dollar sign opening curly bracket \${" // dollar sign opening curly bracket ${
String interpolation expressions must satisfy the Stringable constraint.
Arbitrary Expression Interpolation example:
n = duration(v: "1m")
"the answer is ${n}" // the answer is 1m
t0 = time(v: "2016-06-13T17:43:50.1004002Z")
"the answer is ${t0}" // the answer is 2016-06-13T17:43:50.1004002Z
A regular expression literal represents a regular expression pattern, enclosed in forward slashes.
Within the forward slashes, any unicode character may appear except for an unescaped forward slash.
The \x
hex byte value representation from string literals may also be present.
Regular expression literals support only the following escape sequences:
\/ U+002f forward slash
\\ U+005c backslash
regexp_lit = "/" regexp_char { regexp_char } "/" .
regexp_char = unicode_char | byte_value | regexp_escape_char .
regexp_escape_char = `\` (`/` | `\`)
Examples:
/.*/
/http:\/\/localhost:8086/
/^\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e(ZZ)?$/
/^日本語(ZZ)?$/ // the above two lines are equivalent
/\\xZZ/ // this becomes the literal pattern "\xZZ"
/a\/b\\c\d/ // escape sequences and character class shortcuts are supported
/(?:)/ // the empty regular expression
The regular expression syntax is defined by RE2.
A variable represents a storage location for a single value. Variables are immutable. Once a variable is given a value, it holds that value for the remainder of its lifetime.
An option represents a storage location for any value of a specified type. Options are mutable. An option can hold different values during its lifetime.
Below is a list of some built-in options that are currently implemented in the Flux language:
- now
- task
- location
The now
option is a function that returns a time value to be used as a proxy for the current system time.
// Query should execute as if the below time is the current system time
option now = () => 2006-01-02T15:04:05-07:00
The task
option is used by a scheduler to schedule the execution of a Flux query.
option task = {
name: "foo", // name is required
every: 1h, // task should be run at this interval
delay: 10m, // delay scheduling this task by this duration
cron: "0 2 * * *", // cron is a more sophisticated way to schedule. every and cron are mutually exclusive
retry: 5, // number of times to retry a failed query
}
The location
option is used to set the default time zone of all times in the script.
The location maps the UTC offset in use at that location for a given time.
The default value is set using the time zone of the running process.
option location = fixedZone(offset:-5h) // set timezone to be 5 hours west of UTC
option location = loadLocation(name:"America/Denver") // set location to be America/Denver
IMPL#660 Implement Location option
A type defines a set of values and operations on those values. Types are never explicitly declared as part of the syntax except as part of a builtin statement. Types are always inferred from the usage of the value. Type inference follows a Hindley-Milner style inference system.
A union type defines a set of types. In the rest of this section a union type will be specified as follows:
T = t1 | t2 | ... | tn
where t1
, t2
, ..., and tn
are types.
In the example above a value of type T
is either of type t1
, type t2
, ..., or type tn
.
These are the types from which all other Flux data types are constructed.
The null type represents a missing or unknown value.
The null type name is null
.
There is only one value that comprises the null type and that is the null value.
A type t
is nullable if it can be expressed as follows:
t = {s} | null
where {s}
defines a set of values.
A boolean type represents a truth value, corresponding to the preassigned variables true
and false
.
The boolean type name is bool
.
The boolean type is nullable and can be formally specified as follows:
bool = {true, false} | null
A numeric type represents sets of integer or floating-point values.
The following numeric types exist:
uint = {the set of all unsigned 64-bit integers} | null
int = {the set of all signed 64-bit integers} | null
float = {the set of all IEEE-754 64-bit floating-point numbers} | null
Note all numeric types are nullable.
A time type represents a single point in time with nanosecond precision.
The time type name is time
.
The time type is nullable.
A duration type represents a length of time with nanosecond precision.
The duration type name is duration
.
The duration type is nullable.
Durations can be added to times to produce a new time.
Examples:
2018-07-01T00:00:00Z + 1mo // 2018-08-01T00:00:00Z
2018-07-01T00:00:00Z + 2y // 2020-07-01T00:00:00Z
2018-07-01T00:00:00Z + 5h // 2018-07-01T05:00:00Z
A bytes type represents a sequence of byte values.
The bytes type name is bytes
.
A string type represents a possibly empty sequence of characters.
Strings are immutable: once created they cannot be modified.
The string type name is string
.
The string type is nullable.
Note that an empty string is distinct from a null value.
The length of a string is its size in bytes, not the number of characters, since a single character may be multiple bytes.
A regular expression type represents the set of all patterns for regular expressions.
The regular expression type name is regexp
.
The regular expression type is not nullable.
These are types constructed from basic types. Composite types are not nullable.
An array type represents a sequence of values of any other type. All values in the array must be of the same type. The length of an array is the number of elements in the array.
An record type represents a set of unordered key and value pairs. The key must always be a string. The value may be any other type, and need not be the same as other values within the record.
Keys on a record may only be referenced statically.
Type inference will determine the properties that are present on a record. If type inference determines all the properties on a record it is said to be bounded. Not all keys may be known on the type of a record in which case the record is said to be unbounded. An unbounded record may contain any property in addition to the properties it is known to contain.
A dictionary type is a collection that associates keys to values. Keys must be comparable and of the same type. Values must also be of the same type.
A function type represents a set of all functions with the same argument and result types.
A generator type represents a value that produces an unknown number of other values. The generated values may be of any other type but must all be the same type.
IMPL#658 Implement Generators types
Flux functions can be polymorphic, meaning they can be applied to arguments of different types. Flux supports parametric, record, and ad hoc polymorphism.
Parametric polymorphism is the notion that a function can be applied uniformly to arguments of any type. The identity function is one such example.
f = (x) => x
f(x: 1)
f(x: 1.1)
f(x: "1")
f(x: true)
f(x: f)
Record polymorphism is the notion that a function can be applied to different types of records.
john = {name:"John", lastName:"Smith"}
jane = {name:"Jane", age:44}
// John and Jane are records with different types.
// We can still define a function that can operate on both records safely.
// name returns the name of a person
name = (person) => person.name
name(person:john) // John
name(person:jane) // Jane
device = {id: 125325, lat: 15.6163, lon: 62.6623}
name(person:device) // Type error, "device" does not have a property name.
Records of differing types can be passed to the same function so long as they contain the necessary properties. The necessary properties are determined by the use of the record.
Ad hoc polymorphism is the notion that a function can be applied to arguments of different types, with different behavior depending on the type.
add = (a, b) => a + b
// Integer addition
add(a: 1, b: 1)
// String concatenation
add(a: "str", b: "ing")
// Addition not defined for boolean data types
add(a: true, b: false)
Type constraints are a type system concept used to implement static ad hoc polymorphism.
For example, add = (a, b) => a + b
is a function that is defined only for Addable
types.
If one were to pass a record to add
like so:
add(a: {}, b: {})
the result would be a compile-time type error because records are not addable. Like types, constraints are never explicitly declared but rather inferred from the context.
Addable types are those the binary arithmetic operator +
accepts.
Int, Uint, Float, and String types are Addable.
Subtractable types are those the binary arithmetic operator -
accepts.
Int, Uint, and Float types are Subtractable.
Divisible types are those the binary arithmetic operator \
accepts.
Int, Uint, and Float types are Divisible.
Int, Uint, and Float types are Numeric.
Comparable types are those the binary comparison operators <
, <=
, >
, or >=
accept.
Int, Uint, Float, String, Duration, and Time types are Comparable.
Equatable types are those that can be compared for equality using the ==
or !=
operators.
Bool, Int, Uint, Float, String, Duration, Time, Bytes, Array, and Record types are Equatable.
Nullable types are those that can be null. Bool, Int, Uint, Float, String, Duration, and Time types are Nullable.
Records are the only types that fall under this constraint.
Negatable types ore those the unary arithmetic operator -
accepts.
Int, Uint, Float, and Duration types are Negatable.
Duration and Time types are Timeable.
Stringable types can be evaluated and expressed in string interpolation. String, Int, Uint, Float, Bool, Time, and Duration types are Stringable.
A block is a possibly empty sequence of statements within matching brace brackets.
Block = "{" StatementList "} .
StatementList = { Statement } .
In addition to explicit blocks in the source code, there are implicit blocks:
- The universe block encompasses all Flux source text.
- Each package has a package block containing all Flux source text for that package.
- Each file has a file block containing all Flux source text in that file.
- Each function literal has its own function block even if not explicitly declared.
Blocks nest and influence scoping.
An assignment binds an identifier to a variable, option, or function. Every identifier in a program must be assigned.
Flux is lexically scoped using blocks:
- The scope of a preassigned identifier is in the universe block.
- The scope of an identifier denoting a variable, option, or function at the top level (outside any function) is the package block.
- The scope of the name of an imported package is the file block of the file containing the import declaration.
- The scope of an identifier denoting a function argument is the function body.
- The scope of an identifier assigned inside a function is the innermost containing block.
Note that the package clause is not an assignment. The package name does not appear in any scope. Its purpose is to identify the files belonging to the same package and to specify the default package name for import declarations.
IMPL#247 Add package/namespace support
VariableAssignment = identifier "=" Expression
A variable assignment creates a variable bound to an identifier and gives it a type and value. A variable keeps the same type and value for the remainder of its lifetime. An identifier assigned to a variable in a block cannot be reassigned in the same block. An identifier can be reassigned or shadowed in an inner block.
Examples:
n = 1
m = 2
x = 5.4
f = () => {
n = "a"
m = "b"
return a + b
}
OptionAssignment = "option" [ identifier "." ] identifier "=" Expression
An option assignment creates an option bound to an identifier and gives it a type and a value. Options may only be assigned in a package block. Once declared, an option may not be redeclared in the same package block. An option declared in one package may be reassigned a new value in another. An option keeps the same type for the remainder of its lifetime.
Examples:
// alert package
option severity = ["low", "moderate", "high"]
// foo package
import "alert"
option alert.severity = ["low", "critical"] // qualified option
option n = 2
f = (a, b) => a + b + n
x = f(a:1, b:1) // x = 4
An expression specifies the computation of a value by applying the operators and functions to operands.
Operands denote the elementary values in an expression.
Primary expressions are the operands for unary and binary expressions. A primary expressions may be a literal, an identifier denoting a variable, or a parenthesized expression.
PrimaryExpression = identifier | Literal | "(" Expression ")" .
Flux provides the logical operators and
and or
.
Flux's logical operators observe the short-circuiting behavior seen in other programming languages, meaning that the right-hand side (RHS) operand is conditionally evaluated depending on the result of evaluating the left-hand side (LHS) operand.
When the operator is and
:
- If the LHS operand evaluates to
false
, a value offalse
is produced and the RHS operand is not evaluated.
When the operator is or
:
- If the LHS operand evaluates to
true
, a value oftrue
is produced and the RHS operand is not evaluated.
Literals construct a value.
Literal = int_lit
| float_lit
| string_lit
| regex_lit
| duration_lit
| pipe_receive_lit
| RecordLiteral
| ArrayLiteral
| DictLiteral
| FunctionLiteral .
Record literals construct a value with the record type.
RecordLiteral = "{" RecordBody "}" .
RecordBody = WithProperties | PropertyList .
WithProperties = identifier "with" PropertyList .
PropertyList = [ Property { "," Property } ] .
Property = identifier [ ":" Expression ]
| string_lit ":" Expression .
Examples:
{a: 1, b: 2, c: 3}
{a, b, c}
{o with x: 5, y: 5}
{o with a, b}
Array literals construct a value with the array type.
ArrayLiteral = "[" ExpressionList "]" .
ExpressionList = [ Expression { "," Expression } ] .
Dictionary literals construct a value with the dict type.
DictLiteral = EmptyDict | "[" AssociativeList "]" .
EmptyDict = "[" ":" "]" .
AssociativeList = Association { "," AssociativeList } .
Association = Expression ":" Expression .
The keys can be arbitrary expressions. The type system will enforce that all keys are of the same type.
Examples:
a = "a"
b = [:] // empty dictionary
c = [a: 1, "b": 2] // dictionary mapping string values to integers
d = [a: 1, 2: 3] // type error: cannot mix string and integer keys
A function literal defines a new function with a body and parameters. The function body may be a block or a single expression. The function body must have a return statement if it is an explicit block, otherwise the expression is the return value.
FunctionLiteral = FunctionParameters "=>" FunctionBody .
FunctionParameters = "(" [ ParameterList [ "," ] ] ")" .
ParameterList = Parameter { "," Parameter } .
Parameter = identifier [ "=" Expression ] .
FunctionBody = Expression | Block .
Examples:
() => 1 // function returns the value 1
(a, b) => a + b // function returns the sum of a and b
(x=1, y=1) => x * y // function with default values
(a, b, c) => { // function with a block body
d = a + b
return d / c
}
All function literals are anonymous. A function may be given a name using a variable assignment.
add = (a,b) => a + b
mul = (a,b) => a * b
Function literals are closures: they may refer to variables defined is a surrounding block. Those variables are shared between the function literal and the surrounding block.
A call expressions invokes a function with the provided arguments. Arguments must be specified using the argument name, positional arguments are not supported. Argument order does not matter. When an argument has a default value, it is not required to be specified.
CallExpression = "(" PropertyList ")" .
Call expressions support a short notation in case the name of the argument matches the parameter name. This notation can be used only when every argument matches its parameter.
Examples:
add = (a,b) => a + b
a = 1
b = 2
add(a, b)
// is the same as
add(a: a, b: b)
// both FAIL: cannot mix short and long notation.
add(a: a, b)
add(a, b: b)
A pipe expression is a call expression with an implicit piped argument. Pipe expressions simplify creating long nested call chains.
Pipe expressions pass the result of the left hand expression as the pipe argument to the right hand call expression. Function literals specify which if any argument is the pipe argument using the pipe literal as the argument's default value. It is an error to use a pipe expression if the function does not declare a pipe argument.
pipe_receive_lit = "<-" .
Examples:
foo = () => // function body elided
bar = (x=<-) => // function body elided
baz = (y=<-) => // function body elided
foo() |> bar() |> baz() // equivalent to baz(x:bar(y:foo()))
Index expressions access a value from an array based on a numeric index.
IndexExpression = "[" Expression "]" .
Member expressions access a property of a record.
They are specified via an expression of the form rec.k
or rec["k"]
.
The property being accessed must be either an identifier or a string literal.
In either case the literal value is the name of the property being accessed, the identifier is not evaluated.
It is not possible to access an record's property using an arbitrary expression.
If rec
contains an entry with property k
, both rec.k
and rec["k"]
return the value associated with k
.
If rec
is bounded and does not contain a property k
, both rec.k
and rec["k"]
report a type checking error.
If rec
is unbounded and does not contain a property k
, both rec.k
and rec["k"]
return null.
MemberExpression = DotExpression | MemberBracketExpression
DotExpression = "." identifer
MemberBracketExpression = "[" string_lit "]" .
Conditional expressions evaluate a boolean-valued condition and if the result is true,
the expression following the then
keyword is evaluated and returned.
Otherwise the expression following the else
keyword is evaluated and returned.
In either case, the branch not taken is not evaluated;
only side effects associated with the branch that is taken will occur.
ConditionalExpression = "if" Expression "then" Expression "else" Expression .
Example:
color = if code == 0 then "green" else if code == 1 then "yellow" else "red"
Note according to the above definition, if a condition evaluates to a null or unknown value, the else branch is evaluated.
Operators combine operands into expressions. The precedence of the operators is given in the table below. Operators with a lower number have higher precedence.
Precedence | Operator | Description |
---|---|---|
1 | a() |
Function call |
a[] |
Member or index access | |
. |
Member access | |
2 | |> |
Pipe forward |
3 | () => 1 |
FunctionLiteral |
4 | ^ |
Exponentiation |
5 | * / % |
Multiplication, division, |
and modulo | ||
6 | + - |
Addition and subtraction |
7 | == != |
Comparison operators |
< <= |
||
> >= |
||
=~ !~ |
||
8 | not |
Unary logical operator |
exists |
Null check operator | |
9 | and |
Logical AND |
10 | or |
Logical OR |
11 | if/then/else |
Conditional |
The operator precedence is encoded directly into the grammar as the following.
Expression = ConditionalExpression .
ConditionalExpression = LogicalExpression
| "if" Expression "then" Expression "else" Expression .
LogicalExpression = UnaryLogicalExpression
| LogicalExpression LogicalOperator UnaryLogicalExpression .
LogicalOperator = "and" | "or" .
UnaryLogicalExpression = ComparisonExpression
| UnaryLogicalOperator UnaryLogicalExpression .
UnaryLogicalOperator = "not" | "exists" .
ComparisonExpression = MultiplicativeExpression
| ComparisonExpression ComparisonOperator MultiplicativeExpression .
ComparisonOperator = "==" | "!=" | "<" | "<=" | ">" | ">=" | "=~" | "!~" .
AdditiveExpression = MultiplicativeExpression
| AdditiveExpression AdditiveOperator MultiplicativeExpression .
AdditiveOperator = "+" | "-" .
MultiplicativeExpression = ExponentExpression
| ExponentExpression ExponentOperator MultiplicativeExpression .
MultiplicativeOperator = "*" | "/" | "%" .
ExponentExpression = PipeExpression
| ExponentExpression ExponentOperator PipeExpression .
ExponentOperator = "^" .
PipeExpression = PostfixExpression
| PipeExpression PipeOperator UnaryExpression .
PipeOperator = "|>" .
UnaryExpression = PostfixExpression
| PrefixOperator UnaryExpression .
PrefixOperator = "+" | "-" .
PostfixExpression = PrimaryExpression
| PostfixExpression PostfixOperator .
PostfixOperator = MemberExpression
| CallExpression
| IndexExpression .
Dividing by zero or using the mod operator with a divisor of zero for integer and unsigned integer types will result in an error. Floating point divide by zero produces positive or negative infinity according to the IEEE-754 floating point specification.
Flux source is organized into packages. A package consists of one or more source files. Each source file is parsed individually and composed into a single package.
File = [ PackageClause ] [ ImportList ] StatementList .
ImportList = { ImportDeclaration } .
PackageClause = "package" identifier .
A package clause defines the name for the current package. Package names must be valid Flux identifiers. The package clause must be at the begining of any Flux source file. All files in the same package must declare the same package name. When a file does not declare a package clause, all identifiers in that file will belong to the special main package.
IMPL#247 Add package/namespace support
The main package is special for a few reasons:
- It defines the entrypoint of a Flux program
- It cannot be imported
- All statements are marked as producing side effects
A statement controls execution.
Statement = OptionAssignment
| BuiltinStatement
| VariableAssignment
| ReturnStatement
| ExpressionStatement .
ImportDeclaration = "import" [identifier] string_lit
Associated with every package is a package name and an import path.
The import statement takes a package's import path and brings all of the identifiers defined in that package into the current scope under a namespace.
The import statment defines the namespace through which to access the imported identifiers.
By default the identifier of this namespace is the package name unless otherwise specified.
For example, given a variable x
declared in package foo
, importing foo
and referencing x
would look like this:
import "import/path/to/package/foo"
foo.x
Or this:
import bar "import/path/to/package/foo"
bar.x
A package's import path is always absolute. A package may reassign a new value to an option identifier declared in one of its imported packages. A package cannot access nor modify the identifiers belonging to the imported packages of its imported packages. Every statement contained in an imported package is evaluated.
A terminating statement prevents execution of all statements that appear after it in the same block. A return statement is a terminating statement.
ReturnStatement = "return" Expression .
An expression statement is an expression where the computed value is discarded.
ExpressionStatement = Expression .
Examples:
1 + 1
f()
a
Side effects can occur in two ways.
- By reassigning builtin options
- By calling a function that produces side effects
A function produces side effects when it is explicitly declared to have side effects or when it calls a function that itself produces side effects.
Packages are initialized in the following order:
- All imported packages are initialized and assigned to their package identifier.
- All option declarations are evaluated and assigned regardless of order. An option cannot have a dependency on another option assigned in the same package block.
- All variable declarations are evaluated and assigned regardless of order. A variable cannot have a direct or indirect dependency on itself.
- Any package side effects are evaluated.
A package will only be initialized once across all file blocks and across all packages blocks regardless of how many times it is imported.
Initializing imported packages must be deterministic. Specifically after all imported packages are initialized, each option must be assigned the same value. Packages imported in the same file block are initialized in declaration order. Packages imported across different file blocks have no known order. When a set of imports modify the same option, they must be ordered by placing them in the same file block.
Flux contains many preassigned values. These preassigned values are defined in the source files for the various built-in packages.
When a built-in value is not expressible in Flux its value may be defined by the hosting environment. All such values must have a corresponding builtin statement to declare the existence and type of the built-in value.
BuiltinStatement = "builtin" identifer ":" TypeExpression .
TypeExpression = MonoType ["where" Constraints] .
MonoType = Tvar | Basic | Array | Record | Function .
Tvar = "A" … "Z" .
Basic = "int" | "uint" | "float" | "string" | "bool" | "time" | "duration" | "bytes" | "regexp" .
Array = "[" MonoType "]" .
Record = ( "{" [Properties] "}" ) | ( "{" Tvar "with" Properties "}" ) .
Function = "(" [Parameters] ")" "=>" MonoType .
Properties = Property { "," Property } .
Property = identifier ":" MonoType .
Parameters = Parameter { "," Parameter } .
Parameter = [ "<-" | "?" ] identifier ":" MonoType .
Constraints = Constraint { "," Constraint } .
Constraint = Tvar ":" Kinds .
Kinds = identifier { "+" identifier } .
Example
builtin filter : (<-tables: [T], fn: (r: T) => bool) => [T]
Days of the week are represented as integers in the range [0-6]
.
The following builtin values are defined:
Sunday = 0
Monday = 1
Tuesday = 2
Wednesday = 3
Thursday = 4
Friday = 5
Saturday = 6
Months are represented as integers in the range [1-12]
.
The following builtin values are defined:
January = 1
February = 2
March = 3
April = 4
May = 5
June = 6
July = 7
August = 8
September = 9
October = 10
November = 11
December = 12
These are builtin functions that all take a single time
argument and return an integer.
second
int Second returns the second of the minute for the provided time in the range[0-59]
.minute
int Minute returns the minute of the hour for the provided time in the range[0-59]
.hour
int Hour returns the hour of the day for the provided time in the range[0-59]
.weekDay
int WeekDay returns the day of the week for the provided time in the range[0-6]
.monthDay
int MonthDay returns the day of the month for the provided time in the range[1-31]
.yearDay
int YearDay returns the day of the year for the provided time in the range[1-366]
.month
int Month returns the month of the year for the provided time in the range[1-12]
.year
int Year returns the year for the provided time.week
int Week returns the ISO week of the year for the provided time in the range[1-53]
quarter
int Quarter returns the quarter of the year for the provided time in the range[1-4]
millisecond
int Millisecond returns the millisecond of the second for the provided time in the range[0-999]
microsecond
int Microsecond returns the microsecond of the second for the provided time in the range[0-999999]
nanosecond
int Nanosecond returns the nanosecond of the second for the provided time in the range[0-999999999]
date.truncate
takes in a time t and a Duration unit and returns the given time
truncated to the given unit.
Examples:
truncate(t: "2019-06-03T13:59:01.000000000Z", unit: 1s)
returns time2019-06-03T13:59:01.000000000Z
truncate(t: "2019-06-03T13:59:01.000000000Z", unit: 1m)
returns time2019-06-03T13:59:00.000000000Z
truncate(t: "2019-06-03T13:59:01.000000000Z", unit: 1h)
returns time2019-06-03T13:00:00.000000000Z
The builtin function systemTime
returns the current system time.
All calls to systemTime
within a single evaluation of a Flux script return the same time.
today()
returns the now()
timestamp truncated to the day unit.
Example:
option now = 2021-01-01T12:43:21Z
today() // Returns 2021-01-01T00:00:00Z
FixedZone creates a location based on a fixed time offset from UTC.
FixedZone has the following parameters:
Name | Type | Description |
---|---|---|
offset | duration | Offset is the offset from UTC for the time zone. Offset must be less than 24h. Defaults to 0, which produces the UTC location. |
Examples:
fixedZone(offset:-5h) // time zone 5 hours west of UTC
fixedZone(offset:4h30m) // time zone 4 and a half hours east of UTC
IMPL#156 Implement FixedZone function
LoadLoacation loads a locations from a time zone database.
LoadLocation has the following parameters:
Name | Type | Description |
---|---|---|
name | string | Name is the name of the location to load. The names correspond to names in the IANA tzdb. |
Examples:
loadLocation(name:"America/Denver")
loadLocation(name:"America/Chicago")
loadLocation(name:"Africa/Tunis")
IMPL#157 Implement LoadLoacation function
Flux employs a basic data model built from basic data types. The data model consists of tables, records, columns and streams.
A record is a tuple of named values and is represented using an record type.
A column has a label and a data type.
The available data types for a column are:
bool a boolean value, true or false.
uint an unsigned 64-bit integer
int a signed 64-bit integer
float an IEEE-754 64-bit floating-point number
string a sequence of unicode characters
bytes a sequence of byte values
time a nanosecond precision instant in time
duration a nanosecond precision duration of time
A table is set of records, with a common set of columns and a group key.
The group key is a list of columns. A table's group key denotes which subset of the entire dataset is assigned to the table. As such, all records within a table will have the same values for each column that is part of the group key. These common values are referred to as the group key value, and can be represented as a set of key value pairs.
A tables schema consists of its group key, and its column's labels and types.
IMPL#463 Specify the primitive types that make up stream and table types
A stream represents a potentially unbounded set of tables. A stream is grouped into individual tables using the group key. Within a stream each table's group key value is unique.
IMPL#463 Specify the primitive types that make up stream and table types
null
is a predeclared identifier representing a missing or unknown value.
null
is the only value comprising the null type.
Any non-boolean operator that operates on basic types, returns null when at least one of its operands is null. This can be explained intuitively with the following table and by thinking of a null value as an unknown value.
Expression | Evaluates To | Because |
---|---|---|
null + 5 | null | Adding 5 to an unknown value is still unknown |
null * 5 | null | Multiplying an unknown value by 5 is still unknown |
null == 5 | null | We don't know if an unknown value is equal to 5 |
null < 5 | null | We don't know if an unknown value is less than 5 |
null == null | null | We don't know if something unknown is equal to something else that is also unknown |
In other words, operating on something unknown produces something that is still unknown. The only place where this is not the case is in boolean logic.
Because boolean types are nullable, Flux implements ternary logic as a way of handling boolean operators with null operands. Again, by interpreting a null operand as an unknown value, we have the following definitions:
- not null = null
- null or false = null
- null or true = true
- null or null = null
- null and false = false
- null and true = null
- null and null = null
Note according to the definitions above, it is not possible to check whether or not an expression is null using the ==
and !=
operators as these operators will return null if any of their operands are null.
In order to perform such a check, Flux provides a built-in exists
operator defined as follows:
exists x
returns false ifx
is nullexists x
returns true ifx
is not null
The exists operator can also be applied to records as follows:
exists rec.x
returns false ifx
is not a property ofrec
exists rec.x
returns true ifx
is a property ofrec
Transformations define a change to a stream. Transformations may consume an input stream and always produce a new output stream. The order of group keys for the output stream will have a stable output order based on the input stream. The specific ordering may change between releases and it would not be considered a breaking change.
Most transformations output one table for every table they receive from the input stream. Transformations that modify the group keys or values will need to regroup the tables in the output stream. A transformation produces side effects when it is constructed from a function that produces side effects.
Transformations are represented using function types.
Some transformations, for instance map
and filter
, are represented using higher-order functions (functions that accepts other functions).
When specifying the function passed in, make sure that you use the same names for its parameters.
filter
, for instance, accepts argument fn
which is of type (r: A) => bool where A: Record
.
An invocation of filter
must take a function with one argument named r
:
from(bucket: "db")
|> filter(fn: (r) => ...)
This script would fail:
from(bucket: "db")
|> filter(fn: (v) => ...)
// FAILS!: 'v' != 'r'.
The reason is simple: Flux does not support positional arguments, so parameter names matter.
The transformation (in our example, filter
) must know the name of the parameter in the given function in order to invoke it properly.
The process happens the other way around, actually:
our filter
implementation supposes to invoke a function in this way:
fn(r: <the-record>)
So, you have to:
...
|> filter(fn: (r) => ...)
...
The following functions are preassigned in the universe block. These functions each define a transformation.
From produces a stream of tables from the specified bucket. Each unique series is contained within its own table. Each record in the table represents a single point in the series.
The tables schema will include the following columns:
_time
the time of the record_value
the value of the record_start
the inclusive lower time bound of all records_stop
the exclusive upper time bound of all records
Additionally any tags on the series will be added as columns.
The default group key for the tables is every column except _time
and _value
.
From has the following properties:
Name | Type | Description |
---|---|---|
bucket | string | Bucket is the name of the bucket to query. |
bucketID | string | BucketID is the string encoding of the ID of the bucket to query. |
Example:
from(bucket:"telegraf/autogen")
from(bucketID:"0261d8287f4d6000")
Buckets is a type of data source that retrieves a list of buckets that the caller is authorized to access.
It takes no input parameters and produces an output table with the following columns:
Name | Type | Description |
---|---|---|
name | string | The name of the bucket. |
id | string | The internal ID of the bucket. |
organization | string | The organization this bucket belongs to. |
organizationID | string | The internal ID of the organization. |
retentionPolicy | string | The name of the retention policy, if present. |
retentionPeriod | duration | The duration of time for which data is held in this bucket. |
Example:
buckets() |> filter(fn: (r) => r.organization == "my-org")
Yield indicates that the stream received by the yield operation should be delivered as a result of the query. A query may have multiple results, each identified by the name provided to yield.
Yield outputs the input stream unmodified.
Yield has the following properties:
Name | Type | Description |
---|---|---|
name | string | Unique name to give to yielded results. |
Example:
from(bucket: "telegraf/autogen") |> range(start: -5m) |> yield(name:"1")
Note: The yield
function produces side effects.
Fill will scan a stream for null values and replace them with a non-null value.
The output stream will be the same as the input stream, with all null values in the column replaced.
Fill has the following properties:
Name | Type | Description |
---|---|---|
column | string | The column to fill. Defaults to "_value" |
value | bool, int, uint, float, string, time | The constant value to use in place of nulls. The type must match the type of the valueColumn. |
usePrevious | bool | If set, then assign the value set in the previous non-null row. Cannot be used with value . |
AssertEquals is a function that will test whether two streams have identical data. It also outputs the data from the tested stream unchanged, so that this function can be used to perform in-line tests in a query.
AssertEquals has the following properties:
Name | Type | Description |
---|---|---|
name | string | Unique name given to this assertion. |
got | stream | The stream you are testing. May be piped-forward from another function. |
want | stream | A copy of the expected stream. |
Example:
option now = () => 2015-01-01T00:00:00Z
want = from(bucket: "backup-telegraf/autogen") |> range(start: -5m)
// in-line assertion
from(bucket: "telegraf/autogen") |> range(start: -5m) |> assertEquals(want: want)
// equivalent:
got = from(bucket: "telegraf/autogen") |> range(start: -5m)
assertEquals(got: got, want: want)
Diff is a function that will produce a diff between two table streams.
It will match tables from each stream that have the same group key. For each matched table, it will produce a diff. Any rows that were added or removed will be added to the table as a row.
An additional string column with the name _diff
will be created which will contain a "-"
if the row was present in the got
table and not in the want
table or "+"
if the opposite is true.
The diff
function is guaranteed to emit at least one row if the tables are different and no rows if the tables are the same. The exact diff that is produced may change.
Diff has the following properties:
Name | Type | Description |
---|---|---|
got | stream | The stream you are testing. May be piped-forward from another function. |
want | stream | A copy of the expected stream. |
option now = () => 2015-01-01T00:00:00Z
want = from(bucket: "backup-telegraf/autogen") |> range(start: -5m)
// in-line assertion
from(bucket: "telegraf/autogen") |> range(start: -5m) |> assertEquals(want: want)
// equivalent:
got = from(bucket: "telegraf/autogen") |> range(start: -5m)
diff(got: got, want: want)
Aggregate operations output a table for every input table they receive. A list of columns to aggregate must be provided to the operation. The aggregate function is applied to each column in isolation.
Any output table will have the following properties:
- It always contains a single record.
- It will have the same group key as the input table.
- It will contain a column for each provided aggregate column. The column label will be the same as the input table. The type of the column depends on the specific aggregate operation. The value of the column will be null if the input table is empty or the input column has only null values.
- It will not have a _time column
All aggregate operations have the following properties:
Name | Type | Description |
---|---|---|
column | string | Column specifies a column to aggregate. |
The default behavior of aggregates is to skip over null values.
An arbitrary aggregate function fn
is expressed logically using the reduce function:
fn = (column, tables=<-) => reduce(fn: (r, accumulator) => {
return if exists r.column then ... else ...
}, identity: ...)
For example, the sum
transformation is logically equivalent to:
sum = (column, tables=<-) => reduce(fn: (r, accumulator) => {
return if exists r.column then accumulator + r.column else accumulator
}, identity: 0)
AggregateWindow is a function that simplifies aggregating data over fixed windows of time. AggregateWindow windows the data, performs an aggregate operation, and then undoes the windowing to produce an output table for every input table.
AggregateWindow has the following properties:
Name | Type | Description |
---|---|---|
every | duration | Every specifies the window size to aggregate. |
fn | (tables: <-stream, columns: []string) => stream | Fn specifies the aggregate operation to perform. Any of the functions in this Aggregate section that accept a singular column parameter can be provided. |
column | string | Columns specifies column to aggregate. Defaults to "_value". |
timeSrc | string | TimeSrc is the name of a column from the group key to use as the source for the aggregated time. Defaults to "_stop". |
timeDst | string | TimeDst is the name of a new column in which the aggregated time is placed. Defaults to "_time". |
createEmpty | bool | CreateEmpty, if true, will create empty windows and fill them with a null aggregate value. Defaults to true. |
NOTE: make sure that fn
's parameter names match the ones specified above (see why).
Example:
// Compute the mean over 1m intervals for the last 1h.
from(bucket: "telegraf/autogen")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "cpu" and r._field == "usage_system")
|> aggregateWindow(every: 1m, fn:mean)
Covariance is an aggregate operation. Covariance computes the covariance between two columns.
Covariance has the following properties:
Name | Type | Description |
---|---|---|
columns | []string | Columns specifies a list of the two columns to aggregate. This property is required and has no default. |
pearsonr | bool | Pearsonr indicates whether the result should be normalized to be the Pearson R coefficient. |
valueDst | string | ValueDst is the column into which the result will be placed. Defaults to _value . |
Additionally exactly two columns must be provided to the columns
property.
Example:
from(bucket: "telegraf/autogen")
|> range(start: -5m)
|> map(fn: (r) => ({r with x: r._value, y: r._value * r._value / 2}))
|> covariance(columns: ["x", "y"])
Cov computes the covariance between two streams by first joining the streams and then performing the covariance operation per joined table.
Cov has the following properties:
Name | Type | Description |
---|---|---|
x | stream | X is one of the input streams. |
y | stream | Y is one of the input streams. |
on | []string | On is the list of columns on which to join. |
pearsonr | bool | Pearsonr indicates whether the result should be normalized to be the Pearson R coefficient. |
valueDst | string | ValueDst is the column into which the result will be placed. Defaults to _value . |
Example:
cpu = from(bucket: "telegraf/autogen") |> range(start:-5m) |> filter(fn:(r) => r._measurement == "cpu")
mem = from(bucket: "telegraf/autogen") |> range(start:-5m) |> filter(fn:(r) => r._measurement == "mem")
cov(x: cpu, y: mem)
Pearsonr computes the Pearson R correlation coefficient bewteen two streams.
It is defined in terms of the cov
function:
pearsonr = (x,y,on) => cov(x:x, y:y, on:on, pearsonr:true)
Count is an aggregate operation. For each aggregated column, it outputs the number of records as an integer. It will count both null and non-null records.
Count has the following property:
Name | Type | Description |
---|---|---|
column | string | Columns specifies a column to aggregate. Defaults to "_value" . |
Example:
from(bucket: "telegraf/autogen") |> range(start: -5m) |> count()
Integral is an aggregate operation. For each aggregate column, it outputs the area under the curve of records. The curve is defined as a function where the domain is the record times and the range is the record values. This function will return an error if values in the time column are null or not sorted in ascending order. Null values in aggregate columns are skipped.
Integral has the following properties:
Name | Type | Description |
---|---|---|
column | string | Columns specifies a column to aggregate. Defaults to "_value" . |
unit | duration | Unit is the time duration to use when computing the integral. Defaults to 1s |
timeColumn | string | TimeColumn is the name of the column containing the time value. Defaults to _time . |
Example:
from(bucket: "telegraf/autogen")
|> range(start: -5m)
|> filter(fn: (r) => r._measurement == "cpu" and r._field == "usage_system")
|> integral(unit:10s)
Mean is an aggregate operation. For each aggregated column, it outputs the mean of the non null records as a float.
Mean has the following property:
Name | Type | Description |
---|---|---|
column | string | Columns specifies a column to aggregate. Defaults to "_value" . |
Example:
from(bucket:"telegraf/autogen")
|> filter(fn: (r) => r._measurement == "mem" AND
r._field == "used_percent")
|> range(start:-12h)
|> window(every:10m)
|> mean()
Median is defined as:
median = (method="estimate_tdigest", compression=0.0, column="_value", tables=<-) =>
tables
|> quantile(q:0.5, method: method, compression: compression, column: column)
Is it simply a quantile
with the q
paramter always set to 0.5
.
It therefore shares all the same properties as the quantile function.
Example:
// Determine median cpu system usage:
from(bucket: "telegraf/autogen")
|> range(start: -5m)
|> filter(fn: (r) => r._measurement == "cpu" and r._field == "usage_system")
|> median()
Mode produces the mode for a given column. If there are multiple modes, all of them are returned in a table in sorted order. If there is no mode, null is returned. The following data types are supported: string, float64, int64, uint64, bool, time. Mode only considers non-null values.
Mode has the following properties:
Name | Type | Description |
---|---|---|
column | string | Column is the column on which to track the mode. Defaults to _value . |
Example:
from(bucket:"telegraf/autogen")
|> filter(fn: (r) => r._measurement == "mem" AND
r._field == "used_percent")
|> range(start:-12h)
|> window(every:10m)
|> mode(column: "host")
Quantile is both an aggregate operation and a selector operation depending on selected options. In the aggregate methods, it outputs the value that represents the specified quantile of the non null record as a float.
Quantile has the following properties:
Name | Type | Description |
---|---|---|
column | string | Column specifies a column to aggregate. Defaults to "_value" |
q | float | q is a value between 0 and 1 indicating the desired quantile. |
method | string | Method must be one of: estimate_tdigest, exact_mean, or exact_selector. |
compression | float | Compression indicates how many centroids to use when compressing the dataset. A larger number produces a more accurate result at the cost of increased memory requirements. Defaults to 1000. |
The method parameter must be one of:
estimate_tdigest
: an aggregate result that uses a tdigest data structure to compute an accurate quantile estimate on large data sources.exact_mean
: an aggregate result that takes the average of the two points closest to the quantile value.exact_selector
: see Quantile (selector)
Example:
// Determine the 0.99 quantile for cpu system usage:
from(bucket: "telegraf/autogen")
|> range(start: -5m)
|> filter(fn: (r) => r._measurement == "cpu" and r._field == "usage_system")
|> quantile(q: 0.99, method: "estimate_tdigest", compression: 1000.0)
Skew is an aggregate operation. For each aggregated column, it outputs the skew of the non null record as a float.
Skew has the following parameter:
Name | Type | Description |
---|---|---|
column | string | Column specifies a columns to aggregate. Defaults to "_value" . |
Example:
from(bucket: "telegraf/autogen")
|> range(start: -5m)
|> filter(fn: (r) => r._measurement == "cpu" and r._field == "usage_system")
|> skew()
Spread is an aggregate operation.
For each aggregated column, it outputs the difference between the min and max values.
The type of the output column depends on the type of input column: for input columns with type uint
or int
, the output is an int
; for float
input columns the output is a float
.
All other input types are invalid.
Spread has the following parameter:
Name | Type | Description |
---|---|---|
column | string | Column specifies a column to aggregate. Defaults to "_value" . |
Example:
from(bucket: "telegraf/autogen")
|> range(start: -5m)
|> filter(fn: (r) => r._measurement == "cpu" and r._field == "usage_system")
|> spread()
Stddev is an aggregate operation. For each aggregated column, it outputs the standard deviation of the non null record as a float.
Stddev has the following parameters:
Name | Type | Description |
---|---|---|
column | string | Columns specifies a column to aggregate. Defaults to "_value" . |
mode | string | The standard deviation mode. Must be one of "sample" or "population" . Defaults to "sample" . |
The mode
parameter specifies the type of standard deviation to calculate:
"sample"
: Calculates the sample standard deviation, where the data is considered to be part of a larger population."population"
: Calculates the population standard deviation, where the data is considered a population of its own.
Example:
from(bucket: "telegraf/autogen")
|> range(start: -5m)
|> filter(fn: (r) => r._measurement == "cpu" and r._field == "usage_system")
|> stddev()
Stddev is an aggregate operation. For each aggregated column, it outputs the sum of the non null record. The output column type is the same as the input column type.
Sum has the following parameter:
Name | Type | Description |
---|---|---|
column | string | Column specifies a column to aggregate. Defaults to "_value" . |
Example:
from(bucket: "telegraf/autogen")
|> range(start: -5m)
|> filter(fn: (r) => r._measurement == "cpu" and r._field == "usage_system")
|> sum()
Multiple aggregates can be applied to the same table using the aggregate
function.
IMPL#139 Add aggregate function
Selector operations output a table for every input table they receive. A single column on which to operate must be provided to the operation.
Any output table will have the following properties:
- It will have the same group key as the input table.
- It will contain the same columns as the input table.
- It will have a column
_time
which represents the time of the selected record. This can be set as the value of any time column on the input table. By default the_stop
time column is used.
All selector operations have the following properties:
column
string column specifies a which column to use when selecting.
First is a selector operation. First selects the first non null record from the input table.
Example:
from(bucket:"telegraf/autogen") |> first()
Last is a selector operation. Last selects the last non null record from the input table.
Example:
from(bucket: "telegraf/autogen") |> last()
Max is a selector operation. Max selects the maximum record from the input table.
Example:
from(bucket:"telegraf/autogen")
|> range(start:-12h)
|> filter(fn: (r) => r._measurement == "cpu" AND r._field == "usage_system")
|> max()
Min is a selector operation. Min selects the minimum record from the input table.
Example:
from(bucket:"telegraf/autogen")
|> range(start:-12h)
|> filter(fn: (r) => r._measurement == "cpu" AND r._field == "usage_system")
|> min()
Quantile is both an aggregate operation and a selector operation depending on selected options. In the aggregate methods, it outputs the value that represents the specified quantile of the non null record as a float.
Quantile has the following properties:
Name | Type | Description |
---|---|---|
column | string | Column indicates which column will be used for the quantile computation. Defaults to "_value" |
q | float | q is a value between 0 and 1 indicating the desired quantile. |
method | string | Method must be one of: estimate_tdigest, exact_mean, exact_selector. |
The method parameter must be one of:
estimate_tdigest
: See Quantile (Aggregate).exact_mean
: See Quantile (Aggregate).exact_selector
: a selector result that returns the data point for which at leastq
points are less than.
Example:
// Determine the 0.99 quantile for cpu system usage:
from(bucket: "telegraf/autogen")
|> range(start: -5m)
|> filter(fn: (r) => r._measurement == "cpu" and r._field == "usage_system")
|> quantile(q: 0.99, method: "exact_selector")
Median is defined as:
median = (method="estimate_tdigest", compression=0.0, column="_value", tables=<-) =>
tables
|> quantile(q:0.5, method: method, compression: compression, column: column)
Is it simply a quantile
with the q
paramter always set to 0.5
.
It therefore shares all the same properties as the quantile function.
Example:
// Determine median cpu system usage:
from(bucket: "telegraf/autogen")
|> range(start: -5m)
|> filter(fn: (r) => r._measurement == "cpu" and r._field == "usage_system")
|> median(method: "exact_selector")
Sample is a selector operation. Sample selects a subset of the records from the input table.
The following properties define how the sample is selected.
Name | Type | Description |
---|---|---|
n | int | Sample every Nth element. Must be a positive integer. |
pos | int | Pos is the offset from start of results to begin sampling. The pos must be less than n . If pos is less than 0, a random offset is used. Default is -1 (random offset). |
Example:
from(bucket:"telegraf/autogen")
|> filter(fn: (r) => r._measurement == "cpu" AND
r._field == "usage_system")
|> range(start:-1d)
|> sample(n: 5, pos: 1)
Filter applies a predicate function to each input record.
Only the records for which the predicate evaluates to true will be included in the output tables.
One output table is produced for each input table.
The output tables will have the same schema as their corresponding input tables.
If onEmpty
is set to drop
, then empty tables will be dropped.
If onEmpty
is set to keep
, then empty tables will be output to the next transformation.
The default is to drop empty tables.
Filter has the following properties:
Name | Type | Description |
---|---|---|
fn | (r: A) => bool where A: Record | Fn is a predicate function. Records which evaluate to true, will be included in the output tables. |
onEmpty | string | The behavior for empty tables. This can be keep or drop . The default is drop . |
NOTE: make sure that fn
's parameter names match the ones specified above (see why).
Example:
from(bucket:"telegraf/autogen")
|> range(start:-12h)
|> filter(fn: (r) => r._measurement == "cpu" AND
r._field == "usage_system" AND
r.service == "app-server")
Example:
from(bucket:"telegraf/autogen")
|> range(start:-12h)
|> filter(fn: (r) => r._measurement == "cpu", onEmpty: "keep")
Note according to the definition, records for which the predicate evaluates to null will not be included in the output.
There are six highest/lowest functions that compute the top or bottom N records from all tables in a stream based on a specific aggregation method.
- highestMax - computes the top N records from all tables using the maximum of each table.
- highestAverage - computes the top N records from all tables using the average of each table.
- highestCurrent - computes the top N records from all tables using the last value of each table.
- lowestMin - computes the bottom N records from all tables using the minimum of each table.
- lowestAverage - computes the bottom N records from all tables using the average of each table.
- lowestCurrent - computes the bottom N records from all tables using the last value of each table.
All of the highest/lowest functions take the following parameters:
Name | Type | Description |
---|---|---|
n | int | N is the number of records to select. |
column | string | Column is the column to use when aggregating. Defaults to "_value" . |
groupColumns | []string | GroupColumns are the columns on which to group to perform the aggregation. |
Histogram approximates the cumulative distribution function of a dataset by counting data frequencies for a list of bins. A bin is defined by an upper bound where all data points that are less than or equal to the bound are counted in the bin. The bin counts are cumulative.
Each input table is converted into a single output table representing a single histogram. The output table will have a the same group key as the input table. The columns not part of the group key will be removed and an upper bound column and a count column will be added.
Histogram has the following properties:
Name | Type | Description |
---|---|---|
column | string | Column is the name of a column containing the input data values. The column type must be float. Defaults to _value . |
upperBoundColumn | string | UpperBoundColumn is the name of the column in which to store the histogram upper bounds. Defaults to le . |
countColumn | string | CountColumn is the name of the column in which to store the histogram counts. Defaults to _value . |
bins | []float | Bins is a list of upper bounds to use when computing the histogram frequencies. Each element in the array should contain a float value that represents the maximum value for a bin. |
normalize | bool | Normalize when true will convert the counts into frequencies values between 0 and 1. Normalized histograms cannot be aggregated by summing their counts. Defaults to false . |
Example:
histogram(bins:linearBins(start:0.0,width:10.0,count:10)) // compute the histogram of the data using 10 bins from 0,10,20,...,100
HistogramQuantile approximates a quantile given an histogram that approximates the cumulative distribution of the dataset. Each input table represents a single histogram. The histogram tables must have two columns, a count column and an upper bound column. The count is the number of values that are less than or equal to the upper bound value. The table can have any number of records, each representing an entry in the histogram. The counts must be monotonically increasing when sorted by upper bound. If any values in the count column or upper bound column are null, an error will be returned.
Linear interpolation between the two closest bounds is used to compute the quantile. If the either of the bounds used in interpolation are infinite, then the other finite bound is used and no interpolation is performed.
The output table will have a the same group key as the input table. The columns not part of the group key will be removed and a single value column of type float will be added. The count and upper bound columns must not be part of the group key. The value column represents the value of the desired quantile from the histogram.
HistogramQuantile has the following properties:
Name | Type | Description |
---|---|---|
quantile | float | Quantile is a value between 0 and 1 indicating the desired quantile to compute. |
countColumn | string | CountColumn is the name of the column containing the histogram counts. The count column type must be float. Defaults to _value . |
upperBoundColumn | string | UpperBoundColumn is the name of the column containing the histogram upper bounds. The upper bound column type must be float. Defaults to le . |
valueColumn | string | ValueColumn is the name of the output column which will contain the computed quantile. Defaults to _value . |
minValue | float | MinValue is the assumed minumum value of the dataset. Default to 0. |
When the quantile falls below the lowest upper bound, interpolation is performed between minValue and the lowest upper bound. When minValue is equal to negative infinity, the lowest upper bound is used.
Example:
histogramQuantile(quantile:0.9) // compute the 90th quantile using histogram data.
HourSelection retains all rows in the table within the specified hour range. Hours are specified in military time.
HourSelection has the following properties:
Name | Type | Description |
---|---|---|
start | int | Start is the hour retained data should begin (inclusive). |
stop | int | Stop is the hour retain data should end (inclusive). |
timeColumn | string | TimeColumn is the column that HourSelection should parse. Defaults to _time . |
Example:
hourSelection(start: 9, stop: 17) // retains all data that took place between 9am and 5pm
LinearBins produces a list of linearly separated floats.
LinearBins has the following properties:
Name | Type | Description |
---|---|---|
start | float | Start is the first value in the returned list. |
width | float | Width is the distance between subsequent bin values. |
count | int | Count is the number of bins to create. |
infinity | bool | Infinity when true adds an additional bin with a value of positive infinity. Defaults to true . |
LogarithmicBins produces a list of exponentially separated floats.
LogarithmicBins has the following properties:
Name | Type | Description |
---|---|---|
start | float | Start is the first value in the returned bin list. |
factor | float | Factor is the multiplier applied to each subsequent bin. |
count | int | Count is the number of bins to create. |
infinity | bool | Infinity when true adds an additional bin with a value of positive infinity. Defaults to true . |
Limit caps the number of records in output tables to a fixed size n
.
One output table is produced for each input table.
Each output table will contain the first n
records after the first offset
records of the input table.
If the input table has less than offset + n
records, all records except the first offset
ones will be output.
Limit has the following properties:
Name | Type | Description |
---|---|---|
n | int | N is the maximum number of records per table to output. |
offset | int | Offest is the number of records to skip per table before limiting to n . Defaults to 0. |
Example:
from(bucket: "telegraf/autogen")
|> range(start: -1h)
|> limit(n: 10, offset: 1)
Map applies a function to each record of the input tables. The modified records are assigned to new tables based on the group key of the input table. The output tables are the result of applying the map function to each record on the input tables.
When the output record contains a different value for the group key the record is regroup into the appropriate table. When the output record drops a column that was part of the group key that column is removed from the group key.
Map has the following properties:
Name | Type | Description |
---|---|---|
fn | (r: A) => B where A: Record, B:Record | Function to apply to each record. The return value must be a record. |
NOTE: make sure that fn
's parameter names match the ones specified above (see why).
The resulting table will only have columns present on the returned record of the map function.
Use the with
operator to preserve all columns from the input table in the output table.
Example:
from(bucket:"telegraf/autogen")
|> filter(fn: (r) => r._measurement == "cpu" AND
r._field == "usage_system" AND
r.service == "app-server")
|> range(start:-12h)
// Square the value
// The output table has all column from the input table because of the use of `with`.
|> map(fn: (r) => ({r with _value: r._value * r._value}))
Example (creating a new table):
from(bucket:"telegraf/autogen")
|> filter(fn: (r) => r._measurement == "cpu" AND
r._field == "usage_system" AND
r.service == "app-server")
|> range(start:-12h)
// create a new table by copying each row into a new format
// The output table now only has the columns `_time` and `app_server`.
|> map(fn: (r) => ({_time: r._time, app_server: r.service}))
Reduce aggregates records in each table according to the reducer fn
. The output for each table will be the group key of the table, plus columns corresponding to each field in the reducer record.
If the reducer record contains a column with the same name as a group key column, then the group key column's value is overwritten, and the outgoing group key is changed. However, if two reduced tables write to the same destination group key, then the function will error.
Reduce has the following properties:
Name | Type | Description |
---|---|---|
fn | (r: A, accumulator: B) => B where B: Record | Function to apply to each record with a reducer record of type B. |
identity | B | An initial value to use when creating a reducer. May be used more than once in asynchronous processing use cases. |
NOTE: make sure that fn
's parameter names match the ones specified above (see why).
Example (compute the sum of the value column):
from(bucket:"telegraf/autogen")
|> filter(fn: (r) => r._measurement == "cpu" AND
r._field == "usage_system" AND
r.service == "app-server")
|> range(start:-12h)
|> reduce(fn: (r, accumulator) =>
({sum: r._value + accumulator.sum}), identity: {sum: 0.0}))
Example (compute the sum and count in a single reducer):
from(bucket:"telegraf/autogen")
|> filter(fn: (r) => r._measurement == "cpu" AND
r._field == "usage_system" AND
r.service == "app-server")
|> range(start:-12h)
|> reduce(fn: (r, accumulator) =>
({sum: r._value + accumulator.sum, count: accumulator.count + 1.0}), identity: {sum: 0.0, count:0.0}))
Example (compute the product of all values):
from(bucket:"telegraf/autogen")
|> filter(fn: (r) => r._measurement == "cpu" AND
r._field == "usage_system" AND
r.service == "app-server")
|> range(start:-12h)
|> reduce(fn: (r, accumulator) =>
({prod: r._value * accumulator.prod}), identity: {prod: 1.0}))
Range filters records based on provided time bounds. Each input tables records are filtered to contain only records that exist within the time bounds. Records with a null value for their time will be filtered. Each input table's group key value is modified to fit within the time bounds. Tables where all records exists outside the time bounds are filtered entirely.
IMPL#244 Update range to default to aligned window ranges.
Range has the following properties:
Name | Type | Description |
---|---|---|
start | time | Start inclusively specifies the lower bound (oldest) time of the range by which to filter records. |
stop | time | Stop exclusively specifies the upper bound (newest) time of the range by which to filter records. Defaults to the value of the now option time. |
Example:
from(bucket:"telegraf/autogen")
|> range(start:-12h, stop: -15m)
|> filter(fn: (r) => r._measurement == "cpu" AND
r._field == "usage_system")
Example:
from(bucket:"telegraf/autogen")
|> range(start:2018-05-22T23:30:00Z, stop: 2018-05-23T00:00:00Z)
|> filter(fn: (r) => r._measurement == "cpu" AND
r._field == "usage_system")
Rename renames specified columns in a table. There are two variants: one which takes a map of old column names to new column names, and one which takes a mapping function. If a column is renamed and is part of the group key, the column name in the group key will be updated. If a specified column is not present in a table an error will be thrown.
Rename has the following properties:
Name | Type | Description |
---|---|---|
columns | record | Columns is a map of old column names to new names. Cannot be used with fn . |
fn | (column: string) => string | Fn defines a function mapping between old and new column names. Cannot be used with columns . |
NOTE: make sure that fn
's parameter names match the ones specified above (see why).
Example usage:
Rename a single column:
from(bucket: "telegraf/autogen")
|> range(start: -5m)
|> rename(columns: {host: "server"})
Rename all columns using fn
parameter:
from(bucket: "telegraf/autogen")
|> range(start: -5m)
|> rename(fn: (column) => column + "_new")
Drop excludes specified columns from a table. Columns to exclude can be specified either through a list, or a predicate function. When a dropped column is part of the group key it will also be dropped from the key. If dropping a column in the group key would result in merging two tables with different schemas, an error will be thrown.
Drop has the following properties:
Name | Type | Description |
---|---|---|
columns | []string | Columns is an array of column to exclude from the resulting table. Cannot be used with fn . |
fn | (column: string) => bool | Fn is a predicate function, columns that evaluate to true are dropped. Cannot be used with columns . |
NOTE: make sure that fn
's parameter names match the ones specified above (see why).
Example Usage:
Drop a list of columns:
from(bucket: "telegraf/autogen")
|> range(start: -5m)
|> drop(columns: ["host", "_measurement"])
Drop columns matching a predicate:
from(bucket: "telegraf/autogen")
|> range(start: -5m)
|> drop(fn: (column) => column =~ /usage*/)
Keep is the inverse of drop. It returns a table containing only columns that are specified,
ignoring all others.
Only columns in the group key that are also specified in keep
will be kept in the resulting group key.
If not all columns in the group key are kept, this can result in merging tables that will have the same group key.
If this would result in merging two tables with different schemas, an error will be thrown.
Keep has the following properties:
Name | Type | Description |
---|---|---|
columns | []string | Columns is an array of column to exclude from the resulting table. Cannot be used with fn . |
fn | (column: string) => bool | Fn is a predicate function, columns that evaluate to true are kept. Cannot be used with columns . |
NOTE: make sure that fn
's parameter names match the ones specified above (see why).
Example Usage:
Keep a list of columns:
from(bucket: "telegraf/autogen")
|> range(start: -5m)
|> keep(columns: ["_time", "_value"])
Keep all columns matching a predicate:
from(bucket: "telegraf/autogen")
|> range(start: -5m)
|> keep(fn: (column) => column =~ /inodes*/)
Duplicate duplicates a specified column in a table.
If the specified column is not present in a table an error will be thrown.
If the specified column is part of the group key, it will be duplicated, but it will not be part of the group key of the output table.
If the column indicated by as
does not exist, a column will be added to the table.
If the column does exist, that column will be overwritten with the values specified by column
.
If the as
column is in the group key, there are two possible outcomes:
If the column indicated by column
is in the group key, then as
will remain in the group key and have the same group key value as column
.
If column
is not part of the group key, then as
is removed from the group key.
Duplicate has the following properties:
Name | Type | Description |
---|---|---|
column | string | Column is the name of the column to duplicate. |
as | string | As is the name that should be assigned to the duplicate column. |
Example usage:
Duplicate column server
under the name host
:
from(bucket: "telegraf/autogen")
|> range(start:-5m)
|> filter(fn: (r) => r._measurement == "cpu")
|> duplicate(column: "host", as: "server")
Set assigns a static value to each record. The key may modify and existing column or it may add a new column to the tables. If the column that is modified is part of the group key, then the output tables will be regrouped as needed.
Set has the following properties:
Name | Type | Description |
---|---|---|
key | string | Key is the label for the column to set. |
value | string | Value is the string value to set. |
Example:
from(bucket: "telegraf/autogen") |> set(key: "mykey", value: "myvalue")
Sorts orders the records within each table.
One output table is produced for each input table.
The output tables will have the same schema as their corresponding input tables.
When sorting, nulls will always be first. When desc: false
is set, then nulls are less than every other value. When desc: true
, nulls are greater than every value.
Sort has the following properties:
Name | Type | Description |
---|---|---|
columns | []string | Columns is the sort order to use; precedence from left to right. Default is ["_value"] . |
desc | bool | Desc indicates results should be sorted in descending order. Default is false . |
Example:
from(bucket:"telegraf/autogen")
|> filter(fn: (r) => r._measurement == "system" AND
r._field == "uptime")
|> range(start:-12h)
|> sort(columns:["region", "host", "value"])
Tail caps the number of records in output tables to a fixed size n
.
One output table is produced for each input table.
Each output table will contain the last n
records before the last offset
records of the input table.
If the input table has less than offset + n
records, all records except the last offset
ones will be output.
Tail has the following properties:
Name | Type | Description |
---|---|---|
n | int | N is the maximum number of records per table to output. |
offset | int | Offset is the number of records to skip per table before tailing to n . Defaults to 0. |
Example:
from(bucket: "telegraf/autogen")
|> range(start: -1h)
|> tail(n: 10, offset: 1)
Group groups records based on their values for specific columns. It produces tables with new group keys based on the provided properties.
Group has the following properties:
Name | Type | Description |
---|---|---|
columns | []string | Columns is a list used to calculate the new group key. Defaults to [] . |
mode | string | The grouping mode, can be one of "by" or "except" . Defaults to "by" . |
When using "by"
mode, the specified columns
are the new group key.
When using "except"
mode, the new group key is the difference between the columns of the table under exam and columns
.
Examples
By
from(bucket: "telegraf/autogen")
|> range(start: -30m)
|> group(columns: ["host", "_measurement"])
Or:
...
|> group(columns: ["host", "_measurement"], mode: "by")
Records are grouped by the "host"
and "_measurement"
columns.
The resulting group key is ["host", "_measurement"]
, so a new table for every different ["host", "_measurement"]
value is created.
Every table in the result contains every record for some ["host", "_measurement"]
value.
Every record in some resulting table has the same value for the columns "host"
and "_measurement"
.
Except
from(bucket: "telegraf/autogen")
|> range(start: -30m)
|> group(columns: ["_time"], mode: "except")
Records are grouped by the set of all columns in the table, excluding "_time"
.
For example, if the table has columns ["_time", "host", "_measurement", "_field", "_value"]
then the group key would be
["host", "_measurement", "_field", "_value"]
.
Single-table grouping
from(bucket: "telegraf/autogen")
|> range(start: -30m)
|> group()
Records are grouped into a single table.
The group key of the resulting table is empty.
Columns lists the column labels of input tables. For each input table, it outputs a table with the same group key columns, plus a new column containing the labels of the input table's columns. Each row in an output table contains the group key value and the label of one column of the input table. So, each output table has the same number of rows as the number of columns of the input table.
Columns has the following properties:
Name | Type | Description |
---|---|---|
column | string | Column is the name of the output column to store the column labels. Defaults to _value . |
Example:
from(bucket: "telegraf/autogen")
|> range(start: -30m)
|> columns(column: "labels")
Getting every possible column label in a single table as output:
from(bucket: "telegraf/autogen")
|> range(start: -30m)
|> columns()
|> keep(columns: ["_value"])
|> group()
|> distinct()
Keys outputs the group key of input tables.
For each input table, it outputs a table with the same group key columns, plus a _value
column containing the labels of the input table's group key.
Each row in an output table contains the group key value and the label of one column in the group key of the input table.
So, each output table has the same number of rows as the size of the group key of the input table.
Keys has the following properties:
Name | Type | Description |
---|---|---|
column | string | Column is the name of the output column to store the group key labels. Defaults to _value . |
Example:
from(bucket: "telegraf/autogen")
|> range(start: -30m)
|> keys(column: "keys")
Getting every possible key in a single table as output:
from(bucket: "telegraf/autogen")
|> range(start: -30m)
|> keys()
|> keep(columns: ["_value"])
|> group()
|> distinct()
KeyValues outputs a table with the input table's group key, plus two columns _key
and _value
that correspond to unique (column, value) pairs from the input table.
KeyValues has the following properties:
Name | Type | Description |
---|---|---|
keyColumns | []string | KeyColumns is a list of columns from which values are extracted. |
fn | (schema: schema) => []string | Fn is a schema function that may by used instead of keyColumns to identify the set of columns. |
NOTE: make sure that fn
's parameter names match the ones specified above (see why).
Additional requirements:
- Only one of
keyColumns
orfn
may be used in a single call. - All columns indicated must be of the same type.
- Each input table must have all of the columns listed by the
keyColumns
parameter.
from(bucket: "telegraf/autogen")
|> range(start: -30m)
|> filter(fn: (r) => r._measurement == "cpu")
|> keyValues(keyColumns: ["usage_idle", "usage_user"])
from(bucket: "telegraf/autogen")
|> range(start: -30m)
|> filter(fn: (r) => r._measurement == "cpu")
|> keyValues(fn: (schema) => schema.columns |> filter(fn:(v) => v.label =~ /usage_.*/))
filterColumns = (fn) => (schema) => schema.columns |> filter(fn:(v) => fn(column:v))
from(bucket: "telegraf/autogen")
|> range(start: -30m)
|> filter(fn: (r) => r._measurement == "cpu")
|> keyValues(fn: filterColumns(fn: (column) => column.label =~ /usage_.*/))
Examples:
Given the following input table with group key ["_measurement"]
:
| _time | _measurement | _value | tagA |
| ----- | ------------ | ------ | ---- |
| 00001 | "m1" | 1 | "a" |
| 00002 | "m1" | 2 | "b" |
| 00003 | "m1" | 3 | "c" |
| 00004 | "m1" | 4 | "b" |
keyValues(keyColumns: ["tagA"])
produces the following table with group key ["_measurement"]:
| _measurement | _key | _value |
| ------------ | ------ | ------ |
| "m1" | "tagA" | "a" |
| "m1" | "tagA" | "b" |
| "m1" | "tagA" | "c" |
keyColumns(keyColumns: ["tagB"])
produces the following error message:
received table with columns [_time, _measurement, _value, tagA] not having key columns [tagB]
Window groups records based on a time value. New columns are added to uniquely identify each window and those columns are added to the group key of the output tables.
A single input record will be placed into zero or more output tables, depending on the specific windowing function.
By default the start boundary of a window will align with the Unix epoch (zero time) modified by the offset of the location
option.
Window has the following properties:
Name | Type | Description |
---|---|---|
every | duration | Every is the duration of time between windows. Defaults to period 's value. One of every , period or intervals must be provided. |
period | duration | Period is the duration of the window. Period is the length of each interval. It can be negative, indicating the start and stop boundaries are reversed. Defaults to every 's value. One of every , period or intervals must be provided. |
offset | duration | Offset is the duration by which to shift the window boundaries. It can be negative, indicating that the offset goes backwards in time. Defaults to 0, which will align window end boundaries with the every duration. |
intervals | (start: time, stop: time) => [...]interval | Intervals is a set of intervals to be used as the windows. One of every , period or intervals must be provided. When intervals is provided, every , period , and offset must be zero. |
timeColumn | string | TimeColumn is the name of the time column to use. Defaults to _time . |
startColumn | string | StartColumn is the name of the column containing the window start time. Defaults to _start . |
stopColumn | string | StopColumn is the name of the column containing the window stop time. Defaults to _stop . |
createEmpty | bool | CreateEmpty specifies whether empty tables should be created. Defaults to false . |
Example:
from(bucket:"telegraf/autogen")
|> range(start:-12h)
|> window(every:10m)
|> max()
window(every:1h) // window the data into 1 hour intervals
window(intervals: intervals(every:1d, period:8h, offset:9h)) // window the data into 8 hour intervals starting at 9AM every day.
Pivot collects values stored vertically (column-wise) in a table and aligns them horizontally (row-wise) into logical sets.
Pivot has the following properties:
Name | Type | Description |
---|---|---|
rowKey | []string | RowKey is the list of columns used to uniquely identify a row for the output. |
columnKey | []string | ColumnKey is the list of columns used to pivot values onto each row identified by the rowKey. |
valueColumn | string | ValueColumn identifies the single column that contains the value to be moved around the pivot. |
The group key of the resulting table will be the same as the input tables, excluding the columns found in the columnKey
and valueColumn
.
This is because these columns are not part of the resulting output table.
Any columns in the original table that are not referenced in the rowKey
or the original table's group key will be dropped.
Every input row should have a 1:1 mapping to a particular row, column pair in the output table, determined by its values for the rowKey
and columnKey
.
In the case where more than one value is identified for the same row, column pair in the output, the last value
encountered in the set of table rows is taken as the result.
The output is constructed as follows:
- The set of columns for the new table is the
rowKey
unioned with the group key, but excluding the columns indicated by thecolumnKey
and thevalueColumn
. A new column is added to the set of columns for each unique value identified in the input by thecolumnKey
parameter. The label of a new column is the concatenation of the values atcolumnKey
(if the value is null,"null"
is used) using_
as a separator. - A new row is created for each unique value identified in the input by the
rowKey
parameter. - For each new row, values for group key columns stay the same, while values for new columns are determined from the input tables by the value in
valueColumn
at the row identified by therowKey
values and the new column's label. If no value is found, the value is set to null.
Example 1, align fields within each measurement that have the same timestamp:
from(bucket:"test")
|> range(start: 1970-01-01T00:00:00.000000000Z)
|> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
Input:
_time | _value | _measurement | _field |
---|---|---|---|
1970-01-01T00:00:00.000000001Z | 1.0 | "m1" | "f1" |
1970-01-01T00:00:00.000000001Z | 2.0 | "m1" | "f2" |
1970-01-01T00:00:00.000000001Z | null | "m1" | "f3" |
1970-01-01T00:00:00.000000001Z | 3.0 | "m1" | null |
1970-01-01T00:00:00.000000002Z | 4.0 | "m1" | "f1" |
1970-01-01T00:00:00.000000002Z | 5.0 | "m1" | "f2" |
null | 6.0 | "m1" | "f2" |
1970-01-01T00:00:00.000000002Z | null | "m1" | "f3" |
1970-01-01T00:00:00.000000003Z | null | "m1" | "f1" |
1970-01-01T00:00:00.000000003Z | 7.0 | "m1" | null |
1970-01-01T00:00:00.000000004Z | 8.0 | "m1" | "f3" |
Output:
_time | _measurement | f1 | f2 | f3 | null |
---|---|---|---|---|---|
1970-01-01T00:00:00.000000001Z | "m1" | 1.0 | 2.0 | null | 3.0 |
1970-01-01T00:00:00.000000002Z | "m1" | 4.0 | 5.0 | null | null |
null | "m1" | null | 6.0 | null | null |
1970-01-01T00:00:00.000000003Z | "m1" | null | null | null | 7.0 |
1970-01-01T00:00:00.000000004Z | "m1" | null | null | 8.0 | null |
Example 2, align fields and measurements that have the same timestamp.
Note the effect of:
- having null values in some
columnKey
value; - having more values for the same
rowKey
andcolumnKey
value (the 11th row overrides the 10th, and so does the 15th with the 14th).
from(bucket:"test")
|> range(start: 1970-01-01T00:00:00.000000000Z)
|> pivot(rowKey:["_time"], columnKey: ["_measurement", _field"], valueColumn: "_value")
Input:
_time | _value | _measurement | _field |
---|---|---|---|
1970-01-01T00:00:00.000000001Z | 1.0 | "m1" | "f1" |
1970-01-01T00:00:00.000000001Z | 2.0 | "m1" | "f2" |
1970-01-01T00:00:00.000000001Z | 3.0 | null | "f3" |
1970-01-01T00:00:00.000000001Z | 4.0 | null | null |
1970-01-01T00:00:00.000000002Z | 5.0 | "m1" | "f1" |
1970-01-01T00:00:00.000000002Z | 6.0 | "m1" | "f2" |
1970-01-01T00:00:00.000000002Z | 7.0 | "m1" | "f3" |
1970-01-01T00:00:00.000000002Z | 8.0 | null | null |
null | 9.0 | "m1" | "f3" |
1970-01-01T00:00:00.000000003Z | 10.0 | "m1" | null |
1970-01-01T00:00:00.000000003Z | 11.0 | "m1" | null |
1970-01-01T00:00:00.000000003Z | 12.0 | "m1" | "f3" |
1970-01-01T00:00:00.000000003Z | 13.0 | null | null |
null | 14.0 | "m1" | null |
null | 15.0 | "m1" | null |
Output:
_time | m1_f1 | m1_f2 | null_f3 | null_null | m1_f3 | m1_null |
---|---|---|---|---|---|---|
1970-01-01T00:00:00.000000001Z | 1.0 | 2.0 | 3.0 | 4.0 | null | null |
1970-01-01T00:00:00.000000002Z | 5.0 | 6.0 | null | 8.0 | 7.0 | null |
null | null | null | null | null | 9.0 | 15.0 |
1970-01-01T00:00:00.000000003Z | null | null | null | 13.0 | 12.0 | 11.0 |
Join merges two or more input streams, whose values are equal on a set of common columns, into a single output stream. Null values are not considered equal when comparing column values. The resulting schema is the union of the input schemas, and the resulting group key is the union of the input group keys.
Join has the following properties:
Name | Type | Description |
---|---|---|
tables | record | Tables is the map of streams to be joined. |
on | []string | On is the list of columns on which to join. |
method | string | Method must be one of: inner, cross, left, right, or full. Defaults to "inner" . |
Both tables
and on
are required parameters.
The on
parameter and the cross
method are mutually exclusive.
Join currently only supports two input streams.
IMPL#83 Add support for joining more than 2 streams
IMPL#84 Add support for different join types
Example:
Given the following two streams of data:
-
SF_Temperature
_time _field _value 0001 "temp" 70 0002 "temp" 75 0003 "temp" 72 -
NY_Temperature
_time _field _value 0001 "temp" 55 0002 "temp" 56 0003 "temp" 55
And the following join query:
join(tables: {sf: SF_Temperature, ny: NY_Temperature}, on: ["_time", "_field"])
The output will be:
_time | _field | _value_ny | _value_sf |
---|---|---|---|
0001 | "temp" | 55 | 70 |
0002 | "temp" | 56 | 75 |
0003 | "temp" | 55 | 72 |
The column schema of the output stream is the union of the input schemas, and the same goes for the output group key.
Columns that must be renamed due to ambiguity (i.e. columns that occur in more than one input stream) are renamed
according to the template <column>_<table>
.
Example:
-
SF_Temperature
-
Group Key for table
{ _field }
_time _field _value 0001 "temp" 70 0002 "temp" 75 0003 "temp" 72 -
NY_Temperature
-
Group Key for all tables
{ _time, _field }
_time _field _value 0001 "temp" 55 _time _field _value 0002 "temp" 56 _time _field _value 0003 "temp" 55
join(tables: {sf: SF_Temperature, ny: NY_Temperature}, on: ["_time"])
produces:
-
Group Key for all tables
{ _time, _field_ny, _field_sf }
_time _field_ny _field_sf _value_ny _value_sf 0001 "temp" "temp" 55 70 _time _field_ny _field_sf _value_ny _value_sf 0002 "temp" "temp" 56 75 _time _field_ny _field_sf _value_ny _value_sf 0003 "temp" "temp" 55 72
Union concatenates two or more input streams into a single output stream. In tables that have identical schema and group keys, contents of the tables will be concatenated in the output stream. The output schemas of the Union operation shall be the union of all input schemas.
Union does not preserve the sort order of the rows within tables. A sort operation may be added if a specific sort order is needed.
Union has the following properties:
Name | Type | Description |
---|---|---|
tables | []stream | Tables specifies the streams to union together. There must be at least two streams. |
For example, given this stream, SF_Weather
with group key "_field"
on both tables:
_time | _field | _value |
---|---|---|
0001 | "temp" | 70 |
0002 | "temp" | 75 |
_time | _field | _value |
---|---|---|
0001 | "humidity" | 81 |
0002 | "humidity" | 82 |
And this stream, NY_Weather
, also with group key "_field"
on both tables:
_time | _field | _value |
---|---|---|
0001 | "temp" | 55 |
0002 | "temp" | 56 |
_time | _field | _value |
---|---|---|
0001 | "pressure" | 29.82 |
0002 | "pressure" | 30.01 |
union(tables: [SF_Weather, NY_Weather])
produces this stream (whose tables are grouped by "_field"
):
_time | _field | _value |
---|---|---|
0001 | "temp" | 70 |
0002 | "temp" | 75 |
0001 | "temp" | 55 |
0002 | "temp" | 56 |
_time | _field | _value |
---|---|---|
0001 | "humidity" | 81 |
0002 | "humidity" | 82 |
_time | _field | _value |
---|---|---|
0001 | "pressure" | 29.82 |
0002 | "pressure" | 30.01 |
Unique returns a table with unique values in a specified column. In the case there are multiple rows taking on the same value in the provided column, the first row is kept and the remaining rows are discarded.
Unique has the following properties:
Name | Type | Description |
---|---|---|
column | string | Column that is to have unique values. Defaults to _value . |
Cumulative sum computes a running sum for non null records in the table. The output table schema will be the same as the input table.
Cumulative sum has the following properties:
Name | Type | Description |
---|---|---|
columns | []string | Columns is a list of columns on which to operate. Defaults to ["_value"] . |
Example:
from(bucket: "telegraf/autogen")
|> range(start: -5m)
|> filter(fn: (r) => r._measurement == "disk" and r._field == "used_percent")
|> cumulativeSum(columns: ["_value"])
Derivative computes the time based difference between subsequent non-null records. This function will return an error if values in the time column are null or not sorted in ascending order. If there are multiple rows with the same time value, only the first row will be used to compute the derivative.
Derivative has the following properties:
Name | Type | Description |
---|---|---|
unit | duration | Unit is the time duration to use for the result. Defaults to 1s . |
nonNegative | bool | NonNegative indicates if the derivative is allowed to be negative. If a value is encountered which is less than the previous value, then the derivative will be null for that row. |
columns | []string | Columns is a list of columns on which to compute the derivative Defaults to ["_value"] . |
timeColumn | string | TimeColumn is the column name for the time values. Defaults to _time . |
from(bucket: "telegraf/autogen")
|> range(start: -5m)
|> filter(fn: (r) => r._measurement == "disk" and r._field == "used_percent")
|> derivative(nonNegative: true, columns: ["used_percent"])
Difference computes the difference between subsequent records.
Every user-specified column of numeric type is subtracted while others are kept intact.
Difference has the following properties:
Name | Type | Description |
---|---|---|
nonNegative | bool | NonNegative indicates if the difference is allowed to be negative. If a value is encountered which is less than the previous value then the result is null. |
columns | []string | Columns is a list of columns on which to compute the difference. Defaults to ["_value"] . |
keepFirst | bool | KeepFirst indicates if the first row should be kept. If true , then the difference will be null . Defaults to false . |
Rules for subtracting values for numeric types:
- the difference between two non-null values is their algebraic difference; or null, if the result is negative and
nonNegative: true
; - null minus some value is always null;
- some value
v
minus null isv
minus the last non-null value seen beforev
; or null ifv
is the first non-null value seen.
Example of difference:
_time | A | B | C | tag |
---|---|---|---|---|
0001 | null | 1 | 2 | tv |
0002 | 6 | 2 | null | tv |
0003 | 4 | 2 | 4 | tv |
0004 | 10 | 10 | 2 | tv |
0005 | null | null | 1 | tv |
Result (nonNegative: false
):
_time | A | B | C | tag |
---|---|---|---|---|
0002 | null | 1 | null | tv |
0003 | -2 | 0 | 2 | tv |
0004 | 6 | 8 | -2 | tv |
0005 | null | null | -1 | tv |
Result (nonNegative: true
):
_time | A | B | C | tag |
---|---|---|---|---|
0002 | null | 1 | null | tv |
0003 | null | 0 | 2 | tv |
0004 | 6 | 8 | null | tv |
0005 | null | null | null | tv |
Result (nonNegative: false
, keepFirst: true
):
_time | A | B | C | tag |
---|---|---|---|---|
0001 | null | null | null | tv |
0002 | null | 1 | null | tv |
0003 | -2 | 0 | 2 | tv |
0004 | 6 | 8 | -2 | tv |
0005 | null | null | -1 | tv |
Example of script:
from(bucket: "telegraf/autogen")
|> range(start: -5m)
|> filter(fn: (r) => r._measurement == "cpu" and r._field == "usage_user")
|> difference()
Elapsed returns the elapsed time between subsequent records.
Given an input table, elapsed
will return the same table with an additional elapsed
column and without the first
record as elapsed time is not defined.
Elapsed has the following properties:
Name | Type | Description |
---|---|---|
unit | duration | The unit in which the elapsed time is returned. Defaults to 1s . |
timeColumn | string | Name of the flux.TTime column on which to compute the elapsed time. Defaults to _time . |
columnName | string | Name of the column of elapsed times. Defaults to elapsed . |
Elapsed errors if the timeColumn cannot be found within the given table.
Increase returns the total non-negative difference between values in a table. A main usage case is tracking changes in counter values which may wrap over time when they hit a threshold or are reset. In the case of a wrap/reset, we can assume that the absolute delta between two points will be at least their non-negative difference.
increase = (tables=<-, columns=["_value"]) =>
tables
|> difference(nonNegative: true, columns:columns)
|> cumulativeSum()
Example:
Given the following input table.
| _time | _value |
| ----- | ------ |
| 00001 | 1 |
| 00002 | 5 |
| 00003 | 3 |
| 00004 | 4 |
increase()
produces the following table.
| _time | _value |
| ----- | ------ |
| 00002 | 4 |
| 00003 | 7 |
| 00004 | 8 |
Moving Average computes the moving average the _value
column.
It means the values of a user-defined period for a defined number of points.
Moving Average has the following properties:
Name | Type | Description |
---|---|---|
n | int | N specifies the number of points to mean. |
Rules for taking the moving average for numeric types:
- the average over a period populated by
n
values is equal to their algebraic mean - the average over a period populated by only null values is null
- moving averages that include null values skip over those values
- if
n
is less than the number of records in a table,movingAverage
returns the average of the available values
Examples of moving average (n
= 2):
_time | A | B | C | D | tag |
---|---|---|---|---|---|
0001 | null | 1 | 2 | null | tv |
0002 | 6 | 2 | null | null | tv |
0003 | 4 | null | 4 | 4 | tv |
Result:
_time | A | B | C | D | tag |
---|---|---|---|---|---|
0002 | 6 | 1.5 | 2 | null | tv |
0003 | 5 | 2 | 4 | 4 | tv |
Example of script:
// A 5 point moving average would be called as such:
from(bucket: "telegraf/autogen"):
|> range(start: -7d)
|> movingAverage(n: 5)
Timed Moving Average means the values of the following records for a defined number of points, for the specified column.
timedMovingAverage = (every, period, column="_value", tables=<-) =>
tables
|> window(every: every, period: period)
|> mean(column:column)
|> duplicate(column: "_stop", as: "_time")
|> window(every: inf)
Timed Moving Average has the following properties:
Name | Type | Description |
---|---|---|
every | duration | Every specifies the frequency of windows. |
period | duration | Period specifies the window size to mean. |
column | string | Column specifies a column to aggregate. Defaults to "_value" |
Example:
// A 5 year moving average would be called as such:
from(bucket: "telegraf/autogen"):
|> range(start: -7y)
|> timedMovingAverage(every: 1y, period: 5y)
Exponential Moving Average computes the exponential moving average of the _value
column.
It is a weighted moving average that gives more weighting to recent data as opposed to older data.
Name | Type | Description |
---|---|---|
n | int | N specifies the number of points to mean. |
Rules for taking the exponential moving average for numeric types:
- the first value of an exponential moving average over
n
values is the algebraic mean of the firstn
values - subsequent values are calculated as
y(t) = x(t) * k + y(t-1) * (1 - k)
, where- the constant
k
is defined ask = 2 / (1 + n)
y(t)
is defined as exponential moving average at timet
x(t)
is defined as the value at timet
- the constant
exponentialMovingAverage
ignores ignores null values and does not count them in calculations
Examples (n = 2
):
_time | A | B | C | tag |
---|---|---|---|---|
0001 | 2 | null | 2 | tv |
0002 | null | 10 | 4 | tv |
0003 | 8 | 20 | 5 | tv |
Result:
_time | A | B | C | tag |
---|---|---|---|---|
0002 | 2 | 10 | 3 | tv |
0003 | 6 | 16.67 | 4.33 | tv |
Example of script:
// A 5 point exponential moving average would be called as such:
from(bucket: "telegraf/autogen"):
|> range(start: -7d)
|> exponentialMovingAverage(n: 5)
doubleEMA
Double Exponential Moving Average computes the double exponential moving averages of the _value
column.
It is a weighted moving average that gives more weighting to recent data as opposed to older data, with less lag than a single exponential moving average.
Name | Type | Description |
---|---|---|
n | int | N specifies the number of points to mean. |
A double exponential moving average is defined as DEMA = 2 * EMA_N - EMA of EMA_N
, where
EMA
is an exponential moving average, andN = n
is the look-back period.
In flux, doubleEMA
is defined as
doubleEMA = (n, tables=<-) =>
tables
|> exponentialMovingAverage(n:n)
|> duplicate(column:"_value", as:"ema")
|> exponentialMovingAverage(n:n)
|> map(fn: (r) => ({r with _value: 2.0*r.ema - r._value}))
|> drop(columns: ["ema"])
The behavior of the exponential moving averages used for calculating the double exponential moving average is the same as defined for exponentialMovingAverage
A proper double exponential moving average requires at least 2 * n - 1
values.
Example (n = 10
):
_time | _value | result |
---|---|---|
0001 | 1 | - |
0002 | 2 | - |
0003 | 3 | - |
0004 | 4 | - |
0005 | 5 | - |
0006 | 6 | - |
0007 | 7 | - |
0008 | 8 | - |
0009 | 9 | - |
0010 | 10 | - |
0011 | 11 | - |
0012 | 12 | - |
0013 | 13 | - |
0014 | 14 | - |
0015 | 15 | - |
0016 | 14 | - |
0017 | 13 | - |
0018 | 12 | - |
0019 | 11 | 13.57 |
0020 | 10 | 12.70 |
0021 | 9 | 11.70 |
0022 | 8 | 10.61 |
0023 | 7 | 9.466 |
0024 | 6 | 8.286 |
0025 | 5 | 7.090 |
0026 | 4 | 5.890 |
0027 | 3 | 4.694 |
0028 | 2 | 3.506 |
0029 | 1 | 2.331 |
Example of script:
// A 5 point double exponential moving average would be called as such:
from(bucket: "telegraf/autogen"):
|> range(start: -7d)
|> doubleEMA(n: 5)
tripleEMA
Triple Exponential Moving Average computes the triple exponential moving averages of the _value
column.
It is a weighted moving average that gives more weighting to recent data as opposed to older data, with less lag than a double or single exponential moving average.
Name | Type | Description |
---|---|---|
n | int | N specifies the number of points to mean. |
A triple exponential moving average is defined as TEMA = (3 * EMA_1) - (3 * EMA_2) + EMA_3
, where
EMA_1
is the exponential moving average of the original data,EMA_2
is the exponential moving average ofEMA_1
,EMA_3
is the exponential moving average ofEMA_2
, andN = n
is the look-back period.
In flux, tripleEMA
is defined as
tripleEMA = (n, tables=<-) =>
tables
|> exponentialMovingAverage(n:n)
|> duplicate(column:"_value", as:"ema1")
|> exponentialMovingAverage(n:n)
|> duplicate(column:"_value", as:"ema2")
|> exponentialMovingAverage(n:n)
|> map(fn: (r) => ({r with _value: 3.0*r.ema1 - 3.0*r.ema2 + r._value}))
|> drop(columns: ["ema1", "ema2"])
The behavior of the exponential moving averages used for calculating the triple exponential moving average is the same as defined for exponentialMovingAverage
A proper double exponential moving average requires at least 3 * n - 2
values.
Example (n = 4
):
_time | _value | result |
---|---|---|
0001 | 1 | - |
0002 | 2 | - |
0003 | 3 | - |
0004 | 4 | - |
0005 | 5 | - |
0006 | 6 | - |
0007 | 7 | - |
0008 | 8 | - |
0009 | 9 | - |
0010 | 10 | 10 |
0011 | 11 | 11 |
0012 | 12 | 12 |
0013 | 13 | 13 |
0014 | 14 | 14 |
0015 | 15 | 15 |
0016 | 14 | 14.43 |
0017 | 13 | 13.35 |
0018 | 12 | 12.56 |
0019 | 11 | 11 |
0020 | 10 | 9.907 |
0021 | 9 | 8.866 |
0022 | 8 | 7.859 |
0023 | 7 | 6.871 |
0024 | 6 | 5.891 |
0025 | 5 | 4.913 |
0026 | 4 | 3.933 |
0027 | 3 | 2.950 |
0028 | 2 | 1.963 |
0029 | 1 | 0.973 |
Example of script:
// A 5 point triple exponential moving average would be called as such:
from(bucket: "telegraf/autogen"):
|> range(start: -7d)
|> tripleEMA(n: 5)
The triple exponential derivative indicator is an oscillator and can also be used as a momentum indicator. Triple exponential derivative calculates a triple exponential moving average of the log of the data input over the period of time. It prevents cycles that are shorter than the defined period from being considered by the indicator. Triple exponential derivative oscillates around a zero line.
When used as an oscillator, a positive value indicates an overbought market while a negative value indicates an oversold market.
When used as a momentum indicator, a positive value suggests momentum is increasing while a negative value suggests momentum is decreasing.
It acts on the _value
column.
Name | Type | Description |
---|---|---|
n | int | N specifies the sample size of the algorithm. |
A triple exponential derivative is defined as TRIX[i] = ((EMA3[i] / EMA3[i - 1]) - 1) * 100
, where
EMA_3
isEMA(EMA(EMA(data)))
If there are not enough values to computer a proper triple exponential derivative, the _value
column of the output table will be null
, and all other columns will
be the same as the last record of the input table.
The behavior of the exponential moving averages used for calculating the triple exponential derivative is the same as defined for exponentialMovingAverage
:
tripleExponentialDerivative
ignores null values and does not count them in calculations
Example (n = 4
):
_time | _value | result |
---|---|---|
0001 | 1 | - |
0002 | 2 | - |
0003 | 3 | - |
0004 | 4 | - |
0005 | 5 | - |
0006 | 6 | - |
0007 | 7 | - |
0008 | 8 | - |
0009 | 9 | - |
0010 | 10 | _ |
0011 | 11 | 18.18 |
0012 | 12 | 15.38 |
0013 | 13 | 13.33 |
0014 | 14 | 11.76 |
0015 | 15 | 10.52 |
0016 | 14 | 8.304 |
0017 | 13 | 5.641 |
0018 | 12 | 3.039 |
0019 | 11 | 0.716 |
0020 | 10 | -1.28 |
0021 | 9 | -2.99 |
0022 | 8 | -4.49 |
0023 | 7 | -5.83 |
0024 | 6 | -7.09 |
0025 | 5 | -8.35 |
0026 | 4 | -9.67 |
0027 | 3 | -11.1 |
0028 | 2 | -12.8 |
0029 | 1 | -15.0 |
Example of script:
// A 14 point triple exponential derivative would be called as such:
from(bucket: "telegraf/autogen"):
|> range(start: -7d)
|> tripleExponentialDerivative(n: 14)
The relative strength index (RSI) is a momentum indicator that compares the magnitude of recent increases and decreases
over a specified time period to measure speed and change of data movements. It acts on the _value
column.
Name | Type | Description |
---|---|---|
n | int | N specifies the sample size of the algorithm. |
Rules for calculating the relative strength index for numeric types:
- The general process of calculating is
RSI = 100 - (100 / (1 + (AVG GAIN / AVG LOSS)))
- For the first value of the RSI,
AVG GAIN
andAVG LOSS
are simplen
-period averages - For subsequent calculations,
AVG GAIN
=((PREVIOUS AVG GAIN) * (n - 1)) / n
AVG LOSS
=((PREVIOUS AVG LOSS) * (n - 1)) / n
relativeStrengthIndex
ignores null values and skips over them
Example of relative strength index (N
= 10):
_time | A | B | tag |
---|---|---|---|
0001 | 1 | 1 | tv |
0002 | 2 | 2 | tv |
0003 | 3 | 3 | tv |
0004 | 4 | 4 | tv |
0005 | 5 | 5 | tv |
0006 | 6 | 6 | tv |
0007 | 7 | 7 | tv |
0008 | 8 | 8 | tv |
0009 | 9 | 9 | tv |
0010 | 10 | 10 | tv |
0011 | 11 | 11 | tv |
0012 | 12 | 12 | tv |
0013 | 13 | 13 | tv |
0014 | 14 | 14 | tv |
0015 | 15 | 15 | tv |
0016 | 16 | 16 | tv |
0017 | 17 | null | tv |
0018 | 18 | 17 | tv |
Result:
_time | A | B | tag |
---|---|---|---|
0011 | 100 | 100 | tv |
0012 | 100 | 100 | tv |
0013 | 100 | 100 | tv |
0014 | 100 | 100 | tv |
0015 | 100 | 100 | tv |
0016 | 90 | 90 | tv |
0017 | 81 | 90 | tv |
0018 | 72.9 | 81 | tv |
Example of script:
// A 14 point relative search index plot would be called as such:
from(bucket: "telegraf/autogen"):
|> range(start: -7d)
|> relativeSearchIndex(n: 14)
Kaufman's Efficiency Ratio indicator is created by taking the absolute value of the results of Chande Momentum Oscillator and dividing by 100 to give the 0 to 1 range. Higher values represent a more efficient or trending market.
The default column KaufmansER
takes in is "_value".
Name | Type | Description |
---|---|---|
n | int | N specifies the period. |
Example:
_time | _value | result |
---|---|---|
0001 | 1 | - |
0002 | 2 | - |
0003 | 3 | - |
0004 | 4 | - |
0005 | 5 | - |
0006 | 6 | - |
0007 | 7 | - |
0008 | 8 | - |
0009 | 9 | - |
0010 | 10 | - |
0011 | 11 | 1 |
0012 | 12 | 1 |
0013 | 13 | 1 |
0014 | 14 | 1 |
0015 | 15 | 1 |
0016 | 14 | 0.8 |
0017 | 13 | 0.6 |
0018 | 12 | 0.4 |
0019 | 11 | 0.2 |
0020 | 10 | 0 |
0021 | 9 | 0.2 |
0022 | 8 | 0.4 |
0023 | 7 | 0.6 |
0024 | 6 | 0.8 |
0025 | 5 | 1 |
0026 | 4 | 1 |
0027 | 3 | 1 |
0028 | 2 | 1 |
0029 | 1 | 1 |
Example of script:
from(bucket: "telegraf/autogen"):
|> range(start: -7d)
|> kaufmansER(n: 10)
Kaufman's Adaptive Moving Average is designed to account for market noise or volatility. This trend-following indicator can be used to identify the overall trend, time turning points and filter data movements.
Name | Type | Description |
---|---|---|
n | int | N specifies the period. |
column | string | Column is the column that KaufmansAMA should be performed on and is assumed to be "_value" if left unspecified. |
Example:
_time | _value | result |
---|---|---|
0001 | 1 | - |
0002 | 2 | - |
0003 | 3 | - |
0004 | 4 | - |
0005 | 5 | - |
0006 | 6 | - |
0007 | 7 | - |
0008 | 8 | - |
0009 | 9 | - |
0010 | 10 | - |
0011 | 11 | 10.444444444444445 |
0012 | 12 | 11.135802469135802 |
0013 | 13 | 11.964334705075446 |
0014 | 14 | 12.869074836153025 |
0015 | 15 | 13.81615268675168 |
0016 | 14 | 13.871008014588556 |
0017 | 13 | 13.71308456353558 |
0018 | 12 | 13.553331356741122 |
0019 | 11 | 13.46599437575161 |
0020 | 10 | 13.4515677602438 |
0021 | 9 | 13.29930139347417 |
0022 | 8 | 12.805116570729282 |
0023 | 7 | 11.752584300922965 |
0024 | 6 | 10.036160535131101 |
0025 | 5 | 7.797866963961722 |
0026 | 4 | 6.109926091089845 |
0027 | 3 | 4.727736717272135 |
0028 | 2 | 3.515409287373408 |
0029 | 1 | 2.3974496040963373 |
Example of script:
from(bucket: "telegraf/autogen"):
|> range(start: -7d)
|> kaufmansAMA(n: 10, column: "_value")
Holt Winters applies the Holt-Winters damped prediction method with Nelder-Mead optimization to the given dataset.
In a table, the dataset is composed of the values in the timeColumn
and in the value column
.
The Holt-Winters method predicts n
seasonally adjusted values for the specified column
.
If the data presents no seasonality, the user can specify seasonality = 0
.
The user can request to include the fitted data (on the given dataset) in the result by setting withFit = true
.
The n
predicted values occur at the interval
specified by the user.
If your interval
is 6m
and n
is 3
, you will receive three predicted values that are each six minutes apart.
seasonality
is the seasonal pattern parameter and delimits the length of a seasonal pattern according to interval
.
If your interval
is 2m
and s
is 3
, then the seasonal pattern occurs every six minutes, that is, every three data points.
Before processing the data, Holt Winters discards rows that have a null timestamp. Null values are treated like missing data points and taken into account when applying Holt Winters.
HoltWinters supposes to work with evenly spaced values in time, so:
interval
is used to divide the data in time buckets;- if many values are in the same bucket, the first one is selected, the others are skipped;
- if no value is present for a bucket, a missing value is added for that bucket.
By default, HoltWinters uses the first value per time bucket.
The user can use windowing and selectors/aggregates to specify a different behavior.
The user can also use aggregateWindow
as a data preparation step.
Parameters:
Name | Type | Description |
---|---|---|
n | int | N specifies the number of values to predict. |
seasonality | int | Seasonality specifies the number of points that are in a season. Defaults to 0 . |
interval | duration | Interval is the interval between two data points. |
withFit | bool | WithFit specifies if the fitted data should be returned. Defaults to false . |
timeColumn | string | TimeColumn specifies the time column for the dataset. Defaults to "_time" . |
column | string | Column specifies the value column for the dataset. Defaults to "_value" . |
Examples:
from(bucket: "waterhouse/autogen")
|> range(start: -7y)
|> filter(fn: (r) => r._field == "water_level")
|> aggregateWindow(every: 379m, fn: first).
|> holtWinters(n: 10, seasonality: 4, interval: 379m)
For now, aggregateWindow
lacks of some parameters of window
.
For example, if you need offset
, you must apply this workaround:
from(bucket: "waterhouse/autogen")
|> range(start: -7y)
|> filter(fn: (r) => r._field == "water_level")
|> window(every: 379m, offset: 348m)
// The selector.
|> first()
// We need to timestamp every selected record per time bucket with the beginning
// of the window. So, we overwrite "_time" with "_start".
|> duplicate(column: "_start", as: "_time")
// We need to "un-window", otherwise the Holt-Winters algorithm would run
// for each window separately.
|> window(every: inf)
|> holtWinters(n: 10, seasonality: 4, interval: 379m)
The Chande Momentum Oscillator (CMO) is a technical momentum indicator developed by Tushar Chande. The CMO indicator is created by calculating the difference between the sum of all recent higher data points and the sum of all recent lower data points, then dividing the result by the sum of all data movement over a given time period. The result is multiplied by 100 to give the -100 to +100 range.
Name | Type | Description |
---|---|---|
n | int | N specifies the period. |
columns | []string | Columns is list of all columns that chandeMomentumOscillator should be performed on. |
Example:
_time | A | result |
---|---|---|
0001 | 1 | - |
0002 | 2 | - |
0003 | 3 | - |
0004 | 4 | - |
0005 | 5 | - |
0006 | 6 | - |
0007 | 7 | - |
0008 | 8 | - |
0009 | 9 | - |
0010 | 10 | - |
0011 | 11 | 100 |
0012 | 12 | 100 |
0013 | 13 | 100 |
0014 | 14 | 100 |
0015 | 15 | 100 |
0016 | 14 | 80 |
0017 | 13 | 60 |
0018 | 12 | 40 |
0019 | 11 | 20 |
0020 | 10 | 0 |
0021 | 9 | -20 |
0022 | 8 | -40 |
0023 | 7 | -60 |
0024 | 6 | -80 |
0025 | 5 | -100 |
0026 | 4 | -100 |
0027 | 3 | -100 |
0028 | 2 | -100 |
0029 | 1 | -100 |
Example of script:
from(bucket: "telegraf/autogen"):
|> range(start: -7d)
|> chandeMomentumOscillator(n: 10, columns: ["_value"])
Distinct produces the unique values for a given column. Null is considered its own distinct value if it is present.
Distinct has the following properties:
Name | Type | Description |
---|---|---|
column | string | Column is the column on which to track unique values. Defaults to _value . |
Example:
from(bucket: "telegraf/autogen")
|> range(start: -5m)
|> filter(fn: (r) => r._measurement == "cpu")
|> distinct(column: "host")
TimeShift adds a fixed duration to time columns. The output table schema is the same as the input table. If the time is null, the time will continue to be null.
TimeShift has the following properties:
Name | Type | Description |
---|---|---|
duration | duration | Duration is the amount to add to each time value. May be a negative duration. |
columns | []string | Columns is list of all columns that should be shifted. Defaults to ["_start", "_stop", "_time"] . |
Example:
from(bucket: "telegraf/autogen")
|> range(start: -5m)
|> timeShift(duration: 1000h)
StateCount computes the number of consecutive records in a given state. The state is defined via a user-defined predicate. For each consecutive point for which the predicate evaluates as true, the state count will be incremented. When a point evaluates as false, the state count is reset.
The state count will be added as an additional column to each record. If the expression evaluates as false, the value will be -1. If the expression generates an error during evaluation, the point is discarded, and does not affect the state count.
StateCount has the following parameters:
Name | Type | Description |
---|---|---|
fn | (r: A) => bool where A: Record | Fn is a function that returns true when the record is in the desired state. |
column | string | Column is the name of the column to use to output the state count. Defaults to stateCount . |
NOTE: make sure that fn
's parameter names match the ones specified above (see why).
Example:
from(bucket: "telegraf/autogen")
|> range(start: 2018-05-22T19:53:26Z)
|> stateCount(fn:(r) => r._value > 80)
StateDuration computes the duration of a given state. The state is defined via a user-defined predicate. For each consecutive point for which the predicate evaluates as true, the state duration will be incremented by the duration between points. When a point evaluates as false, the state duration is reset.
The state duration will be added as an additional column to each record. If the expression evaluates as false, the value will be -1. If the expression generates an error during evaluation, the point is discarded, and does not affect the state duration.
Note that as the first point in the given state has no previous point, its state duration will be 0.
The duration is represented as an integer in the units specified.
StateDuration requires sorted and not-null timestamps. So, if one of this requirements is not met, it returns an error.
StateDuration has the following parameters:
Name | Type | Description |
---|---|---|
fn | (r: A) => bool where A: Record | Fn is a function that returns true when the record is in the desired state. |
column | string | Column is the name of the column to use to output the state value. Defaults to stateDuration . |
timeColumn | string | TimeColumn is the name of the column used to extract timestamps. Defaults to _time . |
unit | duration | Unit is the dimension of the output value. Defaults to 1s . |
NOTE: make sure that fn
's parameter names match the ones specified above (see why).
Example:
from(bucket: "telegraf/autogen")
|> range(start: 2018-05-22T19:53:26Z)
|> stateDuration(fn:(r) => r._value > 80)
The To operation takes data from a stream and writes it to a bucket. To has the following properties:
Name | Type | Description |
---|---|---|
bucket | string | Bucket is the bucket name into which data will be written. |
bucketID | string | BucketID is the bucket ID into which data will be written. |
org | string | Org is the organization name of the bucket. |
orgID | string | OrgID is the organization ID of the bucket. |
host | string | Host is the location of a remote host to write to. Defaults to "" . |
token | string | Token is the authorization token to use when writing to a remote host. Defaults to "" . |
timeColumn | string | TimeColumn is the name of the time column of the output. Defaults to "_time" . |
tagColumns | []string | TagColumns is a list of columns to be used as tags in the output. Defaults to all columns of type string, excluding all value columns and the _field column if present. |
fieldFn | (r: A) => B where A: Record , B: Record | Function that takes a record from the input table and returns a record. For each record from the input table fieldFn returns a record that maps output field key to output value. Default: (r) => ({ [r._field]: r._value }) |
TODO(nathanielc): The fieldFn is not valid and needs to change. It uses dynamic record keys which is not allowed.
Either bucket
or bucketID
is required.
Both are mutually exclusive.
Similarly org
and orgID
are mutually exclusive and only required when writing to a remote host.
Both host
and token
are optional parameters, however if host
is specified, token
is required.
NOTE: make sure that fieldFn
's parameter names match the ones specified above (see why).
For example, given the following table:
_time | _start | _stop | _measurement | _field | _value |
---|---|---|---|---|---|
0005 | 0000 | 0009 | "a" | "temp" | 100.1 |
0006 | 0000 | 0009 | "a" | "temp" | 99.3 |
0007 | 0000 | 0009 | "a" | "temp" | 99.9 |
The default to
operation to(bucket:"my-bucket", org:"my-org")
is equivalent to writing the above data using the following line protocol:
_measurement=a temp=100.1 0005
_measurement=a temp=99.3 0006
_measurement=a temp=99.9 0007
For an example overriding to
's default settings, given the following table:
_time | _start | _stop | tag1 | tag2 | hum | temp |
---|---|---|---|---|---|---|
0005 | 0000 | 0009 | "a" | "b" | 55.3 | 100.1 |
0006 | 0000 | 0009 | "a" | "b" | 55.4 | 99.3 |
0007 | 0000 | 0009 | "a" | "b" | 55.5 | 99.9 |
The operation to(bucket:"my-bucket", org:"my-org", tagColumns:["tag1"], fieldFn: (r) => return {"hum": r.hum, "temp": r.temp})
is equivalent to writing the above data using the following line protocol:
_tag1=a hum=55.3,temp=100.1 0005
_tag1=a hum=55.4,temp=99.3 0006
_tag1=a hum=55.5,temp=99.9 0007
Note: The to
function produces side effects.
Top and Bottom sort a table and limits the table to only n records.
Top and Bottom have the following parameters:
Name | Type | Description |
---|---|---|
n | int | N is the number of records to keep. |
columns | []string | Columns provides the sort order for the tables. |
Example:
from(bucket:"telegraf/autogen")
|> range(start: -5m)
|> filter(fn:(r) => r._measurement == "net" and r._field == "bytes_sent")
|> top(n:10, columns:["_value"])
Tests whether a value is a member of a set.
Contains has the following parameters:
Name | Type | Description |
---|---|---|
value | bool, int, uint, float, string, time | The value to search for. |
set | array of bool, int, uint, float, string, time | The set of values to search. |
Example:
contains(value:1, set:[1,2,3])
will return true
.
These functions allow to extract a table from a stream of tables (tableFind
) and access its
columns (getColumn
) and records (getRecord
).
The example below provides an overview of these functions, further information can be found in the paragraphs below.
data = from(bucket:"telegraf/autogen")
|> range(start: -5m)
|> filter(fn:(r) => r._measurement == "cpu")
// extract the first available table for which "_field" is equal to "usage_idle"
t = data |> tableFind(fn: (key) => key._field == "usage_idle")
// this is the same as 'tableFind(tables: data, fn: (key) => key._field == "usage_idle")'
// extract the "_value" column from the table
values = t |> getColumn(column: "_value")
// the same as 'getColumn(table: t, column: "_value")'
// extract the first record from the table
r0 = t |> getRecord(idx: 0)
// the same as 'getRecord(table: t, idx: 0)'
TableFind extracts the first table in a stream of table whose group key values match a given predicate. If no table is found, the function errors.
It has the following parameters:
Name | Type | Description |
---|---|---|
fn | (key: A) => bool where A: Record | Fn is a predicate function. The result is the first table for which fn is true. |
NOTE: make sure that fn
's parameter names match the ones specified above (see why).
Example:
t = from(bucket:"telegraf/autogen")
|> range(start: -5m)
|> filter(fn:(r) => r._measurement == "cpu")
|> tableFind(fn: (key) => key._field == "usage_idle")
// `t` contains the first table whose group key value for `_field` is "usage_idle" after filtering.
// `t` can be used as input to `getColumn` and `getRecord`.
GetColumn extracts a column from a table given its label. If the label is not present in the set of columns, the function errors.
It has the following parameters:
Name | Type | Description |
---|---|---|
column | string | Column is the name of the column to extract. |
Example:
vs = from(bucket:"telegraf/autogen")
|> range(start: -5m)
|> filter(fn:(r) => r._measurement == "cpu")
|> tableFind(fn: (key) => key._field == "usage_idle")
|> getColumn(column: "_value")
// use values
x = vs[0] + vs[1]
GetRecord extracts a record from a table given its index. If the index is out of bounds, the function errors.
It has the following parameters:
Name | Type | Description |
---|---|---|
idx | int | Idx is the index of the record to extract. |
Example:
r0 = from(bucket:"telegraf/autogen")
|> range(start: -5m)
|> filter(fn:(r) => r._measurement == "cpu")
|> tableFind(fn: (key) => key._field == "usage_idle")
|> getRecord(idx: 0)
// use values
x = r0._field + "--" + r0._measurement
FindColumn extracts the first table from a stream of tables where the group key values match a given predicate, and from that table extracts a specified column. The column is returned as an array. The function returns an empty array if no table is found, or if the column label is not present in the set of columns.
It has the following parameters:
Name | Type | Description |
---|---|---|
fn | (key: A) => bool where A: Record | Fn is a predicate function. The column is extracted from the first table for which fn is true. |
column | string | Column is the name of the column to extract. |
NOTE: make sure that fn
's parameter names match the ones specified above (see why).
Example:
vs = from(bucket:"telegraf/autogen")
|> range(start: -5m)
|> filter(fn:(r) => r._measurement == "cpu")
|> findColumn(fn: (key) => key._field == "usage_idle", column: "_value")
// use values
x = vs[0] + vs[1]
FindRecord extracts the first table from a stream of tables where the group key values match a given predicate, and from that table extracts a record at a specified index. The record is returned as a record value. The function returns an empty record if no table is found, or if the index is out of bounds.
It has the following parameters:
Name | Type | Description |
---|---|---|
fn | (key: A) => bool where A: Record | Fn is a predicate function. The record is extracted from the first table for which fn is true. |
idx | int | Idx is the index of the record to extract. |
Example:
r0 = from(bucket:"telegraf/autogen")
|> range(start: -5m)
|> filter(fn:(r) => r._measurement == "cpu")
|> findRecord(fn: (key) => key._field == "usage_idle", idx: 0)
// use values
x = r0._field + "--" + r0._measurement
Truncates all entries in a given time column to a specified unit.
Example:
from(bucket:"telegraph/autogen")
|> truncateTimeColumn(unit: 1s)
Currently, truncateTimeColumn
only works with the default time column _time
.
Convert a value to a bool.
Example: from(bucket: "telegraf") |> filter(fn:(r) => r._measurement == "mem" and r._field == "used") |> toBool()
The function toBool
is defined as toBool = (tables=<-) => tables |> map(fn:(r) => ({r with _value: bool(v:r._value)}))
.
If you need to convert other columns use the map
function directly with the bool
function.
Convert a value to a int.
Example: from(bucket: "telegraf") |> filter(fn:(r) => r._measurement == "mem" and r._field == "used") |> toInt()
The function toInt
is defined as toInt = (tables=<-) => tables |> map(fn:(r) => ({r with _value: int(v:r._value)}))
.
If you need to convert other columns use the map
function directly with the int
function.
Convert a value to a float.
Example: from(bucket: "telegraf") |> filter(fn:(r) => r._measurement == "mem" and r._field == "used") |> toFloat()
The function toFloat
is defined as toFloat = (tables=<-) => tables |> map(fn:(r) => ({r with _value: float(v:r._value)}))
.
If you need to convert other columns use the map
function directly with the float
function.
Convert a value to a duration.
Example: from(bucket: "telegraf") |> filter(fn:(r) => r._measurement == "mem" and r._field == "used") |> toDuration()
The function toDuration
is defined as toDuration = (tables=<-) => tables |> map(fn:(r) => ({r with _value: duration(v:r._value)}))
.
If you need to convert other columns use the map
function directly with the duration
function.
TODO: implement duration as a column type in tables (influxdata#470)
Convert a value to a string.
Example: from(bucket: "telegraf") |> filter(fn:(r) => r._measurement == "mem" and r._field == "used") |> toString()
The function toString
is defined as toString = (tables=<-) => tables |> map(fn:(r) => ({r with _value: string(v:r._value)}))
.
If you need to convert other columns use the map
function directly with the string
function.
Convert a value to a time.
Example: from(bucket: "telegraf") |> filter(fn:(r) => r._measurement == "mem" and r._field == "used") |> toTime()
The function toTime
is defined as toTime = (tables=<-) => tables |> map(fn:(r) => ({r with _value: time(v:r._value)}))
.
If you need to convert other columns use the map
function directly with the time
function.
Convert a value to a uint.
Example: from(bucket: "telegraf") |> filter(fn:(r) => r._measurement == "mem" and r._field == "used") |> toUInt()
The function toUInt
is defined as toUInt = (tables=<-) => tables |> map(fn:(r) => ({r with _value: uint(v:r._value)}))
.
If you need to convert other columns use the map
function directly with the uint
function.
IMPL#242 Update specification around type conversion functions.
Compare two strings lexicographically. Returns 0 if v==t, -1 if v < t, and +1 if v > t.
Example: compare(v: "a", t: "a")
returns the int 0
Reports whether substr is in v.
Example: containsStr(v: "abc", substr: "a")
returns the boolean true
.
Reports whether any value from chars is in v.
Example: containsAny(v: "abc", chars: "and")
returns the boolean true
.
Reports the number of non-overlapping instances in which substr appears in v.
Example: countStr(v: "aaaaa", substr: "a")
returns the int 5
.
Reports whether v and t, interpreted as UTF-8 strings, are equal under Unicode case-folding.
Example: equalFold(v: "Go", t: "go")
returns boolean true
.
Tests whether the string v begins with prefix.
Example: hasPrefix(v: "go gopher", t: "go")
returns boolean true
.
Tests whether the string v ends with suffix.
Example: hasSuffix(v: "go gopher", t: "go")
returns boolean false
.
Returns the index of the first instance of substr in v, or -1 if substr is not present in s.
Example: index(v: "go gopher", substr: "go")
returns int 0
.
Returns the index of the first instance of any value in substr in v, or -1 if all values in substr are not present in s.
Example: indexAny(v: "chicken", chars: "aeiouy")
returns int 2
.
Returns whether or not v is a digit.
Example: isDigit(v: "A")
returns boolean false
.
Returns whether or not v is a letter.
Example: isLetter(v: "A")
returns boolean true
.
Returns whether or not v is lowercase.
Example: isLower(v: "A")
returns boolean false
.
Returns whether or not v is uppercase.
Example: isUpper(v: "A")
returns boolean true
.
Concatenates the elements of inputted string array to create a single string with v as the separator.
Example: joinStr(arr: []string{"a", "b", "c"}, v: ",")
returns string a,b,c
.
Returns the index of the last instance of substr in v, or -1 if substr is not present in v.
Example: lastIndex(v: "go gopher", substr: "go")
returns int 3
.
Returns the index of the last instance of any value from chars in v, or -1 if no value from chars is present in v.
Example: lastIndexAny(v: "go gopher", substr: "go")
returns int 4
.
Returns the length of the given string, defined to be the number of utf code points.
Example: strlen(v: "apple")
returns the int 5
.
Example: strlen(v: "汉字")
returns the int 2
.
Returns a new string consisting of i copies of the string v.
Example: repeat("v: na", i: 2)
returns string nana
.
Returns a copy of the string v with the first i non-overlapping instances of t replaced by u.
Example: replace(v: "oink oink oink", t: "oink", u: "moo", i: 2)
returns string moo moo oink
.
Returns a copy of the string v with the all non-overlapping instances of t replaced by u.
Example: replaceAll(v: "oink oink oink", t: "oink", u: "moo")
returns string moo moo moo
.
Slices v into all substrings separated by t and returns a slice of the substrings between those separators.
Example: split(v: "a,b,c", t: ",")
returns []string ["a" "b" "c"]
.
Slices v into all substrings after each instance of t and returns a slice of the substrings between those separators.
Example: splitAfter(v: "a,b,c", t: ",")
returns []string ["a," "b," "c"]
.
Slices v into all substrings after each instance of t and returns a slice of the substrings between those separators. i determines the number of substrings to return.
Example: splitAfterN(v: "a,b,c", t: ",", i: 2)
returns []string ["a," "b,c"]
.
Slices v into all substrings separated by t and returns a slice of the substrings between those separators. i determines the number of substrings to return.
Example: splitN(v: "a,b,c", t: ",", i: 2)
returns []string ["a" "b,c"]
.
Returns substring as specified by the given indices start and end, based on utf code points.
Example: substring(v: "influx", start: 0, end: 3)
returns string inf
.
Convert a string to title case.
Example: title(v: "a flux of foxes")
returns the string A Flux Of Foxes
.
Convert a string to lower case.
Example: toLower(v: "KOALA")
returns the string koala
.
Returns a copy of the string v with all Unicode letters mapped to their title case.
Example: toTitle("loud noises")
returns the string LOUD NOISES
.
Convert a string to upper case.
Example: toUpper(v: "koala")
returns the string KOALA
.
Remove leading and trailing characters specified in cutset from a string.
Example: trim(v: ".abc.", cutset: ".")
returns the string abc
.
Remove leading characters specified in cutset from a string.
Example: trim(v: ".abc.", cutset: ".")
returns the string abc.
.
Remove a prefix from a string. Strings that do not start with the prefix are returned unchanged.
Example: trimPrefix(v: "123_abc", prefix: "123")
returns the string _abc
.
Remove trailing characters specified in cutset from a string.
Example: trim(v: ".abc.", cutset: ".")
returns the string .abc
.
Remove leading and trailing spaces from a string.
Example: trimSpace(v: " abc ")
returns the string abc
.
Remove a suffix from a string. Strings that do not end with the suffix are returned unchanged.
Example: trimSuffix(v: "abc_123", suffix: "123")
returns the string abc_
.
Parse a regular expression and return, if successful, a regular expression value that can be used to match against text.
Example: compile(v: "abcd")
returns the regular expression value abcd
.
Return a string holding the text of the leftmost match in v of the regular expression.
Example: findString(r: regexp.compile("foo.?"), v: "seafood fool")
returns the string food
.
Returns a two-element slice of integers defining the location of the leftmost match in v of the regular expression.
Example: findStringIndex(r: regexp.compile("ab?"), v: "tablett")
returns the int array [1 3]
.
Return the source text used to compile the regular expression.
Example: getString(v: regexp.compile("abcd"))
returns the regular expression value abcd
.
Report whether the string v contains any match of the regular expression r.
Example: matchString(r: regexp.compile("(gopher){2}"), v: "gophergophergopher")
returns boolean true
Returns a copy of v, replacing matches of the Regexp r with the replacement string t.
Example: replaceAllString(r: regexp.compile("a(x*)b"), v: "-ab-axxb-", t: "T")
returns string -T-T-
.
Return a string that escapes all regular expression metacharacters inside the argument text; the returned string is a regular expression matching the literal text.
Example: quoteMeta("Escaping symbols like: .+*?()|[]{}^$")
returns string Escaping symbols like: \.\+\*\?\(\)\|\[\]\{\}\^\$
.
Slices v into substrings separated by the expression and returns a slice of the substrings between those expression matches.
Example: splitRegex(r: regexp.compile("a*"), v: "abaabaccadaaae", i: 5)
returns string array ["", "b", "b", "c", "cadaaae"]
.
A composite data type is a collection of primitive data types that together have a higher meaning.
A trigger is associated with a table and contains logic for when it should fire. When a trigger fires its table is materialized. Materializing a table makes it available for any down stream operations to consume. Once a table is materialized it can no longer be modified.
Triggers can fire based on these inputs:
Input | Description |
---|---|
Current processing time | The current processing time is the system time when the trigger is being evaluated. |
Watermark time | The watermark time is a time where it is expected that no data will arrive that is older than it. |
Record count | The number of records currently in the table. |
Group key value | The group key value of the table. |
Additionally triggers can be finished, which means that they will never fire again. Once a trigger is finished, its associated table is deleted.
Currently all tables use an after watermark trigger which fires only once the watermark has exceeded the _stop
value of the table and then is immediately finished.
Data sources are responsible for informing about updates to the watermark.
IMPL#240 Make trigger support not dependent on specific columns
A query specification defines what data and operations to perform. The execution model reserves the right to perform those operations as efficiently as possible. The execution model may rewrite the query in anyway it sees fit while maintaining correctness.
Included with the specification of the language and execution model, is a specification of how to submit queries and read their responses over HTTP.
To submit a query for execution, make an HTTP POST request to the /v1/query
endpoint.
The POST request may either submit parameters as the POST body or a subset of the parameters as URL query parameters. The following parameters are supported:
Parameter | Description |
---|---|
query | Query is Flux text describing the query to run. Only one of query or spec may be specified. This parameter may be passed as a URL parameter. |
spec | Spec is a query specification. Only one of query or spec may be specified. |
dialect | Dialect is an object defining the options to use when encoding the response. |
When using the POST body to submit the query the Content-Type
HTTP header must contain the name of the request encoding being used.
Supported request content types:
application/json
- Use a JSON encoding of parameters.
Multiple response content types will be supported.
The desired response content type is specified using the Accept
HTTP header on the request.
Each response content type will have its own dialect options.
Supported response encodings:
test/csv
- Corresponds with the MIME type specified in RFC 4180. Details on the encoding format are specified below.
If no Accept
header is present it is assumed that text/csv
was specified.
The HTTP header Content-Type
of the response will specify the encoding of the response.
Make a request using a query string and URL query parameters:
POST /v1/query?query=%20from%28db%3A%22mydatabse%22%29%20%7C%3E%20last%28%29 HTTP/1.1
Make a request using a query string and the POST body as JSON:
POST /v1/query
{
"query": "from(bucket:\"mydatabase/autogen\") |> last()"
}
Make a request using a query specification and the POST body as JSON:
POST /v1/query
{
"spec": {
"operations": [
{
"kind": "from",
"id": "from0",
"spec": {
"db": "mydatabase"
}
},
{
"kind": "last",
"id": "last1",
"spec": {
"column": ""
}
}
],
"edges": [
{
"parent": "from0",
"child": "last1"
}
],
"resources": {
"priority": "high",
"concurrency_quota": 0,
"memory_bytes_quota": 0
}
}
}
Make a request using a query string and the POST body as JSON.
Dialect options are specified for the text/csv
format.
See below for details on specific dialect options.
POST /v1/query
{
"query": "from(bucket:\"mydatabase/autogen\") |> last()",
"dialect" : {
"header": true,
"annotations": ["datatype"]
}
}
The result of a query is any number of named streams.
As a stream consists of multiple tables each table is encoded as CSV textual data.
CSV data should be encoded using UTF-8, and should be in Unicode Normal Form C as defined in UAX15.
Line endings must be CRLF as defined by the text/csv
MIME type in RFC 4180
Each table may have the following rows:
- annotation rows - a set of rows describing properties about the columns of the table.
- header row - a single row that defines the column labels.
- record rows, a set of rows containing the record data, one record per row.
In addition to the columns on the tables themselves three additional columns may be added to the CSV table.
- annotation - Contains the name of an annotation. This column is optional, if it exists it is always the first column. The only valid values for the column are the list of supported annotations or an empty value.
- result - Contains the name of the result as specified by the query.
- table - Contains a unique ID for each table within a result.
Columns support the following annotations:
- datatype - a description of the type of data contained within the column.
- group - a boolean flag indicating if the column is part of the table's group key.
- default - a default value to be used for rows whose string value is the empty string.
- null - a value indicating that data is missing.
Multiple tables may be encoded into the same file or data stream. The table column indicates the table a row belongs to. All rows for a table must be contiguous.
It is possible that multiple tables in the same result do not share a common table scheme. It is also possible that a table has no records. In such cases an empty row delimits a new table boundary and new annotations and header rows follow. The empty row acts like a delimiter between two independent CSV files that have been concatenated together.
In the case were a table has no rows the default
annotation is used to provide the values of the group key.
Multiple results may be encoded into the same file or data stream. An empty row always delimits separate results within the same file. The empty row acts like a delimiter between two independent CSV files that have been concatenated together.
Annotations rows are prefixed with a comment marker. The first column contains the name of the annotation being defined. The subsequent columns contain the value of the annotation for the respective columns.
The datatype
annotation specifies the data types of the remaining columns.
The possible data types are:
Datatype | Flux type | Description |
---|---|---|
boolean | bool | a truth value, one of "true" or "false" |
unsignedLong | uint | an unsigned 64-bit integer |
long | int | a signed 64-bit integer |
double | float | a IEEE-754 64-bit floating-point number |
string | string | a UTF-8 encoded string |
base64Binary | bytes | a base64 encoded sequence of bytes as defined in RFC 4648 |
dateTime | time | an instant in time, may be followed with a colon : and a description of the format |
duration | duration | a length of time represented as an unsigned 64-bit integer number of nanoseconds |
The group
annotation specifies if the column is part of the table's group key.
Possible values are true
or false
.
The default
annotation specifies a default value, if it exists, for each column.
In order to fully encode a table with its group key the datatype
, group
and default
annotations must be used.
The null
annotation specifies the string value that indicates a missing value.
When the null
annotation is not specified the empty string value is the null
value for the column.
It is not possible to encode/decode a non-null string value that is the same as the null
annotation value for columns of type string.
When the default
annotation value of a column is the same as the null
annotation value of a column, it is interpreted as the column's default value is null.
When an error occurs during execution a table will be returned with the first column label as error
and the second column label as reference
.
The error's properties are contained in the second row of the table.
The error
column contains the error message and the reference
column contains a unique reference code that can be used to get further information about the problem.
When an error occurs before any results are materialized then the HTTP status code will indicate an error and the error details will be encoded in the csv table. When an error occurs after some results have already been sent to the client the error will be encoded as the next table and the rest of the results will be discarded. In such a case the HTTP status code cannot be changed and will remain as 200 OK.
Example error encoding without annotations:
error,reference
Failed to parse query,897
The CSV response format support the following dialect options:
Option | Description |
---|---|
header | Header is a boolean value, if true the header row is included, otherwise its is omitted. Defaults to true. |
delimiter | Delimiter is a character to use as the delimiting value between columns. Defaults to ",". |
quoteChar | QuoteChar is a character to use to quote values containing the delimiter. Defaults to " . |
annotations | Annotations is a list of annotations that should be encoded. If the list is empty the annotation column is omitted entirely. Defaults to an empty list. |
commentPrefix | CommentPrefix is a string prefix to add to comment rows. Defaults to "#". Annotations are always comment rows. |
For context the following example tables encode fictitious data in response to this query:
from(bucket:"mydb/autogen")
|> range(start:2018-05-08T20:50:00Z, stop:2018-05-08T20:51:00Z)
|> group(columns:["_start","_stop", "region", "host"])
|> mean()
|> group(columns:["_start","_stop", "region"])
|> yield(name:"mean")
Example encoding with of a single table with no annotations:
result,table,_start,_stop,_time,region,host,_value
mean,0,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:00Z,east,A,15.43
mean,0,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:20Z,east,B,59.25
mean,0,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:40Z,east,C,52.62
Example encoding with two tables in the same result with no annotations:
result,table,_start,_stop,_time,region,host,_value
mean,0,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:00Z,east,A,15.43
mean,0,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:20Z,east,B,59.25
mean,0,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:40Z,east,C,52.62
mean,1,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:00Z,west,A,62.73
mean,1,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:20Z,west,B,12.83
mean,1,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:40Z,west,C,51.62
Example encoding with two tables in the same result with no annotations and no header row:
mean,0,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:00Z,east,A,15.43
mean,0,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:20Z,east,B,59.25
mean,0,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:40Z,east,C,52.62
mean,1,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:00Z,west,A,62.73
mean,1,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:20Z,west,B,12.83
mean,1,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:40Z,west,C,51.62
Example encoding with two tables in the same result with the datatype annotation:
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,string,string,double
,result,table,_start,_stop,_time,region,host,_value
,mean,0,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:00Z,east,A,15.43
,mean,0,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:20Z,east,B,59.25
,mean,0,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:40Z,east,C,52.62
,mean,1,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:00Z,west,A,62.73
,mean,1,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:20Z,west,B,12.83
,mean,1,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:40Z,west,C,51.62
Example encoding with two tables in the same result with the datatype and group annotations:
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,string,string,double
#group,false,false,true,true,false,true,false,false
,result,table,_start,_stop,_time,region,host,_value
,mean,0,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:00Z,east,A,15.43
,mean,0,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:20Z,east,B,59.25
,mean,0,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:40Z,east,C,52.62
,mean,1,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:00Z,west,A,62.73
,mean,1,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:20Z,west,B,12.83
,mean,1,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:40Z,west,C,51.62
Example encoding with two tables with differing schemas in the same result with the datatype and group annotations:
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,string,string,double
#group,false,false,true,true,false,true,false,false
,result,table,_start,_stop,_time,region,host,_value
,mean,0,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:00Z,east,A,15.43
,mean,0,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:20Z,east,B,59.25
,mean,0,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:40Z,east,C,52.62
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,string,string,double
#group,false,false,true,true,false,true,false,false
,result,table,_start,_stop,_time,location,device,min,max
,mean,1,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:00Z,USA,5825,62.73,68.42
,mean,1,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:20Z,USA,2175,12.83,56.12
,mean,1,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:40Z,USA,6913,51.62,54.25
Example error encoding with the datatype annotation:
#datatype,string,long
,error,reference
,Failed to parse query,897
Example error encoding with after a valid table has already been encoded.
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,string,string,double
,result,table,_start,_stop,_time,region,host,_value
,mean,1,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:00Z,west,A,62.73
,mean,1,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:20Z,west,B,12.83
,mean,1,2018-05-08T20:50:00Z,2018-05-08T20:51:00Z,2018-05-08T20:50:40Z,west,C,51.62
#datatype,string,long
,error,reference
,query terminated: reached maximum allowed memory limits,576