Skip to content

Commit

Permalink
PPL command implementation for appendCol (#990)
Browse files Browse the repository at this point in the history
* Update grammar def

Signed-off-by: Andy Kwok <[email protected]>

* Skeleton for Append fields

Signed-off-by: Andy Kwok <[email protected]>

* Visitor skeleton

Signed-off-by: Andy Kwok <[email protected]>

* Update import

Signed-off-by: Andy Kwok <[email protected]>

* Update import

Signed-off-by: Andy Kwok <[email protected]>

* Update osrt

Signed-off-by: Andy Kwok <[email protected]>

* Changes

Signed-off-by: Andy Kwok <[email protected]>

* Consolidate String constant

Signed-off-by: Andy Kwok <[email protected]>

* Update projection clause

Signed-off-by: Andy Kwok <[email protected]>

* Remove dep on parent method

Signed-off-by: Andy Kwok <[email protected]>

* Consolidate relation inject logic

Signed-off-by: Andy Kwok <[email protected]>

* Move constant

Signed-off-by: Andy Kwok <[email protected]>

* Move out constant from lambda

Signed-off-by: Andy Kwok <[email protected]>

* Consolidate method

Signed-off-by: Andy Kwok <[email protected]>

* Update logic

Signed-off-by: Andy Kwok <[email protected]>

* Test 1 2

Signed-off-by: Andy Kwok <[email protected]>

* Test-cases 3 and 4

Signed-off-by: Andy Kwok <[email protected]>

* Update code format

Signed-off-by: Andy Kwok <[email protected]>

* Update code style

Signed-off-by: Andy Kwok <[email protected]>

* Update scala syntax

Signed-off-by: Andy Kwok <[email protected]>

* Override option

Signed-off-by: Andy Kwok <[email protected]>

* Update override option

Signed-off-by: Andy Kwok <[email protected]>

* Enable override option

Signed-off-by: Andy Kwok <[email protected]>

* Override impl

Signed-off-by: Andy Kwok <[email protected]>

* Minimise cmd permission

Signed-off-by: Andy Kwok <[email protected]>

* Refactor util class

Signed-off-by: Andy Kwok <[email protected]>

* Java doc

Signed-off-by: Andy Kwok <[email protected]>

* Integ test 1 2

Signed-off-by: Andy Kwok <[email protected]>

* Test cases 3 4

Signed-off-by: Andy Kwok <[email protected]>

* Test code comments

Signed-off-by: Andy Kwok <[email protected]>

* Code tidy

Signed-off-by: Andy Kwok <[email protected]>

* Code refactor

Signed-off-by: Andy Kwok <[email protected]>

* ScalaFmt

Signed-off-by: Andy Kwok <[email protected]>

* Remove sout

Signed-off-by: Andy Kwok <[email protected]>

* Update doc

Signed-off-by: Andy Kwok <[email protected]>

* Override option test case

Signed-off-by: Andy Kwok <[email protected]>

* Code style

Signed-off-by: Andy Kwok <[email protected]>

* Code comment

Signed-off-by: Andy Kwok <[email protected]>

* Deprecate visit child (1)

Signed-off-by: Andy Kwok <[email protected]>

* Minimise code diff

Signed-off-by: Andy Kwok <[email protected]>

* Update override logic

Signed-off-by: Andy Kwok <[email protected]>

* Update test-cases

Signed-off-by: Andy Kwok <[email protected]>

* Integ

Signed-off-by: Andy Kwok <[email protected]>

* Make append alias distinct

Signed-off-by: Andy Kwok <[email protected]>

* Update integ for distinct tables

Signed-off-by: Andy Kwok <[email protected]>

* Update limitation

Signed-off-by: Andy Kwok <[email protected]>

* Update code style

Signed-off-by: Andy Kwok <[email protected]>

* Update docs/ppl-lang/ppl-appendcol-command.md

Co-authored-by: Taylor Curran <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>

* Update docs/ppl-lang/ppl-appendcol-command.md

Co-authored-by: Taylor Curran <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>

* Update docs/ppl-lang/ppl-appendcol-command.md

Co-authored-by: Taylor Curran <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>

* Update readme

Signed-off-by: Andy Kwok <[email protected]>

* Mark var as final

Signed-off-by: Andy Kwok <[email protected]>

* Update join type

Signed-off-by: Andy Kwok <[email protected]>

* Update unit tests

Signed-off-by: Andy Kwok <[email protected]>

* Update existing integ test for full outer

Signed-off-by: Andy Kwok <[email protected]>

* Test cases for null

Signed-off-by: Andy Kwok <[email protected]>

* Update scalafmt

Signed-off-by: Andy Kwok <[email protected]>

* Update doc

Signed-off-by: Andy Kwok <[email protected]>

* Update doc

Signed-off-by: Andy Kwok <[email protected]>

* Multiple avg commands

Signed-off-by: Andy Kwok <[email protected]>

* Multiple avg commands

Signed-off-by: Andy Kwok <[email protected]>

* Remove debug

Signed-off-by: Andy Kwok <[email protected]>

* Additional example for conflicted columns

Signed-off-by: Andy Kwok <[email protected]>

* Code refactor

Signed-off-by: Andy Kwok <[email protected]>

---------

Signed-off-by: Andy Kwok <[email protected]>
Co-authored-by: Taylor Curran <[email protected]>
  • Loading branch information
andy-k-improving and currantw authored Jan 8, 2025
1 parent 20ef890 commit 1cb538f
Show file tree
Hide file tree
Showing 12 changed files with 1,510 additions and 0 deletions.
5 changes: 5 additions & 0 deletions docs/ppl-lang/PPL-Example-Commands.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
## Example PPL Queries

#### **AppendCol**
[See additional command details](ppl-appendcol-command.md)
- `source=employees | stats avg(age) as avg_age1 by dept | fields dept, avg_age1 | APPENDCOL [ stats avg(age) as avg_age2 by dept | fields avg_age2 ];` (To display multiple table statistics side by side)
- `source=employees | FIELDS name, dept, age | APPENDCOL OVERRIDE=true [ stats avg(age) as age ];` (When the override option is enabled, fields from the sub-query take precedence over fields in the main query in cases of field name conflicts)

#### **Comment**
[See additional command details](ppl-comment.md)
- `source=accounts | top gender // finds most common gender of all the accounts` (line comment)
Expand Down
2 changes: 2 additions & 0 deletions docs/ppl-lang/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,8 @@ For additional examples see the next [documentation](PPL-Example-Commands.md).

- [`expand commands`](ppl-expand-command.md)

- [`appendcol command`](ppl-appendcol-command.md)

* **Functions**

- [`Expressions`](functions/ppl-expressions.md)
Expand Down
120 changes: 120 additions & 0 deletions docs/ppl-lang/ppl-appendcol-command.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
## PPL `appendcol` command

### Description
Using `appendcol` command to append the result of a sub-search and attach it alongside with the input search results (The main search).

### Syntax - APPENDCOL
`APPENDCOL <override=?> [sub-search]...`

* <override=?>: optional boolean field to specify should result from main-result be overwritten in the case of column name conflict.
* sub-search: Executes PPL commands as a secondary search. The sub-search uses the same data specified in the source clause of the main search results as its input.


#### Example 1: To append the result of `stats avg(age) as AVG_AGE` into existing search result

The example append the result of sub-search `stats avg(age) as AVG_AGE` alongside with the main-search.

PPL query:

os> source=employees | FIELDS name, dept, age | APPENDCOL [ stats avg(age) as AVG_AGE ];
fetched rows / total rows = 9/9
+------+-------------+-----+------------------+
| name | dept | age | AVG_AGE |
+------+-------------+-----+------------------+
| Lisa | Sales | 35 | 31.2222222222222 |
| Fred | Engineering | 28 | NULL |
| Paul | Engineering | 23 | NULL |
| Evan | Sales | 38 | NULL |
| Chloe| Engineering | 25 | NULL |
| Tom | Engineering | 33 | NULL |
| Alex | Sales | 33 | NULL |
| Jane | Marketing | 28 | NULL |
| Jeff | Marketing | 38 | NULL |
+------+-------------+-----+------------------+


#### Example 2: To compare multiple stats commands with side by side with appendCol.

This example demonstrates a common use case: performing multiple statistical calculations and displaying the results side by side in a horizontal layout.

PPL query:

os> source=employees | stats avg(age) as avg_age1 by dept | fields dept, avg_age1 | APPENDCOL [ stats avg(age) as avg_age2 by dept | fields avg_age2 ];
fetched rows / total rows = 3/3
+-------------+-----------+----------+
| dept | avg_age1 | avg_age2 |
+-------------+-----------+----------+
| Engineering | 27.25 | 27.25 |
| Sales | 35.33 | 35.33 |
| Marketing | 33.00 | 33.00 |
+-------------+-----------+----------+


#### Example 3: Append multiple sub-search result

The example demonstrate multiple APPENCOL commands can be chained to provide one comprehensive view for user.

PPL query:

os> source=employees | FIELDS name, dept, age | APPENDCOL [ stats avg(age) as AVG_AGE ] | APPENDCOL [ stats max(age) as MAX_AGE ];
fetched rows / total rows = 9/9
+------+-------------+-----+------------------+---------+
| name | dept | age | AVG_AGE | MAX_AGE |
+------+-------------+-----+------------------+---------+
| Lisa | Sales------ | 35 | 31.22222222222222| 38 |
| Fred | Engineering | 28 | NULL | NULL |
| Paul | Engineering | 23 | NULL | NULL |
| Evan | Sales------ | 38 | NULL | NULL |
| Chloe| Engineering | 25 | NULL | NULL |
| Tom | Engineering | 33 | NULL | NULL |
| Alex | Sales | 33 | NULL | NULL |
| Jane | Marketing | 28 | NULL | NULL |
| Jeff | Marketing | 38 | NULL | NULL |
+------+-------------+-----+------------------+---------+

#### Example 4: Over main-search in the case of column name conflict

The example demonstrate the usage of `OVERRIDE` option to overwrite the `age` column from the main-search,
when the option is set to true and column with same name `age` present on sub-search.

PPL query:

os> source=employees | FIELDS name, dept, age | APPENDCOL OVERRIDE=true [ stats avg(age) as age ];
fetched rows / total rows = 9/9
+------+-------------+------------------+
| name | dept | age |
+------+-------------+------------------+
| Lisa | Sales------ | 31.22222222222222|
| Fred | Engineering | NULL |
| Paul | Engineering | NULL |
| Evan | Sales------ | NULL |
| Chloe| Engineering | NULL |
| Tom | Engineering | NULL |
| Alex | Sales | NULL |
| Jane | Marketing | NULL |
| Jeff | Marketing | NULL |
+------+-------------+------------------+

#### Example 5: AppendCol command with duplicated columns

The example demonstrate what could happen when conflicted columns exist, with `override` set to false or absent.
In this particular case, average aggregation is being performed over column `age` with group-by `dept`, on main and sub query respectively.
As the result, `dept` and `avg_age1` will be returned by the main query, with `avg_age2` and `dept` for the sub-query,
and take into consideration `override` is absent, duplicated columns won't be dropped, hence all four columns will be displayed as the final result.

PPL query:

os> source=employees | stats avg(age) as avg_age1 by dept | APPENDCOL [ stats avg(age) as avg_age2 by dept ];
fetched rows / total rows = 3/3
+------------+--------------+------------+--------------+
| Avg Age 1 | Dept | Avg Age 2 | Dept |
+------------+--------------+------------+--------------+
| 35.33 | Sales | 35.33 | Sales |
| 27.25 | Engineering | 27.25 | Engineering |
| 33.00 | Marketing | 33.00 | Marketing |
+------------+--------------+------------+--------------+


### Limitation:
When override is set to true, only `FIELDS` and `STATS` commands are allowed as the final clause in a sub-search.
Otherwise, an IllegalStateException with the message `Not Supported operation: APPENDCOL should specify the output fields` will be thrown.
Loading

0 comments on commit 1cb538f

Please sign in to comment.