[FEATURE]PPL new `trendline` command ( WMA) #831

salyh · 2024-10-29T20:50:07Z

This is the WMA part of the new spark PPL trendline command

See #655 for details

salyh · 2024-10-30T11:42:11Z

See eliatra#7 for an prototype

salyh · 2024-10-31T08:00:41Z

Consider #833 (comment) when implementing WMA

andy-k-improving · 2024-10-31T17:49:20Z

Hi @salyh @LantaoJin
I'm picking this task up base on the existing SMA implementation (#833), and the provided prototype (eliatra#7)

Thanks!

andy-k-improving · 2024-11-01T19:13:19Z

The below is my proposed implementation on Spark SQL with nth_value aggregation and associated Logical plan.



CREATE TABLE employees (name STRING, dept STRING, salary INT, age INT, con STRING);

INSERT INTO employees VALUES ("Lisa", "Sales", 10000, 35, 'test');
INSERT INTO employees VALUES ("Evan", "Sales", 32000, 38, 'test');
INSERT INTO employees VALUES ("Fred", "Engineering", 21000, 28, 'test');
INSERT INTO employees VALUES ("Alex", "Sales", 30000, 33, 'test');
INSERT INTO employees VALUES ("Tom", "Engineering", 23000, 33, 'test');
INSERT INTO employees VALUES ("Jane", "Marketing", 29000, 28, 'test');
INSERT INTO employees VALUES ("Jeff", "Marketing", 35000, 38, 'test');
INSERT INTO employees VALUES ("Paul", "Engineering", 29000, 23, 'test');
INSERT INTO employees VALUES ("Chloe", "Engineering", 23000, 25, 'test');


explain extended SELECT name, salary, 
(nth_value(salary, 3) OVER w *3 + nth_value(salary, 2) OVER w *2 + nth_value(salary, 1) OVER w *1)/6 AS WMA

FROM employees 
WINDOW w AS (ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
ORDER BY age;



-- 'Sort ['age ASC NULLS FIRST], true
-- +- 'WithWindowDefinition [w=windowspecdefinition('con, 'age ASC NULLS FIRST, specifiedwindowframe(RowFrame, -2, currentrow$()))]
--    +- 'Project ['name, 'salary, 
            -- ((((unresolvedwindowexpression('nth_value('salary, 3), WindowSpecReference(w)) * 3) + 
            -- (unresolvedwindowexpression('nth_value('salary, 2), WindowSpecReference(w)) * 2)) + 
            -- (unresolvedwindowexpression('nth_value('salary, 1), WindowSpecReference(w)) * 1)) / 6) AS WMA#611]
--       +- 'UnresolvedRelation [employees], [], false

I have verified the calculation offline, this should produce the result that we are looking for in this ticket and I'm in the process to convert the same logic into code.

@salyh and @LantaoJin do you guys foresee any potential issue base on above?

andy-k-improving · 2024-11-04T20:04:07Z

The below is my proposed implementation on Spark SQL with nth_value aggregation and associated Logical plan.



CREATE TABLE employees (name STRING, dept STRING, salary INT, age INT, con STRING);

INSERT INTO employees VALUES ("Lisa", "Sales", 10000, 35, 'test');
INSERT INTO employees VALUES ("Evan", "Sales", 32000, 38, 'test');
INSERT INTO employees VALUES ("Fred", "Engineering", 21000, 28, 'test');
INSERT INTO employees VALUES ("Alex", "Sales", 30000, 33, 'test');
INSERT INTO employees VALUES ("Tom", "Engineering", 23000, 33, 'test');
INSERT INTO employees VALUES ("Jane", "Marketing", 29000, 28, 'test');
INSERT INTO employees VALUES ("Jeff", "Marketing", 35000, 38, 'test');
INSERT INTO employees VALUES ("Paul", "Engineering", 29000, 23, 'test');
INSERT INTO employees VALUES ("Chloe", "Engineering", 23000, 25, 'test');


explain extended SELECT name, salary, 
(nth_value(salary, 3) OVER w *3 + nth_value(salary, 2) OVER w *2 + nth_value(salary, 1) OVER w *1)/6 AS WMA

FROM employees 
WINDOW w AS (ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
ORDER BY age;



-- 'Sort ['age ASC NULLS FIRST], true
-- +- 'WithWindowDefinition [w=windowspecdefinition('con, 'age ASC NULLS FIRST, specifiedwindowframe(RowFrame, -2, currentrow$()))]
--    +- 'Project ['name, 'salary, 
            -- ((((unresolvedwindowexpression('nth_value('salary, 3), WindowSpecReference(w)) * 3) + 
            -- (unresolvedwindowexpression('nth_value('salary, 2), WindowSpecReference(w)) * 2)) + 
            -- (unresolvedwindowexpression('nth_value('salary, 1), WindowSpecReference(w)) * 1)) / 6) AS WMA#611]
--       +- 'UnresolvedRelation [employees], [], false

I have verified the calculation offline, this should produce the result that we are looking for in this ticket and I'm in the process to convert the same logic into code.

@salyh and @LantaoJin do you guys foresee any potential issue base on above?

Also the function nth_value( ) require a sort field, as it can't be execute against the natural order of incoming table, hence I'm proposing to have the [sort <[+|-] sort-field>] as mandatory for WMA treandline.
Will that cause any problem, in term of breaking change on trend line PPL syntax?

andy-k-improving · 2024-11-07T01:35:52Z

PR for this issue: #872

salyh added enhancement New feature or request untriaged labels Oct 29, 2024

This was referenced Oct 29, 2024

[FEATURE]PPL new trendline command (SMA) #655

Closed

New trendline ppl command (SMA only) #833

Merged

YANG-DB self-assigned this Oct 29, 2024

YANG-DB added Lang:PPL Pipe Processing Language support 0.6 and removed untriaged labels Oct 29, 2024

YANG-DB added this to PPL Commands Oct 29, 2024

YANG-DB moved this to Design in PPL Commands Oct 29, 2024

YANG-DB changed the title ~~[FEATURE]PPL new trendline command (Weighted moving average - WMA)~~ [FEATURE]PPL new trendline command ( WMA) Oct 30, 2024

YANG-DB removed their assignment Oct 30, 2024

YANG-DB moved this from Design to InReview in PPL Commands Nov 12, 2024

YANG-DB moved this from InReview to Done in PPL Commands Nov 14, 2024

YANG-DB closed this as completed by moving to Done in PPL Commands Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE]PPL new `trendline` command ( WMA) #831

[FEATURE]PPL new `trendline` command ( WMA) #831

salyh commented Oct 29, 2024

salyh commented Oct 30, 2024

salyh commented Oct 31, 2024

andy-k-improving commented Oct 31, 2024

andy-k-improving commented Nov 1, 2024

andy-k-improving commented Nov 4, 2024

andy-k-improving commented Nov 7, 2024

[FEATURE]PPL new trendline command ( WMA) #831

[FEATURE]PPL new trendline command ( WMA) #831

Comments

salyh commented Oct 29, 2024

salyh commented Oct 30, 2024

salyh commented Oct 31, 2024

andy-k-improving commented Oct 31, 2024

andy-k-improving commented Nov 1, 2024

andy-k-improving commented Nov 4, 2024

andy-k-improving commented Nov 7, 2024

[FEATURE]PPL new `trendline` command ( WMA) #831

[FEATURE]PPL new `trendline` command ( WMA) #831