Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]PPL new trendline command ( WMA) #831

Closed
salyh opened this issue Oct 29, 2024 · 6 comments
Closed

[FEATURE]PPL new trendline command ( WMA) #831

salyh opened this issue Oct 29, 2024 · 6 comments
Labels
0.6 enhancement New feature or request Lang:PPL Pipe Processing Language support

Comments

@salyh
Copy link
Contributor

salyh commented Oct 29, 2024

This is the WMA part of the new spark PPL trendline command

See #655 for details

@salyh salyh added enhancement New feature or request untriaged labels Oct 29, 2024
@YANG-DB YANG-DB self-assigned this Oct 29, 2024
@YANG-DB YANG-DB added Lang:PPL Pipe Processing Language support 0.6 and removed untriaged labels Oct 29, 2024
@YANG-DB YANG-DB moved this to Design in PPL Commands Oct 29, 2024
@YANG-DB YANG-DB changed the title [FEATURE]PPL new trendline command (Weighted moving average - WMA) [FEATURE]PPL new trendline command ( WMA) Oct 30, 2024
@YANG-DB YANG-DB removed their assignment Oct 30, 2024
@salyh
Copy link
Contributor Author

salyh commented Oct 30, 2024

See eliatra#7 for an prototype

@salyh
Copy link
Contributor Author

salyh commented Oct 31, 2024

Consider #833 (comment) when implementing WMA

@andy-k-improving
Copy link
Contributor

Hi @salyh @LantaoJin
I'm picking this task up base on the existing SMA implementation (#833), and the provided prototype (eliatra#7)

Thanks!

@andy-k-improving
Copy link
Contributor

The below is my proposed implementation on Spark SQL with nth_value aggregation and associated Logical plan.



CREATE TABLE employees (name STRING, dept STRING, salary INT, age INT, con STRING);

INSERT INTO employees VALUES ("Lisa", "Sales", 10000, 35, 'test');
INSERT INTO employees VALUES ("Evan", "Sales", 32000, 38, 'test');
INSERT INTO employees VALUES ("Fred", "Engineering", 21000, 28, 'test');
INSERT INTO employees VALUES ("Alex", "Sales", 30000, 33, 'test');
INSERT INTO employees VALUES ("Tom", "Engineering", 23000, 33, 'test');
INSERT INTO employees VALUES ("Jane", "Marketing", 29000, 28, 'test');
INSERT INTO employees VALUES ("Jeff", "Marketing", 35000, 38, 'test');
INSERT INTO employees VALUES ("Paul", "Engineering", 29000, 23, 'test');
INSERT INTO employees VALUES ("Chloe", "Engineering", 23000, 25, 'test');


explain extended SELECT name, salary, 
(nth_value(salary, 3) OVER w *3 + nth_value(salary, 2) OVER w *2 + nth_value(salary, 1) OVER w *1)/6 AS WMA

FROM employees 
WINDOW w AS (ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
ORDER BY age;



-- 'Sort ['age ASC NULLS FIRST], true
-- +- 'WithWindowDefinition [w=windowspecdefinition('con, 'age ASC NULLS FIRST, specifiedwindowframe(RowFrame, -2, currentrow$()))]
--    +- 'Project ['name, 'salary, 
            -- ((((unresolvedwindowexpression('nth_value('salary, 3), WindowSpecReference(w)) * 3) + 
            -- (unresolvedwindowexpression('nth_value('salary, 2), WindowSpecReference(w)) * 2)) + 
            -- (unresolvedwindowexpression('nth_value('salary, 1), WindowSpecReference(w)) * 1)) / 6) AS WMA#611]
--       +- 'UnresolvedRelation [employees], [], false

I have verified the calculation offline, this should produce the result that we are looking for in this ticket and I'm in the process to convert the same logic into code.

@salyh and @LantaoJin do you guys foresee any potential issue base on above?

@andy-k-improving
Copy link
Contributor

The below is my proposed implementation on Spark SQL with nth_value aggregation and associated Logical plan.



CREATE TABLE employees (name STRING, dept STRING, salary INT, age INT, con STRING);

INSERT INTO employees VALUES ("Lisa", "Sales", 10000, 35, 'test');
INSERT INTO employees VALUES ("Evan", "Sales", 32000, 38, 'test');
INSERT INTO employees VALUES ("Fred", "Engineering", 21000, 28, 'test');
INSERT INTO employees VALUES ("Alex", "Sales", 30000, 33, 'test');
INSERT INTO employees VALUES ("Tom", "Engineering", 23000, 33, 'test');
INSERT INTO employees VALUES ("Jane", "Marketing", 29000, 28, 'test');
INSERT INTO employees VALUES ("Jeff", "Marketing", 35000, 38, 'test');
INSERT INTO employees VALUES ("Paul", "Engineering", 29000, 23, 'test');
INSERT INTO employees VALUES ("Chloe", "Engineering", 23000, 25, 'test');


explain extended SELECT name, salary, 
(nth_value(salary, 3) OVER w *3 + nth_value(salary, 2) OVER w *2 + nth_value(salary, 1) OVER w *1)/6 AS WMA

FROM employees 
WINDOW w AS (ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
ORDER BY age;



-- 'Sort ['age ASC NULLS FIRST], true
-- +- 'WithWindowDefinition [w=windowspecdefinition('con, 'age ASC NULLS FIRST, specifiedwindowframe(RowFrame, -2, currentrow$()))]
--    +- 'Project ['name, 'salary, 
            -- ((((unresolvedwindowexpression('nth_value('salary, 3), WindowSpecReference(w)) * 3) + 
            -- (unresolvedwindowexpression('nth_value('salary, 2), WindowSpecReference(w)) * 2)) + 
            -- (unresolvedwindowexpression('nth_value('salary, 1), WindowSpecReference(w)) * 1)) / 6) AS WMA#611]
--       +- 'UnresolvedRelation [employees], [], false

I have verified the calculation offline, this should produce the result that we are looking for in this ticket and I'm in the process to convert the same logic into code.

@salyh and @LantaoJin do you guys foresee any potential issue base on above?

Also the function nth_value( ) require a sort field, as it can't be execute against the natural order of incoming table, hence I'm proposing to have the [sort <[+|-] sort-field>] as mandatory for WMA treandline.
Will that cause any problem, in term of breaking change on trend line PPL syntax?

@andy-k-improving
Copy link
Contributor

PR for this issue: #872

@YANG-DB YANG-DB moved this from Design to InReview in PPL Commands Nov 12, 2024
@YANG-DB YANG-DB moved this from InReview to Done in PPL Commands Nov 14, 2024
@YANG-DB YANG-DB closed this as completed by moving to Done in PPL Commands Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.6 enhancement New feature or request Lang:PPL Pipe Processing Language support
Projects
Status: Done
Development

No branches or pull requests

3 participants