Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC: [GLUTEN-7745][VL] Incorporate SQL Union operator into Velox execution pipeline #7842

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

zhztheplayer
Copy link
Member

@zhztheplayer zhztheplayer commented Nov 7, 2024

#7745

This is currently a PoC and not yet workable because of a blocker from Velox's single-threaded execution design:

I20241107 15:06:23.310680 1849235 VeloxPlanConverter.cc:128] Plan Node: 
-- Aggregation[7][PARTIAL n7_0 := count_partial("n0_0")] -> n7_0:BIGINT
  -- LocalPartition[6][GATHER] -> n0_0:BIGINT
    -- Project[4][expressions: (n0_0:BIGINT, "n1_1")] -> n0_0:BIGINT
      -- Project[1][expressions: (n1_1:BIGINT, "n0_0")] -> n1_1:BIGINT
        -- TableScan[0][table: hive_table] -> n0_0:BIGINT
    -- Project[5][expressions: (n0_0:BIGINT, "n3_1")] -> n0_0:BIGINT
      -- Project[3][expressions: (n3_1:BIGINT, "n2_0")] -> n3_1:BIGINT
        -- TableScan[2][table: hive_table] -> n2_0:BIGINT
24/11/07 15:06:23 ERROR TaskResources: Task 11 failed by error: 
org.apache.gluten.exception.GlutenException: Task doesn't support single thread execution: -- Aggregation[7]

	at org.apache.gluten.vectorized.PlanEvaluatorJniWrapper.nativeCreateKernelWithIterator(Native Method)
	at org.apache.gluten.vectorized.NativePlanEvaluator.createKernelWithBatchIterator(NativePlanEvaluator.java:66)
	at org.apache.gluten.backendsapi.velox.VeloxIteratorApi.genFirstStageIterator(VeloxIteratorApi.scala:214)
	at org.apache.gluten.execution.GlutenWholeStageColumnarRDD.$anonfun$compute$1(GlutenWholeStageColumnarRDD.scala:88)
	at org.apache.gluten.utils.Arm$.withResource(Arm.scala:25)
	at org.apache.gluten.metrics.GlutenTimeMetric$.millis(GlutenTimeMetric.scala:37)
	at org.apache.gluten.execution.GlutenWholeStageColumnarRDD.compute(GlutenWholeStageColumnarRDD.scala:77)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
	at org.apache.spark.scheduler.Task.run(Task.scala:141)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

Related Velox code:

https://github.com/facebookincubator/velox/blob/12b52e70ec85ae0cdb4aa990797cddda9be5be27/velox/exec/Driver.h#L658-L660

@github-actions github-actions bot added CORE works for Gluten Core VELOX labels Nov 7, 2024
@zhztheplayer zhztheplayer changed the title PoC: [VL] Incorporate SQL Union operator into Velox execution pipeline PoC: [GLUTEN-7745][VL] Incorporate SQL Union operator into Velox execution pipeline Nov 7, 2024
Copy link

github-actions bot commented Nov 7, 2024

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

github-actions bot commented Nov 7, 2024

Run Gluten Clickhouse CI on x86

Comment on lines 595 to 596
// TODO
override def genUnionTransformerMetricsUpdater(metrics: Map[String, SQLMetric]): MetricsUpdater = MetricsUpdater.Todo
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

Copy link

Run Gluten Clickhouse CI on x86

3 similar comments
Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

1 similar comment
Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

1 similar comment
Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CORE works for Gluten Core VELOX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant