[WIP] Add support for Hyper Log Log PLus Plus(HLL++) #11638

res-life · 2024-10-21T12:41:00Z

closes ##5199

depends on

Hyper log log plus plus(HLL++) spark-rapids-jni#2522

Description

Spark approx_count_distinct description link
Spark accepts one column(can be nested column) and a double literal relativeSD.

Depending on JNI PR:
NVIDIA/spark-rapids-jni#2522

TODO

The NullType reduction case reports error:

struct(longs) is not supported for GPU processing yet

Perf test

// group by
import org.apache.spark.sql.functions
spark.range(10000000).repartition(5).withColumn("m", functions.expr("id % 10")).createOrReplaceTempView("tab")
spark.time(spark.sql("select m, APPROX_COUNT_DISTINCT(id) from tab group by m").show())

// reduction
spark.range(10000000).repartition(5).createOrReplaceTempView("tab")
spark.time(spark.sql("select APPROX_COUNT_DISTINCT(id) from tab ").show())

num_groups	CPU time(hot runs)	GPU time(hot runs)	speedup
10	1106ms, 1020ms, 1059ms	196ms, 208ms, 188ms	3.53x
1,000,000	5135ms, 5307ms, 5487ms	1447ms, 1565ms, 1497ms	5.38x
reduction	942ms, 1041ms, 973ms	169ms, 165ms, 180ms	5.75x

correctness

The results are identical between CPU and GPU.

Signed-off-by: Chong Gao [email protected]

ttnghia · 2024-10-23T04:34:38Z

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/aggregate/GpuHLL.scala

+  }
+}
+
+case class GpuHLL(childExpr: Expression, relativeSD: Double)


Let' call by full name like GpuHyperLogLogPlusPlus to better reflect the CPU version.

ttnghia · 2024-10-23T04:35:19Z

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/aggregate/GpuHLL.scala

+      ReductionAggregation.HLL(numRegistersPerSketch), DType.STRUCT)
+  override lazy val groupByAggregate: GroupByAggregation =
+    GroupByAggregation.HLL(numRegistersPerSketch)
+  override val name: String = "CudfHLL"


Not sure if "PlusPlus" is necessary.

Suggested change

override val name: String = "CudfHLL"

override val name: String = "CudfHyperLogLogPlusPlus"

Signed-off-by: Chong Gao <[email protected]>

res-life · 2024-11-26T10:38:41Z

Ready to review except test cases.

revans2

Looks good

revans2 · 2024-11-26T17:23:14Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala

+    expr[HyperLogLogPlusPlus](
+      "Aggregation approximate count distinct",
+      ExprChecks.reductionAndGroupByAgg(TypeSig.LONG, TypeSig.LONG,
+        Seq(ParamCheck("input", TypeSig.cpuAtomics, TypeSig.all))),


nit: Using cpuAtomics for a GPU field gets to be kind of confusing. Could you please create a gpuAtomics instead?

Will update to support map, array and list because this is merged: NVIDIA/spark-rapids-jni#2575

res-life · 2024-12-13T03:40:15Z

Explain for HLLPP:
In general, HLLPP sketch is a block of memory to estimate distinct value, it contains several integer registers. The num of registers is decided by precision parameter.
num_of_registers_in_a_sketch = pow(2, precision)
e.g.: precision = 9, then num_of_registers_in_a_sketch = 2^9 = 512
Each integer register stores the number of zero bits in a hash code.
Because Spark use xxhash64 to compute hash code, thus hash code is 64 bits.
The max value of register is 64. So Refer to link

  /**
   * The number of bits that is required per register.
   *
   * This number is determined by the maximum number of leading binary zeros a hashcode can
   * produce. This is equal to the number of bits the hashcode returns. The current
   * implementation uses a 64-bit hashcode, this means 6-bits are (at most) needed to store the
   * number of leading zeros.
   */
  val REGISTER_SIZE = 6

6 bits is enough to save a register value.
Spark uses long columns to save HLLPP sketch.
e.g.: precision = 9, num_of_registers_in_a_sketch = 512
becasue of max register value is 6 bits, thus a long can hold 10 register values.
So spark uses 512/10+1 = 52 long columns to save HLLPP sketch column.
In this PR, there are some handlings the conversion:

cuDF uses Struct<long, ..., long> column to do aggregate
Convert long columns to Struct<long, ..., long> column
Convert Struct<long, ..., long> column to long columns

TODO:
Add more test cases
Support nested types: after #11859

@revans2 could you have a look first?

res-life · 2024-12-20T12:57:21Z

[DONE] Add more test cases
[DONE] Support nested types
[DONE] Check the stack depth GPU will use does not exceed threshold

res-life · 2025-01-16T11:06:09Z

@ttnghia @revans2
This commit makes all test cases passed, but caused a memory leak. If have this commit, the HyperLogLogPlusPlusHostUDF(a native UDF pointer in it) is memory leaked. If do not have this commit, a core dump is caused. Refer to the following code:

case class CudfMergeHLLPP(override val dataType: DataType,
    precision: Int)

  override lazy val groupByAggregate: GroupByAggregation = {
      val hll = new HyperLogLogPlusPlusHostUDF(AggregationType.GroupByMerge, precision)
      // here hll is memory leaked; if close it, will cause core dump.
      GroupByAggregation.hostUDF(hll)
    }
}

    static final class HostUDFAggregation extends Aggregation {
       
        // this is referring a `HyperLogLogPlusPlusHostUDF`
        private final HostUDFWrapper wrapper;

        private HostUDFAggregation(HostUDFWrapper wrapper) {
            super(Kind.HOST_UDF);
            this.wrapper = wrapper;
        }

        @Override
        long createNativeInstance() {
            return Aggregation.createHostUDFAgg(wrapper.udfNativeHandle);
        }
    }

We can not close the native object in hll, because later the HostUDFAggregation.createNativeInstance will use the native object in hll.

Please discuss here to find a solution.

res-life · 2025-01-16T11:06:47Z

HLLPP cases passed, but have memory leak, refer to the above comment.

res-life · 2025-01-16T11:16:18Z

The ideal solution is that do not trigger creating a native UDF from java side. Put all the resource management into C++ layer. Not sure if it's doable.

ttnghia · 2025-01-16T18:15:45Z

How about this?

case class CudfMergeHLLPP(override val dataType: DataType,
    precision: Int)
  private lazy val hll = new HyperLogLogPlusPlusHostUDF(AggregationType.GroupByMerge, precision)
  override lazy val groupByAggregate: GroupByAggregation =  GroupByAggregation.hostUDF(hll)
}

hll will live as long as the instance of CudfMergeHLLPP lives. We may need to implements CudfMergeHLLPP as auto-closable too, and its close() method will call close() on hll.

revans2 · 2025-01-16T20:25:26Z

I don't think this is that big of an issue. It will require a little bit of work though.

There are two places we need to worry about this in a CudfAggregate

spark-rapids/sql-plugin/src/main/scala/org/apache/spark/sql/rapids/aggregate/aggregateBase.scala

Lines 285 to 286 in 03b85b1

    
           val reductionAggregate: cudf.ColumnVector => cudf.Scalar 
        
           val groupByAggregate: GroupByAggregation

reducetionAggregate, which is a function that takes an input and returns a Scalar, so we can modify the function to close the HostUDF when it is done internally,.

The second is groupByAggregate

spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala

Lines 569 to 591 in 03b85b1

    
           def performGroupByAggregation(preProcessed: ColumnarBatch): ColumnarBatch = { 
        
             withResource(new NvtxRange("groupby", NvtxColor.BLUE)) { _ => 
        
               withResource(GpuColumnVector.from(preProcessed)) { preProcessedTbl => 
        
                 val groupOptions = cudf.GroupByOptions.builder() 
        
                   .withIgnoreNullKeys(false) 
        
                   .withKeysSorted(doSortAgg) 
        
                   .build() 
        
                 val cudfAggsOnColumn = cudfAggregates.zip(aggOrdinals).map { 
        
                   case (cudfAgg, ord) => cudfAgg.groupByAggregate.onColumn(ord) 
        
                 } 
        
                 // perform the aggregate 
        
                 val aggTbl = preProcessedTbl 
        
                   .groupBy(groupOptions, groupingOrdinals: _*) 
        
                   .aggregate(cudfAggsOnColumn.toSeq: _*) 
        
                 withResource(aggTbl) { _ => 
        
                   GpuColumnVector.from(aggTbl, postStepDataTypes.toArray) 
        
                 } 
        
               } 
        
             } 
        
           }

The main thing here is that we need to make sure that we have a way to create the HostUDF when we need it and then close things when the aggregation operation is done.

https://github.com/NVIDIA/spark-rapids/blob/03b85b184e141849e414e90a83a4b589fe7e57fc/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala#L578C59-L578C67 is the place that I think we need to create it. Currently groupByAggregate is a val, but we could change it into a def without difficulty. That would let us create it on the fly.

Then after the table has been processed

spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala

Line 588 in 03b85b1

}

we can walk through cudfAggsOnColumn and find any cudfAgg values that are also auto-closable and then we close them. The big thing with all of this is to document how all of it works.

res-life requested a review from ttnghia October 21, 2024 12:46

res-life force-pushed the hll branch 2 times, most recently from d42d80a to 1945192 Compare October 23, 2024 01:34

ttnghia reviewed Oct 23, 2024

View reviewed changes

res-life force-pushed the hll branch from efb351a to 5f5f26a Compare October 24, 2024 04:00

res-life changed the title ~~[Do not review] Add Hyper Log Log PLus Plus(HLL++)~~ [Do not review] Add support for Hyper Log Log PLus Plus(HLL++) Oct 24, 2024

res-life force-pushed the hll branch 4 times, most recently from 0a4939f to eb00c2b Compare October 30, 2024 12:37

res-life force-pushed the hll branch from eb00c2b to 0fc310f Compare October 31, 2024 11:58

res-life changed the title ~~[Do not review] Add support for Hyper Log Log PLus Plus(HLL++)~~ Add support for Hyper Log Log PLus Plus(HLL++) Oct 31, 2024

res-life force-pushed the hll branch from 0fc310f to 32bcbb0 Compare October 31, 2024 12:05

Support HyperLogLog++

4d5b1e7

Signed-off-by: Chong Gao <[email protected]>

res-life force-pushed the hll branch from 32bcbb0 to 4d5b1e7 Compare November 21, 2024 07:32

res-life changed the base branch from branch-24.12 to branch-25.02 November 25, 2024 09:53

revans2 reviewed Nov 26, 2024

View reviewed changes

res-life changed the title ~~Add support for Hyper Log Log PLus Plus(HLL++)~~ [WIP] Add support for Hyper Log Log PLus Plus(HLL++) Dec 13, 2024

Use Host UDF

0d2b2a2

res-life mentioned this pull request Dec 17, 2024

Hyper log log plus plus(HLL++) NVIDIA/spark-rapids-jni#2522

Open

Chong Gao added 3 commits December 20, 2024 20:32

Merge branch 'branch-25.02' into hll

98a4b08

Refactor; Update cases

9d1bac2

Check the stack depth GPU will use does not exceed threshold

16119f4

Chong Gao added 3 commits January 1, 2025 08:55

Fix NullType

a24d9b8

Update APIs

f305e0c

Work around to avoid core dump, Note: this commit will cause memory leak

b38c240

Chong Gao added 4 commits January 17, 2025 08:05

Copyright; Format

59ee6ac

Fix compile error with Spark 350

a3b937a

Merge branch 'branch-25.02' into hll

4933eee

Update docs for Spark versions

53bef27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add support for Hyper Log Log PLus Plus(HLL++) #11638

[WIP] Add support for Hyper Log Log PLus Plus(HLL++) #11638

res-life commented Oct 21, 2024 •

edited

Loading

ttnghia Oct 23, 2024

res-life Nov 4, 2024

ttnghia Oct 23, 2024

res-life Nov 4, 2024

res-life commented Nov 26, 2024

revans2 left a comment

revans2 Nov 26, 2024

res-life Dec 11, 2024

res-life commented Dec 13, 2024

res-life commented Dec 20, 2024

res-life commented Jan 16, 2025

res-life commented Jan 16, 2025

res-life commented Jan 16, 2025

ttnghia commented Jan 16, 2025 •

edited

Loading

revans2 commented Jan 16, 2025

	override val name: String = "CudfHLL"
	override val name: String = "CudfHyperLogLogPlusPlus"

[WIP] Add support for Hyper Log Log PLus Plus(HLL++) #11638

Are you sure you want to change the base?

[WIP] Add support for Hyper Log Log PLus Plus(HLL++) #11638

Conversation

res-life commented Oct 21, 2024 • edited Loading

depends on

Description

TODO

Perf test

correctness

ttnghia Oct 23, 2024

Choose a reason for hiding this comment

res-life Nov 4, 2024

Choose a reason for hiding this comment

ttnghia Oct 23, 2024

Choose a reason for hiding this comment

res-life Nov 4, 2024

Choose a reason for hiding this comment

res-life commented Nov 26, 2024

revans2 left a comment

Choose a reason for hiding this comment

revans2 Nov 26, 2024

Choose a reason for hiding this comment

res-life Dec 11, 2024

Choose a reason for hiding this comment

res-life commented Dec 13, 2024

res-life commented Dec 20, 2024

res-life commented Jan 16, 2025

res-life commented Jan 16, 2025

res-life commented Jan 16, 2025

ttnghia commented Jan 16, 2025 • edited Loading

revans2 commented Jan 16, 2025

res-life commented Oct 21, 2024 •

edited

Loading

ttnghia commented Jan 16, 2025 •

edited

Loading