You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
The current implementation of trunc and date_trunc use a kernel where the format is a column.
This works great. In my tests I saw a 180x speedup over the CPU (16 cores). But we could save a lot of memory if the format is a scalar, which I think is the most common case.
In the best case where the format string is "DD", a new kernel we would save about 6 bytes per row of input. On a date that is 150% increase in memory usage. For a timestamp that is only a 75% increase in memory usage. But for the worst case on a date it is "QUARTER" or a 275% increase in memory usage. For a timestamp it is "MICROSECOND", which would be 187.5% increase in memory usage. This is probably minor, but it would be nice.
The text was updated successfully, but these errors were encountered:
The current JNI implementation is already optimized for memory. Although the scala is promoted into a column, such column has only one row thus there is no change in memory usage.
However, I agree that we can optimize further, but in term of performance. Currently, a format string is parsed when processing every row, even there is only one format value (column format of size one). We can do better by parsing the scalar format (on host) before calling the kernel, saving time for the GPU by not doing so again. I'll post a JNI PR shortly.
Is your feature request related to a problem? Please describe.
The current implementation of trunc and date_trunc use a kernel where the format is a column.
This works great. In my tests I saw a 180x speedup over the CPU (16 cores). But we could save a lot of memory if the format is a scalar, which I think is the most common case.
In the best case where the format string is "DD", a new kernel we would save about 6 bytes per row of input. On a date that is 150% increase in memory usage. For a timestamp that is only a 75% increase in memory usage. But for the worst case on a date it is "QUARTER" or a 275% increase in memory usage. For a timestamp it is "MICROSECOND", which would be 187.5% increase in memory usage. This is probably minor, but it would be nice.
The text was updated successfully, but these errors were encountered: