Extract non-zero indexes / reduce array size? #141

polarathene · 2017-06-08T04:20:59Z

polarathene
Jun 8, 2017

Operating on Arrays of 100 mil values, I want to check if any match a value(s) in another Array. set_intersect() seems good for this but is quite slow, 430ms or 220ms when unique is set to true.

Another approach was to use eq() with locate() which converts the equal values of 1 into array of indexes with no zero elements. This took between 480-647ms, eq() was very fast but locate() was the one that took a while. I have moved the result of eq() back to host but this transfer takes too long, although the equivalent to locate() is much faster:

    // Transfer requires Vector of same array size upfront
    let r_eq_len = r_eq.elements() as usize;
    let mut r_eq_host: Vec<u8> = vec![0; r_eq_len];
    // Move Array to host Vec
    r_eq.host::<u8>(&mut r_eq_host);
    // conversion only takes 54ms, cost to move to host for this though took 290ms
    let r_eq_host_indices: Vec<usize> = r_eq_host.into_iter().enumerate().filter(|&(_,b)| b==1).map(|(i,_)| i).collect();

The eq() operation on GPU with ArrayFire is very fast, I can get the index with an Array of the same size where the values are indices, then multiply this against eq() result, again very fast. Removing the 0 value elements to reduce on transfer(I only need a couple values at most out of the 100 million elements that were computed, seems difficult without locate?

How can create an array copy without 0 value elements? I saw trunc() but unclear how this works on both Rust and official docs. This process is the most expensive part and what makes computing on the CPU faster. Actual computation on GPU is very fast.

Answered by 9prady9

Jun 12, 2018

This is being addressed on an upstream issue arrayfire/arrayfire#1818

View full answer

pavanky · 2017-06-08T04:58:08Z

pavanky
Jun 8, 2017
Collaborator

@polarathene eq + locate + index is the only way to do this correctly for now. If the values are not sorted, set_intersect will not give correct results. truc is for truncating floating point numbers and is not relevant.

What do you intend to do with the output ?

0 replies

polarathene · 2017-06-08T06:30:26Z

polarathene
Jun 8, 2017
Author

TL;DR: I need a mapping of the original value(that the hash is computed against) for it to be of any value(no good knowing I've found the correct hash if it's not mapped to the input value that produced it).

NVIDIA wrote a blog article in 2014 about filtering arrays via predicate, Thrust has a method copy_if. Is ArrayFire able to implement similar? So that I could create an array from an existing one with the predicate of element != 0.

While I can do this, the current performance with locate() or set_intersect() causes the CPU version to outperform the GPU. I need to support processing large amounts of data batched over long durations where this overhead adds up.

Array of Strings(as bytes)
Hashing algorithm
Compare array of computed hashes against single or multiple hashes
Return matched hashes with original array indexes so that the original value is known (slow part)

This is meant to achieve similar purpose to the program Hashcat(C with OpenCL). I am using a very simple hashing algorithm for learning purposes, I would like to later port some hashing algorithms over to make use of ArrayFire and Rust benefits and compare performance.

I have an efficient permutation generator that can for example generate all permutations with repetition for a string of length 5 and lower case letters, 26^5 = 11,811,376 permutations. At larger keyspaces such as 26^10(141,167,095,653,376), batches of 100 mil processed per second would take around 16 days. I believe the GPU can do more than this? The CPU version can process 100 mil in 200ms, with ArrayFire the hashing algorithm is very fast(0.056662 ms for 100 mil), filtering results and transfer to and between host can be 3-5x as long as the CPU version. About 2GB of vRAM is used with 100% utilization on GPU, 39% PCIe 3.0 bandwidth on x16.

If this could perform better it would be a good showcase project for ArrayFire?

0 replies

polarathene · 2017-06-08T06:34:18Z

polarathene
Jun 8, 2017
Author

If the values are not sorted, set_intersect will not give correct results.

Far as I can tell I was being given the correct results with this method. Given an small array of possible keys to intersect, it correctly returned the only valid one I expected. If it had a way to provide the indexes of the larger array it would be better than locate performance.

0 replies

polarathene · 2017-06-08T07:45:57Z

polarathene
Jun 8, 2017
Author

I tried a different approach using sort_index(), I used count_all() to get number of non-zero elements, created a new array with indice as value, dim size matched number returned from count_all(eg 9). sort_index() with descending then gave me two arrays(values and indices) that I could use index_gen() on with the range array to extract the first 9 elements.

Unfortunately the sort_index() method was the main perf problem taking 1020ms to perform.

0 replies

polarathene · 2017-06-08T10:00:16Z

polarathene
Jun 8, 2017
Author

This is apparently known as Stream Compaction as requested on issue #124 . It can take a predicate and perform better than Thrust's copy_if() as described here(links to article and github repo with cuda implementation).

Other search results mention using scan() to achieve this(issue #124 has a C++ version example for ArrayFire using it), but only allows for several binary ops rather than a predicate or boolean input. The scan over 100 mil elements took 72ms for me, if there was something like scan_predicate() that took in a bool array from something like eq() and returned an array containing only true values that might be the fastest? A tuple containing indices like sort_index() would also be great.

0 replies

9prady9 · 2018-06-12T08:12:19Z

9prady9
Jun 12, 2018
Maintainer

This is being addressed on an upstream issue arrayfire/arrayfire#1818

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract non-zero indexes / reduce array size? #141

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Extract non-zero indexes / reduce array size? #141

polarathene Jun 8, 2017

Replies: 6 comments

pavanky Jun 8, 2017 Collaborator

polarathene Jun 8, 2017 Author

polarathene Jun 8, 2017 Author

polarathene Jun 8, 2017 Author

polarathene Jun 8, 2017 Author

9prady9 Jun 12, 2018 Maintainer

polarathene
Jun 8, 2017

pavanky
Jun 8, 2017
Collaborator

polarathene
Jun 8, 2017
Author

polarathene
Jun 8, 2017
Author

polarathene
Jun 8, 2017
Author

polarathene
Jun 8, 2017
Author

9prady9
Jun 12, 2018
Maintainer