Extract non-zero indexes / reduce array size? #141
-
Operating on Arrays of 100 mil values, I want to check if any match a value(s) in another Array. Another approach was to use // Transfer requires Vector of same array size upfront
let r_eq_len = r_eq.elements() as usize;
let mut r_eq_host: Vec<u8> = vec![0; r_eq_len];
// Move Array to host Vec
r_eq.host::<u8>(&mut r_eq_host);
// conversion only takes 54ms, cost to move to host for this though took 290ms
let r_eq_host_indices: Vec<usize> = r_eq_host.into_iter().enumerate().filter(|&(_,b)| b==1).map(|(i,_)| i).collect(); The How can create an array copy without 0 value elements? I saw |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments
-
@polarathene What do you intend to do with the output ? |
Beta Was this translation helpful? Give feedback.
-
TL;DR: I need a mapping of the original value(that the hash is computed against) for it to be of any value(no good knowing I've found the correct hash if it's not mapped to the input value that produced it). NVIDIA wrote a blog article in 2014 about filtering arrays via predicate, Thrust has a method While I can do this, the current performance with
This is meant to achieve similar purpose to the program Hashcat(C with OpenCL). I am using a very simple hashing algorithm for learning purposes, I would like to later port some hashing algorithms over to make use of ArrayFire and Rust benefits and compare performance. I have an efficient permutation generator that can for example generate all permutations with repetition for a string of length 5 and lower case letters, 26^5 = 11,811,376 permutations. At larger keyspaces such as 26^10(141,167,095,653,376), batches of 100 mil processed per second would take around 16 days. I believe the GPU can do more than this? The CPU version can process 100 mil in 200ms, with ArrayFire the hashing algorithm is very fast(0.056662 ms for 100 mil), filtering results and transfer to and between host can be 3-5x as long as the CPU version. About 2GB of vRAM is used with 100% utilization on GPU, 39% PCIe 3.0 bandwidth on x16. If this could perform better it would be a good showcase project for ArrayFire? |
Beta Was this translation helpful? Give feedback.
-
Far as I can tell I was being given the correct results with this method. Given an small array of possible keys to intersect, it correctly returned the only valid one I expected. If it had a way to provide the indexes of the larger array it would be better than locate performance. |
Beta Was this translation helpful? Give feedback.
-
I tried a different approach using Unfortunately the |
Beta Was this translation helpful? Give feedback.
-
This is apparently known as Stream Compaction as requested on issue #124 . It can take a predicate and perform better than Thrust's Other search results mention using |
Beta Was this translation helpful? Give feedback.
-
This is being addressed on an upstream issue arrayfire/arrayfire#1818 |
Beta Was this translation helpful? Give feedback.
This is being addressed on an upstream issue arrayfire/arrayfire#1818