You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is really odd.
The computation time both for creating the graphs and for searching
is nearly 2× times longer for Float64 compared to Float32 data points.
How can this be? (The C++ library doesn't show this behaviour)
I thought my architecture is optimized for 64bit..
using HNSW
dim = 128
num_elements = 10000
data = [rand(Float32,dim) for n=1:num_elements];
hnsw = HierarchicalNSW(data; efConstruction = 200, M = 16)
@time add_to_graph!(hnsw)
# --> 4.820807 seconds (415.10 k allocations: 158.142 MiB, 1.29% gc time)
data = [rand(Float64,dim) for n=1:num_elements]
hnsw = HierarchicalNSW(data; efConstruction = 200, M = 16)
@code_warntype add_to_graph!(hnsw) #to trigger compilation
@time add_to_graph!(hnsw)
# --> 7.995207 seconds (379.92 k allocations: 294.682 MiB, 0.71% gc time)
The text was updated successfully, but these errors were encountered:
I'm afraid this is more normal than it looks and, a good result ^_^. The slower speed is mostly due to the fact that more Float32s fit inside the CPU's registers (and cache). Most probably, each 64-bit register can act as two 32-bit registers and this can improve micro-parallelism - two 32-bit register operations can be performed for each 64-bit one (things are probably more complicated in practice but modern super-scalar architectures can do this sort of thing quite well). With Julia's proficiency at generating highly optimized code, this is probably the case.
This effect can be observed for matrix multiplication...
using BenchmarkTools
benchmark(N) =begin
A =rand(Float32, N,N);
B =rand(Float64, N,N);
At = A'
Bt = B'println("Matrix multiplication $N x $N:")
print("\tFloat32 ");
@btime$A *$At;
print("\tFloat64 ");
@btime$B *$Bt;
end# runbenchmark.([10,100,1000]);
Results on my box (Core i7 3840M, 2.8Ghz, 4 cores, 8MB cache):
This is really odd.
The computation time both for creating the graphs and for searching
is nearly 2× times longer for
Float64
compared toFloat32
data points.How can this be? (The C++ library doesn't show this behaviour)
I thought my architecture is optimized for 64bit..
The text was updated successfully, but these errors were encountered: