Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarking of Qsim with Cirq #127

Open
karlunho opened this issue Jun 9, 2020 · 4 comments
Open

Benchmarking of Qsim with Cirq #127

karlunho opened this issue Jun 9, 2020 · 4 comments

Comments

@karlunho
Copy link

karlunho commented Jun 9, 2020

Need to do a benchmark fo qsim with cirq, and publish documentation.

@alexandrupaler
Copy link
Collaborator

alexandrupaler commented Jun 9, 2020

Would the following help?
https://colab.research.google.com/drive/1354IM8hwPU0la4Jv7-b2vwVqrEgasOsq?usp=sharing

This simple benchmark shows that qsimcirq is 5x faster. The benchmark is based on

def test_cirq_qsim_simulate_fullstate(self):

@95-martin-orion
Copy link
Collaborator

We have a simple comparison of qsim vs. the Cirq simulator in the tutorial doc, but I imagine this issue is looking towards a more complete benchmark.

@balopat
Copy link
Contributor

balopat commented Dec 3, 2020

Cross ref quantumlib/Cirq#1124 we could possibly implement this under the same suite of benchmarks.

@TimoEckstein
Copy link

Hello,

I compared qsim with the standard Cirq simulator on 2 AMD EPYC 7502 CPUs (64 cores) with 512 RAM as well as both with the best GPU Cirq simulator (QuLacs-GPU), I was able to find, on a v100 GPU among others (see plot 1).
For this comparison, I used the runtime of the Quantum Fourier Transformation (QFT) dependent on the simulated amount of qubits. One reason for choosing QFT as a simulator benchmarking algorithm was that comparable benchmark data exists (e.g. https://arxiv.org/abs/2009.01845 comparing their custom simulator with the standard Cirq python simulator). Another reason was, that QFT with its very sparse 2-qubit controlled Z power gates and otherwise only 1-qubit gates is a rather ‘worst case scenario’ for qsim and its gate fusion. Hence, it should allow to obtain a reasonable lower bound on the performance improvement when using qsim compared to the Cirq python simulator.

Still, in plot 1 one can see that qsim with fused gates and 64 threads starts outperforming any other considered Cirq simulation method for >21 qubits. At 29 qubits when the GPUs run out of RAM qsim at the 2 AMD EPYC 7502 CPUs is about a factor of 2 faster than QuLacs-GPU on a v100 GPU; or respectively they are comparable for qsim on one AMD EPYC 7502 CPU.
In the other plot is shown how qsim uses the available resources as well as the effect of fused gates, which reduce the total data volume by a factor of approx. 2.5. Similarly, the runtime is reduced by a factor of 2 with fused gates. The maximal memory bandwidth for one AMD EPYC 7502 CPU is about 200GB/s, meaning that about half the maximal bandwidth is used. Respectively, the maximal floating-point operations per second are about 1,100 GFLOP/s for one CPU which indicates that qsim with fused gates uses nearly 70 % of that.

plot1.pdf
plot2.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants