-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds working bitstring Sample function to MPS. #490
Merged
MichaelBroughton
merged 6 commits into
quantumlib:master
from
MichaelBroughton:mps_sample
Dec 22, 2021
Merged
Changes from 2 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
f3d63cb
Adds working bitstring Sample function to MPS.
MichaelBroughton dd0f7f1
remove iostream include.
MichaelBroughton fa36fd7
remove use of gmock.
MichaelBroughton 70f7a2b
Moved to O(n) sampling algorithm.
MichaelBroughton e4ef583
remove unused includes.
MichaelBroughton fc1fcc8
Merge branch 'master' into mps_sample
MichaelBroughton File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MichaelBroughton for adding this! If I understand the implementation correctly, you can save time from n^2 to n where n is the number of sites by storing intermediate contractions. That is, first compute \rho_{1,...,n-1} (rdm obtained by tracing out the last qubit) and store it in memory, then from this compute \rho_{1,...,n-2} and store it in memory, then from this ... compute \rho_{1}. Then sample bits from 1 to n, again reusing these already-computed rdms.
I hope that makes sense... I can't find a reference implementation right now. Basically the idea is not to contract edges more than once - e.g., currently to get \rho_{1} and \rho_{2}, the last n - 2 edges of the MPS are individually contracted twice, but you can avoid this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rmlarose I'm not sure I follow. Can you write out the approach here quickly using something like python and einsum ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, looking back I think you may already be doing what I suggested since
scratch
andscratch2
get modified in each call toReduceDensityMatrix
.For an n=3 MPS, the sequence of rdms should be:
That is, for each
i
you want to obtainrho_1...i
fromrho_1...(i + 1)
, not fromrho_1...n
. If you are already doing this (it's not 100% clear to me howscratch
andscratch2
get modified inReduceDensityMatrix
), you can ignore my comment.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure I see how this contraction process is O(n) and not O(n^2). The contraction process in this PR is this:
Step 1 in this operation always considers tensors [0,...n]. You compute n RDM matrices at the cost of n contractions each so it's O(n^2) to draw a single bitstring. Looking at your proposal IIUC you do this:
Suppose I have this MPS:
and part way through my sampling process I need to compute the RDM on qubit 3 (having already sampled and projected down qubits 0,1 and 2):
With my approach I would just naively recontract everything together. IIUC your approach would keep another temporary variable containing the contractions of the projections from tensor 0 up to i - 1 if I want to compute the RDM for qubit i. This way I can use the fact that I have previously computed:
Where I could store this:
And then (from the diagram above) I can contract 2 into 0,1 -> 0,1,2 and have the entire left side of my 3 RDM calculation done by simply remembering some of the computations I did from the previous step. This means on iteration number:
.
.
.
There will be n steps where step i requires n-i+1 contractions -> \sum_i^n (n-i) -> n * (n - 1) / 2 -> O(n^2) with a prefactor of 1/2 instead of a prefactor of 1. How do I get this down to O(n) ? You can't "cache" tensor contractions on the right side of your RDM index as you go using this same approach can you ?
If we can get it down to O(n) I'd say it's worth updating in this PR, but if it's just the prefactor of 2 I'd say that's fine to come in a subsequent PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MichaelBroughton for the very detailed and clear response. It seems my original comment does apply.
Indeed this is precisely the idea. There is one important change to your sequence of diagrams to get to the O(n) algorithm. Below I sketch out the full O(n) sampling algorithm, which proceeds in two "phases".
Phase 1: Computing RDMs
Starting from the first step with an n=4 MPS, you have this tensor network:
First, you store this. (Note this is not a single-site RDM, which I think is the key difference from how you are thinking about this.) Then you connect the rightmost free legs and contract every edge on the last tensor to get the RDM on the first three sites:
Store this. Then connect the rightmost free legs and contract every edge on the last tensor to get the RDM on the first two sites:
Store this. Then connect...to get the RDM on the first site:
Store this. Note: There are n steps here, and each step requires contracting O(1) edges, yielding O(n) total contracted edges. The output of this step is the list of n "sub tensor networks" shown above.
Phase 2: Sampling
To sample, you iterate backwards through this list. Starting with the last "sub tensor network", namely the RDM on the first site
sample a bit
a
.Next, move to the previous "sub tensor network", namely the RDM on the first two sites, and attach
a
(as a vector, |0> or |1>, corresponding to the bit) to the first site:Contract this tensor network to get a 2x2 rdm, then sample a bit
b
.Next, move to the previous "sub tensor network", namely the RDM on the first three sites, and attach
a
andb
(as vectors) to the first two sites:Contract this tensor network to get a 2x2 rdm, then sample a bit
c
. Note: In the previous step, the tensor networkwas already contracted. To make "Phase 2" also have O(n) contractions, you need to store and reuse this result.
Finally, repeat the same process to get the last bit. Namely, form
contract to get a 2x2 rdm, then sample a bit
d
. Note again that the tensor networkappearing in this diagram has already been contracted in the previous step.
Output
abcd
. Note: This phase has n steps, each of which require O(1) contractions, yielding O(n) contractions. The total for both phase 1 and phase 2 then is O(n).Pseudocode
If helpful, I sketched this procedure out in the following rough-around-the-edges pseudocode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. This makes sense. One final question: I'm still not sold on the Phase 1 portion. Looking at the pseudocode here:
rdms will contain initially an n site tensor network. After the first loop iteration it will contain an n site network and an n-1 site network etc. all the way down to a 1 site network. In terms of space requirements it feels like \sum_i^n i -> n*(n-1) / 2 -> O(n^2) has shown up again, only now in space and not time. Am I still missing something ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right that phase 1, as I've described it, requires O(n^2) memory. However, after talking with a "famous" mps guy - who has formally kicked me out of the tensor network community for describing it this way 😄 - there's a better way to do phase 1 which only requires O(n) memory:
You still store the copy of the first tensor network (using the same n=4 example):
But now you just compute and store the following tensors by successively connecting top/bottom free indices and contracting:
Explicitly, as an example,
--(tw)
is(i and j indicate these legs are connected, respectively).
The time is still O(n), but now the space is also O(n).
The only difference to phase 2 is how the RDMs are obtained. For example, in the first step you form
and contract to get the 2x2 density matrix, then sample a bit
a
. Note thatis selected from the copy of the tensor network in the first step of phase 1.
Next you form
and contract, sample, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright then. I've gone ahead and implemented this. It's a lot faster on the big systems which is great. I didn't look too carefully into profiling it for things like cache friendliness, but generally speaking it looks like it's spending a lot of time doing BLAS operations and moving data:
This might be a good follow up for anyone interested in trying to squeeze some more performance out.