The State of Quantum RAM: A new enlightening Investigation

The realization of quantum RAM aka QRAM would be a game changer for the quantum industry. Very promising proposals almost two decades ago have inspired quantum algorithm engineers at first. But the lack of success stories has disillusioned many by now. Information about the state of QRAM is rare and hardly known. A highly interesting, recent preprint brings light into the dark: A thorough investigation about the state of quantum RAM and its fundamental challenges. Yet, the text is also long and rich of technical details. In this article you will find an overview of my lessons learned from studying the survey. I have simplified the selected highlights to help you getting into the subject yourself.

Symbolic picture of QRAM

Image by Gerd Altmann from Pixabay

QRAM: The missing Game Changer

Data plays a central role in universal computation. It is data that breathes life into most algorithms. Making data memory fast and randomly accessible, was a major step in the evolution of digital computers: RAM. The same holds for quantum computation. Unfortunately here, the situation for quantum RAM aka QRAM is even more complicated:

Imagine a superfast quantum algorithm, that has to perform a lookup on a “table” $ T $ of $ N = 2^n $ bits of data. We want to store this table and make it randomly accessible for a quantum computer. In the quantum world a QRAM-lookup of the (classical) data has to perform the following operation on an address register A and an output register O

$$
\lvert i \rangle_A \lvert 0 \rangle_O \longrightarrow \lvert i \rangle_A \lvert T_i \rangle_O
$$

A simple circuit implementation to access the table, is a series of controlled operations of the type

IF address register = i: 
   THEN add T_i to the output register

For such a lookup an order of $ O(N) $ gates are executed altogether. But this would destroy any superpolynomial speedup in $ N $ of the algorithm!

So, to maintain a superfast quantum speedup, you need to a have superefficient QRAM. Exactly such a QRAM was sketched in the ground breaking “bucket-brigade”-paper by Giovannetti, Lloyd and Maccone in 2008 i – more about this later. Afterwards, QRAM played a key role in various superfast quantum algorithms. The most famous among them is probably the HHL algorithm for solving equations in linear algebra, which achieves a runtime of $ O(log N) $. Along, QRAM was one of the concepts that helped to drive the first wave of quantum machine learning algorithms. But the problem was: The physical realization of the “bucket-brigade”-QRAM was not in sight. This and other constraints led Scott Aaronson to demand “Quantum Machine Learning Algorithms: Read the Fine Print” in 2015 ii. In it, he also coined the term “passive” QRAM. Afterwards not only the first wave of quantum machine learning algorithms ebbed significantly. Also, the enthusiasm for QRAM dropped.

“QRAM: A Survey and Critique”: Lifting the Fog

In my impression the situation has not changed and is still … very foggy. Once in a while I hear the rather wistful question “… and what about QRAM?”, but not even the hardware teams seem to have a real answer. This is all very disappointing.

Therefore, I was very surprised to stumble over a recent preprint from May 2023 by Samuel Jaques and Arthur Rattew (University of Oxford) with a pretty remarkable rating of 58 on Scirate iii. Its title “QRAM: A Survey and Critique” immediately got me hooked.

The study is a thorough investigation about the concepts, challenges and use cases for QRAM. It discusses several strategies of hardware and software implementations. Also, the authors propose regimes, where QRAM could work and encourage scenarios for further research. The preprint has finally given me an understanding of the fundamental problems, that the realization of QRAM faces. But I was also surprised to find out, that there exist promising works, that incorporate a realistic concept of QRAM.

As a survey of various papers the preprint is a broad collection of technical details and even the main text is about 30 pages long. So, if you are also curious about the subject but would like to speed up things for you, then in the following text you will find a personal overview of my lessons learned from reading. For the purpose of this article I have rearranged and simplified the selected highlights of the work. This should help you getting started with the survey yourself.

The Bucket-Brigade: The Mother of all QRAM

But first step back for a second to the mother of all quantum RAM: The “bucket-brigade”. The original work also sketches a “proof of principle” implementation. This is how it works:

A binary tree

A binary tree in general
Image by Rune Magnussen via Wikimedia

The implementation is a binary tree of trapped ions or atoms. Each atom is restricted to three states “wait”, “left” and “right”. Thus, each node is a “qutrit”. In the “wait” state, the node is inactive. This is the initial state of all nodes. The excited states “left” and “right” couple to the left and right spatial direction of the next layer. The coupling is mediated through photons in the following way:

If a quantum processor needs to access an address encoded as qubits in an address register, each qubit in the address register generates a photon, starting from most significant to least significant qubit. The state of the qubit is translated into the photon’s polarisation states $ \lvert left \rangle_P$ and $ \lvert right \rangle_P$. If this photon reaches a node, the following can happen:

    • $ \lvert W \rangle_A $: If the node’s atom is in the wait-state, the photon gets absorbed and generates an excited state $ \lvert L \rangle_A $ or $ \lvert R \rangle_A $ depending on its own polarization

    • $ \lvert L \rangle_A $: If the node’s atom is in the left-state, the photon is deflected to the node on the left spatial direction of the next layer, using stimulated Raman emission, without changing its own state or the state of the forwarding atom. At the next node, the same three cases are possible.

    • $ \lvert R \rangle_A $: If the atom is in the right-state, the photon is deflected to the node on the right spatial direction of the next layer without changing states as above. Again, at the next node, the same three cases are possible.

This will initialize a path of active nodes through the binary tree according to the qubits in the address register from most significant to least significant: A bucket-brigade. If all address qubits are sent this way, the data bit in the leaf is read out by generating a polarized photon which gets routed in the same way but from bottom to top.

A nice feature of the bucket-brigade model is that an error in any node just logarithmically effects the outcome, as only the nodes in the active path are able to propagate errors all the way up. The authors state, that this may be improved even further by error correcting the top layers (somehow) as errors at this stage have greater impact.

As appealing as the bucket-brigade is, it also illustrates all the problems that challenges the realization of QRAM.

Active versus Passive Gate-QRAM

Gate-QRAM

In their preprint Jaques and Rattew call a specialized hardware device as the one described above “gate-QRAM“. It is absolutely crucial that such a device executes with as less interventions as possible. Ideally such a system evolves passively through a single time-independent Hamiltonian. The authors call this “ballistically” and compare it to a Galton box

A Galton box

Galton box, an example of a “ballistical” device
Image by Rodrigo Argenton via Wikimedia

They formulate the key argument of their work as follows:

“In the active model, there is a powerful opportunity cost argument: in many applications, one could repurpose the control hardware for the qubits in the QRAM (or the qubits themselves) to run an extremely parallel classical algorithm to achieve the same results just as fast.”. If gate QRAM needs $O(N)$ interventions, then it is qualitatively no better than “circuit-QRAM“, a software implementation as the simple one sketched above (more about this further down). For example, the original bucket-brigade-paper suggests stimulated Raman emission for the atom-photon interactions, which Jaques and Rattew judge as an active device. More generally, the preprint formalizes the problem of active interventions in a theorem that indicates, that there is trade-off of the number of interventions against time and energy: If one is fixed, the other must grow with the QRAM-size.

Note, that this requirement not only counts for routing to the correct memory cell, but also for reading out the data. As the authors state “For perspective, if we lose a single visible photon’s worth of energy for every bit in memory for each access, accessing $2^{50}$ bits costs about 1 kJ. As we move into cryptographic scales, QRAM becomes clearly absurd. A single photon’s worth of energy per bit means 1 giga-Joule per access if we have $ 2^{80} $ bits” (more about these applications and memory scales later).

Reversibility

A purely passive gate-QRAM that is driven by a single time-independent Hamiltonian, would be reversible. Thus, any implementation faces the challenge that excited states relax to lower energy states. Nodes will need to be carefully engineered to control this without interventions. Yet, as for the bucket-brigade, relaxation is also required: After each QRAM readout the nodes need to turn into the initial waiting-state. This may be achieved, if the emission times of the atoms in the layers at the bottom are designed to be shorter, than the once further up the active path. Thus, the last atom would emit the first photon which would be neatly routed along the remaining active path. In this sense, atom by atom would relax, emit photons which are correctly routed and the address register would restore. For this to happen, emission and absorption errors need to be extremely small. The preprint quantifies this further. Because of all these challenges, the authors concludes “If this interaction problem is solved, the QRAM routing tree looks like a full-fledged trapped ion quantum computer”.

Please note, that besides all of these challenges, we are not even talking about coherence times yet!

Proposals for physical QRAM-Implementations

The authors describe several proposals in the literature for implementing bucket-brigade-QRAM. Unfortunately, according to Jaques and Rattew, they all fall short in some way or the other as outlined. In particular, besides the original design with trapped ions, they explain approaches using

    • Transmons
    • Photonic transistors
    • Heralded Routers
    • Photonic CNOT

Among these, the Transmon-proposal seems most promising, as it provides enough parameters to tune the nodes for constructing a truly passive device iv.

Besides these, the authors describe other QRAM-designs in the literature, that do not follow the bucket-brigade approach. Again, they also explain in detail, why they fail to provide a passive device:

    • Time-bins
    • Quantum Optical Fanout
    • Phase Gate Fanout
    • Derangement Codes

QRAM errors

From the previous arguments we should realize by now, that error correcting the individual nodes in gate-QRAM will essential copy error corrected software implementations. So Jaques and Rattew propose to interpret gate-QRAM as a single physical gate and to correct its output errors. This imposes certain problems, as this physical device would need to couple to logical qubits. As the authors outline, it its unlikely, that the QRAM-gate acts transversal to the error correcting code in the quantum computer. This means, we cannot just couple the noisy QRAM to the physical qubits and the error correcting code will do the rest for us. Instead they propose a distillation of the QRAM-gate, very much like the distillation of T-gates in the surface code. This means, we use $ d $ calls to the noisy QRAM-gate and distill these outcomes to a state of much higher fidelity. Unfortunately, as the authors argue in a theorem, to achieve a high fidelity state, $ d $ has to grow quadratically with the size of the QRAM. But this is asymptotically no better than error corrected circuit-QRAM aka software implementations.

Applications for QRAM and their Scaling

Symbolic image of software applications

Image by Mudassar Iqbal from Pixabay

The preprint lists several interesting applications for QRAM along with numerous references. To my surprise, their focus is not on data for context (like big data, machine learning, finance, …). Instead these works rather use QRAM to speed up quantum calculations. Some of them incorporate optimized software implementation for QRAM (Jaques and Rattew call these “circuit-QRAM”) which are used to compile detailed resource estimations of the use cases. The preprint summarizes the QRAM-specific scaling of these studies and outlines details about the QRAM-implementations.

Optimizing calculations

As the authors state for the the first application:

“Optimizing calculations. Many quantum algorithms perform complicated classical functions in superposition, such as modular exponentiation in Shor’s algorithm or inverse square roots in chemistry problems. Recent techniques … choose instead to classically precompute the result for a large portion of inputs, and then use … [QRAM] to look up the result.”.

For instance, in one referenced paper Google’s Craig Gidney uses a technique called “windowed arithmetics” v. In there, he trades controlled arithmetic calculations, which are controlled by a qubit register, by using the same register as an address register for a QRAM-lookup. The operations are merged into a single windowed batch lookup on the QRAM. Gidney gives the following motivation for this procedure:

“At first glance, this may seem like a bad idea. This optimization saves twenty [quantum] multiplications, but generating the lookup circuit is going to require classically computing all $ 2^{20} $ possible results and so will take millions of multiplications. But physical qubits are noisy …, and quantum error correction is expensive … . Unless huge advances are made on these problems, fault tolerant quantum computers will be at least a billion times less efficient than classical computers on an operation-by-operation basis. Given the current state of the art, trading twenty quantum multiplications for a measly million classical multiplications is a great deal.”.

By the way, see my article Does Quadratic Quantum Speedup have a problem? to find out more about the problems, that Gidney is referring to.

Jaques and Rattew state the following nice aspect about windowed modular exponentiation: Shor’s algorithm requires only about 17 applications of QRAM-lookups. So we could afford much higher error rates for each QRAM-lookup than for other gates. Note, that passive gate-QRAM would only speed up this type of optimization, if the table of data will not have to be regenerated for each application. This could be the case for Google’s qubitization-paper vi, which uses QRAM to look up the value of a function of the Coulumb potential.

In the exemplary papers the QRAM-size $ N $ is rather manageable (around $ 2^{15} $ to $2^{25} $). This means, that in these cases a lookup cost of $ O(N) $ is actually acceptable and the studies still achieve an overall quantum advantage.

Other applications

Another QRAM-application that the authors of the preprint give is the “Dihedral Hidden Subgroup”. And again by the way, see my general introduction for these kind of problems From Shor to the Quantum Hidden Subgroup Problem – A jumpstart. In my article I also introduce Greg Kuperberg’s sieve-algorithm for solving the Dihedral Hidden Subgroup – problem. Almost ten years later he published an improved algorithm that relies an QRAM vii. The goal of the algorithm is to construct an optimized “phase vector state” to read off the characteristics of the hidden subgroup.

The new algorithm uses a table of “phase multipliers” constructed from previous measurements. Using QRAM-lookups Kuperberg’s iteratively constructs improved phase vector states from the previous vector states that came along with the previous measurements. For such cryptographic applications Jaques and Rattew give a mid sized scaling of $ 2^{18} $ to $ 2^{51} $ bits. In these regimes a passive gate-QRAM would start to speed up calculations significantly.

Much worse QRAM-scalings are needed for attacks on quantum-safe cryptography. For these applications, the preprint gives a range of $ 2^{49} $ up to $ 2^{93} $ bits. As these also tend to find a single element in memory, the error rate needs to be extraordinarily high. This is also the case for a Grover-search on a database.

These regimes also indicate, at which scales context data may be loaded into the quantum computer using circuit-QRAM or active gate-QRAM without ruining the overall performance of a fast quantum algorithm. In the case of quantum linear algebra, such as the HHL algorithm, the preprint presents a detailed case study by considering all kinds of runtime costs (obvious and hidden).

Circuit-QRAM

Symbolic image of ciruit-QRAM

Credits Hann, Connor & Lee, Gideon & Girvin, Steven & Jiang, Liang. (2020)
The resilience of quantum random access memory to generic noise. / CC BY 4.0

From the arguments in the previous sections it should be obvious, that the authors of the preprint have little hope for the case of truly passive gate-QRAM. Yet in the last section it also became apparent, that there exist regimes, for which software implemented QRAM could be sufficient. As mentioned, the work provides detailed description of various implementations, with different pros and cons regarding gate complexity, depth, T-gate count, space usage and error scaling.

A bucket-brigade implementation

For instance a bucket-brigade may be implemented by using two qubits for each node in the tree: A routing qubit and a controlling qubit. The routing qubit takes the role of the photon, which is routed along the active path. The controlling qubit manages the activation state of each node and controls the routing: For each node the routing qubit is swapped either to the left or the right routing qubit in the next layer of the active path. This swapping is controlled by the controlling qubit of the node. If the routing qubit reaches an inactive layer, it is not swapped to the routing qubit but to the controlling qubit of the inactive node, thus making it active and controlling the routing of the subsequent routing qubits. After the full path is activated, the data qubit of the leaf controls a CNOT on the routing qubit of the last layer. This routing qubit is then swapped layer by layer as before, yet from bottom to top. This implementation has a gate count of $ O(N log N) $ and its depth is $ O(log N) $. As described above for the physical implementation, the error rate per node may be relatively large compared to simpler QRAM-implementations. Generally, one would expect error rates of order $ O(1/N) $.

Have fun reading

You will find even more aspects in the survey, that I did not cover (e.g. a detailed comparison with classical RAM and the case study about quantum linear algebra) – so, check out the preprint yourself. I should mention, that at first, it took me a while to get into the work due to some technical formalities: The preprint starts by proposing a general naming scheme for QRAM (following Greg Kuperberg’s paper) as this is used rather inconsistently throughout the literature. Thus, the authors use the terms QRACM or QRAQM for quantumly addressable (in superposition) classical data or quantum data viii. Although Samuel Jaques was kind enough to provide further explanations to me, I decided to use the term QRAM instead in my article and keep things simple. Also, the authors themselves state, that at the end the difference between QRACM and QRAQM may be neglected, as the addressing and routing works the same for either case and is the main challenge for QRAM. In this sense, they also use QRAM as a generic term.

Footnotes

i https://arxiv.org/abs/0708.1879: “Quantum random access memory“, paper by Vittorio Giovannetti, Seth Lloyd, Lorenzo Maccone (2008)

ii https://scottaaronson.com/papers/qml.pdf: “Quantum Machine Learning Algorithms: Read the Fine Print” , survey by Scott Aaronson (2015).

iii https://arxiv.org/abs/2305.10310: “QRAM: A Survey and Critique”, preprint by Samuel Jaques, Arthur G. Rattew (May 2023)

iv https://studenttheses.universiteitleiden.nl/access/item%3A2603592/view: “A transmon based quantum switch for a quantum random access memory”, Master’s thesis Arnau Sala Cadellans (2015)

v https://arxiv.org/abs/1905.07682: “Windowed quantum arithmetic”, by Craig Gidney (2019)

vi https://arxiv.org/abs/1902.02134: “Qubitization of Arbitrary Basis Quantum Chemistry Leveraging Sparsity and Low Rank Factorization”, by D. Berry, C. Gidney, M. Motta, J. McClean, R. Babbush (2019)

vii https://arxiv.org/abs/1112.3333: “Another subexponential-time quantum algorithm for the dihedral hidden subgroup problem“, by G. Kuperberg (2011)

viii Jaques and Rattew use the term QRACM for lookups in a data table $ T $

$$
\lvert i \rangle_A \lvert 0 \rangle_O \longrightarrow \lvert i \rangle_A \lvert T_i \rangle_O
$$
whereas QRAQM supports superpositioning of tables, which I symbolically rewrite as
$$
\lvert i \rangle_A \lvert 0 \rangle_O \lvert T \rangle_T \longrightarrow \lvert i \rangle_A \lvert T_i \rangle_O \lvert T \rangle_T
$$
Note, that this type of “quantum data memory” does not support lookups of general wavefunctions $ \psi_i $
$$
\lvert i \rangle_A \lvert 0 \rangle_O \longrightarrow \lvert i \rangle_A \lvert \psi_i \rangle_O
$$
This is state preparation and not possible with QRAM, although most techniques for QRAM may be used for this as well.

Does Quadratic Quantum Speedup have a problem?

As quantum computing evolves the community focuses on the next steps towards fault tolerance. Recent estimates for the overhead of error correcting codes on reasonable hardware assumptions conclude certain unfavorable results for quadratic quantum speedups. As this might effect important use cases for quantum computing we should take a closer look at these results.

Image by PublicDomainPictures from Pixabay

Google and the problem with quadratic quantum speedups

As quantum computing evolves the community starts to focus on the next steps towards fault tolerance. First trial implementations of error correcting codes, as the surface code, have been successfully executed and companies start to work out industrially relevant use cases beyond NISQ in detail. These developments start to highlight an aspect that is often overlooked: The computational overhead of fault tolerance. In this light Google’s Quantum AI team published a study in 2021 “Focus beyond quadratic speedups for error-corrected quantum advantage” i in which they provide some “back of the envelope” calculations and conclude with certain unfavorable results for the quantum case. These roughly estimate the runtime of quantum algorithms with a small polynomial speedup and ask the question: How long would such an algorithm need to run in order to beat the classical runtime? They simplify the aspect in the following generic form:

Consider a quantum algorithm, that repeatedly executes a quantum primitive M times (e.g. an oracle for optimization). If the runtime of each round would be $ t_Q $ the total runtime is:

$$
T_{total} = M * t_Q
$$

If the classical algorithm needs to execute the same primitive $ M^d $ times (e.g. $ d=2 $ for a quadratic speedup like for Grover search), the total classical runtime is:

$$
T_{total} = M^d * t_C
$$

The central question is: How long are $ t_Q $ and $ t_C $? They argue that the quantum runtime for such algorithms is mostly governed by costly, error corrected Toffoli gates which are the quantum version of classical NAND gates. Toffoli gates are not “native” to the surface code and require the generation or “distillation” of so called magic states from a number of noisy states.
Considering state of the art surface code implementations and reasonable assumptions for superconducting quantum computers, they predict a Toffoli execution time of 170 µs (they also mention the case for trapped ions). How many Toffoli executions can we expect until we achieve a quantum advantage? In the case of a Grover search implementation for optimization, Google’s team reasons a count of order $ N $ Toffolis ($ N $ being the number of qubits) for a typical oracle which should be at least $ N = 100 $ to hit quantum advantage. Thus

$$
t_Q = 440 ms
$$

On the classical side they give an estimate of

$$
t_C = 33 ns
$$

Such a quantum algorithm would have to execute for 2.4 hours in order to breakeven with the classical algorithm. Of course, a Grover search for $ N = 100 $ whould need only about 10 cycles to end, but these are just estimates to give an impression. Also Google’s team explains, that considering even larger Ns (and more interesting problems) would be even less favorable for the quantum side. I guess, you could also argue to sequentially execute many $ N=100 $ – problem instances until you reach the breakeven time.

Now, 2.4 hours seems like good news – until you consider multiple CPU cores and a classical algorithm exploiting parallelization (which would be easy for optimization). Even worse: In order to execute the surface code to error-correct the quantum computer you would also need classical processing. And you need lots of it. Google’s team names “thousands of classical CPUs”. And if you use additional 3000 CPUs on the classical side as well, the breakeven time increases to 1 year!

BTW, they also compare annealing algorithms and Sherrington-Kirkpatrick model instances, achieving even worse results for quantum computers. Now, this looks very disappointing for quadratic speedups and let’s hope for significantly better error correcting codes and quantum hardware. But Google’s team also considers the case for $ d = 3, 4 $, achieving drastically better results for the quantum algorithms, especially for quartic speedups. Unfortunately, there aren’t that many such algorithms and most speedups for optimization are at most a quadratic one ii.

Google’s team states that their and similar results are already “folk wisdom” among the small community that studies error-correction. Indeed for instance Microsoft’s Matthias Troyer formulated similar concerns in the past iii.

A detailed study by Montanaro et al

Already in 2019 Montanaro et al published a much more detailed resource estimation about the same subject “Applying quantum algorithms to constraint satisfaction problems” iv. Again at first sight, the researches calculate very promising prospects for best case random instances of the classes 12-SAT, 14-SAT and graph coloring. These classes are among the hardest combinatorial optimization problems out there. Montanaro et al even compared the quantum algorithms against leading classical algorithms for the same task. In particular, they also achieved their performance results by trading space for time, resulting in a very large physical qubit overhead. You may have noticed that Google’s estimates assume that the Toffoli gates are executed sequentially. Actually they mention the option of performing magic state distillation for Toffoli gates in parallel. They argue that, due do the runtime of the involved Clifford gates and the routing overhead such an effect can only accelerate the quantum algorithm by a factor 10 to 100.

But even Montanaro et al conclude, and again, they formulate this in great detail: Once the classical processing overhead for the surface code is taken into account, this quantum advantage vanishes. BTW Google’s team states that error correcting needs to improve by a magnitude of about 3 to enable favorable results. But this would require significant breakthroughs. Let’s hope, that this will not be the end of the story. Nevertheless, once more it emphasizes the role of exponential quantum speedups.

Exponential Quantum Advantage: Aaronson’s state of the union

If you wonder what Scott Aaronson, one of quantum computing‘s most renowned scientists, has to say about the current state of exponential quantum speedups, you just need to look inside a transcript of his talk at the Solvay Physics Conference in Brussels, May 2022. It is on quantum computing‘s top five list at SciRate for the past year and highly informative v. In there, Aaronson also mentions Google’s work from above and the importance of exponential asympotics to quickly reach the breakeven time compared to classical algorithms. In some sense it reminds me of his “QML – Read the fine print” paper, because it is very readable and kind of … well, relentless. But you should really read it yourself.

In this light, it is reasonable to ask how far the boundaries of known, super fast quantum algorithms can be pushed. In my next article, I will discuss this in more detail for the case of quantum computing‘s arguably most central algorithm – Shor’s algorithm:

From Shor to the Quantum Hidden Subgroup Problem – A jumpstart

Footnotes

i https://arxiv.org/abs/2011.04149: “Focus beyond quadratic speedups for error-corrected quantum advantage” paper by R. Babbush, J. McClean, M. Newman, C. Gidney, S. Boixo, and H. Neven

ii https://www.youtube.com/watch?v=1-2LIopvNIk: “Quantum Algorithms for Optimization” Talk by Ronald de Wolf on the Quantum Colloquium of the Simons Institute. He introduces various strategies for fault tolerant quantum computers. In the following panel discussion, with Edward Farhi present, also the case for QAOA is discussed. BTW, de Wolf also mentions a simpler kind of argument for the ropic of this section, the problem with quadratic speedups.

iii https://www.youtube.com/watch?v=WY3htdKUGsA: “Towards Practical Quantum Advantage” Talk by Matthias Troyer on the Quantum Colloquium of the Simons Institute in which he roughly estimates the overhead for quantum error correction by a constant factor of magnitude 10 to 12.

iv https://arxiv.org/abs/1810.05582: “Applying quantum algorithms to constraint satisfaction problems”, paper by E. Campbell, A. Khurana and A. Montanaro which calculates detailed resource estimates for the fault tolerant computation in comparison to best classical algorithms.

v https://arxiv.org/abs/2209.06930: “How Much Structure Is Needed for Huge Quantum Speedups?” Transcript of Scott Aaronson‘s talk at the Solvay Physics Conference in Brussels, May 2022.

Challenging the “Exponential Quantum Advantage Hypothesis“ for Quantum Chemistry

Ever since Feynman‘s statement “Nature isn‘t classical dammit“ there exists the general agreement in the quantum computing community that fault tolerant quantum computers will generically provide an exponential quantum advantage in quantum chemistry for problems of practical interest. A well-known quantum chemist from Caltech now challenges this “Exponential Quantum Advantage Hypothesis“ for a most important class of problems – we should be well advised to pay attention to his insights.

Symbolic sketch for Quantum Chemistry

Image by Irene from Pixabay

Quantum computing and the “Exponential Quantum Advantage Hypothesis” (EQA) for chemistry

Quantum computing has several great use cases of tremendous public interest. Yet, not for all of those use cases is the predicted benefit well established. But there is one field, for which an exponential quantum advantage will remain, no matter what. The purpose for which quantum computers had been proposed in the first place: For the simulation of nature at its fundamental level and its very prominent industrial version: Quantum chemistry. OK, we are not sure, if this advantage could already be achieved in the current NISQ-era. But even if algorithms like the variational eigensolver will not do the job, then definitely fault tolerant quantum computing will do.

This is not just my expectation. It is a general agreement which has almost the status of a fact in the quantum computing community. Rock solid and absolutely hype free. And come on, why not?! We know nature at this level mathematically lives in an exponential Hilbert space. We know since Seth Lloyd’s seminal paper on “Universal Quantum Simulators” that quantum dynamics is of polynomial complexity for quantum computing and thus efficiently solvable. And even for calculating the energy levels of quantum systems, we know a generic algorithm with polynomial complexity: Quantum phase estimation.

A Quantum Chemist from Caltech

Recently, I was very surprised to hear a different view about this matter in the Quantum Colloquium of the Simons Institute (organized by Umesh Vazirani, one of quantum computing’s “fathers”). In his talk Garnet Chan from Caltech discusses the somewhat bold question “Is There Evidence of Exponential Quantum Advantage in Quantum Chemistry?” i. Like often in this series, there is also a panel discussion with other leading scientists following ii – but more about this later. And of course there is also a preprint to Chan’s talk iii.

Now, Chan is a respected scientist from a leading university … and he knows both worlds very well: His main domains are theoretical quantum chemistry and quantum many-body physics. But also, he has worked on some interesting papers on quantum computing e.g. together with Google’s and IBM’s quantum teams (e.g. iv v). As Vazirani states in his introduction “I am particularly looking forward to Garnet’s talk … he has been the one who has been most clear in expressing the big picture as well as the subtleties of the problems of simulating quantum chemistry”.

I find the fact, that such experts start to get really serious with quantum computing extremely promising. It drives the research in an industrially relevant direction and forces quantum computing to start proving itself in practice. What I also find remarkable regarding his preprint on the EQA-hypothesis is its list of co-authors: Among them are leading scientists in quantum computing like John Preskill himself and Ryan Babbush from Google.

So, we should be well advised to pay attention to his insights and his opinion.

Quantum Chemistry: The ground state energy problem

As Chan explains, among the various questions one can ask in quantum chemistry, probably the most central one is the question for the ground state energy of an electronic Hamiltonian. The reason for this is, that one can deduce many things from it

      • If you calculate the energy for two different configurations of molecules, the difference will be the reaction energy

      • If you do the same for two different phases of matter, you can study thermodynamic stability

      • If you calculate the energy as a function of the nuclear coordinates, you get the potential surface for analyzing catalytic and reaction cascades

      • If you study the energy in terms of externally generated perturbations you can analyze aspects of spectroscopy

The textbook algorithm for calculating this ground state energy is quantum phase estimation (QPE). It efficiently splits any state into a sum of eigenstates / eigenvalues – pairs of a Hamiltonian, weighted by the individual overlap of the trial state with these eigenstates. A measurement on the eigenvalue register collapses the superposition into a single pair. The probability that this is the ground state depends on the mentioned overlap.

In this procedure the exact Hamiltonian is discretized with a “basis set”, a more or less generic state-set / orbital-set of size L which relates to the system size. The general expectation is, that this is a generic method to calculate the ground state energy of interesting chemical systems with an exponential advantage in L compared to classical methods.

Quantum Advantage: The problem with the initial overlap

Chan points out that the weakness of QPE is certainly not its polynomial circuit size. The problem is the overlap S of the initial trial state with the true, unknown ground state. This ratio controls how often we need to execute the circuit in order to hit the ground state and not some other eigenstate of the system. Thus, the complexity of the quantum algorithm and all its necessary repetitions scale like:

$$
poly(1 / S)
$$

(Here I leave out other dependencies as they are irrelevant for this article)

By the way, according to Chan, post-QPE-methods exist, but these show the same behavior.

If we choose some arbitrary quantum state, we should expect an overlap, which shrinks as the size of the Hilbert space increases. As this scales exponentially with the system size L, we are left with an exponentially small overlap. This destroys the efficiency of the QPE strategy.

More specifically: If the initial state is chosen as a product state of L non-interacting subsystems and each subsystem has an local overlap of $ s < 1 $, then the global overlap will become exponentially small

$$
s^{O(L)}
$$

Thus, in order to achieve an efficient, polynomial quantum performance, we need an initial state with good overlap, but which mismatches the ground state in terms of its energy. If we had a match in both terms, this would be a good approximation of the ground state at the first place and we would not need to bother calculating.

So the big question for Chan is: Can we generically construct such initial states for systems where classical methods fail (estimating the ground state energy efficiently)?

The search for the right initial states in quantum chemistry

According to Chan, there is currently no indication, that we could construct such states. The argument from above about initial product states also works for single Slater determinant states. These are more or less the electron-version or fermionic version of generic product states while respecting the fermionic anti-communication rules of multi-electron systems (thus, the trick with the determinant).

Another candidate would be adiabatic state preparation, by slowly adding a perturbation to some initial Hamiltonian system and its trial ground state. This method seems promising, but according to Chan this relies on heuristics which lacks the initial requirement of a generic method.

Now, you can improve any type of state preparation by exploiting known properties of the specific system (like its symmetries) and physical intuitions. But as he argues, this also applies to any classical method, leading to improvements on the other side as well.

Chan gives a few numerical simulations of his claims. Specifically, he calculates the overlap of the “best” Slater determinant with a classically estimated or extrapolated ground state for different Iron-sulfur (Fe-S) clusters, including the famous FeMo-cofactor of nature’s nitrogenase. The qubit count for mapping the basis into a quantum computer register ranges from 40 to 154 qubits. In his simulations the overlap indeed drops sharply as this count increases. This does not significantly change by improving the Slater-determinant-ansatz.

Evidence for classical hardness for a Quantum Advantage?

As for the other side of the EQA-hypothesis, he asks himself if there is indeed evidence of exponential hardness for classical methods and gives some counter examples. The first class of those systems have the “single-reference” character and he gives some reasoning, why those are representatives for large parts of the chemical space (after all, the preprint contains 77 pages). In single-reference systems, a single Slater determinant qualitatively describes the ground state locally which can be estimated well by classical methods such as coupled cluster.

The other class of systems he discusses is the Heisenberg model. This is a model for quantum magnets but according to Chan also serves as a representative model for correlated materials. He names several efficient classical methods which can address these systems (like tensor networks / PEPS, sign-free quantum Monte Carlo and to some extend DMRG).

The Quantum Computing Colloquium of the Simons Institute

If you haven’t done so far, I recommend to definitely check out the YouTube-channel of the Quantum Colloquium vi . It started in the middle of the pandemic and is the source of many highly interesting talks of well-known speakers, high-class audience and usually followed by a panel discussion with top scientists. I am myself a co-organizer of a Meetup-group on quantum computing and I have to realize: If your name is Umesh Vazirani, I guess, you have a few other options on how to organize these things.

What I found very remarkable in the discussion afterwards was the amount of consent among the panelists with Chan’s talk. Also I have learned, that if you are a computer scientist, even a top scientist, you argue very cautiously when it comes to quantum chemistry. I guess this is just very respectable. The panelists agree with Chan, that a solution of the overlap-problem in state preparation is totally unknown.

Nathan Wiebe agrees with all the data, while coming to different conclusions, because the requirements for the quantum and classical calculations need to be specified further. Indeed, as also Chan and co-author Lin Lin point out, classical heuristics are carried out in polynomial time, without necessarily achieving a specific accuracy, while the calculation error in quantum computing can be efficiently controlled.

Wiebe also doubts, if conclusions can be made for the asymptotics from small size examples.

Birgitta Whaley, a chemist and Vazirani’s colleague from Berkley, outlines that multi-reference systems (for which multiple Slater-determinants play a role) and conical intersection problems of potential surfaces definitely show exponential hardness for classical methods. Also for “non-othogonal reference states” (I guess meaning: reference states expanded in non-orthogonal orbitals) couple cluster expansion tend to continue forever.

She and also Isaac Chuang advertise to look for other problems in quantum chemistry which are more optimal for quantum computers. Whaley specifically names spin ordering which may be formulated as a decision problem (yet not a binary one).

What about Quantum Advantage for quantum computing?

What is the conclusion from all of this? Nevertheless, even Chan expects that quantum computers will open the door for answering lots of interesting questions and he can’t wait to get fault tolerant quantum computing in his hands. While he questions a generic exponential advantage, he assumes that there will be lots of room for drastic, polynomial advantage. After all, every improvement in classical quantum chemistry has been a polynomial one. Among the panelists there are also doubts whether asymptotic complexity is the right tool to study quantum advantage.

At the end, I find this reassuring. Indeed, even for the ground state energy problem, there has been detailed resource estimations for the related quantum circuit in the past which calculate a quantum advantage – in wall-clock time. One example for this is the famous FeMo-cofactor of nitrogenase (btw: the initial paper and some further work was co-authored by panelist Nathan Wiebe) vii. Another nice example is a recent thorough investigation by Google Quantum AI and Boehringer Ingelheim about the class of cytochrome P450 enzymes viii.

So I think, there is no reason to doubt quantum advantage – especially for quantum chemistry. But perhaps we should be more careful about statements regarding exponential speed ups – even for quantum chemistry.

Footnotes

i https://www.youtube.com/watch?v=DZPH7ENcRLU: “Is There Evidence of Exponential Quantum Advantage in Quantum Chemistry?“ talk by Garnet Chan / Caltech on the Quantum Colloquium of the Simons Institute.

ii https://www.youtube.com/watch?v=YO5-rojZSYg: Panel discussion to Garnet Chan‘s talk with various leading scientists.

iii https://arxiv.org/abs/2208.02199: “Is there evidence for exponential quantum advantage in quantum chemistry?“ Preprint by Garnet Chan et al.

iv https://arxiv.org/abs/2203.15291: “Simulating challenging correlated molecules and materials on the Sycamore quantum processor“ paper by Chan and Google AI

v https://arxiv.org/abs/2001.03685: “Quantum algorithms for quantum chemistry and quantum materials science“ paper by Chan and IBM‘s quantum group

vi https://www.youtube.com/playlist?list=PLgKuh-lKre11rEOFaO62MJCzBXfjZSDqA: YouTube-channel of the Quantum Colloquium / Simons Institute

vii https://arxiv.org/abs/1605.03590“Elucidating Reaction Mechanisms on Quantum Computers”, the seminal FeMoCo-paper by ETH Zürich and Microsoft which has started extensive research. A more recent advance regarding the molecule was published by Google in https://arxiv.org/abs/2011.03494“Even more efficient quantum computations of chemistry through tensor hypercontraction”

viii https://www.pnas.org/doi/10.1073/pnas.2203533119“Reliably assessing the electronic structure of cytochrome P450 on today’s classical computers and tomorrow’s quantum computers” paper by Google Quantum AI and Boehringer Ingelheim

Is Google’s quantum supremacy milestone now really obsolete?

In August 2022 news has spread through the media, that a team of researchers from the Chinese Academy of Sciences managed to implement Google‘s quantum supremacy task on a classical supercomputer. Regarding the fact that Google’s 2019-milestone plays a prominent role for the quantum computing community and also for the communication with the public, I think, it is necessary that the community needs to deal with the details of this new paper.

 

The Sycamore chip

The Sycamore chip (credits Google, CC BY 3.0, via Wikimedia Commons)

Google‘s quantum supremacy milestone

You know the story: In October 2019 Google published a seminal milestone in quantum computing. In a paper, which Google‘s Quantum AI team managed to keep secret almost until the end, they announced its new quantum computer Sycamore with 53 qubits i. But most important they also described a quantum algorithm that managed to outperform even the largest classical supercomputer by the factor 200s : 10 000 years.

Despite some criticism, the work has been regarded as one of major milestone for quantum computing in recent history. By time, criticism grew louder and in August 2022 news has spread through the media that a team of researchers from the Chinese Academy of Sciences finally managed to equalize the performance on the same task using a classical supercomputer. BTW, the preprint dates back to last year November.

In October 2019 I myself wrote a pretty enthusiastic article about quantum supremacy in my German online book and now I have looked into the paper „Solving the sampling problem of the Sycamore quantum circuits“ by Pan, Chen and Zhang ii.

Quantum Random Circuit Sampling

The idea of Google‘s supremacy experiment is simple and pretty brilliant. Once a quantum circuit is complex enough no classical supercomputer should be able to simulate it due to the exponentially large Hilbert space of the resulting state. Google achieved just this by randomly dicing a 53-qubit circuit from a set of one and two qubit quantum gates. The two qubit gates are iSWAP-like gates, that are especially hard to simulate for a classical computer on a large scale. The complexity of the circuit grows exponentially in circuit depth and at 20 cycles Google hit the supremacy regime. The result of the algorithm is a distribution of random looking 53 bit strings, like a „speckled intensity pattern produced by light interference in laser scatter“, as the paper states.

A very central feature of the paper is the answer to the following question: How do you actually prove, that the quantum computer did what it was supposed to do and didn‘t just produce random looking trash?

For one thing, they reproduced the result of their quantum hardware for less complex circuits of the same style on classical hardware, this is how Google‘s team got the 10,000 year estimate. But most of all: They used a reliable benchmark tool, which they had already introduced in 2017 iii, to measure the correctness of the resulting distribution …

The key point: Google‘s cross entropy benchmark

The basic idea of the cross entropy bench marking fidelity is very clever: If we measure the quantum circuit we will most likely measure a bit string with a large amplitude in the resulting quantum state of the circuit. If we now calculate the ideal amplitude of just this bit string on a classical computer, we should get a value, which is larger than the average. Now we collect a large number of measurement results and their bit strings. If we sum up all their calculated ideal amplitudes, we should definitely get a value that is „suspiciously large“ (as Scott Aaronson calls this).

BTW 1: To calculate „a few“ amplitudes of the ideal circuit is nothing compared with calculating the full exponentially large state vector as you will see later in this text.

BTW 2: Could you „spoof“ the result of the cross entropy fidelity? Let‘s say, we don‘t measure a quantum circuit but just select a series of random bit strings. What would the fidelity look like?

In detail the fidelity is defined as

$$
F_{XEB} = \tfrac{2^n}{k} \big(
\sum^k_{i=1} |\langle 0^n | C | x_i \rangle|^2
\big) -1
$$ Here, $ n $ is the number of qubits, $ k $ is the number of samples, $ C $ is the quantum circuit and $ x_i $ is the i‘th sampled bit string.

Now, $ \tfrac{1}{k} \big( \sum … \big) $ is like a monte carlo integration and if the $ x_i $ were drawn from the uniform distribution, then this integration would converge to the uniform summation over the full computational basis. Thus, giving $ \rightarrow \tfrac{1}{2^n} || 0^n ||^2 = \frac{1}{2^n} $ as C is unitary. So for random guessing $ x_i $ we have

$$
F_{XEB} = 0
$$ Question: But could you improve this by guessing better?

Answer: This is exactly what Pan, Chen and Zhang did!

An important fact about Google‘s result is: As Sycamore is a NISQ-device it produces imperfect results and the measured distribution is not the true distribution. A perfect quantum device would actually generate a fidelity 1 (the argument for this is much more subtle iv). But Google scored a fidelity of

$$
F_{XEB} = 0,002
$$. So to rival Google‘s result, you do not need a perfect result either.

Interpretations of „Solving the sampling problem“

The 10,000 year estimate by Google was also caused by the type of simulation variant, that they used. As there is no way to store a state of a $ 2^{53} \approx 10^{18} $ dimensional Hilbert space on RAM, they used a hybrid Schrödinger-Feynman algorithm, that breaks the circuit in patches and recombines those via path-integrals an the end. This is memory efficient but becomes exponentially more computationally expensive than a pure Schrödinger simulation. But Google‘s team also mentions in the paper „We expect that lower simulation costs than reported here will eventually be achieved, but we also expect they will be consistently outpaced by hardware improvements on larger quantum processors.“.

Just a few days after the release, IBM’s quantum team managed to propose the former: An alternative approach that aimed at using the massive hard drive of the Summit supercomputer instead of RAM. They estimated a runtime of 2 days. Now, this approach still consumes exponential resources and probably due to practical reasons they never executed the algorithm. So trying to solve the problem by executing a full amplitude simulation and sample from this seems unrealistic, what also the Chinese team now emphasizes.

We all know, a quantum computer itself does just this: A full amplitude generation. But of course, we are never able to see the full amplitude result, all we see are repeated measurement results. So in this sense: If you are able to somehow generate a series of bit strings from a given quantum circuit which scores at least 0,002 in the cross entropy benchmark, what should this be called? Is this „solving the sampling problem“ or is this just „spoofing the Sycamore test“? I think, both terms are fair to say.

The classical sampling algorithm by Pan, Chen and Zhang

So a full amplitude simulation won‘t work. If we would just do a single amplitude simulation of one bit string this would vastly reduces the computational cost. Just think of a simple example like $ \langle 0^n | H^n | 1^n \rangle| $. You never need to write down the full state vector, you just focus / project on the given bit strings. Pan, Chen and Zhang compare the supremacy circuit with a gigantic three dimensional tensor network with certain boundary conditions for the initial and the final state. To request a single amplitude for a certain computational basis state / bit string, both boundaries are fixed. The case in their work is a little different:

They choose their 53-bit strings the following way:

      1. 6 bits are variable
      2. For the remaining 47 bits, they generate $ 2^{20} \approx 1 million $ randomly chosen strings

Altogether this covers almost $ 10^8 $ of all possible $ 10^{18} $ bit strings. Thus, the latter are 10 billions times more strings. It is remarkable that this proved to be enough samples. They call the collection the „sparse state“. For each of the 47-bit strings: For the additional 6 variable bits, finally they chose only the most significant ones, i.e. the ones that generate the highest scores on the cross entropy fidelity. This leaves about 1 million uncorrelated bit strings, which is comparable with the figures from Google‘s experiment.

In a sense this is similar to what Ewin Tang does in her quantum inspired algorithms: Sample first and then calculate (quantum algorithms do the opposite).

There is another trick that the researchers do, to simplify the calculation: They drill K holes into the tensor network, which makes the computation far easier but decreases the fidelity by a factor $ 2^{-K} $. I think, this is also a fair alteration of the sampling problem, since Sycamore also inserts errors – yet unwillingly, of course.

The remaining problem is still highly difficult. The team manages to calculate all $ 10^8 $ amplitudes with a single tensor network contraction: The basic purpose of the sparse state is to multiply each gate or contract each subtensor starting from the bottom right of the circuit (qubit no. 53 of the final state) and then iteratively work yourself up and to the initial state. Each iteration step projects the subtensor only to those product states / bit combinations that are included in the sparse state. For instance, if we contract a two qubit gate for qubit no. 53 and 52 and in the sparse state only the combinations 01 and 10 exist for bits no. 53 and 52, than the remaining two combination will be dropped out of the computation. This decreases the complexity of the sparse state calculation, compared to a full amplitude simulation, approximately by a factor 10 billion.

All in all the Chinese team arrives at a fidelity

$$
F_{XEB} = 0.0037
$$  in about 15 hours using 512 GPUs. They argue, that using even more advanced hardware would finally beat Google‘s 200 seconds.

The future of quantum supremacy experiments

The paper of Pan, Chen and Zhang finally dissolves the 10,000 year estimate of Google’s work for good. You could argue, that there remains a quantum advantage regarding computing cycles or carbon footprint (as Scott Aaronson states).

In my opinion Google’s result is still a milestone, may-be a less spectacular one by now. The current development was kind of expected by the team and the community. Also, we should get used to the fact that, newly detected quantum advantages might be caught up by some improvement on the classical side.

But what about the other part of the Google-quote from above “…but we also expect they [the classical simulations] will be consistently outpaced by hardware improvements on larger quantum processors.”?

The quantum algorithm could be improved in various ways by hardware advances. Most of all: Better gate fidelity, which would lead to deeper circuit depth and better cross entropy fidelity scoring. At some point using enhanced random guessing and drilling holes would probably be not good enough on the classical side. Of course, as always in quantum, more qubits would also work pretty good.

Might there be some news in preparation? Alongside a preprint about toric code implementations, Google Quantum AI recently mentioned (by the way) a planar 72-qubit quantum processor, Sycamore’s big brother v. I very much wonder, if the team is still on the quantum supremacy track …

Yet there is an intrinsic problem with random quantum circuit sampling. At some point, the heart of all the arguments, the cross entropy benchmark, will hit the quantum supremacy regime itself, making it impossible to verify the measured distribution on classical hardware. So at some point conceptionally new quantum supremacy experiments are probably needed. Now, this is related to a much more interesting question:

At what tasks will we be able to observe exponential quantum advantages in the near future?

I will deal with this question in a future post. So stay tuned … Smiley

Footnotes

i https://www.nature.com/articles/s41586-019-1666-5: „Quantum supremacy using a programmable superconducting processor“ by Google Quantum AI

ii https://arxiv.org/abs/2111.03011: „Solving the sampling problem of the Sycamore quantum circuits“ by Pan, Chen and Zhang from the Chinese Academy of Sciences

iii https://arxiv.org/abs/1608.00263: „Characterizing Quantum Supremacy in Near-Term Devices“ by Google Quantum AI

ivhttps://ion.nechita.net/wp-content/uploads/2021/01/Nechita-Qsup-jan-2021.pdf, „Mathematical aspects of Google’s quantum supremacy experiment“ by Ion Nechita. Since I could not come up for the explanation myself why $ F_{XEB} = 1 $ for ideal circuits, this is best / easiest explanation, I have found. For more and more samples N the distribution converges to the „Porter-Thomas distribution“ $ N*exp(-Nx) $ which leads to an easy integral and the result from above. If you have a simpler explanation feel free to tell me.

v https://arxiv.org/pdf/2207.06431.pdf: „Suppressing quantum errors by scaling a surface code logical qubit“ by Google Quantum AI