A Quantum distance-based classifier (part 2)

Robert Wezeman, TNO

Table of Contents

$$ \newcommand{\ket}[1]{\left|{#1}\right\rangle} $$

Introduction

In the first part of this notebook series on a distance-based classifier, we looked at how to implement a distance-based classifier on the quantum inspire using QASM code. We looked at the simplest possible case, that is: using two data points, each with two features, to assign a label (classify) to a random test point, see image. In this notebook we will extend the previous classifier, by increasing the number of data points to four. In this notebook we demonstrate how to use the projectQ framework to do this.

Back to Table of Contents

Problem

Again we have the following binary classification problem: Given the data set $$\mathcal{D} = \Big\{ ({\bf x}_1, y_1), \ldots ({\bf x}_M , y_M) \Big\},$$

consisting of $M$ data points $x_i\in\mathbb{R}^n$ and corresponding labels $y_i\in \{-1, 1\}$, give a prediction for the label $\tilde{y}$ corresponding to an unlabeled data point $\bf\tilde{x}$. The classifier we shall implement with our quantum circuit is a distance-based classifier and is given by \begin{equation}\newcommand{\sgn}{{\rm sgn}}\newcommand{\abs}[1]{\left\lvert#1\right\rvert}\label{eq:classifier} \tilde{y} = \sgn\left(\sum_{m=0}^{M-1} y_m \left[1-\frac{1}{4M}\abs{{\bf\tilde{x}}-{\bf x}_m}^2\right]\right). \hspace{3cm} (1)\end{equation}

This is a typical $M$-nearest-neighbor model where each data point is given a weight related to the distance measure. To implement this classifier on a quantum computer we use amplitude encoding. For the details see the previous notebook.

Back to Contents

Theory

The algorithm requires a $n$-qubit quantum system to be in the following state initially \begin{equation}\label{eq:prepstate} \ket{\mathcal{D}} = \frac{1}{\sqrt{2M}} \sum_{m=0}^{M-1} \ket{m}\Big(\ket{0}\ket{\psi_{\bf\tilde{{x}}}} + \ket{1}\ket{\psi_{\bf{x}_m}}\Big)\ket{y_m}.\hspace{3cm} (2) \end{equation}

Here $\ket{m}$ is the $m^{th}$ state of the computational basis used to keep track of the $m^{th}$ training input. The second register is a single ancillary qubit entangled with the third register. The excited state of the ancillary qubit is entangled with the $m^{th}$ training state $\ket{\psi_{{x}_m}}$, while the ground state is entangled with the new input state $\ket{\psi_{\tilde{x}}}$. The last register encodes the label of the $m^{th}$ training data point by \begin{equation} \begin{split} y_m = -1 \Longleftrightarrow& \ket{y_m} = \ket{0},\\ y_m = 1 \Longleftrightarrow& \ket{y_m} = \ket{1}. \end{split} \end{equation} Once in this state the algorithm only consists of the following three operations:

  1. Apply a Hadamard gate on the second register.

  2. Measure the second register. We restart the algorithm if we measure a $\ket{1}$ and only continue when we are in the $\ket{0}$ branch.

  3. Measure the last qubit $\ket{y_m}$.

In the special case where the amount of training data for both labels is the same, this last measurement relates to the classifier as described in previous section by \begin{equation} \tilde{y} = \left\{ \begin{array}{lr} -1 & : p(q_4 = 0 ) > p(q_4 = 1)\\ +1 & : p(q_4 = 0 ) < p(q_4 = 1) \end{array} \right. \end{equation} By setting $\tilde{y}$ to be the most likely outcome of many measurement shots, we obtain the desired distance-based classifier.

In the previous notebook we saw the implementation for $N=2$ data points, each with $M=2$ features. Now we will consider the case for two datapoints with $M=4$ features.

Back to Table of Contents

Algorithm

To describe the desired initial state for $M = 4$ and $N = 2$ we need 5 qubits:

Furthermore, these $4$ require us to implement a triple controlled-$R_y$ gate, or $CCCR_y$. ProjectQ does this automatically for us, at the cost of two extra ancillary qubits, resulting in a total of 7 qubits. The algorithm is divided in different parts:

After doing step $C1$ four times with the right angles(${\bf x_m}=R_y\ket{0}$) and in the right order alternated with step $C2$ the system will be in the state $$\ket{\mathcal{D}_C} =\frac{1}{\sqrt{8}} \sum_{m=0}^{3}\ket{m}\Big(\ket{0}\ket{\tilde{{\bf x}}}+\ket{1}\ket{\bf x_m}\Big)\ket{0}$$

Back to Table of Contents

Implementation

We will implement the above algorithm using the projectQ framework. First we need to import some modules and set up the authentication for connecting to the Quantum Inspire API.

Before we consider the four point classifier, let us first review the 2 point classifier again. Consider the following data points from the Iris flower dataset: \begin{equation} \begin{split} &\ket{\psi_{\tilde{x}}} = \;\;\;0.9999 \ket{0} -0.0011\ket{1}, \hspace{2cm}y = 1,\\ &\ket{\psi_{x_0}} = -0.4583 \ket{0} - 0.8889\ket{1}, \hspace{2cm}y = 1,\\ &\ket{\psi_{x_1}} = -0.3728 \ket{0} +0.9279\ket{1}, \hspace{2cm}y = 0. \end{split} \end{equation}

The code for this implementation is shown below and treated in detail in the previous notebook.

The classifier predicts the wrong label 0 for the test point with this combination of data points as: \begin{equation} 0.04 + 0.1043719 > 0.0366663 + 0.1019124 \rightarrow \text{Assign label 0 to } \tilde{y} \end{equation} The left figure gives intuition why the classifier fails to predict the correct label. For this specific combination of data points, the test point is closer to the data point with label 0 than to the data point with label 1. As the prediction is based on only these two data points, it is expected to give this wrong prediction. Note, this problem has nothing to do with the quantum computer used for the calculation. The same results would be obtained with a classical distance-based classifier.

By adding two more data points, one for each label, we improve the classifier so that it is better able to give the right label as prediction. Add the following two points to the calculation: \begin{equation} \begin{split} &\ket{\psi_{x_2}} = -0.3728 \ket{0} +0.9279\ket{1}, \hspace{2cm}y = 1,\\ &\ket{\psi_{x_3}} = -0.4583 \ket{0} - 0.8889\ket{1}, \hspace{2cm}y = 0. \end{split} \end{equation}

Consider the quantum circuit for the four point classifier below.

Just as we did in previous notebook, we need to count only those outcomes where the third qubit is equal to a 0. That is, we only consider the 16 outcomes in the set: $$\{00000, 01000, 10000, 11000, 00001, 01001, 10001, 11001, 00010, 01010, 10010, 11010, 00011, 01011, 10011, 11011\}$$

The label $\tilde{y}$ is then given by a majority vote:

\begin{equation} \begin{split} \tilde{y}_{0} = \# \{00000, 01000, 10000, 11000, 00010, 01010, 10010, 11010\}\\ \tilde{y}_{1} = \#\{00001, 01001, 10001, 11001, 00011, 01011, 10011, 11011\} \end{split} \end{equation}\begin{equation} \tilde{y} = \left\{ \begin{array}{lr} -1 & : \tilde{y}_{0} > \tilde{y}_{1}\\ +1 & : \tilde{y}_{1} > \tilde{y}_{0} \end{array} \right. \end{equation}

By adding these two points we see that the classifier now gives the correct label for the test data. To see how well this 4-point distance-based classifier performs we also apply it to randomly selected training and test data.

Conclusion and further work

In general, a distance-based classifier gets better in predicting classes once more data is included. In this notebook we showed how to extend the previous algorithm to 4 data points. We saw an example where the 2-point distance-based classifier fails to predict the right label, while, when extended to four data points, it managed to classify the test point correctly. This is what we expect classically, however, the key takeaway here is that the quantum algorithm itself (step f) did not change. It is independent of the size of dataset and it is from this that we can expect huge speedups.

Extending the classifier to more data points is now analogus. Extending the classifier to $8$ data points will be a nice challenge for the reader to test their understanding, a solution is given below.

In this notebook we used the projectQ backend to generate the quantum algorithm. Note that for controlled gate operations in generall ancillary qubits are required. ProjectQ code does not minimize the amount of ancillary qubits needed in the algorithm, this could be improved.

The next notebook in this series will look at how to include more than two features for the data.

Back to Table of Contents

References

Solution for 8 data points