A Quantum distance-based classifier (part 3)

Robert Wezeman, TNO

Table of Contents

$$ \newcommand{\ket}[1]{\left|{#1}\right\rangle} $$

Introduction

This notebook is the third in the series on the quantum distance-based classifier. In the first notebook, we looked at how to build a distance-based classifier with two data points, each having two features. In the second notebook, we looked at how to increase the amount of data points. In this notebook we will look at how to increase the amount of features.

Back to Table of Contents

Problem

We repeat the problem description from the previous notebooks. We define the following binary classification problem: Given the data set $$\mathcal{D} = \Big\{ ({\bf x}_1, y_1), \ldots ({\bf x}_M , y_M) \Big\},$$

consisting of $M$ data points $x_i\in\mathbb{R}^n$ and corresponding labels $y_i\in \{-1, 1\}$, give a prediction for the label $\tilde{y}$ corresponding to an unlabeled data point $\bf\tilde{x}$. The classifier we shall implement with our quantum circuit is a distance-based classifier and is given by \begin{equation}\newcommand{\sgn}{{\rm sgn}}\newcommand{\abs}[1]{\left\lvert#1\right\rvert}\label{eq:classifier} \tilde{y} = \sgn\left(\sum_{m=0}^{M-1} y_m \left[1-\frac{1}{4M}\abs{{\bf\tilde{x}}-{\bf x}_m}^2\right]\right). \hspace{3cm} (1)\end{equation}

This is a typical $M$-nearest-neighbor model where each data point is given a weight related to the distance measure. To implement this classifier on a quantum computer we use amplitude encoding. Details are found in the previous notebooks.

Back to Contents

Theory

In the previous notebooks we encoded the data, each having two features, with a simple $R_y(\theta)$ rotation. The angle for this rotation was chosen such that it rotated the state $\ket{0}$ to the desired state $\ket{\bf{x}}$ corresponding to the data. If we want to include more features for our data points we need to generalize this rotation. Instead of a simple rotation, we now need a combination of gates such that it maps $\ket{0\ldots 0} \mapsto \ket{\bf x} = \sum_i a_i \ket{i}$, where $\ket{i}$ is the $i^{th}$ entry of the computational basis $\left\{\ket{0\ldots0},\ldots,\ket{1\ldots1}\right\}$. Again we only work with normalised data, meaning $\sum_i \lvert a_i \rvert^2=1$. The general procedure how to initialize a state to an arbitrary superposed state can be found in the article by Long and Sun. In this notebook we will consider how to implement their scheme for 2 qubits, that is up to 4 features: \begin{equation} \ket{00} \mapsto a_{00} \ket{00} + a_{01} \ket{01} + a_{10} \ket{10} + a_{11} \ket{11} \end{equation}

For the implementation we closely follow the reference and use single bit rotation gates $U(\theta)$ defined by \begin{equation} U(\theta) = \begin{pmatrix} \cos(\theta) & \sin(\theta) \\ \sin(\theta) & -\cos(\theta) \end{pmatrix} = R_y(2\theta) \cdot Z \end{equation} and controlled versions of it. Because we will only act with these gates on $\ket{0}$ we can even drop the $Z$ gate.

For two qubits the scheme consists of three steps:

  1. Apply a bit rotation $U(\alpha_1)$ on the first qubit: \begin{equation} U(\alpha_1)\ket{0}\otimes\ket{0}= \sqrt{\abs{a_{00}}^2 + \abs{a_{01}}^2} \ket{00} + \sqrt{\abs{a_{10}}^2 + \abs{a_{11}}^2} \ket{10}, \end{equation} where $\alpha_1$ is given by \begin{equation} \alpha_1 = \arctan\left(\sqrt{\frac{\abs{a_{10}}^2 + \abs{a_{11}}^2}{\abs{a_{00}}^2 + \abs{a_{01}}^2}}\right) \end{equation}
  2. Next, apply a controlled-rotation $U(\alpha_2)$ on the second qubit, with as control the first qubit being 0. Choose $\alpha_2$ such that: \begin{equation} \cos(\alpha_2) = \frac{a_{00}}{\sqrt{\abs{a_{00}}^2+\abs{a_{01}}^2}}, \hspace{1cm} \sin(\alpha_2) = \frac{a_{01}}{\sqrt{\abs{a_{00}}^2+\abs{a_{01}}^2}} \end{equation}
  3. Lastly, apply a controlled-rotation $U(\alpha_3)$ on the second qubit, with the first qubit being 1 as control. Choose $\alpha_3$ such that: \begin{equation} \cos(\alpha_3) = \frac{a_{10}}{\sqrt{\abs{a_{10}}^2+\abs{a_{11}}^2}}, \hspace{1cm} \sin(\alpha_3) = \frac{a_{11}}{\sqrt{\abs{a_{10}}^2+\abs{a_{11}}^2}} \end{equation}

The angles $\alpha_2$ and $\alpha_3$ are chosen such that the root terms cancel out, leaving us with the desired result: \begin{equation} \begin{split} \sqrt{\abs{a_{00}}^2 + \abs{a_{01}}^2}& \ket{0}\otimes U(\alpha_2)\ket{0} + \sqrt{\abs{a_{10}}^2 + \abs{a_{11}}^2} \ket{1}\otimes U(\alpha_3)\ket{0}\\ &=a_{11} \ket{00} + a_{01} \ket{01} + a_{10} \ket{10} + a_{11} \ket{11} \end{split} \end{equation}

Note: this circuit makes it also possible to encode 3 features by simply setting $a_{11}=0$.

The circuit looks something like this:

Back to Table of Contents

Algorithm for the arbitrary state preparation

In this notebook we will work with the Qiskit backend for the quantum inspire. Let us first take a closer look at the state preparation part of the circuit used to preparing an arbitrary state. The following code loads the scaled and normalised data of the Iris set containing all 4 features. Let us consider the first point in this set

We want to build the circuit such that: \begin{equation}\ket{00} \mapsto -0.3270 \ket{00} + 0.4737 \ket{01} - 0.5700 \ket{10} - 0.5864 \ket{11}\end{equation}

As always the first step is to set up a connection with the Quantum Inspire:

We can now construct a function which builds the quantum circuit as discussed above. Note that Qiskit contains the general unitary gates $u3(\theta, \phi, \lambda)$ which are related to our definition of $U(\alpha)$ by: $$U(\alpha) = u3(2\alpha, 0, \pi).$$

Unfortunately, these $u3$ gates are not yet implemented on the quantum inspire backend and thus we need to make use of regular $R_y$ rotations and CNOT gates.

We can compare these results with what we expected if we measure the state $\psi = a_{00} \ket{00} + a_{01} \ket{01} + a_{10} \ket{10} + a_{11} \ket{11}$ and obtain the result $X$

Outcome $x$ $Prob(X=x)$
$\ket{00}$ $\abs{a_{00}}^2$
$\ket{01}$ $\abs{a_{01}}^2$
$\ket{10}$ $\abs{a_{10}}^2$
$\ket{11}$ $\abs{a_{11}}^2$

In the next section we use this feature encoding schema in the distance-based classifier for 2 data points, each having 4 features. In the algorithm however, feature encoding is done in a controlled fashion, with the index qubit as control. Therefore we will also need the same circuit as above but transformed to a controlled version. This can be done by replacing every gate by a controlled equivalent.

Back to Table of Contents

Implementation of the distance-based classifier

We first consider the case with 2 data points and just 2 features, randomly chosen from the Iris data set. This so that we can later compare the results of the distance-based classifier with 2 features to the implementation with 4 features.

The above algorithm is the same implementation of the algorithm as we did in the first notebook. The following code runs the algorithm on randomly selected data from the iris set.

Below the implementation for four features is given. Note that the structure of the algorithm is similar to the case with two features. We use one ancillary qubit for the controlled encoding of the features.

The following code runs the algorithm for 2+1 random data points with 4 features each.

In the case of an infinite amount of shots the quantum inspire gives as a result the true probability distribution which coincides with the classical solution of the distance-based classifier. The following table shows the quality of the distance-based classifier depending on the amount of data points and included features. The table contains the percentage of correct predictions for random selected data from the iris set, the results are over a sample of 10.000 runs and can be reproduced using the quality_classifier method of DataPlotter class.

% correct prediction 2 features 3 features 4 features
2 data points 0.9426 0.9870 0.9940
4 data points 0.9735 0.9933 0.9986
8 data points 0.9803 0.9975 0.9998

These results show why one is not only interested in extending the amount of data points but also in including more features for data.

Conclusion and further work

In this notebook we demonstrated how to extended the distance-based classifier to be able to handle data containing up to four features. We saw that the quality of the distance-based classifier improves both by including more data points but also by including data which containing more features.

So far in this notebook series we have only looked at binary classification, the results belong either to the class with a label 0 or to the class with the label 1. For some problems one is interessted in identifying between more than two classes, for example number recognition. A possible next extention for the current classifier is to extend it to being able to classify between more than two labels. This can be done by encoding the label in multiple qubits instead of one qubit.

We have only tested the distance-based classifier on rescaled data from the iris data set, this data set is well classified by the the distance-based classifier. For other data sets this might not necessary be the case. Suppose a different data set which after scaling has different the classes lie in concentric circles, at first glance we do not expect the distance-based classifier to yield good predictions. These problems can possibly be solved by an alternative data pre-processing or by a totally different type of classifier. The task of selecting the right methods for data preprocessing and the corresponding classifier is not a task for the quantum computer but for the data analyst. It will be interessting to see different classifiers implemented on quantum computers in the near future.

Back to Table of Contents

References