" Building wheel for promise (setup.py) ... \u001b[?25ldone\n",
"\u001b[33mWARNING: You are using pip version 20.2.4; however, version 22.2.2 is available.\n",
"\u001b[?25h Created wheel for promise: filename=promise-2.3-py3-none-any.whl size=21495 sha256=9675d5917247a843670272d14b85c27811d276d4f06faeb1bc0a545e304f40d5\n",
" Stored in directory: /home/jovyan/.cache/pip/wheels/29/93/c6/762e359f8cb6a5b69c72235d798804cae523bbe41c2aa8333d\n",
"\u001b[33mWARNING: You are using pip version 20.2.4; however, version 22.0.3 is available.\n",
"You should consider upgrading via the '/opt/conda/bin/python3 -m pip install --upgrade pip' command.\u001b[0m\n"
"You should consider upgrading via the '/opt/conda/bin/python3 -m pip install --upgrade pip' command.\u001b[0m\n"
]
]
}
}
...
@@ -207,8 +192,8 @@
...
@@ -207,8 +192,8 @@
"name": "stdout",
"name": "stdout",
"output_type": "stream",
"output_type": "stream",
"text": [
"text": [
"\u001b[1mDownloading and preparing dataset 11.06 MiB (download: 11.06 MiB, generated: 21.00 MiB, total: 32.06 MiB) to /home/jovyan/tensorflow_datasets/mnist/3.0.1...\u001b[0m\n",
"\u001b[1mDownloading and preparing dataset 11.06 MiB (download: 11.06 MiB, generated: 21.00 MiB, total: 32.06 MiB) to ~/tensorflow_datasets/mnist/3.0.1...\u001b[0m\n",
"\u001b[1mDataset mnist downloaded and prepared to /home/jovyan/tensorflow_datasets/mnist/3.0.1. Subsequent calls will reuse this data.\u001b[0m\n"
"\u001b[1mDataset mnist downloaded and prepared to ~/tensorflow_datasets/mnist/3.0.1. Subsequent calls will reuse this data.\u001b[0m\n"
]
]
}
}
],
],
...
@@ -396,8 +381,8 @@
...
@@ -396,8 +381,8 @@
"name": "stdout",
"name": "stdout",
"output_type": "stream",
"output_type": "stream",
"text": [
"text": [
"\u001b[1mDownloading and preparing dataset 29.45 MiB (download: 29.45 MiB, generated: 36.42 MiB, total: 65.87 MiB) to /home/jovyan/tensorflow_datasets/fashion_mnist/3.0.1...\u001b[0m\n",
"\u001b[1mDownloading and preparing dataset 29.45 MiB (download: 29.45 MiB, generated: 36.42 MiB, total: 65.87 MiB) to ~/tensorflow_datasets/fashion_mnist/3.0.1...\u001b[0m\n",
"\u001b[1mDataset fashion_mnist downloaded and prepared to /home/jovyan/tensorflow_datasets/fashion_mnist/3.0.1. Subsequent calls will reuse this data.\u001b[0m\n"
"\u001b[1mDataset fashion_mnist downloaded and prepared to ~/tensorflow_datasets/fashion_mnist/3.0.1. Subsequent calls will reuse this data.\u001b[0m\n"
]
]
}
}
],
],
...
@@ -1433,7 +1418,7 @@
...
@@ -1433,7 +1418,7 @@
"name": "stdout",
"name": "stdout",
"output_type": "stream",
"output_type": "stream",
"text": [
"text": [
"Got 192 / 500 correct => accuracy: 0.384000\n"
"Got 394 / 500 correct => accuracy: 0.788000\n"
]
]
}
}
],
],
...
...
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
# Exercise 1 - K-Nearest Neighbor Classifier for MNIST
# Exercise 1 - K-Nearest Neighbor Classifier for MNIST
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
In this exercise, we'll apply KNN Classifiers to the MNIST dataset. The aim of the exercise is to get acquainted with the MNIST dataset.
In this exercise, we'll apply KNN Classifiers to the MNIST dataset. The aim of the exercise is to get acquainted with the MNIST dataset.
This guide uses [tf.keras](https://www.tensorflow.org/guide/keras), a high-level API to build and train models in TensorFlow.
This guide uses [tf.keras](https://www.tensorflow.org/guide/keras), a high-level API to build and train models in TensorFlow.
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
## Install and import dependencies
## Install and import dependencies
We'll need [TensorFlow Datasets](https://www.tensorflow.org/datasets/), an API that simplifies downloading and accessing datasets, and provides several sample datasets to work with. We're also using a few helper libraries.
We'll need [TensorFlow Datasets](https://www.tensorflow.org/datasets/), an API that simplifies downloading and accessing datasets, and provides several sample datasets to work with. We're also using a few helper libraries.
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
!pipinstall-Utensorflow_datasets
!pipinstall-Utensorflow_datasets
```
```
%% Output
%% Output
Collecting tensorflow_datasets
Requirement already up-to-date: tensorflow_datasets in /opt/conda/lib/python3.7/site-packages (4.6.0)
Building wheel for promise (setup.py) ... [?25ldone
[33mWARNING: You are using pip version 20.2.4; however, version 22.2.2 is available.
[?25h Created wheel for promise: filename=promise-2.3-py3-none-any.whl size=21495 sha256=9675d5917247a843670272d14b85c27811d276d4f06faeb1bc0a545e304f40d5
Stored in directory: /home/jovyan/.cache/pip/wheels/29/93/c6/762e359f8cb6a5b69c72235d798804cae523bbe41c2aa8333d
This guide uses the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset—often used as the "Hello, World" of machine learning programs for computer vision. The MNIST dataset contains images of handwritten digits (0, 1, 2, etc)
This guide uses the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset—often used as the "Hello, World" of machine learning programs for computer vision. The MNIST dataset contains images of handwritten digits (0, 1, 2, etc)
We will use 60,000 images to train the network and 10,000 images to evaluate how accurately the network learned to classify images. You can access the MNIST directly from TensorFlow, using the [Datasets](https://www.tensorflow.org/datasets) API:
We will use 60,000 images to train the network and 10,000 images to evaluate how accurately the network learned to classify images. You can access the MNIST directly from TensorFlow, using the [Datasets](https://www.tensorflow.org/datasets) API:
[1mDownloading and preparing dataset 11.06 MiB (download: 11.06 MiB, generated: 21.00 MiB, total: 32.06 MiB) to /home/jovyan/tensorflow_datasets/mnist/3.0.1...[0m
[1mDownloading and preparing dataset 11.06 MiB (download: 11.06 MiB, generated: 21.00 MiB, total: 32.06 MiB) to ~/tensorflow_datasets/mnist/3.0.1...[0m
[1mDataset mnist downloaded and prepared to /home/jovyan/tensorflow_datasets/mnist/3.0.1. Subsequent calls will reuse this data.[0m
[1mDataset mnist downloaded and prepared to ~/tensorflow_datasets/mnist/3.0.1. Subsequent calls will reuse this data.[0m
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
Loading the dataset returns metadata as well as a *training dataset* and *test dataset*.
Loading the dataset returns metadata as well as a *training dataset* and *test dataset*.
* The model is trained using `train_dataset`.
* The model is trained using `train_dataset`.
* The model is tested against `test_dataset`.
* The model is tested against `test_dataset`.
The images are 28 $\times$ 28 arrays, with pixel values in the range `[0, 255]`. The *labels* are an array of integers, in the range `[0, 9]`. These correspond to the handwritten numbers.
The images are 28 $\times$ 28 arrays, with pixel values in the range `[0, 255]`. The *labels* are an array of integers, in the range `[0, 9]`. These correspond to the handwritten numbers.
Each image is mapped to a single label. Since the *class names* are not included with the dataset, store them here to use later when plotting the images:
Each image is mapped to a single label. Since the *class names* are not included with the dataset, store them here to use later when plotting the images:
Let's explore the format of the dataset before training the model. The following shows there are 60,000 images in the training set, and 10000 images in the test set:
Let's explore the format of the dataset before training the model. The following shows there are 60,000 images in the training set, and 10000 images in the test set:
print("Number of training examples: {}".format(num_train_examples))
print("Number of training examples: {}".format(num_train_examples))
print("Number of test examples: {}".format(num_test_examples))
print("Number of test examples: {}".format(num_test_examples))
```
```
%% Output
%% Output
Number of training examples: 60000
Number of training examples: 60000
Number of test examples: 10000
Number of test examples: 10000
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
Let's plot an image to see what it looks like.
Let's plot an image to see what it looks like.
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
# Take a single image, and remove the color dimension by reshaping
# Take a single image, and remove the color dimension by reshaping
forimage,labelintest_dataset.take(1):
forimage,labelintest_dataset.take(1):
break
break
image=image.numpy().reshape((28,28))
image=image.numpy().reshape((28,28))
# Plot the image - voila an example of a handwritten digit
# Plot the image - voila an example of a handwritten digit
plt.figure()
plt.figure()
plt.imshow(image,cmap=plt.cm.binary)
plt.imshow(image,cmap=plt.cm.binary)
plt.colorbar()
plt.colorbar()
plt.grid(False)
plt.grid(False)
plt.show()
plt.show()
```
```
%% Output
%% Output
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
Display the first 25 images from the *test set* and display the class name below each image. Verify that the data is in the correct format and we're ready to build and train the network.
Display the first 25 images from the *test set* and display the class name below each image. Verify that the data is in the correct format and we're ready to build and train the network.
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
plt.figure(figsize=(10,10))
plt.figure(figsize=(10,10))
i=0
i=0
for (image,label)intest_dataset.take(25):
for (image,label)intest_dataset.take(25):
image=image.numpy().reshape((28,28))
image=image.numpy().reshape((28,28))
plt.subplot(5,5,i+1)
plt.subplot(5,5,i+1)
plt.xticks([])
plt.xticks([])
plt.yticks([])
plt.yticks([])
plt.grid(False)
plt.grid(False)
plt.imshow(image,cmap=plt.cm.binary)
plt.imshow(image,cmap=plt.cm.binary)
plt.xlabel(class_names[label])
plt.xlabel(class_names[label])
i+=1
i+=1
plt.show()
plt.show()
```
```
%% Output
%% Output
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
## Import the Fashion MNIST dataset
## Import the Fashion MNIST dataset
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
If numbers are not your thing then use the [Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist) dataset, which contains 70,000 grayscale images in 10 categories. The images show individual articles of clothing at low resolution (28 $\times$ 28 pixels), as seen here:
If numbers are not your thing then use the [Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist) dataset, which contains 70,000 grayscale images in 10 categories. The images show individual articles of clothing at low resolution (28 $\times$ 28 pixels), as seen here:
<b>Figure 1.</b><ahref="https://github.com/zalandoresearch/fashion-mnist">Fashion-MNIST samples</a> (by Zalando, MIT License).<br/>
<b>Figure 1.</b><ahref="https://github.com/zalandoresearch/fashion-mnist">Fashion-MNIST samples</a> (by Zalando, MIT License).<br/>
</td></tr>
</td></tr>
</table>
</table>
You may use Fashion MNIST for variety, and because it's a slightly more challenging problem than regular MNIST. Both datasets are relatively small and are used to verify that an algorithm works as expected. They're good starting points to test and debug code.
You may use Fashion MNIST for variety, and because it's a slightly more challenging problem than regular MNIST. Both datasets are relatively small and are used to verify that an algorithm works as expected. They're good starting points to test and debug code.
We will use 60,000 images to train the network and 10,000 images to evaluate how accurately the network learned to classify images. You can access the Fashion MNIST directly from TensorFlow, using the [Datasets](https://www.tensorflow.org/datasets) API:
We will use 60,000 images to train the network and 10,000 images to evaluate how accurately the network learned to classify images. You can access the Fashion MNIST directly from TensorFlow, using the [Datasets](https://www.tensorflow.org/datasets) API:
[1mDownloading and preparing dataset 29.45 MiB (download: 29.45 MiB, generated: 36.42 MiB, total: 65.87 MiB) to /home/jovyan/tensorflow_datasets/fashion_mnist/3.0.1...[0m
[1mDownloading and preparing dataset 29.45 MiB (download: 29.45 MiB, generated: 36.42 MiB, total: 65.87 MiB) to ~/tensorflow_datasets/fashion_mnist/3.0.1...[0m
[1mDataset fashion_mnist downloaded and prepared to /home/jovyan/tensorflow_datasets/fashion_mnist/3.0.1. Subsequent calls will reuse this data.[0m
[1mDataset fashion_mnist downloaded and prepared to ~/tensorflow_datasets/fashion_mnist/3.0.1. Subsequent calls will reuse this data.[0m
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
Loading the dataset returns metadata as well as a *training dataset* and *test dataset*.
Loading the dataset returns metadata as well as a *training dataset* and *test dataset*.
* The model is trained using `train_dataset`.
* The model is trained using `train_dataset`.
* The model is tested against `test_dataset`.
* The model is tested against `test_dataset`.
The images are 28 $\times$ 28 arrays, with pixel values in the range `[0, 255]`. The *labels* are an array of integers, in the range `[0, 9]`. These correspond to the *class* of clothing the image represents:
The images are 28 $\times$ 28 arrays, with pixel values in the range `[0, 255]`. The *labels* are an array of integers, in the range `[0, 9]`. These correspond to the *class* of clothing the image represents:
<table>
<table>
<tr>
<tr>
<th>Label</th>
<th>Label</th>
<th>Class</th>
<th>Class</th>
</tr>
</tr>
<tr>
<tr>
<td>0</td>
<td>0</td>
<td>T-shirt/top</td>
<td>T-shirt/top</td>
</tr>
</tr>
<tr>
<tr>
<td>1</td>
<td>1</td>
<td>Trouser</td>
<td>Trouser</td>
</tr>
</tr>
<tr>
<tr>
<td>2</td>
<td>2</td>
<td>Pullover</td>
<td>Pullover</td>
</tr>
</tr>
<tr>
<tr>
<td>3</td>
<td>3</td>
<td>Dress</td>
<td>Dress</td>
</tr>
</tr>
<tr>
<tr>
<td>4</td>
<td>4</td>
<td>Coat</td>
<td>Coat</td>
</tr>
</tr>
<tr>
<tr>
<td>5</td>
<td>5</td>
<td>Sandal</td>
<td>Sandal</td>
</tr>
</tr>
<tr>
<tr>
<td>6</td>
<td>6</td>
<td>Shirt</td>
<td>Shirt</td>
</tr>
</tr>
<tr>
<tr>
<td>7</td>
<td>7</td>
<td>Sneaker</td>
<td>Sneaker</td>
</tr>
</tr>
<tr>
<tr>
<td>8</td>
<td>8</td>
<td>Bag</td>
<td>Bag</td>
</tr>
</tr>
<tr>
<tr>
<td>9</td>
<td>9</td>
<td>Ankle boot</td>
<td>Ankle boot</td>
</tr>
</tr>
</table>
</table>
Each image is mapped to a single label. Since the *class names* are not included with the dataset, store them here to use later when plotting the images:
Each image is mapped to a single label. Since the *class names* are not included with the dataset, store them here to use later when plotting the images:
Let's explore the format of the dataset before training the model. The following shows there are 60,000 images in the training set, and 10000 images in the test set:
Let's explore the format of the dataset before training the model. The following shows there are 60,000 images in the training set, and 10000 images in the test set:
print("Number of training examples: {}".format(num_train_examples))
print("Number of training examples: {}".format(num_train_examples))
print("Number of test examples: {}".format(num_test_examples))
print("Number of test examples: {}".format(num_test_examples))
```
```
%% Output
%% Output
Number of training examples: 60000
Number of training examples: 60000
Number of test examples: 10000
Number of test examples: 10000
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
Let's plot an image to see what it looks like.
Let's plot an image to see what it looks like.
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
# Take a single image, and remove the color dimension by reshaping
# Take a single image, and remove the color dimension by reshaping
forimage,labelintest_dataset.take(1):
forimage,labelintest_dataset.take(1):
break
break
image=image.numpy().reshape((28,28))
image=image.numpy().reshape((28,28))
# Plot the image - voila a piece of fashion clothing
# Plot the image - voila a piece of fashion clothing
plt.figure()
plt.figure()
plt.imshow(image,cmap=plt.cm.binary)
plt.imshow(image,cmap=plt.cm.binary)
plt.colorbar()
plt.colorbar()
plt.grid(False)
plt.grid(False)
plt.show()
plt.show()
```
```
%% Output
%% Output
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
Display the first 25 images from the *test set* and display the class name below each image. Verify that the data is in the correct format and we're ready to build and train the network.
Display the first 25 images from the *test set* and display the class name below each image. Verify that the data is in the correct format and we're ready to build and train the network.
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
plt.figure(figsize=(10,10))
plt.figure(figsize=(10,10))
i=0
i=0
for (image,label)intest_dataset.take(25):
for (image,label)intest_dataset.take(25):
image=image.numpy().reshape((28,28))
image=image.numpy().reshape((28,28))
plt.subplot(5,5,i+1)
plt.subplot(5,5,i+1)
plt.xticks([])
plt.xticks([])
plt.yticks([])
plt.yticks([])
plt.grid(False)
plt.grid(False)
plt.imshow(image,cmap=plt.cm.binary)
plt.imshow(image,cmap=plt.cm.binary)
plt.xlabel(class_names[label])
plt.xlabel(class_names[label])
i+=1
i+=1
plt.show()
plt.show()
```
```
%% Output
%% Output
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
Decide whether you want to work with the traditional or Fashin MNIST dataset, then extract 5000 training examples and
Decide whether you want to work with the traditional or Fashin MNIST dataset, then extract 5000 training examples and
print("Shape of test data labels : ",y_test.shape)
print("Shape of test data labels : ",y_test.shape)
```
```
%% Output
%% Output
Shape of image test data : (500, 784)
Shape of image test data : (500, 784)
Shape of test data labels : (500,)
Shape of test data labels : (500,)
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
# Exercises
# Exercises
1. Apply Nearest Neighbour with L1 distance to this subset of the dataset and determine the accuracy on the
1. Apply Nearest Neighbour with L1 distance to this subset of the dataset and determine the accuracy on the
test dataset and plot the confusion matrix.
test dataset and plot the confusion matrix.
2. Apply K-Nearest Neighbour with $k=5$ and L2 distance to this subset of the dataset and determine the accuracy on the test dataset and plot the confusion matrix.
2. Apply K-Nearest Neighbour with $k=5$ and L2 distance to this subset of the dataset and determine the accuracy on the test dataset and plot the confusion matrix.
3. Determine by means of 5-fold cross-validation the best value of $k$ in the set $\{1,4,5,10,12,18,20\}$.
3. Determine by means of 5-fold cross-validation the best value of $k$ in the set $\{1,4,5,10,12,18,20\}$.
4. Scale the pixel values to the interval $[0, 1]$ and compute the test accuracy for the best value of k determined in exercise 3.
4. Scale the pixel values to the interval $[0, 1]$ and compute the test accuracy for the best value of k determined in exercise 3.
5. Implement the Cosine distance measure in the k-nearest neighbour classifier. The cosine distance between two vectors $a$ and $b$ can be computed by
5. Implement the Cosine distance measure in the k-nearest neighbour classifier. The cosine distance between two vectors $a$ and $b$ can be computed by
```python
```python
fromnumpy.linalgimportnorm
fromnumpy.linalgimportnorm
fromnumpyimportdot
fromnumpyimportdot
dists[a,b]=1-dot(a,b)/(norm(a)*norm(b))
dists[a,b]=1-dot(a,b)/(norm(a)*norm(b))
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
# Solution 1
# Solution 1
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
classKNearestNeighbor():
classKNearestNeighbor():
""" a kNN classifier with L2 distance """
""" a kNN classifier with L2 distance """
def__init__(self):
def__init__(self):
pass
pass
deftrain(self,X,y):
deftrain(self,X,y):
"""
"""
Train the classifier. For k-nearest neighbors this is just
Train the classifier. For k-nearest neighbors this is just
memorizing the training data.
memorizing the training data.
Inputs:
Inputs:
- X: A numpy array of shape (num_train, D) containing the training data
- X: A numpy array of shape (num_train, D) containing the training data
consisting of num_train samples each of dimension D.
consisting of num_train samples each of dimension D.
- y: A numpy array of shape (N,) containing the training labels, where
- y: A numpy array of shape (N,) containing the training labels, where
y[i] is the label for X[i].
y[i] is the label for X[i].
"""
"""
self.X_train=X.astype('float')
self.X_train=X.astype('float')
self.y_train=y
self.y_train=y
defpredict(self,X,k=1,num_loops=0):
defpredict(self,X,k=1,num_loops=0):
"""
"""
Predict labels for test data using this classifier.
Predict labels for test data using this classifier.
Inputs:
Inputs:
- X: A numpy array of shape (num_test, D) containing test data consisting
- X: A numpy array of shape (num_test, D) containing test data consisting
of num_test samples each of dimension D.
of num_test samples each of dimension D.
- k: The number of nearest neighbors that vote for the predicted labels.
- k: The number of nearest neighbors that vote for the predicted labels.
- num_loops: Determines which implementation to use to compute distances
- num_loops: Determines which implementation to use to compute distances
between training points and testing points.
between training points and testing points.
Returns:
Returns:
- y: A numpy array of shape (num_test,) containing predicted labels for the
- y: A numpy array of shape (num_test,) containing predicted labels for the
test data, where y[i] is the predicted label for the test point X[i].
test data, where y[i] is the predicted label for the test point X[i].
"""
"""
ifnum_loops==0:
ifnum_loops==0:
dists=self.compute_distances_no_loops(X)
dists=self.compute_distances_no_loops(X)
elifnum_loops==1:
elifnum_loops==1:
dists=self.compute_distances_one_loop(X)
dists=self.compute_distances_one_loop(X)
elifnum_loops==2:
elifnum_loops==2:
dists=self.compute_distances_two_loops(X)
dists=self.compute_distances_two_loops(X)
else:
else:
raiseValueError('Invalid value %d for num_loops'%num_loops)
raiseValueError('Invalid value %d for num_loops'%num_loops)
returnself.predict_labels(dists,k=k)
returnself.predict_labels(dists,k=k)
defcompute_distances_two_loops(self,X):
defcompute_distances_two_loops(self,X):
"""
"""
Compute the distance between each test point in X and each
Compute the distance between each test point in X and each
training point in self.X_train using a nested loop over both
training point in self.X_train using a nested loop over both
the training data and the test data.
the training data and the test data.
Inputs:
Inputs:
- X: A numpy array of shape (num_test, D) containing test data.
- X: A numpy array of shape (num_test, D) containing test data.
Returns:
Returns:
- dists: A numpy array of shape (num_test, num_train) where
- dists: A numpy array of shape (num_test, num_train) where
dists[i, j] is the Euclidean distance between the ith test
dists[i, j] is the Euclidean distance between the ith test