diff --git a/notebooks/Block_5/Jupyter Notebook Block 5 - Object Detection and Segmentation.ipynb b/notebooks/Block_5/Jupyter Notebook Block 5 - Object Detection and Segmentation.ipynb index a18828c356dfe43d38fe4fbf9f2c017c99725a0b..6e5a31745e81b142722fcb12449da372b1df07a6 100644 --- a/notebooks/Block_5/Jupyter Notebook Block 5 - Object Detection and Segmentation.ipynb +++ b/notebooks/Block_5/Jupyter Notebook Block 5 - Object Detection and Segmentation.ipynb @@ -2369,6 +2369,245 @@ "2. [Yolo](https://machinelearningmastery.com/how-to-perform-object-detection-with-yolov3-in-keras/)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Object detection is a computer vision task that involves both localizing one or more objects within an image and classifying each object in the image.\n", + "\n", + "It is a challenging computer vision task that requires both successful object localization in order to locate and draw a bounding box around each object in an image, and object classification to predict the correct class of object that was localized.\n", + "\n", + "The “You Only Look Once,†or YOLO, family of models are a series of end-to-end deep learning models designed for fast object detection, developed by Joseph Redmon, et al. and first described in the 2015 paper titled [You Only Look Once: Unified, Real-Time Object Detection](https://arxiv.org/abs/1506.02640).\n", + "\n", + "The approach involves a single deep convolutional neural network (originally a version of GoogLeNet, later updated and called DarkNet based on VGG) that splits the input into a grid of cells and each cell directly predicts a bounding box and object classification. The result is a large number of candidate bounding boxes that are consolidated into a final prediction by a post-processing step.\n", + "\n", + "There are three main variations of the approach, at the time of writing; they are YOLOv1, YOLOv2, and YOLOv3. The first version proposed the general architecture, whereas the second version refined the design and made use of predefined anchor boxes to improve bounding box proposal, and version three further refined the model architecture and training process.\n", + "\n", + "Although the accuracy of the models is close but not as good as Region-Based Convolutional Neural Networks (R-CNNs), they are popular for object detection because of their detection speed, often demonstrated in real-time on video or with camera feed input.\n", + "\n", + "A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Object Detection With YOLOv3\n", + "\n", + "The keras-yolo3 project provides a lot of capability for using [YOLOv3 models](https://github.com/experiencor/keras-yolo3), including object detection, transfer learning, and training new models from scratch.\n", + "\n", + "In this section, we will use a pre-trained model to perform object detection on an unseen photograph. This capability is available in a single Python file in the repository called [yolo3_one_file_to_detect_them_all.py](https://raw.githubusercontent.com/experiencor/keras-yolo3/master/yolo3_one_file_to_detect_them_all.py) that has about 435 lines. This script is, in fact, a program that will use pre-trained weights to prepare a model and use that model to perform object detection and output a model. It also depends upon OpenCV.\n", + "\n", + "Instead of using this program directly, we will reuse elements from this program and develop our own scripts to first prepare and save a Keras YOLOv3 model, and then load the model to make a prediction for a new photograph." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "ename": "FileNotFoundError", + "evalue": "[Errno 2] No such file or directory: 'yolov3.weights'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m<ipython-input-1-2b4409768ba2>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 159\u001b[0m \u001b[0mmodel\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mmake_yolov3_model\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 160\u001b[0m \u001b[0;31m# load the model weights\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 161\u001b[0;31m \u001b[0mweight_reader\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mWeightReader\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'yolov3.weights'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 162\u001b[0m \u001b[0;31m# set the model weights into the model\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 163\u001b[0m \u001b[0mweight_reader\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mload_weights\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmodel\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m<ipython-input-1-2b4409768ba2>\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, weight_file)\u001b[0m\n\u001b[1;32m 109\u001b[0m \u001b[0;32mclass\u001b[0m \u001b[0mWeightReader\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 110\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__init__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mweight_file\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 111\u001b[0;31m \u001b[0;32mwith\u001b[0m \u001b[0mopen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mweight_file\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'rb'\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mw_f\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 112\u001b[0m \u001b[0mmajor\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mstruct\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munpack\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'i'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mw_f\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m4\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 113\u001b[0m \u001b[0mminor\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mstruct\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munpack\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'i'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mw_f\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m4\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: 'yolov3.weights'" + ] + } + ], + "source": [ + "# create a YOLOv3 Keras model and save it to file\n", + "# based on https://github.com/experiencor/keras-yolo3\n", + "import struct\n", + "import numpy as np\n", + "from keras.layers import Conv2D\n", + "from keras.layers import Input\n", + "from keras.layers import BatchNormalization\n", + "from keras.layers import LeakyReLU\n", + "from keras.layers import ZeroPadding2D\n", + "from keras.layers import UpSampling2D\n", + "from keras.layers.merge import add, concatenate\n", + "from keras.models import Model\n", + "\n", + "def _conv_block(inp, convs, skip=True):\n", + " x = inp\n", + " count = 0\n", + " for conv in convs:\n", + " if count == (len(convs) - 2) and skip:\n", + " skip_connection = x\n", + " count += 1\n", + " if conv['stride'] > 1: x = ZeroPadding2D(((1,0),(1,0)))(x) # peculiar padding as darknet prefer left and top\n", + " x = Conv2D(conv['filter'],\n", + " conv['kernel'],\n", + " strides=conv['stride'],\n", + " padding='valid' if conv['stride'] > 1 else 'same', # peculiar padding as darknet prefer left and top\n", + " name='conv_' + str(conv['layer_idx']),\n", + " use_bias=False if conv['bnorm'] else True)(x)\n", + " if conv['bnorm']: x = BatchNormalization(epsilon=0.001, name='bnorm_' + str(conv['layer_idx']))(x)\n", + " if conv['leaky']: x = LeakyReLU(alpha=0.1, name='leaky_' + str(conv['layer_idx']))(x)\n", + " return add([skip_connection, x]) if skip else x\n", + "\n", + "def make_yolov3_model():\n", + " input_image = Input(shape=(None, None, 3))\n", + " # Layer 0 => 4\n", + " x = _conv_block(input_image, [{'filter': 32, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 0},\n", + " {'filter': 64, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 1},\n", + " {'filter': 32, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 2},\n", + " {'filter': 64, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 3}])\n", + " # Layer 5 => 8\n", + " x = _conv_block(x, [{'filter': 128, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 5},\n", + " {'filter': 64, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 6},\n", + " {'filter': 128, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 7}])\n", + " # Layer 9 => 11\n", + " x = _conv_block(x, [{'filter': 64, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 9},\n", + " {'filter': 128, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 10}])\n", + " # Layer 12 => 15\n", + " x = _conv_block(x, [{'filter': 256, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 12},\n", + " {'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 13},\n", + " {'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 14}])\n", + " # Layer 16 => 36\n", + " for i in range(7):\n", + " x = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 16+i*3},\n", + " {'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 17+i*3}])\n", + " skip_36 = x\n", + " # Layer 37 => 40\n", + " x = _conv_block(x, [{'filter': 512, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 37},\n", + " {'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 38},\n", + " {'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 39}])\n", + " # Layer 41 => 61\n", + " for i in range(7):\n", + " x = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 41+i*3},\n", + " {'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 42+i*3}])\n", + " skip_61 = x\n", + " # Layer 62 => 65\n", + " x = _conv_block(x, [{'filter': 1024, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 62},\n", + " {'filter': 512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 63},\n", + " {'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 64}])\n", + " # Layer 66 => 74\n", + " for i in range(3):\n", + " x = _conv_block(x, [{'filter': 512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 66+i*3},\n", + " {'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 67+i*3}])\n", + " # Layer 75 => 79\n", + " x = _conv_block(x, [{'filter': 512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 75},\n", + " {'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 76},\n", + " {'filter': 512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 77},\n", + " {'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 78},\n", + " {'filter': 512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 79}], skip=False)\n", + " # Layer 80 => 82\n", + " yolo_82 = _conv_block(x, [{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 80},\n", + " {'filter': 255, 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 81}], skip=False)\n", + " # Layer 83 => 86\n", + " x = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 84}], skip=False)\n", + " x = UpSampling2D(2)(x)\n", + " x = concatenate([x, skip_61])\n", + " # Layer 87 => 91\n", + " x = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 87},\n", + " {'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 88},\n", + " {'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 89},\n", + " {'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 90},\n", + " {'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 91}], skip=False)\n", + " # Layer 92 => 94\n", + " yolo_94 = _conv_block(x, [{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 92},\n", + " {'filter': 255, 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 93}], skip=False)\n", + " # Layer 95 => 98\n", + " x = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 96}], skip=False)\n", + " x = UpSampling2D(2)(x)\n", + " x = concatenate([x, skip_36])\n", + " # Layer 99 => 106\n", + " yolo_106 = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 99},\n", + " {'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 100},\n", + " {'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 101},\n", + " {'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 102},\n", + " {'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 103},\n", + " {'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 104},\n", + " {'filter': 255, 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 105}], skip=False)\n", + " model = Model(input_image, [yolo_82, yolo_94, yolo_106])\n", + " return model\n", + "\n", + "class WeightReader:\n", + " def __init__(self, weight_file):\n", + " with open(weight_file, 'rb') as w_f:\n", + " major,\t= struct.unpack('i', w_f.read(4))\n", + " minor,\t= struct.unpack('i', w_f.read(4))\n", + " revision, = struct.unpack('i', w_f.read(4))\n", + " if (major*10 + minor) >= 2 and major < 1000 and minor < 1000:\n", + " w_f.read(8)\n", + " else:\n", + " w_f.read(4)\n", + " transpose = (major > 1000) or (minor > 1000)\n", + " binary = w_f.read()\n", + " self.offset = 0\n", + " self.all_weights = np.frombuffer(binary, dtype='float32')\n", + "\n", + " def read_bytes(self, size):\n", + " self.offset = self.offset + size\n", + " return self.all_weights[self.offset-size:self.offset]\n", + "\n", + " def load_weights(self, model):\n", + " for i in range(106):\n", + " try:\n", + " conv_layer = model.get_layer('conv_' + str(i))\n", + " print(\"loading weights of convolution #\" + str(i))\n", + " if i not in [81, 93, 105]:\n", + " norm_layer = model.get_layer('bnorm_' + str(i))\n", + " size = np.prod(norm_layer.get_weights()[0].shape)\n", + " beta = self.read_bytes(size) # bias\n", + " gamma = self.read_bytes(size) # scale\n", + " mean = self.read_bytes(size) # mean\n", + " var = self.read_bytes(size) # variance\n", + " weights = norm_layer.set_weights([gamma, beta, mean, var])\n", + " if len(conv_layer.get_weights()) > 1:\n", + " bias = self.read_bytes(np.prod(conv_layer.get_weights()[1].shape))\n", + " kernel = self.read_bytes(np.prod(conv_layer.get_weights()[0].shape))\n", + " kernel = kernel.reshape(list(reversed(conv_layer.get_weights()[0].shape)))\n", + " kernel = kernel.transpose([2,3,1,0])\n", + " conv_layer.set_weights([kernel, bias])\n", + " else:\n", + " kernel = self.read_bytes(np.prod(conv_layer.get_weights()[0].shape))\n", + " kernel = kernel.reshape(list(reversed(conv_layer.get_weights()[0].shape)))\n", + " kernel = kernel.transpose([2,3,1,0])\n", + " conv_layer.set_weights([kernel])\n", + " except ValueError:\n", + " print(\"no convolution #\" + str(i))\n", + "\n", + " def reset(self):\n", + " self.offset = 0\n", + "\n", + "# define the model\n", + "model = make_yolov3_model()\n", + "# load the model weights\n", + "weight_reader = WeightReader('yolov3.weights')\n", + "# set the model weights into the model\n", + "weight_reader.load_weights(model)\n", + "# save the model to file\n", + "model.save('model.h5')" + ] + }, { "cell_type": "markdown", "metadata": {