Newer
Older
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "dWyPGNkCGhIX"
},
"source": [
"# Part I : Create Your Own Dataset and Train it with ConvNets\n",
"\n",
"In this part of the notebook, you will set up your own dataset for image classification. Please specify \n",
"under `queries` the image categories you are interested in. Under `limit` specify the number of images \n",
"you want to download for each image category. \n",
"\n",
"You do not need to understand the class `simple_image_download`, just execute the cell after you have specified \n",
"the download folder.\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {},
"colab_type": "code",
"id": "8rckz3ZuGhIc",
"outputId": "6f615f06-759a-4eea-839e-658155df8d36"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Found 2 image links\n",
"Saved 2 images\n",
"Found 2 image links\n",
"Saved 2 images\n",
"Found 2 image links\n",
"Saved 2 images\n",
"Found 2 image links\n",
"Saved 2 images\n",
"Found 2 image links\n",
"Saved 2 images\n",
"Found 2 image links\n",
"Saved 2 images\n",
"Found 2 image links\n",
"Saved 2 images\n",
"Found 2 image links\n",
"ERROR - Could not save https://upload.wikimedia.org/wikipedia/commons/5/59/Marion_Cotillard_at_2019_Cannes.jpg - cannot identify image file <_io.BytesIO object at 0x7f1a0b4d6d70>\n",
"Saved 1 images\n"
}
],
"from selenium import webdriver\n",
"from selenium.webdriver.firefox.options import Options\n",
"from Image_crawling import Image_crawling\n",
"queries = [\"brad pitt\",\"johnny depp\", \"leonardo dicaprio\", \"robert de niro\", \"angelina jolie\", \"sandra bullock\", \"catherine deneuve\", \"marion cotillard\"]\n",
"#queries = [\"Bart Simpson\",\"Homer Simpson\"]\n",
"download_folder = \"./brandnew_images/train/\"\n",
"waittime = 0.1 # Time to wait between actions, depends on the number of pictures you want to crawl. More pictures means you need to wait longer for them to load. \n",
"# Set options\n",
"options = webdriver.FirefoxOptions()\n",
"options.add_argument('--headless')\n",
"# Create Driver\n",
"driver = webdriver.Firefox(options=options, executable_path=\"/usr/bin/geckodriver\")\n",
"# create instance of crawler\n",
"image_crawling = Image_crawling(driver, waittime=waittime)\n",
"for query in queries:\n",
" # Craws image urls:\n",
" image_urls = image_crawling.fetch_image_urls(query, limit)\n",
" \n",
" # download images\n",
" image_crawling.download_image(download_folder + query)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "CRHl9UX6GhIs"
},
"source": [
"Please check carefully the downloaded images, there may be a lot of garbage! You definitely need to \n",
"clean the data.\n",
"\n",
"In the following, you will apply data augmentation to your data set."
]
},
{
"cell_type": "code",
"metadata": {
"colab": {},
"colab_type": "code",
"id": "3SX21FtcGhIu"
},
"outputs": [],
"source": [
"# General imports\n",
"import tensorflow as tf\n",
"tf.compat.v1.enable_eager_execution(\n",
" config=None, device_policy=None, execution_mode=None\n",
")\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"# Shortcuts to keras if (however from tensorflow)\n",
"from tensorflow.keras.preprocessing.image import ImageDataGenerator\n",
"from tensorflow.keras.models import Sequential\n",
"from tensorflow.keras.layers import Conv2D, MaxPooling2D\n",
"from tensorflow.keras.layers import Activation, Dropout, Flatten, Dense\n",
"from tensorflow.keras.callbacks import TensorBoard \n",
"\n",
"# Shortcut for displaying images\n",
"def plot_img(img):\n",
" plt.imshow(img, cmap='gray')\n",
" plt.axis(\"off\")\n",
" plt.show()\n",
" \n",
"# The target image size can be fixed here (quadratic)\n",
"# the ImageDataGenerator() automatically scales the images accordingly (aspect ratio is changed)\n",
"image_size = 150"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {},
"colab_type": "code",
"id": "rN_Mp1rmGhI1",
"outputId": "6417b1f9-e7d4-4d56-a213-191f9d17524a"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Found 480 images belonging to 8 classes.\n"
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
" 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)"
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# These are the class names; this defines the ordering of the classes\n",
"class_names = [\"brad pitt\", \"johnny depp\", \"leonardo dicaprio\", \"robert de niro\",\n",
" \"angelina jolie\", \"sandra bullock\", \"catherine deneuve\", \"marion cotillard\"]\n",
"\n",
"\n",
"# Class ImageDataGenerator() returns an iterator holding one batch of images\n",
"# the constructor takes arguments defining the different image transformations\n",
"# for augmentation purposes (rotation, x-/y-shift, intensity scaling - here 1./255 \n",
"# to scale range to [0, 1], shear, zoom, flip, ... )\n",
"train_datagen = ImageDataGenerator(\n",
" rotation_range=10,\n",
" width_shift_range=0.2,\n",
" height_shift_range=0.2,\n",
" rescale=1./255,\n",
" shear_range=0.2,\n",
" zoom_range=0.2,\n",
" horizontal_flip=True,\n",
" fill_mode='nearest')\n",
"\n",
"\n",
"dir_iter = train_datagen.flow_from_directory('./train/', \n",
" target_size=(image_size, image_size),\n",
" classes=class_names,\n",
" batch_size=25, class_mode='sparse', shuffle=False)\n",
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
"dir_iter[0][1]"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "V2fYccc8GhJF"
},
"source": [
"Before you continue, you need to split the downloaded images into a `train` folder and into a `validation` folder."
]
},
{
"cell_type": "raw",
"metadata": {
"colab_type": "raw",
"id": "VamXG4FoGhJH"
},
"source": [
"./\n",
"├── train\n",
"│ ├── brad pitt\n",
"│ └── johnny deep\n",
"| ├── leonardo di caprio\n",
"| └── ...\n",
"│ \n",
"└── validation\n",
" ├── brad pitt\n",
" ├── johnny deep\n",
" ├── leonardo di caprio\n",
" └── ..."
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "9322su6vGhJJ"
},
"source": [
"If you want to use the example of this jupyter notebook, you can use the images provided in the ./train and ./validation folders."
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "xPqJWgeAGhJL"
},
"source": [
"## Define a ConvNet Model"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {},
"colab_type": "code",
"id": "UuJV4JBKGhJO"
},
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
"source": [
"batch_size = 20\n",
"num_train_images = 480\n",
"num_valid_images = 80\n",
"num_classes = 8\n",
"\n",
"model_scratch = Sequential()\n",
"model_scratch.add(Conv2D(32, (3, 3), input_shape=(image_size, image_size, 3)))\n",
"model_scratch.add(Activation('relu'))\n",
"model_scratch.add(MaxPooling2D(pool_size=(2, 2)))\n",
"\n",
"model_scratch.add(Conv2D(32, (3, 3)))\n",
"model_scratch.add(Activation('relu'))\n",
"model_scratch.add(MaxPooling2D(pool_size=(2, 2)))\n",
"\n",
"model_scratch.add(Conv2D(64, (3, 3)))\n",
"model_scratch.add(Activation('relu'))\n",
"model_scratch.add(MaxPooling2D(pool_size=(2, 2)))\n",
"\n",
"# this converts our 3D feature maps to 1D feature vectors\n",
"model_scratch.add(Flatten()) \n",
"model_scratch.add(Dense(64))\n",
"model_scratch.add(Activation('relu'))\n",
"model_scratch.add(Dropout(0.5))\n",
"model_scratch.add(Dense(num_classes))\n",
"model_scratch.add(Activation('softmax'))\n",
"\n",
"model_scratch.compile(loss='categorical_crossentropy',\n",
" optimizer='adam',\n",
" metrics=['accuracy'])\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {},
"colab_type": "code",
"id": "JFdkIokMGhJT",
"outputId": "63e7d032-4083-4fe0-d970-c10bf0c39a94"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Found 480 images belonging to 8 classes.\n",
"Found 80 images belonging to 8 classes.\n"
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
]
}
],
"source": [
"# This is the augmentation configuration we will use for training\n",
"train_datagen = ImageDataGenerator(\n",
" rescale=1./255,\n",
" shear_range=0.2,\n",
" zoom_range=0.2,\n",
" horizontal_flip=True)\n",
"\n",
"# This is the augmentation configuration we will use for validation:\n",
"# only rescaling\n",
"validation_datagen = ImageDataGenerator(rescale=1./255)\n",
"\n",
"# This is a generator that will read pictures found in\n",
"# subfolers of './train', and indefinitely generate\n",
"# batches of augmented image data\n",
"train_generator = train_datagen.flow_from_directory(\n",
" './train', # this is the target directory\n",
" target_size=(image_size, image_size), # all images will be resized to 150x150\n",
" classes=class_names,\n",
" batch_size=batch_size) \n",
"\n",
"# This is a similar generator, for validation data\n",
"validation_generator = validation_datagen.flow_from_directory(\n",
" './validation',\n",
" target_size = (image_size, image_size),\n",
" classes = class_names,\n",
" batch_size = batch_size)"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {},
"colab_type": "code",
"id": "cytHiQUTGhJb"
},
"outputs": [],
"source": [
"logdir = os.path.join(\"logs\", datetime.datetime.now().strftime(\"%Y%m%d-%H%M%S\"))\n",
"tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {},
"colab_type": "code",
"id": "C7dCbyXPGhJg",
"outputId": "98b4085e-ed6d-43e2-831f-aec32161583f"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 2/24 [=>............................] - ETA: 34s - loss: 2.1133 - accuracy: 0.0500 "
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/opt/conda/lib/python3.7/site-packages/PIL/Image.py:952: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images\n",
" \"Palette images with Transparency expressed in bytes should be \"\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"24/24 [==============================] - 33s 1s/step - loss: 2.0931 - accuracy: 0.1250 - val_loss: 2.0772 - val_accuracy: 0.1500\n",
"18/24 [=====================>........] - ETA: 6s - loss: 2.0769 - accuracy: 0.1333"
]
}
],
"source": [
"history = model_scratch.fit(\n",
" train_generator,\n",
" steps_per_epoch = num_train_images // batch_size,\n",
" epochs = 20,\n",
" validation_data = validation_generator,\n",
" validation_steps = num_valid_images // batch_size,\n",
" callbacks = [tensorboard_callback])"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {},
"colab_type": "code",
"id": "wt_ONw5PGhJm",
"outputId": "e75d8a73-da49-4dbe-ffcf-7cb316be39a2"
},
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
"source": [
"plt.plot(history.history['accuracy'])\n",
"plt.plot(history.history['val_accuracy'])\n",
"plt.title('model accuracy')\n",
"plt.ylabel('accuracy')\n",
"plt.xlabel('epoch')\n",
"plt.legend(['train', 'valid'], loc='lower right')\n",
"plt.show()\n",
"plt.plot(history.history['loss'])\n",
"plt.plot(history.history['val_loss'])\n",
"plt.title('model loss')\n",
"plt.ylabel('loss')\n",
"plt.xlabel('epoch')\n",
"plt.legend(['train', 'valid'], loc='upper right')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Tensorboard"
]
},
{
"cell_type": "code",
"# Load the TensorBoard notebook extension\n",
"os.makedirs(logdir, exist_ok=True)\n",
"%tensorboard --logdir logs"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "Y8oAT4oUGhJs"
},
"source": [
"# Part II : Transfer Learning\n",
"\n",
"\n",
"Having to train an image-classification model using very little data is a common situation,\n",
"which you’ll likely encounter in practice if you ever do computer vision in a\n",
"professional context. A “few” samples can mean anywhere from a few hundred to a\n",
"few tens of thousands of images. As a practical example, we’ll focus on classifying\n",
"560 images belongig to 8 actors. We’ll use 480 pictures for training, and 80 for validation.\n",
"\n",
"## 2.1 Feature Extraction with a Pretrained Model\n",
"Feature extraction consists of using the representations learned by a previously\n",
"trained model to extract interesting features from new samples. These features are\n",
"then run through a new classifier, which is trained from scratch.\n",
"As you saw previously, ConvNets used for image classification comprise two parts:\n",
"they start with a series of pooling and convolution layers, and they end with a densely\n",
"connected classifier. The first part is called the _convolutional base_ of the model. In the\n",
"case of convnets, feature extraction consists of taking the convolutional base of a previously\n",
"trained network, running the new data through it, and training a new classifier\n",
"on top of the output.\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
"source": [
"from IPython.display import Image\n",
"Image(\"./Images/feature_extraction.png\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Why only reuse the convolutional base? Could we reuse the densely connected\n",
"classifier as well? In general, doing so should be avoided. The reason is that the representations\n",
"learned by the convolutional base are likely to be more generic and, therefore,\n",
"more reusable: the feature maps of a ConvNet are presence maps of generic\n",
"concepts over a picture, which are likely to be useful regardless of the computer vision\n",
"problem at hand. But the representations learned by the classifier will necessarily be\n",
"specific to the set of classes on which the model was trained—they will only contain\n",
"information about the presence probability of this or that class in the entire picture.\n",
"Additionally, representations found in densely connected layers no longer contain any information about where objects are located in the input image; these layers get rid of\n",
"the notion of space, whereas the object location is still described by convolutional feature\n",
"maps. For problems where object location matters, densely connected features\n",
"are largely useless.\n",
"\n",
"\n",
"Note that the level of generality (and therefore reusability) of the representations\n",
"extracted by specific convolution layers depends on the depth of the layer in the\n",
"model. Layers that come earlier in the model extract local, highly generic feature\n",
"maps (such as visual edges, colors, and textures), whereas layers that are higher up\n",
"extract more-abstract concepts (such as “cat ear” or “dog eye”). So if your new dataset\n",
"differs a lot from the dataset on which the original model was trained, you may be better\n",
"off using only the first few layers of the model to do feature extraction, rather than\n",
"using the entire convolutional base.\n",
"\n",
"\n",
"\n",
"In this case, because the ImageNet class set does not contain images of actors, we’ll \n",
"choose not to use the densely connected layers, in order to cover\n",
"the more general case where the class set of the new problem doesn’t overlap the\n",
"class set of the original model. Let’s put this into practice by using the convolutional\n",
"base of the VGG16 network, trained on ImageNet, to extract interesting features\n",
"from actors, and then train a classifier for the 8 actors on top of\n",
"these features.\n",
"\n",
"The VGG16 model, among others, comes prepackaged with Keras. You can import\n",
"it from the `keras.applications` module. Many other image-classification models (all\n",
"pretrained on the ImageNet dataset) are available as part of `keras.applications`:\n",
"\n",
"\n",
"- Xception\n",
"- ResNet\n",
"- MobileNet\n",
"- EfficientNet\n",
"- DenseNet\n",
"- etc.\n",
"\n",
"Let's instantiate the VGG16 model."
]
},
{
"cell_type": "code",
"metadata": {
"colab": {},
"colab_type": "code",
"id": "4Luec7pbGhJv",
"scrolled": true
},
"outputs": [],
"source": [
"# General imports\n",
"import sys\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from sklearn.metrics import confusion_matrix\n",
"import tensorflow as tf\n",
"from tensorflow import keras\n",
"from tensorflow.keras import layers\n",
"\n",
"# Shortcuts to keras if (however from tensorflow)\n",
"from tensorflow.keras import applications\n",
"from tensorflow.keras import optimizers\n",
"from tensorflow.keras.preprocessing.image import ImageDataGenerator\n",
"from tensorflow.keras.models import Sequential\n",
"from tensorflow.keras.layers import Conv2D, MaxPool2D\n",
"from tensorflow.keras.layers import Activation, Dropout, Flatten, Dense\n",
"from tensorflow.keras.callbacks import TensorBoard \n",
"\n",
"# Shortcut for displaying images\n",
"def plot_img(img):\n",
" plt.imshow(img, cmap='gray')\n",
" plt.axis(\"off\")\n",
" plt.show()\n",
" \n",
"# The target image size can be fixed here (quadratic)\n",
"# The ImageDataGenerator() automatically scales the images accordingly (aspect ratio is changed)\n",
]
},
{
"cell_type": "code",
"metadata": {
"colab": {},
"colab_type": "code",
"id": "eRes_n9BGhJ0"
},
"conv_base = keras.applications.vgg16.VGG16(weights=\"imagenet\",\n",
" include_top=False,\n",
" input_shape=(image_size, image_size, 3))"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "vEIWLeqSGhJ5"
},
"source": [
"You pass three arguments to the constructor:\n",
"\n",
"- `weights` specifies the weight checkpoint from which to initialize the model.\n",
"\n",
"- `include_top` refers to including (or not) the densely connected classifier on\n",
"top of the network. By default, this densely connected classifier corresponds to\n",
"the 1'000 classes from ImageNet. Because we intend to use our own densely\n",
"connected classifier (with 8 classes of actors), we don’t need to\n",
"include it.\n",
"- `input_shape` is the shape of the image tensors that we’ll feed to the network.\n",
"This argument is purely optional: if we don’t pass it, the network will be able to\n",
"process inputs of any size. Here we pass it so that we can visualize (in the following\n",
"summary) how the size of the feature maps shrinks with each new convolution\n",
"and pooling layer."
]
},
{
"cell_type": "markdown",
"Here’s the detail of the architecture of the VGG16 convolutional base. It’s similar to\n",
"the simple convnets you’re already familiar with:"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {},
"colab_type": "code",
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "DBSrhVORGhKH"
},
"source": [
"\n",
"The final feature map (output volume) has shape $(5, 5, 512)$. That's the feature on top of which we will stick a densely connected classifier.\n",
"\n",
"At this point, there are two ways how we could proceed:\n",
"\n",
"- __Approach 1__: Run the convolutional base over our dataset, record its output to a NumPy array\n",
"on disk, and then use this data as input to a standalone, densely connected classifier\n",
"similar to those you saw in Block 4 of this course. This solution is fast and\n",
"cheap to run, because it only requires running the convolutional base once for\n",
"every input image, and the convolutional base is by far the most expensive part\n",
"of the pipeline. But for the same reason, this technique won’t allow us to use\n",
"data augmentation.\n",
"\n",
"- __Approach 2__: Extend the model we have (`conv_base`) by adding `Dense` layers on top, and run\n",
"the whole thing from end to end on the input data. This will allow us to use\n",
"data augmentation, because every input image goes through the convolutional\n",
"base every time it’s seen by the model. But for the same reason, this technique is\n",
"far more expensive than the first.\n",
"We’ll cover both techniques. Let’s walk through the code required to set up the first\n",
"one: recording the output of `conv_base` on our data and using these outputs as inputs\n",
"to a new model."
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "mlpIDmSCGhKI"
},
"source": [
"### 1. Approach : Fast feature extraction without data augmentation\n",
"We’ll start by extracting features as NumPy arrays by calling the `predict()` method of\n",
"the `conv_base` model on our training, and validation datasets.\n",
"Let’s iterate over our datasets to extract the VGG16 features."
]
},
{
"cell_type": "code",
"from tensorflow.keras.utils import image_dataset_from_directory\n",
"train_dataset = image_dataset_from_directory(\n",
" './train',\n",
" batch_size=32,\n",
" label_mode=\"categorical\")\n",
"validation_dataset = image_dataset_from_directory(\n",
" './validation',\n",
" batch_size=32,\n",
" label_mode=\"categorical\")"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"def get_features_and_labels(dataset):\n",
" all_features = []\n",
" all_labels = []\n",
" for images, labels in dataset:\n",
" preprocessed_images = keras.applications.vgg16.preprocess_input(images)\n",
" features = conv_base.predict(preprocessed_images)\n",
" all_features.append(features)\n",
" all_labels.append(labels)\n",
" return np.concatenate(all_features), np.concatenate(all_labels)\n",
"train_features, train_labels = get_features_and_labels(train_dataset)\n",
"val_features, val_labels = get_features_and_labels(validation_dataset)"
]
},
"Importantly, `predict()` only expects images, not labels, but our current dataset yields\n",
"batches that contain both images and their labels. Moreover, the VGG16 model expects\n",
"inputs that are preprocessed with the function `keras.applications.vgg16.preprocess_input`, which scales pixel values to an appropriate range.\n",
"The extracted features are currently of shape `(samples, 5, 5, 512)`:"
]
},
{
"cell_type": "code",
]
},
{
"cell_type": "markdown",
"And the labels are now referring to the order of the folders"
]
},
{
"cell_type": "code",
]
},
{
"cell_type": "code",
"print(val_features.shape)\n",
"print(val_labels.shape)"
]
},
{
"cell_type": "code",
"# Note the use of the Flatten\n",
"# layer before passing the\n",
"# features to a Dense layer\n",
"x = layers.Flatten()(inputs)\n",
"x = layers.Dense(256)(x)\n",
"x = layers.Dropout(0.7)(x)\n",
"outputs = layers.Dense(8, activation=\"softmax\")(x)\n",
"model = keras.Model(inputs, outputs)"
]
},
{
"cell_type": "code",
"model.summary()"
]
},
{
"cell_type": "code",
"source": [
"model.compile(loss=\"categorical_crossentropy\",\n",
" optimizer=\"rmsprop\",\n",
" metrics=[\"accuracy\"])\n",
"\n",
"logdir = os.path.join(\"logs\", datetime.datetime.now().strftime(\"%Y%m%d-%H%M%S\"))\n",
"\n",
"\n",
"callbacks = [\n",
" keras.callbacks.ModelCheckpoint(filepath=\"feature_extraction.keras\", save_best_only=True, monitor=\"val_loss\"),\n",
" tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)\n",
"]\n",
"\n",
"history = model.fit(\n",
"train_features, train_labels,\n",
"epochs=30,\n",
"validation_data=(val_features, val_labels),\n",
"callbacks=callbacks\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that we’ll also use a `ModelCheckpoint` callback to save the model after each\n",
"epoch. We’ll configure it with the path specifying where to save the file, as well as the\n",
"arguments `save_best_only=True` and `monitor=\"val_loss\"`: they tell the callback to\n",
"only save a new file (overwriting any previous one) when the current value of the\n",
"`val_loss` metric is lower than at any previous time during training. This guarantees\n",
"that your saved file will always contain the state of the model corresponding to its bestperforming\n",
"training epoch, in terms of its performance on the validation data. As a\n",
"result, we won’t have to retrain a new model for a lower number of epochs if we start\n",
"overfitting: we can just reload our saved file."
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let’s look at the loss and accuracy curves during training:"
]
},
{
"cell_type": "code",
"source": [
"plt.plot(history.history['accuracy'])\n",
"plt.plot(history.history['val_accuracy'])\n",
"plt.title('model accuracy')\n",
"plt.ylabel('accuracy')\n",
"plt.xlabel('epoch')\n",
"plt.legend(['train', 'valid'], loc='lower right')\n",
"plt.show()\n",
"plt.plot(history.history['loss'])\n",
"plt.plot(history.history['val_loss'])\n",
"plt.title('model loss')\n",
"plt.ylabel('loss')\n",
"plt.xlabel('epoch')\n",
"plt.legend(['train', 'valid'], loc='upper right')\n",
"plt.show()"
]
},
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We reach a validation accuracy of about 32% — much worse than we achieved in the\n",
"previous section with the small model trained from scratch. \n",
"\n",
"The learning curves indicate that we’re overfitting almost from the start—\n",
"despite using dropout with a fairly large rate. That’s because this technique doesn’t\n",
"use data augmentation, which is essential for preventing overfitting with small image\n",
"datasets."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Tensorboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Load the TensorBoard notebook extension\n",
"%load_ext tensorboard\n",
"\n",
"%tensorboard --logdir logs"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "DJT-DgHvGhKu"
},
"source": [
"### 2. Approach : Feature Extraction with Data Augmentation\n",
"\n",
"\n",
"Now let’s review the second technique we mentioned for doing feature extraction,\n",
"which is much slower and more expensive, but which allows us to use data augmentation\n",
"during training: creating a model that chains the `conv_base` with a new dense\n",
"classifier, and training it end to end on the inputs.\n",
"In order to do this, we will first freeze the convolutional base. Freezing a layer or set of\n",
"layers means preventing their weights from being updated during training. If we don’t\n",
"do this, the representations that were previously learned by the convolutional base will\n",
"be modified during training. Because the Dense layers on top are randomly initialized,\n",
"very large weight updates would be propagated through the network, effectively\n",
"destroying the representations previously learned.\n",
"\n",
"In Keras, we freeze a layer or model by setting its trainable attribute to `False`. "
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "50DF9pH1GhKw"
},
"source": [
"#### Instantiating and freezing the VGG16 convolutional base"
]
},
{
"cell_type": "code",
"conv_base = keras.applications.vgg16.VGG16(weights=\"imagenet\", include_top=False)\n",
"conv_base.trainable = False"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Setting trainable to `False` empties the list of trainable weights of the layer or model."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Printing the list of trainable weights before and after freezing"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"conv_base.trainable = True"
]
},
{
"cell_type": "code",
"source": [
"print(\"This is the number of trainable weights before freezing the conv base:\", len(conv_base.trainable_weights))"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"conv_base.trainable = False"
]
},
{
"cell_type": "code",
"source": [
"print(\"This is the number of trainable weights after freezing the conv base:\", len(conv_base.trainable_weights))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can create a new model that chains together\n",
"\n",
"1. A data augmentation stage\n",
"2. Our frozen convolutional base \n",
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
"3. A dense classifier"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Adding a data augmentation stage and a classifier to the convolutional base"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data_augmentation = keras.Sequential(\n",
"[\n",
"layers.RandomFlip(\"horizontal\"),\n",
"layers.RandomRotation(0.1),\n",
"layers.RandomZoom(0.2),\n",
"]\n",
")"
]
},
{
"cell_type": "code",
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
"metadata": {},
"outputs": [],
"source": [
"inputs = keras.Input(shape=(180, 180, 3))\n",
"# Apply data augmentation\n",
"x = data_augmentation(inputs)\n",
"# Apply input value scaling\n",
"x = keras.applications.vgg16.preprocess_input(x)\n",
"x = conv_base(x)\n",
"x = layers.Flatten()(x)\n",
"x = layers.Dense(256)(x)\n",
"x = layers.Dropout(0.5)(x)\n",
"outputs = layers.Dense(8, activation=\"softmax\")(x)\n",
"model = keras.Model(inputs, outputs)\n",
"model.compile(loss=\"categorical_crossentropy\",\n",
" optimizer=\"rmsprop\",\n",
" metrics=[\"accuracy\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With this setup, only the weights from the two Dense layers that we added will be\n",
"trained. That’s a total of four weight tensors: two per layer (the main weight matrix\n",
"and the bias vector). \n",
"\n",
"Note that in order for these changes to take effect, you must first\n",
"compile the model. If you ever modify weight trainability after compilation, you\n",
"should then recompile the model, or these changes will be ignored.\n",
"\n",
"Let’s train our model. Thanks to data augmentation, it will take much longer for\n",
"the model to start overfitting, so we can train for more epochs — let’s do 50.\n",
"\n",
"__NOTE__ This technique is expensive enough that you should only attempt it if\n",
"you have access to a GPU (such as the free GPU available in Colab) — it’s\n",
"intractable on CPU. If you can’t run your code on GPU, then the previous\n",
"technique is the way to go."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"logdir = os.path.join(\"logs\", datetime.datetime.now().strftime(\"%Y%m%d-%H%M%S\"))\n",
"\n",
"\n",
" keras.callbacks.ModelCheckpoint(filepath=\"feature_extraction_with_augmentation.keras\", save_best_only=True, monitor=\"val_loss\"),\n",
" tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)\n",
"]\n"
]
},
{
"cell_type": "code",
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
"source": [
"history = model.fit(\n",
"train_dataset,\n",
"epochs=50,\n",
"validation_data=validation_dataset,\n",
"callbacks=callbacks)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let’s plot the results again. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.plot(history.history['accuracy'])\n",
"plt.plot(history.history['val_accuracy'])\n",
"plt.title('model accuracy')\n",
"plt.ylabel('accuracy')\n",
"plt.xlabel('epoch')\n",
"plt.legend(['train', 'valid'], loc='lower right')\n",
"plt.show()\n",
"plt.plot(history.history['loss'])\n",
"plt.plot(history.history['val_loss'])\n",
"plt.title('model loss')\n",
"plt.ylabel('loss')\n",
"plt.xlabel('epoch')\n",
"plt.legend(['train', 'valid'], loc='upper right')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see, we reach a validation accuracy of over 98%. This is a strong improvement over the previous model."
]
},
{
"cell_type": "markdown",
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Load the TensorBoard notebook extension\n",
"%load_ext tensorboard\n",
"%tensorboard --logdir logs"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"## Fine Tuning\n",
"\n",
"Another widely used technique for model reuse, complementary to feature extraction, is _fine-tuning_. \n",
"Fine-tuning consists of unfreezing a few of the top layers of a frozen model base used\n",
"for feature extraction, and jointly training both the newly added part of the model (in this case, the\n",
"fully connected classifier) and these top layers. This is called _fine-tuning_ because it slightly \n",
"adjusts the more abstract representations of the model being reused in order to make them more relevant for the problem at hand.\n",
"\n",
"I stated earlier that it’s necessary to freeze the convolution base of VGG16 in order to be able to\n",
"train a randomly initialized classifier on top. For the same reason, it’s only possible to fine-tune the top\n",
"layers of the convolutional base once the classifier on top has already been trained. If the classifier isn’t\n",
"already trained, the error signal propagating through the network during training will be too\n",
"large, and the representations previously learned by the layers being fine-tuned will be destroyed. Thus\n",
"the steps for fine-tuning a network are as follows:\n",
"The steps for fine-tuning are as follows:\n",
"\n",
"1. Add our custom network on top of an already-trained base network.\n",
"2. Freeze the base network.\n",
"3. Train the part we added.\n",
"4. Unfreeze some layers in the base network. (Note that you should not unfreeze “batch normalization” layers, which are not relevant here since there are no such layers in VGG16. )\n",
"5. Jointly train both these layers and the part we added.\n",
"We already completed the first three steps when doing feature extraction. Let’s proceed with step 4:\n",
"we’ll unfreeze our `conv_base` and then freeze individual layers inside it.\n",
"\n",
"As a reminder, this is what our convolutional base looks like:"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {},
"colab_type": "code",
"id": "cnObzTupGhLV",
"outputId": "3754b2b3-8885-44b3-cb87-82612d223ec3"
},
"outputs": [],
"source": [
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"We will fine-tune the last three convolutional layers, which means all layers up to `block4_pool` should be frozen, and the layers `block5_conv1`, `block5_conv2`, and `block5_conv3` should be trainable.\n",
"Why not fine-tune more layers? Why not fine-tune the entire convolutional base?\n",
"You could. But you need to consider the following:\n",
"- Earlier layers in the convolutional base encode more generic, reusable features, whereas layers higher up encode more specialized features. It’s more useful to fine-tune the more specialized features, because these are the ones that need to be repurposed on your new problem. There would be fast-decreasing returns in fine-tuning lower layers.\n",
"- The more parameters you’re training, the more you’re at risk of overfitting. The convolutional base has 15 million parameters, so it would be risky to attempt to train it on your small dataset. \n",
"Thus, in this situation, it’s a good strategy to fine-tune only the top two or three layers in the convolutional base. Let’s set this up, starting from where we left off in the previous example."
]
},
{
"cell_type": "markdown",
"#### Freezing all layers until the fourth from the last"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {},
"colab_type": "code",
"id": "tBXYN1t2GhLc",
"outputId": "b33ae8d1-925b-4e8a-f15d-a62356070896"
},
"conv_base.trainable = True\n",
"for layer in conv_base.layers[:-4]:\n",
" layer.trainable = False"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "XWw1mYfUGhLg"
},
"source": [
"Now we can begin fine-tuning the model. We’ll do this with the `RMSprop` optimizer, using a very low learning rate. The reason for using a low learning rate is that we want to limit the magnitude of the modifications we make to the representations of the three\n",
"layers we’re fine-tuning. Updates that are too large may harm these representations."
]
},
{
"cell_type": "code",
"metadata": {
"colab": {},
"colab_type": "code",
"id": "4YBjFhSVGhLh",
"outputId": "c688820a-0f28-4aa0-b247-15a9684fa08f"
},
"outputs": [],
"source": [
"model.compile(loss=\"binary_crossentropy\",\n",
" optimizer=keras.optimizers.RMSprop(learning_rate=1e-5),\n",
" metrics=[\"accuracy\"])\n",
"logdir = os.path.join(\"logs\", datetime.datetime.now().strftime(\"%Y%m%d-%H%M%S\"))\n",
"callbacks = [\n",
" keras.callbacks.ModelCheckpoint(filepath=\"fine_tuning.keras\", save_best_only=True, monitor=\"val_loss\"),\n",
" tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)\n",
"]\n",
"\n",
"history = model.fit(train_dataset,\n",
" epochs=30,\n",
" validation_data=validation_dataset,\n",
" callbacks=callbacks)"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {},
"colab_type": "code",
"id": "9rwSMMQaGhLx",
"outputId": "0a58db5a-0f22-45e8-d1fb-0a664fceaf4d"
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.plot(history.history['accuracy'])\n",
"plt.plot(history.history['val_accuracy'])\n",
"plt.title('model accuracy')\n",
"plt.ylabel('accuracy')\n",
"plt.xlabel('epoch')\n",
"plt.legend(['train', 'valid'], loc='lower right')\n",
"plt.show()\n",
"plt.plot(history.history['loss'])\n",
"plt.plot(history.history['val_loss'])\n",
"plt.title('model loss')\n",
"plt.ylabel('loss')\n",
"plt.xlabel('epoch')\n",
"plt.legend(['train', 'valid'], loc='upper right')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Tensorboard"
]
},
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Load the TensorBoard notebook extension\n",
"%load_ext tensorboard\n",
"%tensorboard --logdir logs"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"validation_generator_no_shuffle = validation_datagen.flow_from_directory(\n",
" './validation',\n",
" target_size=(image_size, image_size),\n",
" batch_size=num_valid_images,\n",
" classes=class_names,\n",
" shuffle=False)\n",
"\n",
"\n",
"prediction = model_fine_tuned.predict_generator(validation_generator_no_shuffle,1)"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {},
"colab_type": "code",
"id": "WoDOi_F8GhL5",
"outputId": "17c21c92-2a5d-4e21-c367-57e818046762"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 8 1 0 1 0 0 0 0 ], brad pitt\n",
"[ 1 7 0 1 0 0 1 0 ], johnny deep\n",
"[ 2 0 8 0 0 0 0 0 ], leonardo dicaprio\n",
"[ 0 0 0 9 0 0 0 1 ], robert de niro\n",
"[ 0 0 0 0 8 0 1 1 ], angelina jolie\n",
"[ 0 0 0 0 1 7 0 2 ], sandra bullock\n",
"[ 1 0 0 0 0 1 7 1 ], catherine deneuve\n",
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
"[ 0 0 0 0 2 2 0 6 ], marion cotillard\n"
]
}
],
"source": [
"Y_valid = np.zeros((num_valid_images,1),dtype=int)\n",
"\n",
"step = num_valid_images // num_classes\n",
"for ind in range(num_classes):\n",
" Y_valid[ind*step:(ind+1)*step] = ind\n",
" \n",
"confmat = confusion_matrix(Y_valid,np.argmax(prediction,axis=1)) \n",
"\n",
"for i0 in range(num_classes):\n",
" sys.stdout.write('[')\n",
" for i1 in range(num_classes):\n",
" sys.stdout.write('{:3d} '.format(confmat[i0,i1]))\n",
" \n",
" sys.stdout.write('], {}\\n'.format(class_names[i0]))\n",
" \n",
"sys.stdout.flush()"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {},
"colab_type": "code",
"id": "nNp0qChLGhL-",
"outputId": "f22e9bfe-e5da-4d57-fbdc-2ea55d6681e7"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"wrong classification for: marion cotillard\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"matched to: angelina jolie\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"matched to: sandra bullock\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
]
}
],
"source": [
"# Choose the class label you want to check\n",
"clbl = 7\n",
"step = num_valid_images // num_classes\n",
"pred_labels = np.argmax(prediction[clbl*step:(clbl+1)*step],axis=1)\n",
"wrong_labels = np.transpose(np.nonzero(pred_labels != clbl))\n",
"\n",
"print('wrong classification for: {}'.format(class_names[clbl]))\n",
"for i0 in wrong_labels:\n",
" img = validation_generator_no_shuffle[0][0][clbl*step + i0,...]\n",
" plot_img(img.reshape(150,150,3))\n",
" print('matched to: {}'.format(class_names[pred_labels[i0][0]]))\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "KaLGByZgGhMD"
},
"source": [
"## Tensorboard"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {},
"colab_type": "code",
"id": "RdVEKygQGhMF",
"outputId": "2161da8b-9605-4bcc-e60a-6ea83fccd401"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Traceback (most recent call last):\n",
" File \"/opt/conda/bin/tensorboard\", line 5, in <module>\n",
" from tensorboard.main import run_main\n",
" File \"/opt/conda/lib/python3.7/site-packages/tensorboard/main.py\", line 27, in <module>\n",
" from tensorboard import default\n",
" File \"/opt/conda/lib/python3.7/site-packages/tensorboard/default.py\", line 33, in <module>\n",
" from tensorboard.plugins.audio import audio_plugin\n",
" File \"/opt/conda/lib/python3.7/site-packages/tensorboard/plugins/audio/audio_plugin.py\", line 23, in <module>\n",
" from tensorboard import plugin_util\n",
" File \"/opt/conda/lib/python3.7/site-packages/tensorboard/plugin_util.py\", line 24, in <module>\n",
" import markdown\n",
" File \"/opt/conda/lib/python3.7/site-packages/markdown/__init__.py\", line 29, in <module>\n",
" from .core import Markdown, markdown, markdownFromFile # noqa: E402\n",
" File \"/opt/conda/lib/python3.7/site-packages/markdown/core.py\", line 26, in <module>\n",
" from . import util\n",
" File \"/opt/conda/lib/python3.7/site-packages/markdown/util.py\", line 88, in <module>\n",
" INSTALLED_EXTENSIONS = metadata.entry_points(group='markdown.extensions')\n",
"TypeError: entry_points() got an unexpected keyword argument 'group'\n"
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
]
}
],
"source": [
"!tensorboard --port=8061 --logdir=tensorboard/"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "R0dfpdDOGhM2"
},
"source": [
"# Part IV : Object Detection with Mask R-CNN\n",
"\n",
"### Please run this section on Colab !"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "vOAEQt-pGhM3"
},
"source": [
"Object detection is a task in computer vision that involves identifying the presence, location, and type of one or more objects in a given photograph.\n",
"\n",
"It is a challenging problem that involves building upon methods for object recognition (e.g. where are they), object localization (e.g. what are their extent), and object classification (e.g. what are they).\n",
"\n",
"In recent years, deep learning techniques have achieved state-of-the-art results for object detection, such as on standard benchmark datasets and in computer vision competitions. Most notably is the R-CNN, or Region-Based Convolutional Neural Networks, and the most recent technique called Mask R-CNN that is capable of achieving state-of-the-art results on a range of object detection tasks.\n",
"\n",
"In this section, we will discover how to use the __Mask R-CNN__ model to detect objects in new photographs.\n",
"\n",
"After completing this tutorial, you will know:\n",
"\n",
"- The region-based Convolutional Neural Network family of models for object detection and the most recent variation called Mask R-CNN.\n",
"\n",
"- The best-of-breed open source library implementation of the Mask R-CNN for the Keras deep learning library.\n",
" \n",
"- How to use a pre-trained Mask R-CNN to perform object localization and detection on new photographs.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "ra-bXlWXGhM4"
},
"source": [
"## Mask R-CNN for Object Detection\n",
"\n",
"Object detection is a computer vision task that involves both localizing one or more objects within an image and classifying each object in the image.\n",
"\n",
"It is a challenging computer vision task that requires both successful object localization in order to locate and draw a bounding box around each object in an image, and object classification to predict the correct class of object that was localized.\n",
"\n",
"An extension of object detection involves marking the specific pixels in the image that belong to each detected object instead of using coarse bounding boxes during object localization. This harder version of the problem is generally referred to as object segmentation or semantic segmentation.\n",
"\n",
"The __Region-Based__ Convolutional Neural Network, or R-CNN, is a family of convolutional neural network models designed for object detection, developed by Ross Girshick, et al.\n",
"\n",
"There are perhaps four main variations of the approach, resulting in the current pinnacle called Mask R-CNN. The salient aspects of each variation can be summarized as follows:\n",
"\n",
"- __R-CNN__: Bounding boxes are proposed by the “selective search” algorithm, each of which is stretched and features are extracted via a deep convolutional neural network, such as AlexNet, before a final set of object classifications are made with linear SVMs.\n",
"\n",
"- __Fast R-CNN__: Simplified design with a single model, bounding boxes are still specified as input, but a region-of-interest pooling layer is used after the deep CNN to consolidate regions and the model predicts both class labels and regions of interest directly.\n",
" \n",
"- __Faster R-CNN__: Addition of a Region Proposal Network that interprets features extracted from the deep CNN and learns to propose regions-of-interest directly.\n",
" \n",
"- __Mask R-CNN__: Extension of Faster R-CNN that adds an output model for predicting a mask for each detected object.\n",
"\n",
"The Mask R-CNN model introduced in the 2018 paper titled [Mask R-CNN](https://arxiv.org/abs/1703.06870) is the most recent variation of the family models and supports both object detection and object segmentation. The paper provides a nice summary of the model linage to that point:\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "GlXwuVoOGhM7"
},
"source": [
"### Matterport Mask R-CNN Project\n",
"\n",
"Mask R-CNN is a sophisticated model to implement, especially as compared to a simple or even state-of-the-art deep convolutional neural network model.\n",
"\n",
"Source code is available for each version of the R-CNN model, provided in separate GitHub repositories with prototype models based on the Caffe deep learning framework. For example:\n",
"\n",
"- R-CNN: [Regions with Convolutional Neural Network Features, GitHub](https://github.com/rbgirshick/rcnn)\n",
"\n",
"- Fast R-CNN, [GitHub](https://github.com/rbgirshick/fast-rcnn)\n",
"\n",
"- Faster R-CNN Python Code, [GitHub](https://github.com/rbgirshick/py-faster-rcnn)\n",
"\n",
"- Detectron, Facebook AI, [GitHub](https://github.com/facebookresearch/Detectron)\n",
"\n",
"Instead of developing an implementation of the R-CNN or Mask R-CNN model from scratch, we can use a reliable third-party implementation built on top of the Keras deep learning framework.\n",
"\n",
"The best of breed third-party implementations of Mask R-CNN is the [Mask R-CNN](https://github.com/matterport/Mask_RCNN) Project developed by Matterport. The project is open source released under a permissive license (i.e. MIT license) and the code has been widely used on a variety of projects and Kaggle competitions.\n",
"\n",
"Nevertheless, it is an open source project, subject to the whims of the project developers. As such, I have a fork of the project available, just in case there are major changes to the API in the future.\n",
"\n",
"The project is light on API documentation, although it does provide a number of examples in the form of Python Notebooks that you can use to understand how to use the library by example. Two notebooks that may be helpful to review are:\n",
"\n",
"- Mask R-CNN Demo, [Notebook](https://github.com/matterport/Mask_RCNN/blob/master/samples/demo.ipynb)\n",
"\n",
"- Mask R-CNN – Inspect Trained Model, [Notebook](https://github.com/matterport/Mask_RCNN/blob/master/samples/coco/inspect_model.ipynb)\n",
"\n",
"There are perhaps three main use cases for using the Mask R-CNN model with the Matterport library; they are:\n",
"\n",
"- __Object Detection Application__: Use a pre-trained model for object detection on new images.\n",
"\n",
"- __New Model via Transfer Learning__: Use a pre-trained model as a starting point in developing a model for a new object detection dataset.\n",
" \n",
"- __New Model from Scratch__: Develop a new model from scratch for an object detection dataset.\n",
"\n",
"In order to get familiar with the model and the library, we will look at the first example in the next section.\n",
"\n",
"#### Object Detection With Mask R-CNN\n",
"\n",
"In this section, we will use the Matterport Mask R-CNN library to perform object detection on arbitrary photographs.\n",
"\n",
"Much like using a pre-trained deep CNN for image classification, e.g. such as VGG-16 trained on an ImageNet dataset, we can use a pre-trained Mask R-CNN model to detect objects in new photographs. In this case, we will use a Mask R-CNN trained on the [MS COCO object detection problem](http://cocodataset.org/#home).\n",
"\n",
"#### Mask R-CNN Installation\n",
"\n",
"The first step is to install the library.\n",
"\n",
"At the time of writing, there is no distributed version of the library, so we have to install it manually. The good news is that this is very easy.\n",
"\n",
"Installation involves cloning the GitHub repository and running the installation script on your workstation. If you are having trouble, see the [installation instructions](https://github.com/matterport/Mask_RCNN#installation) buried in the library’s readme file.\n",
"\n",
"#### Step 0. Open Colab and Upload this Notebook\n",
"\n",
"#### Step 1. Clone the Mask R-CNN GitHub Repository\n",
"\n",
"This is as simple as running the following command from your command line:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 104
},
"colab_type": "code",
"id": "HGiDmuejGhM8",
"outputId": "ce5ca013-96e5-4766-d2ed-b4cde9b3ca94"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Cloning into 'Mask_RCNN'...\n",
"remote: Enumerating objects: 956, done.\u001b[K\n",
"remote: Total 956 (delta 0), reused 0 (delta 0), pack-reused 956\u001b[K\n",
"Receiving objects: 100% (956/956), 111.84 MiB | 30.52 MiB/s, done.\n",
"Resolving deltas: 100% (570/570), done.\n"
]
}
],
"source": [
"!git clone https://github.com/matterport/Mask_RCNN.git"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "S7uXyFVPGhNA"
},
"source": [
"This will create a new local directory with the name Mask_RCNN that looks as follows:"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "raw",
"id": "DhKn5ytcGhNA"
},
"source": [
"Mask_RCNN\n",
"├── assets\n",
"├── build\n",
"│ ├── bdist.macosx-10.13-x86_64\n",
"│ └── lib\n",
"│ └── mrcnn\n",
"├── dist\n",
"├── images\n",
"├── mask_rcnn.egg-info\n",
"├── mrcnn\n",
"└── samples\n",
" ├── balloon\n",
" ├── coco\n",
" ├── nucleus\n",
" └── shapes"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "WvFlDgvJGhNB"
},
"source": [
"#### Step 2. Install the Mask R-CNN Library\n",
"\n",
"The library can be installed directly via pip.\n",
"\n",
"Change directory into the _Mask_RCNN_ directory and run the installation script.\n",
"\n",
"From the command line, type the following:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"colab_type": "code",
"id": "aEUeZhX5GhNB",
"outputId": "be5de5a1-e821-477c-ce28-91bb9f8c3194"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from -r requirements.txt (line 1)) (1.18.2)\n",
"Requirement already satisfied: scipy in /usr/local/lib/python3.6/dist-packages (from -r requirements.txt (line 2)) (1.4.1)\n",
"Requirement already satisfied: Pillow in /usr/local/lib/python3.6/dist-packages (from -r requirements.txt (line 3)) (7.0.0)\n",
"Requirement already satisfied: cython in /usr/local/lib/python3.6/dist-packages (from -r requirements.txt (line 4)) (0.29.15)\n",
"Requirement already satisfied: matplotlib in /usr/local/lib/python3.6/dist-packages (from -r requirements.txt (line 5)) (3.2.1)\n",
"Requirement already satisfied: scikit-image in /usr/local/lib/python3.6/dist-packages (from -r requirements.txt (line 6)) (0.16.2)\n",
"Requirement already satisfied: tensorflow>=1.3.0 in /tensorflow-1.15.0/python3.6 (from -r requirements.txt (line 7)) (1.15.0)\n",
"Requirement already satisfied: keras>=2.0.8 in /usr/local/lib/python3.6/dist-packages (from -r requirements.txt (line 8)) (2.2.5)\n",
"Requirement already satisfied: opencv-python in /usr/local/lib/python3.6/dist-packages (from -r requirements.txt (line 9)) (4.1.2.30)\n",
"Requirement already satisfied: h5py in /usr/local/lib/python3.6/dist-packages (from -r requirements.txt (line 10)) (2.8.0)\n",
"Requirement already satisfied: imgaug in /usr/local/lib/python3.6/dist-packages (from -r requirements.txt (line 11)) (0.2.9)\n",
"Requirement already satisfied: IPython[all] in /usr/local/lib/python3.6/dist-packages (from -r requirements.txt (line 12)) (5.5.0)\n",
"Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->-r requirements.txt (line 5)) (1.1.0)\n",
"Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->-r requirements.txt (line 5)) (2.8.1)\n",
"Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib->-r requirements.txt (line 5)) (0.10.0)\n",
"Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->-r requirements.txt (line 5)) (2.4.6)\n",
"Requirement already satisfied: networkx>=2.0 in /usr/local/lib/python3.6/dist-packages (from scikit-image->-r requirements.txt (line 6)) (2.4)\n",
"Requirement already satisfied: PyWavelets>=0.4.0 in /usr/local/lib/python3.6/dist-packages (from scikit-image->-r requirements.txt (line 6)) (1.1.1)\n",
"Requirement already satisfied: imageio>=2.3.0 in /usr/local/lib/python3.6/dist-packages (from scikit-image->-r requirements.txt (line 6)) (2.4.1)\n",
"Requirement already satisfied: absl-py>=0.7.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow>=1.3.0->-r requirements.txt (line 7)) (0.9.0)\n",
"Requirement already satisfied: keras-applications>=1.0.8 in /usr/local/lib/python3.6/dist-packages (from tensorflow>=1.3.0->-r requirements.txt (line 7)) (1.0.8)\n",
"Requirement already satisfied: keras-preprocessing>=1.0.5 in /usr/local/lib/python3.6/dist-packages (from tensorflow>=1.3.0->-r requirements.txt (line 7)) (1.1.0)\n",
"Requirement already satisfied: wheel>=0.26 in /usr/local/lib/python3.6/dist-packages (from tensorflow>=1.3.0->-r requirements.txt (line 7)) (0.34.2)\n",
"Requirement already satisfied: six>=1.10.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow>=1.3.0->-r requirements.txt (line 7)) (1.12.0)\n",
"Requirement already satisfied: grpcio>=1.8.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow>=1.3.0->-r requirements.txt (line 7)) (1.24.3)\n",
"Requirement already satisfied: google-pasta>=0.1.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow>=1.3.0->-r requirements.txt (line 7)) (0.2.0)\n",
"Requirement already satisfied: protobuf>=3.6.1 in /usr/local/lib/python3.6/dist-packages (from tensorflow>=1.3.0->-r requirements.txt (line 7)) (3.10.0)\n",
"Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow>=1.3.0->-r requirements.txt (line 7)) (1.1.0)\n",
"Requirement already satisfied: astor>=0.6.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow>=1.3.0->-r requirements.txt (line 7)) (0.8.1)\n",
"Requirement already satisfied: gast==0.2.2 in /usr/local/lib/python3.6/dist-packages (from tensorflow>=1.3.0->-r requirements.txt (line 7)) (0.2.2)\n",
"Requirement already satisfied: tensorboard<1.16.0,>=1.15.0 in /tensorflow-1.15.0/python3.6 (from tensorflow>=1.3.0->-r requirements.txt (line 7)) (1.15.0)\n",
"Requirement already satisfied: tensorflow-estimator==1.15.1 in /tensorflow-1.15.0/python3.6 (from tensorflow>=1.3.0->-r requirements.txt (line 7)) (1.15.1)\n",
"Requirement already satisfied: wrapt>=1.11.1 in /usr/local/lib/python3.6/dist-packages (from tensorflow>=1.3.0->-r requirements.txt (line 7)) (1.12.1)\n",
"Requirement already satisfied: opt-einsum>=2.3.2 in /usr/local/lib/python3.6/dist-packages (from tensorflow>=1.3.0->-r requirements.txt (line 7)) (3.2.0)\n",
"Requirement already satisfied: pyyaml in /usr/local/lib/python3.6/dist-packages (from keras>=2.0.8->-r requirements.txt (line 8)) (3.13)\n",
"Requirement already satisfied: Shapely in /usr/local/lib/python3.6/dist-packages (from imgaug->-r requirements.txt (line 11)) (1.7.0)\n",
"Requirement already satisfied: pygments in /usr/local/lib/python3.6/dist-packages (from IPython[all]->-r requirements.txt (line 12)) (2.1.3)\n",
"Requirement already satisfied: pickleshare in /usr/local/lib/python3.6/dist-packages (from IPython[all]->-r requirements.txt (line 12)) (0.7.5)\n",
"Requirement already satisfied: decorator in /usr/local/lib/python3.6/dist-packages (from IPython[all]->-r requirements.txt (line 12)) (4.4.2)\n",
"Requirement already satisfied: pexpect; sys_platform != \"win32\" in /usr/local/lib/python3.6/dist-packages (from IPython[all]->-r requirements.txt (line 12)) (4.8.0)\n",
"Requirement already satisfied: traitlets>=4.2 in /usr/local/lib/python3.6/dist-packages (from IPython[all]->-r requirements.txt (line 12)) (4.3.3)\n",
"Requirement already satisfied: prompt-toolkit<2.0.0,>=1.0.4 in /usr/local/lib/python3.6/dist-packages (from IPython[all]->-r requirements.txt (line 12)) (1.0.18)\n",
"Requirement already satisfied: simplegeneric>0.8 in /usr/local/lib/python3.6/dist-packages (from IPython[all]->-r requirements.txt (line 12)) (0.8.1)\n",
"Requirement already satisfied: setuptools>=18.5 in /usr/local/lib/python3.6/dist-packages (from IPython[all]->-r requirements.txt (line 12)) (46.0.0)\n",
"Requirement already satisfied: qtconsole; extra == \"all\" in /usr/local/lib/python3.6/dist-packages (from IPython[all]->-r requirements.txt (line 12)) (4.7.1)\n",
"Requirement already satisfied: nbconvert; extra == \"all\" in /usr/local/lib/python3.6/dist-packages (from IPython[all]->-r requirements.txt (line 12)) (5.6.1)\n",
"Requirement already satisfied: ipyparallel; extra == \"all\" in /usr/local/lib/python3.6/dist-packages (from IPython[all]->-r requirements.txt (line 12)) (6.2.4)\n",
"Requirement already satisfied: ipywidgets; extra == \"all\" in /usr/local/lib/python3.6/dist-packages (from IPython[all]->-r requirements.txt (line 12)) (7.5.1)\n",
"Requirement already satisfied: Sphinx>=1.3; extra == \"all\" in /usr/local/lib/python3.6/dist-packages (from IPython[all]->-r requirements.txt (line 12)) (1.8.5)\n",
"Requirement already satisfied: notebook; extra == \"all\" in /usr/local/lib/python3.6/dist-packages (from IPython[all]->-r requirements.txt (line 12)) (5.2.2)\n",
"Requirement already satisfied: testpath; extra == \"all\" in /usr/local/lib/python3.6/dist-packages (from IPython[all]->-r requirements.txt (line 12)) (0.4.4)\n",
"Requirement already satisfied: nose>=0.10.1; extra == \"all\" in /usr/local/lib/python3.6/dist-packages (from IPython[all]->-r requirements.txt (line 12)) (1.3.7)\n",
"Requirement already satisfied: nbformat; extra == \"all\" in /usr/local/lib/python3.6/dist-packages (from IPython[all]->-r requirements.txt (line 12)) (5.0.4)\n",
"Requirement already satisfied: ipykernel; extra == \"all\" in /usr/local/lib/python3.6/dist-packages (from IPython[all]->-r requirements.txt (line 12)) (4.6.1)\n",
"Requirement already satisfied: requests; extra == \"all\" in /usr/local/lib/python3.6/dist-packages (from IPython[all]->-r requirements.txt (line 12)) (2.21.0)\n",
"Requirement already satisfied: werkzeug>=0.11.15 in /usr/local/lib/python3.6/dist-packages (from tensorboard<1.16.0,>=1.15.0->tensorflow>=1.3.0->-r requirements.txt (line 7)) (1.0.0)\n",
"Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.6/dist-packages (from tensorboard<1.16.0,>=1.15.0->tensorflow>=1.3.0->-r requirements.txt (line 7)) (3.2.1)\n",
"Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.6/dist-packages (from pexpect; sys_platform != \"win32\"->IPython[all]->-r requirements.txt (line 12)) (0.6.0)\n",
"Requirement already satisfied: ipython-genutils in /usr/local/lib/python3.6/dist-packages (from traitlets>=4.2->IPython[all]->-r requirements.txt (line 12)) (0.2.0)\n",
"Requirement already satisfied: wcwidth in /usr/local/lib/python3.6/dist-packages (from prompt-toolkit<2.0.0,>=1.0.4->IPython[all]->-r requirements.txt (line 12)) (0.1.8)\n",
"Requirement already satisfied: qtpy in /usr/local/lib/python3.6/dist-packages (from qtconsole; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (1.9.0)\n",
"Requirement already satisfied: jupyter-client>=4.1 in /usr/local/lib/python3.6/dist-packages (from qtconsole; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (5.3.4)\n",
"Requirement already satisfied: jupyter-core in /usr/local/lib/python3.6/dist-packages (from qtconsole; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (4.6.3)\n",
"Requirement already satisfied: mistune<2,>=0.8.1 in /usr/local/lib/python3.6/dist-packages (from nbconvert; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (0.8.4)\n",
"Requirement already satisfied: defusedxml in /usr/local/lib/python3.6/dist-packages (from nbconvert; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (0.6.0)\n",
"Requirement already satisfied: jinja2>=2.4 in /usr/local/lib/python3.6/dist-packages (from nbconvert; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (2.11.1)\n",
"Requirement already satisfied: entrypoints>=0.2.2 in /usr/local/lib/python3.6/dist-packages (from nbconvert; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (0.3)\n",
"Requirement already satisfied: pandocfilters>=1.4.1 in /usr/local/lib/python3.6/dist-packages (from nbconvert; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (1.4.2)\n",
"Requirement already satisfied: bleach in /usr/local/lib/python3.6/dist-packages (from nbconvert; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (3.1.3)\n",
"Requirement already satisfied: pyzmq>=13 in /usr/local/lib/python3.6/dist-packages (from ipyparallel; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (17.0.0)\n",
"Requirement already satisfied: tornado>=4 in /usr/local/lib/python3.6/dist-packages (from ipyparallel; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (4.5.3)\n",
"Requirement already satisfied: widgetsnbextension~=3.5.0 in /usr/local/lib/python3.6/dist-packages (from ipywidgets; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (3.5.1)\n",
"Requirement already satisfied: docutils>=0.11 in /usr/local/lib/python3.6/dist-packages (from Sphinx>=1.3; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (0.15.2)\n",
"Requirement already satisfied: babel!=2.0,>=1.3 in /usr/local/lib/python3.6/dist-packages (from Sphinx>=1.3; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (2.8.0)\n",
"Requirement already satisfied: packaging in /usr/local/lib/python3.6/dist-packages (from Sphinx>=1.3; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (20.3)\n",
"Requirement already satisfied: imagesize in /usr/local/lib/python3.6/dist-packages (from Sphinx>=1.3; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (1.2.0)\n",
"Requirement already satisfied: alabaster<0.8,>=0.7 in /usr/local/lib/python3.6/dist-packages (from Sphinx>=1.3; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (0.7.12)\n",
"Requirement already satisfied: sphinxcontrib-websupport in /usr/local/lib/python3.6/dist-packages (from Sphinx>=1.3; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (1.2.1)\n",
"Requirement already satisfied: snowballstemmer>=1.1 in /usr/local/lib/python3.6/dist-packages (from Sphinx>=1.3; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (2.0.0)\n",
"Requirement already satisfied: terminado>=0.3.3; sys_platform != \"win32\" in /usr/local/lib/python3.6/dist-packages (from notebook; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (0.8.3)\n",
"Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /usr/local/lib/python3.6/dist-packages (from nbformat; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (2.6.0)\n",
"Requirement already satisfied: idna<2.9,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (2.8)\n",
"Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (3.0.4)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (2019.11.28)\n",
"Requirement already satisfied: urllib3<1.25,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (1.24.3)\n",
"Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.6/dist-packages (from jinja2>=2.4->nbconvert; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (1.1.1)\n",
"Requirement already satisfied: webencodings in /usr/local/lib/python3.6/dist-packages (from bleach->nbconvert; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (0.5.1)\n",
"Requirement already satisfied: pytz>=2015.7 in /usr/local/lib/python3.6/dist-packages (from babel!=2.0,>=1.3->Sphinx>=1.3; extra == \"all\"->IPython[all]->-r requirements.txt (line 12)) (2018.9)\n",
"WARNING:root:Fail load requirements file, so using default ones.\n",
"running install\n",
"running bdist_egg\n",
"running egg_info\n",
"creating mask_rcnn.egg-info\n",
"writing mask_rcnn.egg-info/PKG-INFO\n",
"writing dependency_links to mask_rcnn.egg-info/dependency_links.txt\n",
"writing top-level names to mask_rcnn.egg-info/top_level.txt\n",
"writing manifest file 'mask_rcnn.egg-info/SOURCES.txt'\n",
"reading manifest template 'MANIFEST.in'\n",
"writing manifest file 'mask_rcnn.egg-info/SOURCES.txt'\n",
"installing library code to build/bdist.linux-x86_64/egg\n",
"running install_lib\n",
"running build_py\n",
"creating build\n",
"creating build/lib\n",
"creating build/lib/mrcnn\n",
"copying mrcnn/visualize.py -> build/lib/mrcnn\n",
"copying mrcnn/parallel_model.py -> build/lib/mrcnn\n",
"copying mrcnn/model.py -> build/lib/mrcnn\n",
"copying mrcnn/utils.py -> build/lib/mrcnn\n",
"copying mrcnn/config.py -> build/lib/mrcnn\n",
"copying mrcnn/__init__.py -> build/lib/mrcnn\n",
"creating build/bdist.linux-x86_64\n",
"creating build/bdist.linux-x86_64/egg\n",
"creating build/bdist.linux-x86_64/egg/mrcnn\n",
"copying build/lib/mrcnn/visualize.py -> build/bdist.linux-x86_64/egg/mrcnn\n",
"copying build/lib/mrcnn/parallel_model.py -> build/bdist.linux-x86_64/egg/mrcnn\n",
"copying build/lib/mrcnn/model.py -> build/bdist.linux-x86_64/egg/mrcnn\n",
"copying build/lib/mrcnn/utils.py -> build/bdist.linux-x86_64/egg/mrcnn\n",
"copying build/lib/mrcnn/config.py -> build/bdist.linux-x86_64/egg/mrcnn\n",
"copying build/lib/mrcnn/__init__.py -> build/bdist.linux-x86_64/egg/mrcnn\n",
"byte-compiling build/bdist.linux-x86_64/egg/mrcnn/visualize.py to visualize.cpython-36.pyc\n",
"byte-compiling build/bdist.linux-x86_64/egg/mrcnn/parallel_model.py to parallel_model.cpython-36.pyc\n",
"byte-compiling build/bdist.linux-x86_64/egg/mrcnn/model.py to model.cpython-36.pyc\n",
"byte-compiling build/bdist.linux-x86_64/egg/mrcnn/utils.py to utils.cpython-36.pyc\n",
"byte-compiling build/bdist.linux-x86_64/egg/mrcnn/config.py to config.cpython-36.pyc\n",
"byte-compiling build/bdist.linux-x86_64/egg/mrcnn/__init__.py to __init__.cpython-36.pyc\n",
"creating build/bdist.linux-x86_64/egg/EGG-INFO\n",
"copying mask_rcnn.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO\n",
"copying mask_rcnn.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO\n",
"copying mask_rcnn.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO\n",
"copying mask_rcnn.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO\n",
"zip_safe flag not set; analyzing archive contents...\n",
"creating dist\n",
"creating 'dist/mask_rcnn-2.1-py3.6.egg' and adding 'build/bdist.linux-x86_64/egg' to it\n",
"removing 'build/bdist.linux-x86_64/egg' (and everything under it)\n",
"Processing mask_rcnn-2.1-py3.6.egg\n",
"Removing /usr/local/lib/python3.6/dist-packages/mask_rcnn-2.1-py3.6.egg\n",
"Copying mask_rcnn-2.1-py3.6.egg to /usr/local/lib/python3.6/dist-packages\n",
"mask-rcnn 2.1 is already the active version in easy-install.pth\n",
"\n",
"Installed /usr/local/lib/python3.6/dist-packages/mask_rcnn-2.1-py3.6.egg\n",
"Processing dependencies for mask-rcnn==2.1\n",
"Finished processing dependencies for mask-rcnn==2.1\n"
]
}
],
"source": [
"import os\n",
"os.chdir('./Mask_RCNN')\n",
"!pip3 install -r requirements.txt\n",
"!python3 setup.py install "
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "DlySPeHPGhNE"
},
"source": [
"The library will then install directly and you will see a lot of successful installation messages ending with the following:"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "raw",
"id": "nAww1LboGhNF"
},
"source": [
"...\n",
"Finished processing dependencies for mask-rcnn==2.1"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "55X0zSm7GhNG"
},
"source": [
"#### Step 3: Confirm the Library Was Installed\n",
"\n",
"It is always a good idea to confirm that the library was installed correctly.\n",
"\n",
"You can confirm that the library was installed correctly by querying it via the pip command; for example:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 191
},
"colab_type": "code",
"id": "kKXRZ1vTGhNG",
"outputId": "9f0df55c-755f-4e11-a6c3-e8b7418eefcb"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Name: mask-rcnn\n",
"Version: 2.1\n",
"Summary: Mask R-CNN for object detection and instance segmentation\n",
"Home-page: https://github.com/matterport/Mask_RCNN\n",
"Author: Matterport\n",
"Author-email: waleed.abdulla@gmail.com\n",
"License: MIT\n",
"Location: /usr/local/lib/python3.6/dist-packages/mask_rcnn-2.1-py3.6.egg\n",
"Requires: \n",
"Required-by: \n"
]
}
],
"source": [
"!pip3 show mask-rcnn"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "f0vwUrMcGhNJ"
},
"source": [
"### Example of Object Localization\n",
"\n",
"We are going to use a pre-trained Mask R-CNN model to detect objects on a new photograph.\n",
"\n",
"#### Step 1. Download Model Weights\n",
"\n",
"First, download the weights for the pre-trained model, specifically a Mask R-CNN trained on the MS Coco dataset.\n",
"\n",
"The weights are available from the project GitHub project and the file is about 250 megabytes. Download the model weights to a file with the name ‘mask_rcnn_coco.h5‘ in your current working directory.\n",
"\n",
"[Download Weights (mask_rcnn_coco.h5)](https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5) (246 megabytes)\n",
"\n",
"#### Step 2. Download Sample Photograph\n",
"\n",
"We also need a photograph in which to detect objects.\n",
"\n",
"Download from Ilias the photograph to your current working directory with the filename ‘african-elephant.jpg‘\n",
"\n",
"\n",
"african-elephant.jpg"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "n8ccmDSvGhNK"
},
"source": [
"#### Step 3. Load Model and Make Prediction\n",
"\n",
"First, the model must be defined via an instance MaskRCNN class.\n",
"\n",
"This class requires a configuration object as a parameter. The configuration object defines how the model might be used during training or inference.\n",
"\n",
"In this case, the configuration will only specify the number of images per batch, which will be one, and the number of classes to predict.\n",
"\n",
"You can see the full extent of the configuration object and the properties that you can override in the [config.py](https://github.com/matterport/Mask_RCNN/blob/master/mrcnn/config.py) file."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "qAfMaOOzGhNL"
},
"outputs": [],
"source": [
"%tensorflow_version 1.x\n",
"from mrcnn.config import Config\n",
"from mrcnn.model import MaskRCNN\n",
"# define the test configuration\n",
"class TestConfig(Config):\n",
" NAME = \"test\"\n",
" GPU_COUNT = 1\n",
" IMAGES_PER_GPU = 1\n",
" NUM_CLASSES = 1 + 80"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "1CmHYT4RGhNN"
},
"source": [
"We can now define the MaskRCNN instance.\n",
"\n",
"We will define the model as type “inference” indicating that we are interested in making predictions and not training. We must also specify a directory where any log messages could be written, which in this case will be the current working directory."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "Sg482-mcGhNO"
},
"outputs": [],
"source": [
"# define the model\n",
"rcnn = MaskRCNN(mode='inference', model_dir='./', config=TestConfig())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install 'h5py==2.10.0' --force-reinstall"
]
},
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "9BtI50MlGhNR"
},
"source": [
"The next step is to load the weights that we downloaded. You should save it on google drive and then load it."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"colab_type": "code",
"id": "_TWgehzsNOSV",
"outputId": "73225d99-e9df-4d1c-c733-a092c97e336c"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n"
]
}
],
"source": [
"from google.colab import drive\n",
"drive.mount('/content/drive')"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 245
},
"colab_type": "code",
"id": "46t9gwLdGhNR",
"outputId": "842b58f4-2678-4ad9-bbcf-aac4656392b7"
},
Loading
Loading full blame...