From ea2937b9164cbb3b4dc8d3161deaa12b7ddcd90a Mon Sep 17 00:00:00 2001
From: Mirko Birbaumer <mirko.birbaumer@hslu.ch>
Date: Sun, 3 Apr 2022 20:34:17 +0000
Subject: [PATCH] Adaptation of VAE to tensorflow 2.7

---
 ...am, Neural Style Transfer, and GAN's.ipynb | 1385 +++++------------
 1 file changed, 389 insertions(+), 996 deletions(-)

diff --git a/notebooks/Block_7/Jupyter Notebook Block 7 - Generative Models - DeepDream, Neural Style Transfer, and GAN's.ipynb b/notebooks/Block_7/Jupyter Notebook Block 7 - Generative Models - DeepDream, Neural Style Transfer, and GAN's.ipynb
index 66cd9bc..7e1f590 100644
--- a/notebooks/Block_7/Jupyter Notebook Block 7 - Generative Models - DeepDream, Neural Style Transfer, and GAN's.ipynb	
+++ b/notebooks/Block_7/Jupyter Notebook Block 7 - Generative Models - DeepDream, Neural Style Transfer, and GAN's.ipynb	
@@ -11,15 +11,18 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Sampling from a latent space of images to create entirely new images or edit existing\n",
-    "ones is currently the most popular and successful application of creative AI. In this section\n",
-    "and the next, we’ll review some high-level concepts pertaining to image generation,\n",
-    "alongside implementations details relative to the two main techniques in this\n",
-    "domain: _variational autoencoders_ (VAEs) and _generative adversarial networks_ (GANs). \n",
+    "The most popular and successful application of creative AI today is image generation:\n",
+    "learning latent visual spaces and sampling from them to create entirely new pictures\n",
+    "interpolated from real ones — pictures of imaginary people, imaginary places, imaginary\n",
+    "cats and dogs, and so on.\n",
     "\n",
-    "The techniques we present here aren’t specific to images—you could develop latent spaces\n",
-    "of sound, music, or even text, using GANs and VAEs—but in practice, the most interesting\n",
-    "results have been obtained with pictures, and that’s what we focus on here."
+    "In this section and the next, we’ll review some high-level concepts pertaining to\n",
+    "image generation, alongside implementation details relative to the two main techniques\n",
+    "in this domain: _variational autoencoders_ (VAEs) and _generative adversarial networks_\n",
+    "(GANs). Note that the techniques we will discuss here aren’t specific to images — you\n",
+    "could develop latent spaces of sound, music, or even text, using GANs and VAEs — but\n",
+    "in practice, the most interesting results have been obtained with pictures, and that’s\n",
+    "what we’ll focus on here."
    ]
   },
   {
@@ -33,14 +36,16 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The key idea of image generation is to develop a low-dimensional latent space of representations\n",
-    "(which naturally is a vector space) where any point can be mapped to a\n",
-    "realistic-looking image. The module capable of realizing this mapping, taking as input\n",
-    "a latent point and outputting an image (a grid of pixels), is called a _generator_ (in the\n",
-    "case of GANs) or a _decoder_ (in the case of VAEs). \n",
+    "The key idea of image generation is to develop a low-dimensional _latent space_ of representations\n",
+    "(which, like everything else in deep learning, is a vector space), where any\n",
+    "point can be mapped to a “valid” image: an image that looks like the real thing. The\n",
+    "module capable of realizing this mapping, taking as input a latent point and outputting\n",
+    "an image (a grid of pixels), is called a _generator_ (in the case of GANs) or a _decoder_\n",
+    "(in the case of VAEs). Once such a latent space has been learned, you can sample\n",
+    "points from it, and, by mapping them back to image space, generate images that have\n",
+    "never been seen before (see the figure below). These new images are the in-betweens of\n",
+    "the training images.\n",
     "\n",
-    "Once such a latent space has been developed, you can sample points from it, either deliberately \n",
-    "or at random, and, by mapping them to image space, generate images that have never been seen before.\n",
     "\n",
     "<img src='./Bilder/latent_space.jpg'>"
    ]
@@ -49,11 +54,12 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "GANs and VAEs are two different strategies for learning such latent spaces of image\n",
-    "representations, each with its own characteristics. VAEs are great for learning latent\n",
-    "spaces that are well structured, where specific directions encode a meaningful axis of\n",
-    "variation in the data. GANs generate images that can potentially be highly realistic, but\n",
-    "the latent space they come from may not have as much structure and continuity."
+    "GANs and VAEs are two different strategies for learning such latent spaces of\n",
+    "image representations, each with its own characteristics. VAEs are great for learning\n",
+    "latent spaces that are well structured, where specific directions encode a meaningful\n",
+    "axis of variation in the data (see the figure below). GANs generate images that can potentially\n",
+    "be highly realistic, but the latent space they come from may not have as much\n",
+    "structure and continuity."
    ]
   },
   {
@@ -62,6 +68,7 @@
    "source": [
     "## Concept vectors for image editing\n",
     "\n",
+    "\n",
     "The idea of concept vectors is the following : given a latent space of representations, or an\n",
     "embedding space, certain directions in the space may encode interesting axes of variation\n",
     "in the original data. \n",
@@ -97,8 +104,8 @@
     "Variational autoencoders, simultaneously discovered by Kingma and Welling in\n",
     "December 2013 and Rezende, Mohamed, and Wierstra in January 2014 are a kind\n",
     "of generative model that’s especially appropriate for the task of image editing via concept\n",
-    "vectors. They’re a modern take on autoencoders — a type of network that aims to\n",
-    "encode an input to a low-dimensional latent space and then decode it back—that\n",
+    "vectors. They’re a modern take on autoencoders - a type of network that aims to\n",
+    "encode an input to a low-dimensional latent space and then decode it back - that\n",
     "mixes ideas from deep learning with Bayesian inference.\n",
     "\n",
     "\n",
@@ -129,25 +136,25 @@
     "with a little bit of statistical magic that forces them to learn continuous, highly structured\n",
     "latent spaces. They have turned out to be a powerful tool for image generation.\n",
     "\n",
-    "\n",
-    "A VAE, instead of compressing its input image into a fixed code in the latent space,\n",
-    "turns the image into the parameters of a statistical distribution: a mean and a variance.\n",
-    "Essentially, this means you’re assuming the input image has been generated by a\n",
-    "statistical process, and that the randomness of this process should be taken into\n",
+    "A VAE, instead of compressing its input image into a fixed code in the latent\n",
+    "space, turns the image into the parameters of a statistical distribution: a mean and a\n",
+    "variance. Essentially, this means we’re assuming the input image has been generated\n",
+    "by a statistical process, and that the randomness of this process should be taken into\n",
     "account during encoding and decoding. The VAE then uses the mean and variance\n",
-    "parameters to randomly sample one element of the distribution, and decodes that element\n",
-    "back to the original input, see the Figure below: \n",
-    "\n",
+    "parameters to randomly sample one element of the distribution, and decodes that\n",
+    "element back to the original input , see the Figure below: \n",
     "\n",
     "<img src='./Bilder/vae_illustration.jpg'>\n",
     "\n",
     "A VAE maps an image to two vectors, `z_mean` and `z_log_sigma`, which define\n",
     "a probability distribution over the latent space, used to sample a latent point to decode.\n",
     "\n",
+    "The stochasticity of this process improves robustness and forces the latent space to encode meaningful \n",
+    "representations everywhere: every point sampled in the latent space is decoded to a valid\n",
+    "output.\n",
+    "\n",
     "\n",
-    "The stochasticity of this process\n",
-    "improves robustness and forces the latent space to encode meaningful representations\n",
-    "everywhere: every point sampled in the latent space is decoded to a valid output."
+    "\n"
    ]
   },
   {
@@ -156,7 +163,7 @@
    "source": [
     "In technical terms, here’s how a VAE works:\n",
     "\n",
-    "1. An encoder module turns the input samples input_img into two parameters in\n",
+    "1. An encoder module turns the input samples `input_img` into two parameters in\n",
     "a latent space of representations, `z_mean` and `z_log_variance`\n",
     "\n",
     "2. You randomly sample a point z from the latent normal distribution that’s\n",
@@ -209,9 +216,352 @@
    "metadata": {},
    "source": [
     "You can then train the model using the reconstruction loss and the regularization loss.\n",
-    "The following listing shows the encoder network you’ll use, mapping images to the\n",
+    "\n",
+    "For the regularization loss, we typically use an expression (the Kullback–Leibler divergence)\n",
+    "meant to nudge the distribution of the encoder output toward a well-rounded\n",
+    "normal distribution centered around 0. This provides the encoder with a sensible\n",
+    "assumption about the structure of the latent space it’s modeling."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Implementing a VAE with Keras\n",
+    "\n",
+    "We’re going to be implementing a VAE that can generate MNIST digits. It’s going to\n",
+    "have three parts:\n",
+    "\n",
+    "1. An _encoder network_ that turns a real image into a mean and a variance in the latent space\n",
+    "\n",
+    "2. A _sampling layer_ that takes such a mean and variance, and uses them to sample a random point from the latent space\n",
+    "\n",
+    "3. A _decoder network_ that turns points from the latent space back into images\n",
+    "\n",
+    "The following listing shows the encoder network we’ll use, mapping images to the\n",
     "parameters of a probability distribution over the latent space. It’s a simple convnet\n",
-    "that maps the input image x to two vectors, `z_mean` and `z_log_var`."
+    "that maps the input image `x` to two vectors, `z_mean` and `z_log_var`. One important\n",
+    "detail is that we use strides for downsampling feature maps instead of max pooling.\n",
+    "(remember the U-Net in the image segmentation section). Recall\n",
+    "that, in general, strides are preferable to max pooling for any model that cares about\n",
+    "information location — that is to say, where stuff is in the image — and this one does, since\n",
+    "it will have to produce an image encoding that can be used to reconstruct a valid\n",
+    "image."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### VAE encoder network"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from tensorflow import keras\n",
+    "from tensorflow.keras import layers\n",
+    "\n",
+    "# Dimensionality of the latent space: a 2D plane\n",
+    "latent_dim = 2\n",
+    "encoder_inputs = keras.Input(shape=(28, 28, 1))\n",
+    "x = layers.Conv2D(32, 3, activation=\"relu\", strides=2, padding=\"same\")(encoder_inputs)\n",
+    "x = layers.Conv2D(64, 3, activation=\"relu\", strides=2, padding=\"same\")(x)\n",
+    "x = layers.Flatten()(x)\n",
+    "x = layers.Dense(16, activation=\"relu\")(x)\n",
+    "# The input image ends up being encoded into these\n",
+    "# two parameters\n",
+    "z_mean = layers.Dense(latent_dim, name=\"z_mean\")(x)\n",
+    "z_log_var = layers.Dense(latent_dim, name=\"z_log_var\")(x)\n",
+    "encoder = keras.Model(encoder_inputs, [z_mean, z_log_var], name=\"encoder\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Its summary looks like this:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Model: \"encoder\"\n",
+      "__________________________________________________________________________________________________\n",
+      " Layer (type)                   Output Shape         Param #     Connected to                     \n",
+      "==================================================================================================\n",
+      " input_2 (InputLayer)           [(None, 28, 28, 1)]  0           []                               \n",
+      "                                                                                                  \n",
+      " conv2d_2 (Conv2D)              (None, 14, 14, 32)   320         ['input_2[0][0]']                \n",
+      "                                                                                                  \n",
+      " conv2d_3 (Conv2D)              (None, 7, 7, 64)     18496       ['conv2d_2[0][0]']               \n",
+      "                                                                                                  \n",
+      " flatten_1 (Flatten)            (None, 3136)         0           ['conv2d_3[0][0]']               \n",
+      "                                                                                                  \n",
+      " dense_1 (Dense)                (None, 16)           50192       ['flatten_1[0][0]']              \n",
+      "                                                                                                  \n",
+      " z_mean (Dense)                 (None, 2)            34          ['dense_1[0][0]']                \n",
+      "                                                                                                  \n",
+      " z_log_var (Dense)              (None, 2)            34          ['dense_1[0][0]']                \n",
+      "                                                                                                  \n",
+      "==================================================================================================\n",
+      "Total params: 69,076\n",
+      "Trainable params: 69,076\n",
+      "Non-trainable params: 0\n",
+      "__________________________________________________________________________________________________\n"
+     ]
+    }
+   ],
+   "source": [
+    "encoder.summary()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next is the code for using `z_mean` and `z_log_var`, the parameters of the statistical distribution\n",
+    "assumed to have produced input_img, to generate a latent space point `z`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Latent-space-sampling layer"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import tensorflow as tf\n",
+    "class Sampler(layers.Layer):\n",
+    "    def call(self, z_mean, z_log_var):\n",
+    "        batch_size = tf.shape(z_mean)[0]\n",
+    "        z_size = tf.shape(z_mean)[1]\n",
+    "        # Draw a batch of random normal\n",
+    "        # Apply the VAE vectors.\n",
+    "        epsilon = tf.random.normal(shape=(batch_size, z_size))\n",
+    "        return z_mean + tf.exp(0.5 * z_log_var) * epsilon"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The following listing shows the decoder implementation. We reshape the vector `z` to\n",
+    "the dimensions of an image and then use a few convolution layers to obtain a final\n",
+    "image output that has the same dimensions as the original input_img."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Input where we’ll feed z\n",
+    "latent_inputs = keras.Input(shape=(latent_dim,))\n",
+    "# Produce the same number of coefficients that we\n",
+    "# had at the level of the Flatten layer in the encoder\n",
+    "x = layers.Dense(7 * 7 * 64, activation=\"relu\")(latent_inputs)\n",
+    "# Revert the Flatten layer of the encoder\n",
+    "x = layers.Reshape((7, 7, 64))(x)\n",
+    "# Revert the Conv2D layers of the encoder\n",
+    "x = layers.Conv2DTranspose(64, 3, activation=\"relu\", strides=2, padding=\"same\")(x)\n",
+    "x = layers.Conv2DTranspose(32, 3, activation=\"relu\", strides=2, padding=\"same\")(x)\n",
+    "# The output ends up with shape (28, 28, 1)\n",
+    "decoder_outputs = layers.Conv2D(1, 3, activation=\"sigmoid\", padding=\"same\")(x)\n",
+    "decoder = keras.Model(latent_inputs, decoder_outputs, name=\"decoder\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Its summary looks like this:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Model: \"decoder\"\n",
+      "_________________________________________________________________\n",
+      " Layer (type)                Output Shape              Param #   \n",
+      "=================================================================\n",
+      " input_3 (InputLayer)        [(None, 2)]               0         \n",
+      "                                                                 \n",
+      " dense_2 (Dense)             (None, 3136)              9408      \n",
+      "                                                                 \n",
+      " reshape (Reshape)           (None, 7, 7, 64)          0         \n",
+      "                                                                 \n",
+      " conv2d_transpose (Conv2DTra  (None, 14, 14, 64)       36928     \n",
+      " nspose)                                                         \n",
+      "                                                                 \n",
+      " conv2d_transpose_1 (Conv2DT  (None, 28, 28, 32)       18464     \n",
+      " ranspose)                                                       \n",
+      "                                                                 \n",
+      " conv2d_4 (Conv2D)           (None, 28, 28, 1)         289       \n",
+      "                                                                 \n",
+      "=================================================================\n",
+      "Total params: 65,089\n",
+      "Trainable params: 65,089\n",
+      "Non-trainable params: 0\n",
+      "_________________________________________________________________\n"
+     ]
+    }
+   ],
+   "source": [
+    "decoder.summary()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now let’s create the VAE model itself. This is your first example of a model that isn’t\n",
+    "doing supervised learning (an autoencoder is an example of _self-supervised learning_,\n",
+    "because it uses its inputs as targets). Whenever you depart from classic supervised\n",
+    "learning, it’s common to subclass the `Model` class and implement a custom `train_\n",
+    "step()` to specify the new training logic.\n",
+    "That’s what we’ll do here."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class VAE(keras.Model):\n",
+    "    def __init__(self, encoder, decoder, **kwargs):\n",
+    "        super().__init__(**kwargs)\n",
+    "        self.encoder = encoder\n",
+    "        self.decoder = decoder\n",
+    "        self.sampler = Sampler()\n",
+    "        # We use these metrics to keep track of the loss averages\n",
+    "        # over each epoch.\n",
+    "        self.total_loss_tracker = keras.metrics.Mean(name=\"total_loss\")\n",
+    "        self.reconstruction_loss_tracker = keras.metrics.Mean(\n",
+    "        name=\"reconstruction_loss\")\n",
+    "        self.kl_loss_tracker = keras.metrics.Mean(name=\"kl_loss\")\n",
+    "\n",
+    "    \n",
+    "\n",
+    "        # We list the metrics in the metrics\n",
+    "        # property to enable the model to reset\n",
+    "        # them after each epoch (or between\n",
+    "        # multiple calls to fit()/evaluate())\n",
+    "        @property\n",
+    "        def metrics(self):\n",
+    "            return [self.total_loss_tracker,\n",
+    "                    self.reconstruction_loss_tracker,\n",
+    "                selfself.kl_loss_tracker]\n",
+    "\n",
+    "        def train_step(self, data):\n",
+    "            with tf.GradientTape() as tape:\n",
+    "                z_mean, z_log_var = self.encoder(data)\n",
+    "                z = self.sampler(z_mean, z_log_var)\n",
+    "                reconstruction = decoder(z)\n",
+    "                # We sum the reconstruction loss over the spatial\n",
+    "                # dimensions (axes 1 and 2) and take its mean over the\n",
+    "                # batch dimension.\n",
+    "                reconstruction_loss = tf.reduce_mean(\n",
+    "                tf.reduce_sum(keras.losses.binary_crossentropy(data, reconstruction), axis=(1, 2))\n",
+    "                )\n",
+    "                # Add the regularization term (Kullback–Leibler divergence)\n",
+    "                kl_loss = -0.5 * (1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var))\n",
+    "                total_loss = reconstruction_loss + tf.reduce_mean(kl_loss)\n",
+    "            \n",
+    "            grads = tape.gradient(total_loss, self.trainable_weights)\n",
+    "            self.optimizer.apply_gradients(zip(grads, self.trainable_weights))\n",
+    "            self.total_loss_tracker.update_state(total_loss)\n",
+    "            self.reconstruction_loss_tracker.update_state(reconstruction_loss)\n",
+    "            self.kl_loss_tracker.update_state(kl_loss)\n",
+    "            return {\n",
+    "            \"total_loss\": self.total_loss_tracker.result(),\n",
+    "            \"reconstruction_loss\": self.reconstruction_loss_tracker.result(),\n",
+    "            \"kl_loss\": self.kl_loss_tracker.result(),\n",
+    "            }\n",
+    "    \n",
+    "    "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Finally, we’re ready to instantiate and train the model on MNIST digits. Because the\n",
+    "loss is taken care of in the custom layer, we don’t specify an external loss at compile\n",
+    "time (`loss=None`), which in turn means we won’t pass target data during training - as\n",
+    "you can see, we only pass `x_train` to the model in `fit()`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Training the VAE"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch 1/30\n"
+     ]
+    },
+    {
+     "ename": "NotImplementedError",
+     "evalue": "Exception encountered when calling layer \"vae_1\" (type VAE).\n\nWhen subclassing the `Model` class, you should implement a `call()` method.\n\nCall arguments received:\n  • inputs=tf.Tensor(shape=(128, 28, 28, 1), dtype=float32)\n  • training=True\n  • mask=None",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mNotImplementedError\u001b[0m                       Traceback (most recent call last)",
+      "\u001b[0;32m<ipython-input-11-97ae65a7d644>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m     11\u001b[0m \u001b[0;31m# Note that we don’t pass targets in fit(), since train_step()\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     12\u001b[0m \u001b[0;31m# doesn’t expect any\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 13\u001b[0;31m \u001b[0mvae\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmnist_digits\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mepochs\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m30\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbatch_size\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m128\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
+      "\u001b[0;32m/opt/conda/lib/python3.7/site-packages/keras/utils/traceback_utils.py\u001b[0m in \u001b[0;36merror_handler\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m     65\u001b[0m     \u001b[0;32mexcept\u001b[0m \u001b[0mException\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m  \u001b[0;31m# pylint: disable=broad-except\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     66\u001b[0m       \u001b[0mfiltered_tb\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_process_traceback_frames\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0me\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__traceback__\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 67\u001b[0;31m       \u001b[0;32mraise\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwith_traceback\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfiltered_tb\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     68\u001b[0m     \u001b[0;32mfinally\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     69\u001b[0m       \u001b[0;32mdel\u001b[0m \u001b[0mfiltered_tb\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;32m/opt/conda/lib/python3.7/site-packages/keras/engine/training.py\u001b[0m in \u001b[0;36mcall\u001b[0;34m(self, inputs, training, mask)\u001b[0m\n\u001b[1;32m    473\u001b[0m         \u001b[0ma\u001b[0m \u001b[0mlist\u001b[0m \u001b[0mof\u001b[0m \u001b[0mtensors\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mthere\u001b[0m \u001b[0mare\u001b[0m \u001b[0mmore\u001b[0m \u001b[0mthan\u001b[0m \u001b[0mone\u001b[0m \u001b[0moutputs\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    474\u001b[0m     \"\"\"\n\u001b[0;32m--> 475\u001b[0;31m     raise NotImplementedError('When subclassing the `Model` class, you should '\n\u001b[0m\u001b[1;32m    476\u001b[0m                               'implement a `call()` method.')\n\u001b[1;32m    477\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;31mNotImplementedError\u001b[0m: Exception encountered when calling layer \"vae_1\" (type VAE).\n\nWhen subclassing the `Model` class, you should implement a `call()` method.\n\nCall arguments received:\n  • inputs=tf.Tensor(shape=(128, 28, 28, 1), dtype=float32)\n  • training=True\n  • mask=None"
+     ]
+    }
+   ],
+   "source": [
+    "import numpy as np\n",
+    "(x_train, _), (x_test, _) = keras.datasets.mnist.load_data()\n",
+    "# We train on all MNIST digits, so we concatenate\n",
+    "# the training and test samples\n",
+    "mnist_digits = np.concatenate([x_train, x_test], axis=0)\n",
+    "mnist_digits = np.expand_dims(mnist_digits, -1).astype(\"float32\") / 255\n",
+    "vae = VAE(encoder, decoder)\n",
+    "# Note that we don’t pass a loss argument in compile(), since the loss\n",
+    "# is already part of the train_step().\n",
+    "vae.compile(optimizer=keras.optimizers.Adam(), run_eagerly=True)\n",
+    "# Note that we don’t pass targets in fit(), since train_step()\n",
+    "# doesn’t expect any\n",
+    "vae.fit(mnist_digits, epochs=30, batch_size=128)"
    ]
   },
   {
@@ -535,7 +885,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Part IV : Adverserial Networks"
+    "# Part II : Adverserial Networks"
    ]
   },
   {
@@ -961,7 +1311,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Part I : Deep Dreams\n",
+    "# Part III : Deep Dreams\n",
     "\n",
     "Unzip the `Bilder.zip` file in the same directory where you run this notebook."
    ]
@@ -1399,7 +1749,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Part II : Neural style transfer\n",
+    "# Part IV : Neural style transfer\n",
     "\n",
     "In addition to DeepDream, another major development in deep-learning-driven\n",
     "image modification is __neural style transfer__, introduced by Leon Gatys et al. in the summer\n",
@@ -2012,963 +2362,6 @@
     "\n",
     "3. Give more weight on content image or style image."
    ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Part III : Generating Images with Variational Autoencoders (VAE)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Sampling from a latent space of images to create entirely new images or edit existing\n",
-    "ones is currently the most popular and successful application of creative AI. In this section\n",
-    "and the next, we’ll review some high-level concepts pertaining to image generation,\n",
-    "alongside implementations details relative to the two main techniques in this\n",
-    "domain: _variational autoencoders_ (VAEs) and _generative adversarial networks_ (GANs). \n",
-    "\n",
-    "The techniques we present here aren’t specific to images—you could develop latent spaces\n",
-    "of sound, music, or even text, using GANs and VAEs—but in practice, the most interesting\n",
-    "results have been obtained with pictures, and that’s what we focus on here."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Sampling from Latent Spaces of Images"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The key idea of image generation is to develop a low-dimensional latent space of representations\n",
-    "(which naturally is a vector space) where any point can be mapped to a\n",
-    "realistic-looking image. The module capable of realizing this mapping, taking as input\n",
-    "a latent point and outputting an image (a grid of pixels), is called a _generator_ (in the\n",
-    "case of GANs) or a _decoder_ (in the case of VAEs). \n",
-    "\n",
-    "Once such a latent space has been developed, you can sample points from it, either deliberately \n",
-    "or at random, and, by mapping them to image space, generate images that have never been seen before.\n",
-    "\n",
-    "<img src='./Bilder/latent_space.jpg'>"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "GANs and VAEs are two different strategies for learning such latent spaces of image\n",
-    "representations, each with its own characteristics. VAEs are great for learning latent\n",
-    "spaces that are well structured, where specific directions encode a meaningful axis of\n",
-    "variation in the data. GANs generate images that can potentially be highly realistic, but\n",
-    "the latent space they come from may not have as much structure and continuity."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Concept vectors for image editing\n",
-    "\n",
-    "The idea of concept vectors is the following : given a latent space of representations, or an\n",
-    "embedding space, certain directions in the space may encode interesting axes of variation\n",
-    "in the original data. \n",
-    "\n",
-    "In a latent space of images of faces, for instance, there may\n",
-    "be a smile vector $s$, such that if latent point $z$ is the embedded representation of a certain\n",
-    "face, then latent point $z + s$ is the embedded representation of the same face,\n",
-    "smiling. \n",
-    "\n",
-    "\n",
-    "Once you’ve identified such a vector, it then becomes possible to edit images\n",
-    "by projecting them into the latent space, moving their representation in a meaningful\n",
-    "way, and then decoding them back to image space. There are concept vectors for\n",
-    "essentially any independent dimension of variation in image space—in the case of\n",
-    "faces, you may discover vectors for adding sunglasses to a face, removing glasses, turning\n",
-    "a male face into a female face, and so on. \n",
-    "\n",
-    "\n",
-    "The Figure below is an example of a smile vector,\n",
-    "a concept vector discovered by Tom White from the Victoria University School of\n",
-    "Design in New Zealand, using VAEs trained on a dataset of faces of celebrities (the\n",
-    "CelebA dataset).\n",
-    "\n",
-    "<img src='./Bilder/smile_vector.jpg'>\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Variational autoencoders\n",
-    "\n",
-    "Variational autoencoders, simultaneously discovered by Kingma and Welling in\n",
-    "December 2013 and Rezende, Mohamed, and Wierstra in January 2014 are a kind\n",
-    "of generative model that’s especially appropriate for the task of image editing via concept\n",
-    "vectors. They’re a modern take on autoencoders — a type of network that aims to\n",
-    "encode an input to a low-dimensional latent space and then decode it back—that\n",
-    "mixes ideas from deep learning with Bayesian inference.\n",
-    "\n",
-    "\n",
-    "A classical image autoencoder takes an image, maps it to a latent vector space via\n",
-    "an encoder module, and then decodes it back to an output with the same dimensions\n",
-    "as the original image, via a decoder module,  see the figure below: \n",
-    "\n",
-    "<img src='./Bilder/autoencoder.jpg'>\n",
-    "\n",
-    "It’s then trained by\n",
-    "using as target data the same images as the input images, meaning the autoencoder\n",
-    "learns to reconstruct the original inputs. By imposing various constraints on the code\n",
-    "(the output of the encoder), you can get the autoencoder to learn more-or-less interesting\n",
-    "latent representations of the data. \n",
-    "\n",
-    "Most commonly, you’ll constrain the code to\n",
-    "be low-dimensional and sparse (mostly zeros), in which case the encoder acts as a way\n",
-    "to compress the input data into fewer bits of information."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "In practice, such classical autoencoders don’t lead to particularly useful or nicely\n",
-    "structured latent spaces. They’re not much good at compression, either. For these reasons,\n",
-    "they have largely fallen out of fashion. VAEs, however, augment autoencoders\n",
-    "with a little bit of statistical magic that forces them to learn continuous, highly structured\n",
-    "latent spaces. They have turned out to be a powerful tool for image generation.\n",
-    "\n",
-    "\n",
-    "A VAE, instead of compressing its input image into a fixed code in the latent space,\n",
-    "turns the image into the parameters of a statistical distribution: a mean and a variance.\n",
-    "Essentially, this means you’re assuming the input image has been generated by a\n",
-    "statistical process, and that the randomness of this process should be taken into\n",
-    "account during encoding and decoding. The VAE then uses the mean and variance\n",
-    "parameters to randomly sample one element of the distribution, and decodes that element\n",
-    "back to the original input, see the Figure below: \n",
-    "\n",
-    "\n",
-    "<img src='./Bilder/vae_illustration.jpg'>\n",
-    "\n",
-    "A VAE maps an image to two vectors, `z_mean` and `z_log_sigma`, which define\n",
-    "a probability distribution over the latent space, used to sample a latent point to decode.\n",
-    "\n",
-    "\n",
-    "The stochasticity of this process\n",
-    "improves robustness and forces the latent space to encode meaningful representations\n",
-    "everywhere: every point sampled in the latent space is decoded to a valid output."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "In technical terms, here’s how a VAE works:\n",
-    "\n",
-    "1. An encoder module turns the input samples input_img into two parameters in\n",
-    "a latent space of representations, `z_mean` and `z_log_variance`\n",
-    "\n",
-    "2. You randomly sample a point z from the latent normal distribution that’s\n",
-    "assumed to generate the input image, via\n",
-    "`z = z_mean + exp(z_log_variance) * epsilon`\n",
-    "\n",
-    "where epsilon is a random tensor of small values.\n",
-    "\n",
-    "3. A decoder module maps this point in the latent space back to the original input\n",
-    "image.\n",
-    "\n",
-    "\n",
-    "\n",
-    "Because `epsilon` is random, the process ensures that every point that’s close to the latent location\n",
-    "where you encoded `input_img` (z-mean) can be decoded to something similar to\n",
-    "`input_img`, thus forcing the latent space to be continuously meaningful. Any two close points\n",
-    "in the latent space will decode to highly similar images. Continuity, combined with the low\n",
-    "dimensionality of the latent space, forces every direction in the latent space to encode a meaningful\n",
-    "axis of variation of the data, making the latent space very structured and thus highly suitable\n",
-    "to manipulation via concept vectors.\n",
-    "\n",
-    "\n",
-    "\n",
-    "The parameters of a VAE are trained via two loss functions: a _reconstruction loss_ that\n",
-    "forces the decoded samples to match the initial inputs, and a _regularization loss_ that\n",
-    "helps learn well-formed latent spaces and reduce overfitting to the training data. Let’s\n",
-    "quickly go over a Keras implementation of a VAE. Schematically, it looks like this:"
-   ]
-  },
-  {
-   "cell_type": "raw",
-   "metadata": {},
-   "source": [
-    "# Encodes the input into a mean and variance parameter\n",
-    "z_mean, z_log_variance = encoder(input_img)\n",
-    "\n",
-    "# Draws a latent point using a small random epsilon\n",
-    "z = z_mean + exp(z_log_variance) * epsilon\n",
-    "\n",
-    "# Decodes z back to an image\n",
-    "reconstructed_img = decoder(z)\n",
-    "\n",
-    "# Instantiates the autoencoder model, which maps an \n",
-    "# input image to its reconstruction\n",
-    "model = Model(input_img, reconstructed_img)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "You can then train the model using the reconstruction loss and the regularization loss.\n",
-    "The following listing shows the encoder network you’ll use, mapping images to the\n",
-    "parameters of a probability distribution over the latent space. It’s a simple convnet\n",
-    "that maps the input image x to two vectors, `z_mean` and `z_log_var`."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Latent-space-sampling function"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import numpy as np\n",
-    "import tensorflow as tf\n",
-    "from tensorflow import keras\n",
-    "from tensorflow.keras import layers\n",
-    "\n",
-    "class Sampling(layers.Layer):\n",
-    "    \"\"\"Uses (z_mean, z_log_var) to sample z, the vector encoding a digit.\"\"\"\n",
-    "\n",
-    "    def call(self, inputs):\n",
-    "        z_mean, z_log_var = inputs\n",
-    "        batch = tf.shape(z_mean)[0]\n",
-    "        dim = tf.shape(z_mean)[1]\n",
-    "        epsilon = tf.keras.backend.random_normal(shape=(batch, dim))\n",
-    "        return z_mean + tf.exp(0.5 * z_log_var) * epsilon\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### VAE encoder network"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "latent_dim = 2\n",
-    "\n",
-    "encoder_inputs = keras.Input(shape=(28, 28, 1))\n",
-    "x = layers.Conv2D(32, 3, activation=\"relu\", strides=2, padding=\"same\")(encoder_inputs)\n",
-    "x = layers.Conv2D(64, 3, activation=\"relu\", strides=2, padding=\"same\")(x)\n",
-    "x = layers.Flatten()(x)\n",
-    "x = layers.Dense(16, activation=\"relu\")(x)\n",
-    "z_mean = layers.Dense(latent_dim, name=\"z_mean\")(x)\n",
-    "z_log_var = layers.Dense(latent_dim, name=\"z_log_var\")(x)\n",
-    "z = Sampling()([z_mean, z_log_var])\n",
-    "encoder = keras.Model(encoder_inputs, [z_mean, z_log_var, z], name=\"encoder\")\n",
-    "encoder.summary()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Next is the code for using `z_mean` and `z_log_var`, the parameters of the statistical distribution\n",
-    "assumed to have produced `input_img`, to generate a latent space point z.\n",
-    "Here, you wrap some arbitrary code (built on top of Keras backend primitives) into a\n",
-    "`Lambda` layer. In Keras, everything needs to be a layer, so code that isn’t part of a builtin\n",
-    "layer should be wrapped in a `Lambda` (or in a custom layer)."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### VAE decoder network, mapping latent space points to images\n",
-    "\n",
-    "The following listing shows the decoder implementation. You reshape the vector z to\n",
-    "the dimensions of an image and then use a few convolution layers to obtain a final\n",
-    "image output that has the same dimensions as the original `input_img`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "latent_inputs = keras.Input(shape=(latent_dim,))\n",
-    "x = layers.Dense(7 * 7 * 64, activation=\"relu\")(latent_inputs)\n",
-    "x = layers.Reshape((7, 7, 64))(x)\n",
-    "x = layers.Conv2DTranspose(64, 3, activation=\"relu\", strides=2, padding=\"same\")(x)\n",
-    "x = layers.Conv2DTranspose(32, 3, activation=\"relu\", strides=2, padding=\"same\")(x)\n",
-    "decoder_outputs = layers.Conv2DTranspose(1, 3, activation=\"sigmoid\", padding=\"same\")(x)\n",
-    "decoder = keras.Model(latent_inputs, decoder_outputs, name=\"decoder\")\n",
-    "decoder.summary()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The dual loss of a VAE doesn’t fit the traditional expectation of a sample-wise function\n",
-    "of the form `loss(input, target)`. Thus, you’ll set up the loss by writing a custom\n",
-    "layer that internally uses the built-in `add_loss` layer method to create an arbitrary loss."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Custom layer used to compute the VAE loss"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "class VAE(keras.Model):\n",
-    "    def __init__(self, encoder, decoder, **kwargs):\n",
-    "        super(VAE, self).__init__(**kwargs)\n",
-    "        self.encoder = encoder\n",
-    "        self.decoder = decoder\n",
-    "\n",
-    "    def train_step(self, data):\n",
-    "        if isinstance(data, tuple):\n",
-    "            data = data[0]\n",
-    "        with tf.GradientTape() as tape:\n",
-    "            z_mean, z_log_var, z = encoder(data)\n",
-    "            reconstruction = decoder(z)\n",
-    "            reconstruction_loss = tf.reduce_mean(\n",
-    "                keras.losses.binary_crossentropy(data, reconstruction)\n",
-    "            )\n",
-    "            reconstruction_loss *= 28 * 28\n",
-    "            kl_loss = 1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var)\n",
-    "            kl_loss = tf.reduce_mean(kl_loss)\n",
-    "            kl_loss *= -0.5\n",
-    "            total_loss = reconstruction_loss + kl_loss\n",
-    "        grads = tape.gradient(total_loss, self.trainable_weights)\n",
-    "        self.optimizer.apply_gradients(zip(grads, self.trainable_weights))\n",
-    "        return {\n",
-    "            \"loss\": total_loss,\n",
-    "            \"reconstruction_loss\": reconstruction_loss,\n",
-    "            \"kl_loss\": kl_loss,\n",
-    "        }"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Finally, you’re ready to instantiate and train the model. Because the loss is taken care\n",
-    "of in the custom layer, you don’t specify an external loss at compile time (`loss=None`),\n",
-    "which in turn means you won’t pass target data during training (as you can see, you\n",
-    "only pass `x_train` to the model in `fit`)."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Training the VAE"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "(x_train, _), (x_test, _) = keras.datasets.mnist.load_data()\n",
-    "mnist_digits = np.concatenate([x_train, x_test], axis=0)\n",
-    "mnist_digits = np.expand_dims(mnist_digits, -1).astype(\"float32\") / 255\n",
-    "\n",
-    "vae = VAE(encoder, decoder)\n",
-    "vae.compile(optimizer=keras.optimizers.Adam())\n",
-    "vae.fit(mnist_digits, epochs=30, batch_size=128)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Once such a model is trained—on MNIST, in this case—you can use the decoder network\n",
-    "to turn arbitrary latent space vectors into images."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Sampling a grid of points from the 2D latent space and decoding them to images"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import matplotlib.pyplot as plt\n",
-    "\n",
-    "\n",
-    "def plot_latent_space(vae, n=30, figsize=15):\n",
-    "    # display a n*n 2D manifold of digits\n",
-    "    digit_size = 28\n",
-    "    scale = 1.0\n",
-    "    figure = np.zeros((digit_size * n, digit_size * n))\n",
-    "    # linearly spaced coordinates corresponding to the 2D plot\n",
-    "    # of digit classes in the latent space\n",
-    "    grid_x = np.linspace(-scale, scale, n)\n",
-    "    grid_y = np.linspace(-scale, scale, n)[::-1]\n",
-    "\n",
-    "    for i, yi in enumerate(grid_y):\n",
-    "        for j, xi in enumerate(grid_x):\n",
-    "            z_sample = np.array([[xi, yi]])\n",
-    "            x_decoded = vae.decoder.predict(z_sample)\n",
-    "            digit = x_decoded[0].reshape(digit_size, digit_size)\n",
-    "            figure[\n",
-    "                i * digit_size : (i + 1) * digit_size,\n",
-    "                j * digit_size : (j + 1) * digit_size,\n",
-    "            ] = digit\n",
-    "\n",
-    "    plt.figure(figsize=(figsize, figsize))\n",
-    "    start_range = digit_size // 2\n",
-    "    end_range = n * digit_size + start_range\n",
-    "    pixel_range = np.arange(start_range, end_range, digit_size)\n",
-    "    sample_range_x = np.round(grid_x, 1)\n",
-    "    sample_range_y = np.round(grid_y, 1)\n",
-    "    plt.xticks(pixel_range, sample_range_x)\n",
-    "    plt.yticks(pixel_range, sample_range_y)\n",
-    "    plt.xlabel(\"z[0]\")\n",
-    "    plt.ylabel(\"z[1]\")\n",
-    "    plt.imshow(figure, cmap=\"Greys_r\")\n",
-    "    plt.show()\n",
-    "\n",
-    "\n",
-    "plot_latent_space(vae)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The grid of sampled digits (see the Figure above) shows a completely continuous\n",
-    "distribution of the different digit classes, with one digit morphing\n",
-    "into another as you follow a path through latent space. Specific directions\n",
-    "in this space have a meaning: for example, there’s a direction for\n",
-    "“four-ness,” “one-ness,” and so on.\n",
-    "\n",
-    "\n",
-    "In the next section, we’ll cover in detail the other major tool for generating\n",
-    "artificial images: generative adversarial networks (GANs)."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Display how the latent space clusters different digit classes"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def plot_label_clusters(vae, data, labels):\n",
-    "    # display a 2D plot of the digit classes in the latent space\n",
-    "    z_mean, _, _ = vae.encoder.predict(data)\n",
-    "    plt.figure(figsize=(12, 10))\n",
-    "    plt.scatter(z_mean[:, 0], z_mean[:, 1], c=labels)\n",
-    "    plt.colorbar()\n",
-    "    plt.xlabel(\"z[0]\")\n",
-    "    plt.ylabel(\"z[1]\")\n",
-    "    plt.show()\n",
-    "\n",
-    "\n",
-    "(x_train, y_train), _ = keras.datasets.mnist.load_data()\n",
-    "x_train = np.expand_dims(x_train, -1).astype(\"float32\") / 255\n",
-    "\n",
-    "plot_label_clusters(vae, x_train, y_train)\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Wrapping up \n",
-    "\n",
-    "Wrapping up\n",
-    "- Image generation with deep learning is done by learning latent spaces that capture\n",
-    "statistical information about a dataset of images. By sampling and decoding\n",
-    "points from the latent space, you can generate never-before-seen images.\n",
-    "There are two major tools to do this: VAEs and GANs.\n",
-    "\n",
-    "- VAEs result in highly structured, continuous latent representations. For this reason,\n",
-    "they work well for doing all sorts of image editing in latent space: face\n",
-    "swapping, turning a frowning face into a smiling face, and so on. They also work\n",
-    "nicely for doing latent-space-based animations, such as animating a walk along a\n",
-    "cross section of the latent space, showing a starting image slowly morphing into\n",
-    "different images in a continuous way.\n",
-    "\n",
-    "- GANs enable the generation of realistic single-frame images but may not induce\n",
-    "latent spaces with solid structure and high continuity.\n",
-    "Most successful practical applications I have seen with images rely on VAEs, but GANs\n",
-    "are extremely popular in the world of academic research—at least, circa 2016–2017.\n",
-    "You’ll find out how they work and how to implement one in the next section.\n",
-    "\n",
-    "## Extensions for VAE\n",
-    "\n",
-    "\n",
-    "To play further with image generation, I suggest working with the [Largescale\n",
-    "Celeb Faces Attributes](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html) (CelebA) dataset. It’s a free-to-download image\n",
-    "dataset containing more than 200,000 celebrity portraits. It’s great for experimenting\n",
-    "with concept vectors in particular—it definitely beats MNIST."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Part IV : Adverserial Networks"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Generative adversarial networks (GANs), introduced in 2014 by \n",
-    "[Goodfellow et al.,](https://arxiv.org/abs/1406.2661) are\n",
-    "an alternative to VAEs for learning latent spaces of images. They enable the generation\n",
-    "of fairly realistic synthetic images by forcing the generated images to be statistically\n",
-    "almost indistinguishable from real ones.\n",
-    "\n",
-    "\n",
-    "An intuitive way to understand GANs is to imagine a forger trying to create a fake\n",
-    "Picasso painting. At first, the forger is pretty bad at the task. He mixes some of his\n",
-    "fakes with authentic Picassos and shows them all to an art dealer. The art dealer makes\n",
-    "an authenticity assessment for each painting and gives the forger feedback about what\n",
-    "makes a Picasso look like a Picasso. The forger goes back to his studio to prepare some\n",
-    "new fakes. As times goes on, the forger becomes increasingly competent at imitating\n",
-    "the style of Picasso, and the art dealer becomes increasingly expert at spotting fakes.\n",
-    "In the end, they have on their hands some excellent fake Picassos.\n",
-    "\n",
-    "\n",
-    "That’s what a GAN is: a forger network and an expert network, each being trained\n",
-    "to best the other. As such, a GAN is made of two parts:\n",
-    "\n",
-    "- _Generator network_ — Takes as input a random vector (a random point in the\n",
-    "latent space), and decodes it into a synthetic image\n",
-    "- _Discriminator network_ (or adversary) — Takes as input an image (real or synthetic),\n",
-    "and predicts whether the image came from the training set or was created by\n",
-    "the generator network.\n",
-    "\n",
-    "The generator network is trained to be able to fool the discriminator network, and\n",
-    "thus it evolves toward generating increasingly realistic images as training goes on: artificial\n",
-    "images that look indistinguishable from real ones, to the extent that it’s impossible\n",
-    "for the discriminator network to tell the two apart (see figure 8.15). Meanwhile,\n",
-    "the discriminator is constantly adapting to the gradually improving capabilities of the\n",
-    "generator, setting a high bar of realism for the generated images. Once training is\n",
-    "over, the generator is capable of turning any point in its input space into a believable\n",
-    "image. Unlike VAEs, this latent space has fewer explicit guarantees of meaningful\n",
-    "structure; in particular, it isn’t continuous.\n",
-    "\n",
-    "\n",
-    "<img src='./Bilder/gan_illustration.jpg'>\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Remarkably, a GAN is a system where the optimization minimum isn’t fixed, unlike in\n",
-    "any other training setup you’ve encountered in this book. Normally, gradient descent\n",
-    "consists of rolling down hills in a static loss landscape. But with a GAN, every step\n",
-    "taken down the hill changes the entire landscape a little. \n",
-    "\n",
-    "It’s a dynamic system where\n",
-    "the optimization process is seeking not a minimum, but an equilibrium between two\n",
-    "forces. For this reason, GANs are notoriously difficult to train—getting a GAN to work\n",
-    "requires lots of careful tuning of the model architecture and training parameters"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### A schematic GAN implementation\n",
-    "\n",
-    "In this section, we’ll explain how to implement a GAN in Keras, in its barest form—\n",
-    "because GANs are advanced, diving deeply into the technical details would be out of\n",
-    "scope for this chapter. The specific implementation is a _deep convolutional GAN_ (DCGAN):\n",
-    "a GAN where the generator and discriminator are deep convnets. In particular, it uses\n",
-    "a `Conv2DTranspose` layer for image upsampling in the generator.\n",
-    "You’ll train the GAN on images from CIFAR10, a dataset of 50,000 32 × 32 RGB\n",
-    "images belonging to 10 classes (5,000 images per class). To make things easier, you’ll\n",
-    "only use images belonging to the class “frog.”\n",
-    "Schematically, the GAN looks like this:\n",
-    "\n",
-    "1. A generator network maps vectors of shape (`latent_dim`) to images of shape\n",
-    "$(32, 32, 3)$.\n",
-    "\n",
-    "2. A discriminator network maps images of shape $(32, 32, 3)$ to a binary score\n",
-    "estimating the probability that the image is real.\n",
-    "\n",
-    "3. A gan network chains the generator and the discriminator together: \n",
-    "`gan(x) = discriminator(generator(x))`. Thus this gan network maps latent space vectors\n",
-    "to the discriminator’s assessment of the realism of these latent vectors as\n",
-    "decoded by the generator.\n",
-    "\n",
-    "\n",
-    "4. You train the discriminator using examples of real and fake images along with\n",
-    "“real”/“fake” labels, just as you train any regular image-classification model.\n",
-    "\n",
-    "5. To train the generator, you use the gradients of the generator’s weights with\n",
-    "regard to the loss of the gan model. This means, at every step, you move the\n",
-    "weights of the generator in a direction that makes the discriminator more likely\n",
-    "to classify as “real” the images decoded by the generator. In other words, you\n",
-    "train the generator to fool the discriminator."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### A bag of tricks\n",
-    "\n",
-    "The process of training GANs and tuning GAN implementations is notoriously difficult.\n",
-    "There are a number of known tricks you should keep in mind. Like most things\n",
-    "in deep learning, it’s more alchemy than science: these tricks are heuristics, not\n",
-    "theory-backed guidelines. They’re supported by a level of intuitive understanding of\n",
-    "the phenomenon at hand, and they’re known to work well empirically, although not\n",
-    "necessarily in every context.\n",
-    "Here are a few of the tricks used in the implementation of the GAN generator and\n",
-    "discriminator in this section. It isn’t an exhaustive list of GAN-related tips; you’ll find\n",
-    "many more across the GAN literature:\n",
-    "\n",
-    "- We use `tanh` as the last activation in the generator, instead of `sigmoid`, which is\n",
-    "more commonly found in other types of models.\n",
-    "\n",
-    "- We sample points from the latent space using a _normal distribution_ (Gaussian distribution),\n",
-    "not a uniform distribution.\n",
-    "\n",
-    "- Stochasticity is good to induce robustness. Because GAN training results in a\n",
-    "dynamic equilibrium, GANs are likely to get stuck in all sorts of ways. Introducing\n",
-    "randomness during training helps prevent this. We introduce randomness\n",
-    "in two ways: by using dropout in the discriminator and by adding random noise\n",
-    "to the labels for the discriminator.\n",
-    "\n",
-    "- Sparse gradients can hinder GAN training. In deep learning, sparsity is often a\n",
-    "desirable property, but not in GANs. Two things can induce gradient sparsity:\n",
-    "max pooling operations and `ReLU` activations. Instead of max pooling, we recommend\n",
-    "using strided convolutions for downsampling, and we recommend\n",
-    "using a `LeakyReLU` layer instead of a ReLU activation. It’s similar to `ReLU`, but it\n",
-    "relaxes sparsity constraints by allowing small negative activation values.\n",
-    "\n",
-    "- In generated images, it’s common to see checkerboard artifacts caused by\n",
-    "unequal coverage of the pixel space in the generator (see figure 8.17). To fix\n",
-    "this, we use a kernel size that’s divisible by the stride size whenever we use a\n",
-    "strided `Conv2DTranpose` or `Conv2D` in both the generator and the discriminator."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## The generator\n",
-    "\n",
-    "First, let’s develop a `generator` model that turns a vector (from the latent space—\n",
-    "during training it will be sampled at random) into a candidate image. One of the\n",
-    "many issues that commonly arise with GANs is that the generator gets stuck with generated\n",
-    "images that look like noise. A possible solution is to use dropout on both the discriminator\n",
-    "and the generator."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### GAN generator network"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import tensorflow\n",
-    "from tensorflow import keras\n",
-    "from tensorflow.keras import layers\n",
-    "import numpy as np\n",
-    "latent_dim = 32\n",
-    "height = 32\n",
-    "width = 32\n",
-    "channels = 3\n",
-    "\n",
-    "generator_input = keras.Input(shape=(latent_dim,))\n",
-    "# Transforms the input into a 16 × 16 128-channel feature map\n",
-    "x = layers.Dense(128 * 16 * 16)(generator_input)\n",
-    "x = layers.LeakyReLU()(x)\n",
-    "x = layers.Reshape((16, 16, 128))(x)\n",
-    "x = layers.Conv2D(256, 5, padding='same')(x)\n",
-    "x = layers.LeakyReLU()(x)\n",
-    "# Upsamples to 32 × 32\n",
-    "x = layers.Conv2DTranspose(256, 4, strides=2, padding='same')(x)\n",
-    "x = layers.LeakyReLU()(x)\n",
-    "x = layers.Conv2D(256, 5, padding='same')(x)\n",
-    "x = layers.LeakyReLU()(x)\n",
-    "x = layers.Conv2D(256, 5, padding='same')(x)\n",
-    "x = layers.LeakyReLU()(x)\n",
-    "\n",
-    "# Produces a 32 × 32 1-channel feature map (shape of a CIFAR10 image)\n",
-    "x = layers.Conv2D(channels, 7, activation='tanh', padding='same')(x)\n",
-    "# Instantiates the generator model, which maps the input\n",
-    "# of shape (latent_dim,) into an image of shape (32, 32, 3)\n",
-    "generator = keras.models.Model(generator_input, x)\n",
-    "generator.summary()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### The discriminator\n",
-    "\n",
-    "Next, you’ll develop a discriminator model that takes as input a candidate image\n",
-    "(real or synthetic) and classifies it into one of two classes: “generated image” or “real\n",
-    "image that comes from the training set.”"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### The GAN discriminator network"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "discriminator_input = layers.Input(shape=(height, width, channels))\n",
-    "x = layers.Conv2D(128, 3)(discriminator_input)\n",
-    "x = layers.LeakyReLU()(x)\n",
-    "x = layers.Conv2D(128, 4, strides=2)(x)\n",
-    "x = layers.LeakyReLU()(x)\n",
-    "x = layers.Conv2D(128, 4, strides=2)(x)\n",
-    "x = layers.LeakyReLU()(x)\n",
-    "x = layers.Conv2D(128, 4, strides=2)(x)\n",
-    "x = layers.LeakyReLU()(x)\n",
-    "x = layers.Flatten()(x)\n",
-    "# One dropout layer: an important trick!\n",
-    "x = layers.Dropout(0.4)(x)\n",
-    "# Classification layer\n",
-    "x = layers.Dense(1, activation='sigmoid')(x)\n",
-    "# Instantiates the discriminator model, which turns\n",
-    "# a (32, 32, 3) input into a binary classification\n",
-    "# decision (fake/real)\n",
-    "discriminator = tensorflow.keras.models.Model(discriminator_input, x)\n",
-    "discriminator.summary()\n",
-    "discriminator_optimizer = tensorflow.keras.optimizers.RMSprop(\n",
-    "lr=0.0008,\n",
-    "    # Uses gradient clipping (by value) in the optimizer\n",
-    "clipvalue=1.0,\n",
-    "    # To stabilize training, uses learning-rate decay\n",
-    "decay=1e-8)\n",
-    "discriminator.compile(optimizer=discriminator_optimizer,\n",
-    "loss='binary_crossentropy')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## The adversarial network\n",
-    "\n",
-    "Finally, you’ll set up the GAN, which chains the generator and the discriminator.\n",
-    "When trained, this model will move the generator in a direction that improves its ability\n",
-    "to fool the discriminator. This model turns latent-space points into a classification\n",
-    "decision—“fake” or “real”—and it’s meant to be trained with labels that are always\n",
-    "“these are real images.” So, training gan will update the weights of `generator` in a way\n",
-    "that makes discriminator more likely to predict “real” when looking at fake images.\n",
-    "It’s very important to note that you set the `discriminator` to be frozen during training\n",
-    "(non-trainable): its weights won’t be updated when training gan. If the discriminator\n",
-    "weights could be updated during this process, then you’d be training the discriminator\n",
-    "to always predict “real,” which isn’t what you want!"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Adversarial network"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "discriminator.trainable = False\n",
-    "gan_input = tensorflow.keras.Input(shape=(latent_dim,))\n",
-    "gan_output = discriminator(generator(gan_input))\n",
-    "gan = tensorflow.keras.models.Model(gan_input, gan_output)\n",
-    "gan_optimizer = tensorflow.keras.optimizers.RMSprop(lr=0.0004, clipvalue=1.0, decay=1e-8)\n",
-    "gan.compile(optimizer=gan_optimizer, loss='binary_crossentropy')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## How to train your DCGAN\n",
-    "\n",
-    "Now you can begin training. To recapitulate, this is what the training loop looks like\n",
-    "schematically. For each epoch, you do the following:\n",
-    "1. Draw random points in the latent space (random noise).\n",
-    "\n",
-    "2. Generate images with `generator` using this random noise\n",
-    "\n",
-    "3. Mix the generated images with real ones\n",
-    "\n",
-    "4. Train `discriminator` using these mixed images, with corresponding targets:\n",
-    "either “real” (for the real images) or “fake” (for the generated images)\n",
-    "\n",
-    "5. Draw new random points in the latent space\n",
-    "\n",
-    "6. Train gan using these random vectors, with targets that all say “these are real\n",
-    "images.” This updates the weights of the generator (only, because the discriminator\n",
-    "is frozen inside gan) to move them toward getting the discriminator to\n",
-    "predict “these are real images” for generated images: this trains the generator\n",
-    "to fool the discriminator.\n",
-    "Let’s implement it."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Implementing GAN training"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import os\n",
-    "from tensorflow.keras.preprocessing import image\n",
-    "# Loads CIFAR10 data\n",
-    "(x_train, y_train), (_, _) = keras.datasets.cifar10.load_data()\n",
-    "\n",
-    "# Selects frog images (class 6)\n",
-    "x_train = x_train[y_train.flatten() == 6]\n",
-    "\n",
-    "# Normalizes data\n",
-    "x_train = x_train.reshape((x_train.shape[0],) + (height, width, channels)).astype('float32') / 255.\n",
-    "iterations = 10000\n",
-    "batch_size = 20\n",
-    "\n",
-    "# Specifies where you want to save generated images\n",
-    "save_dir = './data/'\n",
-    "start = 0\n",
-    "for step in range(iterations):\n",
-    "    # Samples random points in the latent space\n",
-    "    random_latent_vectors = np.random.normal(size=(batch_size, latent_dim))\n",
-    "    # Decodes them to fake images\n",
-    "    generated_images = generator.predict(random_latent_vectors)\n",
-    "    # Combines them with real images\n",
-    "    stop = start + batch_size\n",
-    "    real_images = x_train[start: stop]\n",
-    "    combined_images = np.concatenate([generated_images, real_images])\n",
-    "    # Assembles labels, discriminating real from fake images\n",
-    "    labels = np.concatenate([np.ones((batch_size, 1)), np.zeros((batch_size, 1))])\n",
-    "    # Adds random noise to the labels—an important trick!\n",
-    "    labels += 0.05 * np.random.random(labels.shape)\n",
-    "    d_loss = discriminator.train_on_batch(combined_images, labels)\n",
-    "    # Samples random points in the latent space\n",
-    "    random_latent_vectors = np.random.normal(size=(batch_size, latent_dim))\n",
-    "    # Assembles labels that say “these are all real images” (it’s a lie!)\n",
-    "    misleading_targets = np.zeros((batch_size, 1))\n",
-    "    # Trains the generator (via the gan model, where the discriminator weights are frozen)\n",
-    "    a_loss = gan.train_on_batch(random_latent_vectors, misleading_targets)\n",
-    "    \n",
-    "    start += batch_size\n",
-    "    if start > len(x_train) - batch_size:\n",
-    "        start = 0\n",
-    "    \n",
-    "    # Occasionally saves and plots (every 100 steps)\n",
-    "    if step % 100 == 0:\n",
-    "        # Saves model weights\n",
-    "        gan.save_weights('gan.h5')\n",
-    "        # Prints metrics\n",
-    "        print('discriminator loss:', d_loss)\n",
-    "        print('adversarial loss:', a_loss)\n",
-    "        # Saves one generated image\n",
-    "        img = image.array_to_img(generated_images[0] * 255., scale=False)\n",
-    "        img.save(os.path.join(save_dir, 'generated_frog' + str(step) + '.png'))\n",
-    "        # Saves one real image for comparison\n",
-    "        img = image.array_to_img(real_images[0] * 255., scale=False)\n",
-    "        img.save(os.path.join(save_dir, 'real_frog' + str(step) + '.png'))\n",
-    "        \n",
-    "        \n",
-    "        "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "When training, you may see the adversarial loss begin to increase considerably, while\n",
-    "the discriminative loss tends to zero—the discriminator may end up dominating the\n",
-    "generator. If that’s the case, try reducing the discriminator learning rate, and increase\n",
-    "the dropout rate of the discriminator."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Wrapping up\n",
-    "- A GAN consists of a generator network coupled with a discriminator network.\n",
-    "The discriminator is trained to differenciate between the output of the generator\n",
-    "and real images from a training dataset, and the generator is trained to fool the\n",
-    "discriminator. Remarkably, the generator never sees images from the training\n",
-    "set directly; the information it has about the data comes from the discriminator.\n",
-    "\n",
-    "- GANs are difficult to train, because training a GAN is a dynamic process rather\n",
-    "than a simple gradient descent process with a fixed loss landscape. Getting a\n",
-    "GAN to train correctly requires using a number of heuristic tricks, as well as\n",
-    "extensive tuning.\n",
-    "\n",
-    "- GANs can potentially produce highly realistic images. But unlike VAEs, the\n",
-    "latent space they learn doesn’t have a neat continuous structure and thus may\n",
-    "not be suited for certain practical applications, such as image editing via latentspace\n",
-    "concept vectors."
-   ]
   }
  ],
  "metadata": {
-- 
GitLab