{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part I : Language Models and Recurrent Neural Networks"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Many of the classical texts are no longer protected under copyright.\n",
    "\n",
    "This means that you can download all of the text for these books for free and use them in experiments, like creating generative models. Perhaps the best place to get access to free books that are no longer protected by copyright is [Project Gutenberg](https://www.gutenberg.org/).\n",
    "\n",
    "In this tutorial we are going to use [Goethes Faust I](http://www.gutenberg.org/files/21000/21000-8.txt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " mögt ihr walten,\n",
      "Wie ihr aus Dunst und Nebel um mich steigt;\n",
      "Mein Busen fühlt sich jugendlich erschüttert\n",
      "Vom Zauberhauch der euren Zug umwittert.\n",
      "\n",
      "Ihr bringt mit euch die Bilder froher Tage,\n",
      "Und man\n",
      "corpus length in characters: 218851\n"
     ]
    }
   ],
   "source": [
    "from __future__ import print_function\n",
    "from tensorflow.keras.callbacks import LambdaCallback\n",
    "from tensorflow.keras.models import Sequential\n",
    "from tensorflow.keras.layers import Dense\n",
    "from tensorflow.keras.layers import LSTM\n",
    "from tensorflow.keras.models import load_model\n",
    "from tensorflow.keras.optimizers import RMSprop\n",
    "from tensorflow.keras.utils import get_file\n",
    "from tensorflow.keras.layers import Bidirectional\n",
    "from tensorflow.keras.layers import Input, Embedding, Dropout, Activation\n",
    "import numpy as np\n",
    "import random\n",
    "import sys\n",
    "import io\n",
    "import string\n",
    "\n",
    "# If you prefer Nietzsche in english, then go for this text\n",
    "# path = get_file(\n",
    "#                 'nietzsche.txt',\n",
    "#                  origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')\n",
    "\n",
    "path = get_file('21000-8.txt',\n",
    "                origin='http://www.gutenberg.org/files/21000/21000-8.txt')\n",
    "    \n",
    "\n",
    "with io.open(path, encoding='ISO-8859-1', errors='ignore') as f:\n",
    "    text = f.read()\n",
    "\n",
    "# print somewhere in the middle of the text 200 characters\n",
    "print(text[1200:1400])\n",
    "\n",
    "# print corpus length\n",
    "print('corpus length in characters:', len(text))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Clean Text\n",
    "\n",
    "We need to transform the raw text into a sequence of tokens or words that we can use as a source \n",
    "to train the model.\n",
    "\n",
    "Based on reviewing the raw text (above), below are some specific operations you may want to explore \n",
    "yourself as an extension.\n",
    "\n",
    "- Replace ‘–‘ with a white space so we can split words better.\n",
    "- Split words based on white space.\n",
    "- Remove all punctuation from words to reduce the vocabulary size (e.g. ‘What?’ becomes ‘What’).\n",
    "- Remove all words that are not alphabetic to remove standalone punctuation tokens.\n",
    "- Normalize all words to lowercase to reduce the vocabulary size.\n",
    "\n",
    "Vocabulary size is a big deal with language modeling. A smaller vocabulary results in a smaller model that trains faster.\n",
    "\n",
    "We can implement each of these cleaning operations in this order in a function. Below is the function `clean_doc()` that takes a loaded document as an argument and returns an array of clean tokens."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# turn a doc into clean tokens\n",
    "def clean_doc(doc):\n",
    "    # replace '--' with a space ' '\n",
    "    doc = doc.replace('--', ' ')\n",
    "    # split into tokens by white space\n",
    "    tokens = doc.split()\n",
    "    # remove punctuation from each token\n",
    "    table = str.maketrans('', '', string.punctuation)\n",
    "    tokens = [w.translate(table) for w in tokens]\n",
    "    # remove remaining tokens that are not alphabetic\n",
    "    tokens = [word for word in tokens if word.isalpha()]\n",
    "    # make lower case\n",
    "    tokens = [word.lower() for word in tokens]\n",
    "    return tokens"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can run this cleaning operation on our loaded document and print out some of the tokens and statistics as a sanity check. After this, `doc` is a big array containing all the corpus, word by word."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['ihr', 'naht', 'euch', 'wieder', 'schwankende', 'gestalten', 'die', 'früh', 'sich', 'einst', 'dem', 'trüben', 'blick', 'gezeigt', 'versuch', 'ich', 'wohl', 'euch', 'diesmal', 'fest', 'zu', 'halten', 'fühl', 'ich', 'mein', 'herz', 'noch', 'jenem', 'wahn', 'geneigt', 'ihr', 'drängt', 'euch', 'zu', 'nun', 'gut', 'so', 'mögt', 'ihr', 'walten', 'wie', 'ihr', 'aus', 'dunst', 'und', 'nebel', 'um', 'mich', 'steigt', 'mein', 'busen', 'fühlt', 'sich', 'jugendlich', 'erschüttert', 'vom', 'zauberhauch', 'der', 'euren', 'zug', 'umwittert', 'ihr', 'bringt', 'mit', 'euch', 'die', 'bilder', 'froher', 'tage', 'und', 'manche', 'liebe', 'schatten', 'steigen', 'auf', 'gleich', 'einer', 'alten', 'halbverklungnen', 'sage', 'kommt', 'erste', 'lieb', 'und', 'freundschaft', 'mit', 'herauf', 'der', 'schmerz', 'wird', 'neu', 'es', 'wiederholt', 'die', 'klage', 'des', 'lebens', 'labyrinthisch', 'irren', 'lauf', 'und', 'nennt', 'die', 'guten', 'die', 'um', 'schöne', 'stunden', 'vom', 'glück', 'getäuscht', 'vor', 'mir', 'hinweggeschwunden', 'sie', 'hören', 'nicht', 'die', 'folgenden', 'gesänge', 'die', 'seelen', 'denen', 'ich', 'die', 'ersten', 'sang', 'zerstoben', 'ist', 'das', 'freundliche', 'gedränge', 'verklungen', 'ach', 'der', 'erste', 'wiederklang', 'mein', 'leidlied', 'ertönt', 'der', 'unbekannten', 'menge', 'ihr', 'beyfall', 'selbst', 'macht', 'meinem', 'herzen', 'bang', 'und', 'was', 'sich', 'sonst', 'an', 'meinem', 'lied', 'erfreuet', 'wenn', 'es', 'noch', 'lebt', 'irrt', 'in', 'der', 'welt', 'zerstreuet', 'und', 'mich', 'ergreift', 'ein', 'längst', 'entwöhntes', 'sehnen', 'nach', 'jenem', 'stillen', 'ernsten', 'geisterreich', 'es', 'schwebet', 'nun', 'in', 'unbestimmten', 'tönen', 'mein', 'lispelnd', 'lied', 'der', 'aeolsharfe', 'gleich', 'ein', 'schauer', 'faßt', 'mich', 'thräne', 'folgt', 'den', 'thränen', 'das']\n",
      "Total Tokens: 33600\n",
      "Unique Tokens: 7005\n"
     ]
    }
   ],
   "source": [
    "# clean document\n",
    "tokens = clean_doc(text)\n",
    "# we skip the gutenberg introduction\n",
    "tokens = tokens[122:]\n",
    "print(tokens[:200])\n",
    "print('Total Tokens: %d' % len(tokens))\n",
    "print('Unique Tokens: %d' % len(set(tokens)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We also get some statistics about the clean document.\n",
    "\n",
    "We can see that there are approximately 33'600 words in the clean text and a vocabulary of just under 7'005 words. This is smallish and models fit on this data should be manageable on modest hardware."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Save Clean Text\n",
    "\n",
    "We can organize the long list of tokens into sequences of 50 input words and 1 output word.\n",
    "That is, sequences of 51 words.\n",
    "\n",
    "We can do this by iterating over the list of tokens from token 51 onwards and taking the prior 50 tokens as a sequence, then repeating this process to the end of the list of tokens.\n",
    "\n",
    "We will transform the tokens into space-separated strings for later storage in a file.\n",
    "The code to split the list of clean tokens into sequences with a length of 51 tokens is listed below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Total Sequences: 33549\n",
      "['ihr naht euch wieder schwankende gestalten die früh sich einst dem trüben blick gezeigt versuch ich wohl euch diesmal fest zu halten fühl ich mein herz noch jenem wahn geneigt ihr drängt euch zu nun gut so mögt ihr walten wie ihr aus dunst und nebel um mich steigt mein busen', 'naht euch wieder schwankende gestalten die früh sich einst dem trüben blick gezeigt versuch ich wohl euch diesmal fest zu halten fühl ich mein herz noch jenem wahn geneigt ihr drängt euch zu nun gut so mögt ihr walten wie ihr aus dunst und nebel um mich steigt mein busen fühlt', 'euch wieder schwankende gestalten die früh sich einst dem trüben blick gezeigt versuch ich wohl euch diesmal fest zu halten fühl ich mein herz noch jenem wahn geneigt ihr drängt euch zu nun gut so mögt ihr walten wie ihr aus dunst und nebel um mich steigt mein busen fühlt sich']\n"
     ]
    }
   ],
   "source": [
    "# organize into sequences of tokens\n",
    "length = 50 + 1\n",
    "sequences = list()\n",
    "for i in range(length, len(tokens)):\n",
    "    # select sequence of tokens\n",
    "    seq = tokens[i-length:i]\n",
    "    # convert into a line\n",
    "    line = ' '.join(seq)\n",
    "    # store\n",
    "    sequences.append(line)\n",
    "print('Total Sequences: %d' % len(sequences))\n",
    "print(sequences[0:3])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Running this piece creates a long list of lines. Printing statistics \n",
    "on the list, we can see that we will have exactly 33'549 training \n",
    "patterns to fit our model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we can save the sequences to a new file for later loading.\n",
    "\n",
    "We can define a new function for saving lines of text to a file. \n",
    "This new function is called `save_doc()` and is listed below. It \n",
    "takes as input a list of lines and a filename. The lines are written, \n",
    "one per line, in ASCII format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# save tokens to file, one dialog per line\n",
    "def save_doc(lines, filename):\n",
    "    data = '\\n'.join(lines)\n",
    "    file = open(filename, 'w')\n",
    "    file.write(data)\n",
    "    file.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can call this function and save our training sequences to the file `goethe_sequences.txt`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# save sequences to file\n",
    "out_filename = 'goethe_sequences.txt'\n",
    "save_doc(sequences, out_filename)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Take a look at the file with your text editor.\n",
    "\n",
    "You will see that each line is shifted along one word, with a new word at the end to be predicted."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Train Language Model\n",
    "\n",
    "We can now train a statistical language model from the prepared data.\n",
    "\n",
    "The model we will train is a neural language model. It has a few unique characteristics:\n",
    "\n",
    "- It uses a distributed representation for words so that different words with similar meanings \n",
    "  will have a similar representation.\n",
    "- It learns the representation at the same time as learning the model.\n",
    "- It learns to predict the probability for the next word using the context of the last 50 words.\n",
    "\n",
    "Specifically, we will use an Embedding Layer to learn the representation of words, and a Long Short-Term Memory (LSTM) recurrent neural network to learn to predict words based on their context.\n",
    "\n",
    "Let’s start by loading our training data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load Sequences\n",
    "\n",
    "We can load our training data using the `load_doc()` function we developed in the previous section.\n",
    "\n",
    "Once loaded, we can split the data into separate training sequences by splitting based on new lines.\n",
    "\n",
    "The snippet below will load the ‘nietzsche_sequences.txt' from the current working directory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load doc into memory\n",
    "def load_doc(filename):\n",
    "    # open the file as read only\n",
    "    file = open(filename, 'r')\n",
    "    # read all text\n",
    "    text = file.read()\n",
    "    # close the file\n",
    "    file.close()\n",
    "    return text\n",
    " \n",
    "# load\n",
    "in_filename = 'goethe_sequences.txt'\n",
    "doc = load_doc(in_filename)\n",
    "lines = doc.split('\\n')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we can encode the training data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Encode Sequences\n",
    "\n",
    "The word embedding layer expects input sequences to be comprised of integers.\n",
    "\n",
    "We can map each word in our vocabulary to a unique integer and encode our input \n",
    "sequences. Later, when we make predictions, we can convert the prediction to numbers \n",
    "and look up their associated words in the same mapping.\n",
    "\n",
    "To do this encoding, we will use the `Tokenizer` class in the Keras API.\n",
    "\n",
    "First, the Tokenizer must be trained on the entire training dataset, which means it \n",
    "finds all of the unique words in the data and assigns each a unique integer.\n",
    "\n",
    "We can then use the fit Tokenizer to encode all of the training sequences, converting \n",
    "each sequence from a list of words to a list of integers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[21, 2606, 33, 103, 2605, 1639, 3, 654, 19, 944, 27, 2604, 356, 7002, 2602, 2, 61, 33, 2601, 389, 8, 483, 355, 2, 44, 134, 51, 564, 782, 7000, 21, 321, 33, 8, 47, 102, 14, 6998, 21, 6996, 25, 21, 55, 2600, 1, 652, 69, 22, 482, 44, 190], [2606, 33, 103, 2605, 1639, 3, 654, 19, 944, 27, 2604, 356, 7002, 2602, 2, 61, 33, 2601, 389, 8, 483, 355, 2, 44, 134, 51, 564, 782, 7000, 21, 321, 33, 8, 47, 102, 14, 6998, 21, 6996, 25, 21, 55, 2600, 1, 652, 69, 22, 482, 44, 190, 484], [33, 103, 2605, 1639, 3, 654, 19, 944, 27, 2604, 356, 7002, 2602, 2, 61, 33, 2601, 389, 8, 483, 355, 2, 44, 134, 51, 564, 782, 7000, 21, 321, 33, 8, 47, 102, 14, 6998, 21, 6996, 25, 21, 55, 2600, 1, 652, 69, 22, 482, 44, 190, 484, 19], [103, 2605, 1639, 3, 654, 19, 944, 27, 2604, 356, 7002, 2602, 2, 61, 33, 2601, 389, 8, 483, 355, 2, 44, 134, 51, 564, 782, 7000, 21, 321, 33, 8, 47, 102, 14, 6998, 21, 6996, 25, 21, 55, 2600, 1, 652, 69, 22, 482, 44, 190, 484, 19, 2607], [2605, 1639, 3, 654, 19, 944, 27, 2604, 356, 7002, 2602, 2, 61, 33, 2601, 389, 8, 483, 355, 2, 44, 134, 51, 564, 782, 7000, 21, 321, 33, 8, 47, 102, 14, 6998, 21, 6996, 25, 21, 55, 2600, 1, 652, 69, 22, 482, 44, 190, 484, 19, 2607, 2608], [1639, 3, 654, 19, 944, 27, 2604, 356, 7002, 2602, 2, 61, 33, 2601, 389, 8, 483, 355, 2, 44, 134, 51, 564, 782, 7000, 21, 321, 33, 8, 47, 102, 14, 6998, 21, 6996, 25, 21, 55, 2600, 1, 652, 69, 22, 482, 44, 190, 484, 19, 2607, 2608, 111], [3, 654, 19, 944, 27, 2604, 356, 7002, 2602, 2, 61, 33, 2601, 389, 8, 483, 355, 2, 44, 134, 51, 564, 782, 7000, 21, 321, 33, 8, 47, 102, 14, 6998, 21, 6996, 25, 21, 55, 2600, 1, 652, 69, 22, 482, 44, 190, 484, 19, 2607, 2608, 111, 2609], [654, 19, 944, 27, 2604, 356, 7002, 2602, 2, 61, 33, 2601, 389, 8, 483, 355, 2, 44, 134, 51, 564, 782, 7000, 21, 321, 33, 8, 47, 102, 14, 6998, 21, 6996, 25, 21, 55, 2600, 1, 652, 69, 22, 482, 44, 190, 484, 19, 2607, 2608, 111, 2609, 4], [19, 944, 27, 2604, 356, 7002, 2602, 2, 61, 33, 2601, 389, 8, 483, 355, 2, 44, 134, 51, 564, 782, 7000, 21, 321, 33, 8, 47, 102, 14, 6998, 21, 6996, 25, 21, 55, 2600, 1, 652, 69, 22, 482, 44, 190, 484, 19, 2607, 2608, 111, 2609, 4, 1190], [944, 27, 2604, 356, 7002, 2602, 2, 61, 33, 2601, 389, 8, 483, 355, 2, 44, 134, 51, 564, 782, 7000, 21, 321, 33, 8, 47, 102, 14, 6998, 21, 6996, 25, 21, 55, 2600, 1, 652, 69, 22, 482, 44, 190, 484, 19, 2607, 2608, 111, 2609, 4, 1190, 1640]]\n"
     ]
    }
   ],
   "source": [
    "from tensorflow.keras.preprocessing.text import Tokenizer\n",
    "# integer encode sequences of words\n",
    "tokenizer = Tokenizer()\n",
    "tokenizer.fit_on_texts(lines)\n",
    "sequences = tokenizer.texts_to_sequences(lines)\n",
    "print(sequences[0:10])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'und': 1, 'ich': 2, 'die': 3, 'der': 4, 'nicht': 5, 'ein': 6, 'das': 7, 'zu': 8, 'in': 9, 'ist': 10, 'du': 11, 'sie': 12, 'es': 13, 'so': 14, 'mephistopheles': 15, 'den': 16, 'mit': 17, 'faust': 18, 'sich': 19, 'mir': 20, 'ihr': 21, 'mich': 22, 'was': 23, 'er': 24, 'wie': 25, 'auf': 26, 'dem': 27, 'nur': 28, 'the': 29, 'von': 30, 'an': 31, 'doch': 32, 'euch': 33, 'wenn': 34, 'im': 35, 'dich': 36, 'da': 37, 'wir': 38, 'of': 39, 'daß': 40, 'man': 41, 'als': 42, 'auch': 43, 'mein': 44, 'dir': 45, 'uns': 46, 'nun': 47, 'hier': 48, 'des': 49, 'margarete': 50, 'noch': 51, 'nach': 52, 'schon': 53, 'project': 54, 'aus': 55, 'wird': 56, 'sind': 57, 'ihn': 58, 'or': 59, 'to': 60, 'wohl': 61, 'hat': 62, 'vor': 63, 'für': 64, 'bin': 65, 'you': 66, 'and': 67, 'einen': 68, 'um': 69, 'will': 70, 'kann': 71, 'denn': 72, 'eine': 73, 'soll': 74, 'am': 75, 'war': 76, 'durch': 77, 'zum': 78, 'a': 79, 'ach': 80, 'wer': 81, 'gutenbergtm': 82, 'geist': 83, 'muß': 84, 'alle': 85, 'gleich': 86, 'welt': 87, 'gar': 88, 'bey': 89, 'dann': 90, 'o': 91, 'nichts': 92, 'ihm': 93, 'allein': 94, 'wo': 95, 'einem': 96, 'alles': 97, 'this': 98, 'marthe': 99, 'with': 100, 'work': 101, 'gut': 102, 'wieder': 103, 'recht': 104, 'meine': 105, 'ja': 106, 'selbst': 107, 'mann': 108, 'immer': 109, 'herr': 110, 'vom': 111, 'seyn': 112, 'hab': 113, 'gehn': 114, 'leben': 115, 'viel': 116, 'mag': 117, 'mehr': 118, 'teufel': 119, 'laß': 120, 'any': 121, 'bist': 122, 'gern': 123, 'gott': 124, 'gretchen': 125, 'hast': 126, 'fort': 127, 'meinem': 128, 'zur': 129, 'ins': 130, 'nacht': 131, 'tag': 132, 'works': 133, 'herz': 134, 'nie': 135, 'zeit': 136, 'erst': 137, 'menschen': 138, 'kein': 139, 'ganz': 140, 'frosch': 141, 'kommt': 142, 'geht': 143, 'dieser': 144, 'sey': 145, 'ists': 146, 'steht': 147, 'weiß': 148, 'herrn': 149, 'deine': 150, 'weh': 151, 'dein': 152, 'kind': 153, 'electronic': 154, 'gutenberg': 155, 'macht': 156, 'haben': 157, 'seyd': 158, 'ab': 159, 'komm': 160, 'mutter': 161, 'is': 162, 'allen': 163, 'aber': 164, 'hin': 165, 'diesem': 166, 'sagen': 167, 'for': 168, 'herzen': 169, 'seh': 170, 'sehr': 171, 'bald': 172, 'einmal': 173, 'all': 174, 'sonst': 175, 'freund': 176, 'haus': 177, 'lange': 178, 'schüler': 179, 'über': 180, 'wär': 181, 'siebel': 182, 'liebe': 183, 'nein': 184, 'brust': 185, 'habt': 186, 'sein': 187, 'meinen': 188, 'foundation': 189, 'busen': 190, 'läßt': 191, 'deinen': 192, 'himmel': 193, 'gewiß': 194, 'dort': 195, 'meiner': 196, 'wagner': 197, 'altmayer': 198, 'by': 199, 'not': 200, 'einer': 201, 'diesen': 202, 'andern': 203, 'kraft': 204, 'lang': 205, 'davon': 206, 'blut': 207, 'hexe': 208, 'terms': 209, 'if': 210, 'lieb': 211, 'sehn': 212, 'natur': 213, 'sieht': 214, 'erde': 215, 'seine': 216, 'hand': 217, 'diese': 218, 'that': 219, 'brander': 220, 'be': 221, 'are': 222, 'hören': 223, 'jetzt': 224, 'geschehn': 225, 'schönen': 226, 'eben': 227, 'gethan': 228, 'deinem': 229, 'her': 230, 'weit': 231, 'agreement': 232, 'schöne': 233, 'sagt': 234, 'drum': 235, 'augen': 236, 'jeder': 237, 'beym': 238, 'ewig': 239, 'frey': 240, 'keine': 241, 'warum': 242, 'tritt': 243, 'from': 244, 'do': 245, 'glück': 246, 'machen': 247, 'genug': 248, 'wenig': 249, 'seht': 250, 'oder': 251, 'ihre': 252, 'unter': 253, 'singt': 254, 'mußt': 255, 'kopf': 256, 'weg': 257, 'darf': 258, 'chor': 259, 'schön': 260, 'ende': 261, 'liegt': 262, 'herein': 263, 'faßt': 264, 'oft': 265, 'lust': 266, 'keinen': 267, 'groß': 268, 'ganze': 269, 'seele': 270, 'nieder': 271, 'thut': 272, 'jedem': 273, 'frau': 274, 'at': 275, 'states': 276, 'donations': 277, 'ohne': 278, 'könnt': 279, 'aller': 280, 'werden': 281, 'tausend': 282, 'hinaus': 283, 'sinn': 284, 'wein': 285, 'stimme': 286, 'heut': 287, 'damit': 288, 'license': 289, 'may': 290, 'other': 291, 'tage': 292, 'heute': 293, 'freude': 294, 'ihren': 295, 'etwas': 296, 'schritt': 297, 'geben': 298, 'hinein': 299, 'zeiten': 300, 'zurück': 301, 'mädchen': 302, 'alte': 303, 'dran': 304, 'feuer': 305, 'ganzen': 306, 'siehst': 307, 'seiner': 308, 'seinem': 309, 'kannst': 310, 'habe': 311, 'kommen': 312, 'andre': 313, 'weib': 314, 'deiner': 315, 'gehen': 316, 'sieh': 317, 'liebchen': 318, 'copyright': 319, 'full': 320, 'drängt': 321, 'literary': 322, 'archive': 323, 'alten': 324, 'erste': 325, 'guten': 326, 'menge': 327, 'zwar': 328, 'dieß': 329, 'großen': 330, 'laßt': 331, 'halb': 332, 'sag': 333, 'schwer': 334, 'heran': 335, 'wollen': 336, 'hölle': 337, 'meer': 338, 'scheint': 339, 'liebt': 340, 'sinnen': 341, 'welch': 342, 'neuen': 343, 'tod': 344, 'kunst': 345, 'füßen': 346, 'glas': 347, 'oben': 348, 'müssen': 349, 'luft': 350, 'sah': 351, 'ueber': 352, 'wollt': 353, 'it': 354, 'fühl': 355, 'blick': 356, 'lied': 357, 'noth': 358, 'bis': 359, 'kreis': 360, 'leicht': 361, 'herren': 362, 'trank': 363, 'morgen': 364, 'thier': 365, 'führen': 366, 'sollt': 367, 'ruh': 368, 'heißt': 369, 'buch': 370, 'geister': 371, 'hört': 372, 'welche': 373, 'eins': 374, 'hör': 375, 'los': 376, 'vorüber': 377, 'hält': 378, 'seinen': 379, 'geschwind': 380, 'fenster': 381, 'wort': 382, 'thiere': 383, 'willst': 384, 'valentin': 385, 'copy': 386, 'paragraph': 387, 'on': 388, 'fest': 389, 'ersten': 390, 'lebt': 391, 'zieht': 392, 'bleibt': 393, 'gebt': 394, 'besten': 395, 'armen': 396, 'lassen': 397, 'weiter': 398, 'giebt': 399, 'mensch': 400, 'möchte': 401, 'ob': 402, 'lieben': 403, 'licht': 404, 'todt': 405, 'schauen': 406, 'nah': 407, 'möcht': 408, 'kaum': 409, 'hätte': 410, 'ichs': 411, 'vater': 412, 'hätt': 413, 'zwey': 414, 'acht': 415, 'schmuck': 416, 'arm': 417, 'dieses': 418, 'hinter': 419, 'stehn': 420, 'gute': 421, 'dazu': 422, 'vorbey': 423, 'ihrem': 424, 'leib': 425, 'mirs': 426, 'leibe': 427, 'public': 428, 'united': 429, 'can': 430, 'use': 431, 'trademark': 432, 'as': 433, 'must': 434, 'access': 435, 'refund': 436, 'manche': 437, 'lebens': 438, 'weil': 439, 'dichter': 440, 'führe': 441, 'vielleicht': 442, 'such': 443, 'jedes': 444, 'eurem': 445, 'trägt': 446, 'hatte': 447, 'keiner': 448, 'besser': 449, 'doctor': 450, 'werd': 451, 'gesellen': 452, 'hinauf': 453, 'zeichen': 454, 'anders': 455, 'tragen': 456, 'flamme': 457, 'sitzt': 458, 'kurz': 459, 'gefühl': 460, 'je': 461, 'hatt': 462, 'klein': 463, 'wirst': 464, 'engel': 465, 'junge': 466, 'voll': 467, 'heraus': 468, 'eines': 469, 'pudel': 470, 'leise': 471, 'thun': 472, 'wirds': 473, 'sollst': 474, 'böser': 475, 'ort': 476, 'platz': 477, 'domain': 478, 'agree': 479, 'your': 480, 'we': 481, 'steigt': 482, 'halten': 483, 'fühlt': 484, 'bringt': 485, 'schmerz': 486, 'lauf': 487, 'person': 488, 'beste': 489, 'frisch': 490, 'freylich': 491, 'sehen': 492, 'jener': 493, 'gewalt': 494, 'jahre': 495, 'lesen': 496, 'jeden': 497, 'dringt': 498, 'einander': 499, 'regt': 500, 'sinne': 501, 'art': 502, 'lachen': 503, 'guter': 504, 'eure': 505, 'alter': 506, 'worte': 507, 'hilft': 508, 'kleine': 509, 'sterne': 510, 'drey': 511, 'herrlich': 512, 'umher': 513, 'hohe': 514, 'stets': 515, 'find': 516, 'rechten': 517, 'gleichen': 518, 'schaffen': 519, 'thor': 520, 'jahr': 521, 'herum': 522, 'wissen': 523, 'würde': 524, 'allem': 525, 'drein': 526, 'arme': 527, 'vergebens': 528, 'jammer': 529, 'schätzen': 530, 'genießen': 531, 'fasse': 532, 'hervor': 533, 'gesang': 534, 'gesellschaft': 535, 'dabey': 536, 'ey': 537, 'manchen': 538, 'bösen': 539, 'juchhe': 540, 'ging': 541, 'laut': 542, 'volk': 543, 'ward': 544, 'niemand': 545, 'brauch': 546, 'schafft': 547, 'rechte': 548, 'mach': 549, 'glaub': 550, 'sachen': 551, 'bißchen': 552, 'elend': 553, 'lieschen': 554, 'associated': 555, 'no': 556, 'without': 557, 'set': 558, 'forth': 559, 'fee': 560, 'laws': 561, 'information': 562, 'including': 563, 'jenem': 564, 'sage': 565, 'herauf': 566, 'neu': 567, 'stunden': 568, 'stillen': 569, 'behagen': 570, 'sitzen': 571, 'gewesen': 572, 'fast': 573, 'bricht': 574, 'wunder': 575, 'wirkt': 576, 'augenblick': 577, 'wollte': 578, 'bringen': 579, 'schlecht': 580, 'eurer': 581, 'knecht': 582, 'schlägt': 583, 'wahrheit': 584, 'bereit': 585, 'schein': 586, 'tiefe': 587, 'darum': 588, 'wasser': 589, 'fehlt': 590, 'treten': 591, 'weise': 592, 'aufs': 593, 'land': 594, 'rings': 595, 'schlag': 596, 'gegeben': 597, 'nase': 598, 'herzlich': 599, 'ferne': 600, 'verlieren': 601, 'gedanken': 602, 'theil': 603, 'schau': 604, 'worten': 605, 'schweben': 606, 'fühle': 607, 'gesicht': 608, 'verzeiht': 609, 'unser': 610, 'nennen': 611, 'tief': 612, 'hoffnung': 613, 'unsre': 614, 'finden': 615, 'alt': 616, 'streben': 617, 'werde': 618, 'ließ': 619, 'freuden': 620, 'bruder': 621, 'nehmen': 622, 'frauen': 623, 'hexen': 624, 'nimmt': 625, 'he': 626, 'sohn': 627, 'traum': 628, 'indessen': 629, 'herbey': 630, 'schwelle': 631, 'fängt': 632, 'indem': 633, 'thür': 634, 'kleinen': 635, 'bleiben': 636, 'fluch': 637, 'dus': 638, 'große': 639, 'kerl': 640, 'ihrer': 641, 'halt': 642, 'kessel': 643, 'rette': 644, 'ebook': 645, 'permission': 646, 'distributing': 647, 'copies': 648, 'section': 649, 'provide': 650, 'which': 651, 'nebel': 652, 'about': 653, 'früh': 654, 'tönen': 655, 'thränen': 656, 'director': 657, 'beyden': 658, 'hohen': 659, 'erscheint': 660, 'sollte': 661, 'spaß': 662, 'gegenwart': 663, 'solch': 664, 'froh': 665, 'fällt': 666, 'bewegt': 667, 'wesen': 668, 'eh': 669, 'vielen': 670, 'jugend': 671, 'spiel': 672, 'dies': 673, 'muth': 674, 'findet': 675, 'thaten': 676, 'fassen': 677, 'engen': 678, 'sonne': 679, 'deines': 680, 'plagen': 681, 'fürwahr': 682, 'dirs': 683, 'herab': 684, 'steh': 685, 'darfst': 686, 'geistern': 687, 'armer': 688, 'klug': 689, 'pein': 690, 'heilgen': 691, 'wonne': 692, 'liegen': 693, 'dahin': 694, 'lampe': 695, 'wies': 696, 'grab': 697, 'verschwindet': 698, 'selber': 699, 'gebracht': 700, 'wahrlich': 701, 'läuft': 702, 'dießmal': 703, 'spiegel': 704, 'fließen': 705, 'gaben': 706, 'raum': 707, 'gift': 708, 'hundert': 709, 'überall': 710, 'stand': 711, 'hebt': 712, 'riegel': 713, 'letzte': 714, 'christ': 715, 'klang': 716, 'weiber': 717, 'selig': 718, 'genuß': 719, 'ziehen': 720, 'stadt': 721, 'stehen': 722, 'seite': 723, 'neue': 724, 'trinkt': 725, 'hause': 726, 'höre': 727, 'wurden': 728, 'großer': 729, 'fliegen': 730, 'reich': 731, 'glauben': 732, 'werth': 733, 'schloß': 734, 'wissenschaft': 735, 'stunde': 736, 'berg': 737, 'schöner': 738, 'verloren': 739, 'glieder': 740, 'feld': 741, 'schaden': 742, 'finde': 743, 'freundlich': 744, 'brennt': 745, 'hoff': 746, 'erden': 747, 'magst': 748, 'durchs': 749, 'singen': 750, 'jungen': 751, 'fühlen': 752, 'jung': 753, 'verflucht': 754, 'unten': 755, 'ehre': 756, 'edlen': 757, 'grad': 758, 'denken': 759, 'müßt': 760, 'stellt': 761, 'kam': 762, 'könig': 763, 'lieber': 764, 'ding': 765, 'bitte': 766, 'leid': 767, 'konnte': 768, 'garten': 769, 'heinrich': 770, 'text': 771, 'have': 772, 'distribute': 773, 'charge': 774, 'distribution': 775, 'comply': 776, 'paid': 777, 'its': 778, 'tax': 779, 'volunteers': 780, 'web': 781, 'wahn': 782, 'ebooks': 783, 'bang': 784, 'schauer': 785, 'folgt': 786, 'lustige': 787, 'wünschte': 788, 'leute': 789, 'thu': 790, 'bunten': 791, 'willen': 792, 'tiefer': 793, 'geboren': 794, 'vernunft': 795, 'liebsten': 796, 'sucht': 797, 'endlich': 798, 'denkt': 799, 'treibt': 800, 'satt': 801, 'mancher': 802, 'kalt': 803, 'wilde': 804, 'dirne': 805, 'holden': 806, 'geh': 807, 'klingt': 808, 'ruft': 809, 'kräfte': 810, 'sichs': 811, 'ihrs': 812, 'gieb': 813, 'blumen': 814, 'drang': 815, 'trug': 816, 'spricht': 817, 'sogleich': 818, 'fahren': 819, 'tönt': 820, 'reise': 821, 'schnell': 822, 'breiten': 823, 'grund': 824, 'felsen': 825, 'gewöhnlich': 826, 'sehe': 827, 'springt': 828, 'kommst': 829, 'straße': 830, 'strebt': 831, 'dank': 832, 'dunkeln': 833, 'lebendig': 834, 'sprechen': 835, 'leider': 836, 'können': 837, 'weder': 838, 'dafür': 839, 'freud': 840, 'rechts': 841, 'geld': 842, 'hund': 843, 'geistes': 844, 'mund': 845, 'weben': 846, 'eigner': 847, 'ha': 848, 'trieb': 849, 'näher': 850, 'wagen': 851, 'bins': 852, 'schaff': 853, 'menschheit': 854, 'ergetzen': 855, 'erkennen': 856, 'gnug': 857, 'bitt': 858, 'solche': 859, 'leiden': 860, 'stelle': 861, 'wald': 862, 'meister': 863, 'gunst': 864, 'keinem': 865, 'nachbar': 866, 'saft': 867, 'gruß': 868, 'süßen': 869, 'rath': 870, 'wahrhaftig': 871, 's': 872, 'ers': 873, 'solchen': 874, 'gewinnen': 875, 'verderben': 876, 'breit': 877, 'tanz': 878, 'band': 879, 'links': 880, 'stein': 881, 'saß': 882, 'schönes': 883, 'trinken': 884, 'kennen': 885, 'ziehn': 886, 'glut': 887, 'spur': 888, 'stock': 889, 'ruhig': 890, 'heiligen': 891, 'ihnen': 892, 'original': 893, 'geschrieben': 894, 'anfang': 895, 'offen': 896, 'grunde': 897, 'regen': 898, 'komme': 899, 'sobald': 900, 'weine': 901, 'langen': 902, 'bequemen': 903, 'grade': 904, 'krone': 905, 'stroh': 906, 'lippen': 907, 'nehmt': 908, 'freyheit': 909, 'zeitvertreib': 910, 'ordnung': 911, 'nimmermehr': 912, 'wohin': 913, 'bock': 914, 'chorus': 915, 'genung': 916, 'floh': 917, 'kater': 918, 'schelm': 919, 'gehts': 920, 'schatz': 921, 'konnt': 922, 'kästchen': 923, 'schrein': 924, 'heimlich': 925, 'küssen': 926, 'genau': 927, 'holder': 928, 'gabst': 929, 'irrlicht': 930, 'editions': 931, 'distributed': 932, 'free': 933, 'compliance': 934, 'state': 935, 'under': 936, 'posted': 937, 'owner': 938, 'but': 939, 'medium': 940, 'replacement': 941, 'limited': 942, 'received': 943, 'einst': 944, 'denen': 945, 'ernsten': 946, 'schwebet': 947, 'weiten': 948, 'unsrer': 949, 'besonders': 950, 'wirs': 951, 'anblick': 952, 'reine': 953, 'gesetzt': 954, 'reden': 955, 'phantasie': 956, 'stück': 957, 'ganzes': 958, 'solcher': 959, 'spielen': 960, 'nähe': 961, 'schauspiel': 962, 'schmerzen': 963, 'element': 964, 'braucht': 965, 'lieder': 966, 'hals': 967, 'ziel': 968, 'pflicht': 969, 'indeß': 970, 'tiefen': 971, 'fels': 972, 'wette': 973, 'gerne': 974, 'würd': 975, 'gras': 976, 'sogar': 977, 'speise': 978, 'bewußt': 979, 'jede': 980, 'bekennen': 981, 'staub': 982, 'schlange': 983, 'last': 984, 'reichen': 985, 'schwebt': 986, 'zimmer': 987, 'sessel': 988, 'zweifel': 989, 'ehr': 990, 'kerker': 991, 'ans': 992, 'statt': 993, 'umsonst': 994, 'mond': 995, 'haupt': 996, 'antlitz': 997, 'erbärmlich': 998, 'grauen': 999, 'heben': 1000, 'gottheit': 1001, 'klopft': 1002, 'fülle': 1003, 'zusammen': 1004, 'andrer': 1005, 'flammen': 1006, 'mittel': 1007, 'sterben': 1008, 'trunk': 1009, 'sieben': 1010, 'büßen': 1011, 'meiden': 1012, 'gang': 1013, 'herrliche': 1014, 'gefühle': 1015, 'kleiner': 1016, 'meines': 1017, 'leichten': 1018, 'schlüssel': 1019, 'euer': 1020, 'schleicht': 1021, 'höhle': 1022, 'gefahr': 1023, 'viele': 1024, 'hoher': 1025, 'ton': 1026, 'chöre': 1027, 'liebende': 1028, 'jenen': 1029, 'kuß': 1030, 'holdes': 1031, 'letzten': 1032, 'gehe': 1033, 'sicher': 1034, 'laufen': 1035, 'führt': 1036, 'gefällt': 1037, 'bessers': 1038, 'kehrt': 1039, 'begegnen': 1040, 'lohn': 1041, 'zog': 1042, 'sinken': 1043, 'entfernt': 1044, 'juchheisa': 1045, 'heisa': 1046, 'kreise': 1047, 'bring': 1048, 'tropfen': 1049, 'tagen': 1050, 'droben': 1051, 'schickt': 1052, 'glücklich': 1053, 'fragt': 1054, 'beyde': 1055, 'neues': 1056, 'flügel': 1057, 'boden': 1058, 'goldne': 1059, 'vorwärts': 1060, 'fremde': 1061, 'abend': 1062, 'blickst': 1063, 'schwarzen': 1064, 'schien': 1065, 'seines': 1066, 'eng': 1067, 'still': 1068, 'gottes': 1069, 'nimm': 1070, 'kennt': 1071, 'lernen': 1072, 'hoch': 1073, 'bleibe': 1074, 'natürlich': 1075, 'gefallen': 1076, 'steckt': 1077, 'böse': 1078, 'entsteht': 1079, 'entgegen': 1080, 'rein': 1081, 'fragen': 1082, 'fangen': 1083, 'zarten': 1084, 'wären': 1085, 'hinüber': 1086, 'laube': 1087, 'stürzt': 1088, 'ums': 1089, 'junker': 1090, 'kleide': 1091, 'entbehren': 1092, 'ohren': 1093, 'rast': 1094, 'trauben': 1095, 'höchsten': 1096, 'ruhn': 1097, 'scheiden': 1098, 'gesehn': 1099, 'gelingen': 1100, 'gemacht': 1101, 'einzig': 1102, 'fuß': 1103, 'sechs': 1104, 'gib': 1105, 'übergeben': 1106, 'baum': 1107, 'etwa': 1108, 'herüber': 1109, 'treiben': 1110, 'heilig': 1111, 'übel': 1112, 'fehlen': 1113, 'trefflich': 1114, 'versteht': 1115, 'weißt': 1116, 'sorgen': 1117, 'sies': 1118, 'blocksberg': 1119, 'tisch': 1120, 'thät': 1121, 'angst': 1122, 'loch': 1123, 'stimmen': 1124, 'begehrt': 1125, 'rief': 1126, 'waren': 1127, 'bravo': 1128, 'spüre': 1129, 'höhe': 1130, 'just': 1131, 'gehört': 1132, 'verfluchte': 1133, 'dieb': 1134, 'fährt': 1135, 'welchen': 1136, 'au': 1137, 'leuten': 1138, 'tanzend': 1139, 'nahmen': 1140, 'fräulein': 1141, 'sprach': 1142, 'hart': 1143, 'süße': 1144, 'kirche': 1145, 'ring': 1146, 'mans': 1147, 'armes': 1148, 'hände': 1149, 'nimmer': 1150, 'machst': 1151, 'leuchtet': 1152, 'general': 1153, 'puck': 1154, 'laub': 1155, 'ketten': 1156, 'online': 1157, 'creating': 1158, 'part': 1159, 'copying': 1160, 'unless': 1161, 'mission': 1162, 'using': 1163, 'phrase': 1164, 'collection': 1165, 'individual': 1166, 'located': 1167, 'right': 1168, 'displaying': 1169, 'support': 1170, 'freely': 1171, 'format': 1172, 'status': 1173, 'anyone': 1174, 'fees': 1175, 'providing': 1176, 'requirements': 1177, 'additional': 1178, 'form': 1179, 'provided': 1180, 'within': 1181, 'days': 1182, 'contact': 1183, 'cannot': 1184, 'damages': 1185, 'us': 1186, 'has': 1187, 'site': 1188, 'email': 1189, 'euren': 1190, 'bilder': 1191, 'steigen': 1192, 'irren': 1193, 'nennt': 1194, 'seelen': 1195, 'sang': 1196, 'ergreift': 1197, 'längst': 1198, 'sehnen': 1199, 'hofft': 1200, 'gelassen': 1201, 'gewöhnt': 1202, 'gefällig': 1203, 'strom': 1204, 'enge': 1205, 'ficht': 1206, 'brot': 1207, 'sprich': 1208, 'unsres': 1209, 'gestalt': 1210, 'nachwelt': 1211, 'knaben': 1212, 'dächt': 1213, 'brav': 1214, 'zeigt': 1215, 'verstand': 1216, 'gaffen': 1217, 'zwingen': 1218, 'hilfts': 1219, 'bedenkt': 1220, 'wen': 1221, 'eilt': 1222, 'volles': 1223, 'plagt': 1224, 'thoren': 1225, 'zweck': 1226, 'ziele': 1227, 'freventlich': 1228, 'zurücke': 1229, 'länge': 1230, 'reihe': 1231, 'sturm': 1232, 'pfade': 1233, 'blätter': 1234, 'entzückt': 1235, 'volle': 1236, 'bekannt': 1237, 'packt': 1238, 'klarheit': 1239, 'irrthum': 1240, 'sammelt': 1241, 'ehren': 1242, 'erfreuen': 1243, 'quell': 1244, 'triebe': 1245, 'hängen': 1246, 'fern': 1247, 'kranz': 1248, 'verehren': 1249, 'kinder': 1250, 'poeten': 1251, 'braut': 1252, 'wißt': 1253, 'unsern': 1254, 'gebraucht': 1255, 'schnelle': 1256, 'himmlischen': 1257, 'nachher': 1258, 'unbegreiflich': 1259, 'werke': 1260, 'dreht': 1261, 'michael': 1262, 'kette': 1263, 'fragst': 1264, 'hättst': 1265, 'altes': 1266, 'schönsten': 1267, 'niemals': 1268, 'vollen': 1269, 'zieh': 1270, 'führ': 1271, 'wege': 1272, 'erlaubt': 1273, 'voller': 1274, 'muhme': 1275, 'erscheinen': 1276, 'erscheinung': 1277, 'hübsch': 1278, 'erster': 1279, 'medicin': 1280, 'magister': 1281, 'quer': 1282, 'schier': 1283, 'pfaffen': 1284, 'könnte': 1285, 'lehren': 1286, 'länger': 1287, 'geheimniß': 1288, 'brauche': 1289, 'mitternacht': 1290, 'pult': 1291, 'papier': 1292, 'thau': 1293, 'verfluchtes': 1294, 'neben': 1295, 'erblickt': 1296, 'fließt': 1297, 'junges': 1298, 'adern': 1299, 'innre': 1300, 'toben': 1301, 'reinen': 1302, 'webt': 1303, 'goldnen': 1304, 'hängt': 1305, 'stürmen': 1306, 'zagen': 1307, 'schwindet': 1308, 'geheimnißvoll': 1309, 'zuckt': 1310, 'mächtig': 1311, 'angezogen': 1312, 'furchtsam': 1313, 'wurm': 1314, 'wem': 1315, 'ebenbild': 1316, 'pfarrer': 1317, 'kindern': 1318, 'darnach': 1319, 'gewinn': 1320, 'pergament': 1321, 'quillt': 1322, 'gedacht': 1323, 'zuletzt': 1324, 'munde': 1325, 'namen': 1326, 'gelehrt': 1327, 'frage': 1328, 'immerfort': 1329, 'erdensohn': 1330, 'dessen': 1331, 'lehret': 1332, 'gehorchen': 1333, 'empfangen': 1334, 'gelangen': 1335, 'ewigen': 1336, 'solltet': 1337, 'bart': 1338, 'stehst': 1339, 'brauchte': 1340, 'trübe': 1341, 'schwitzen': 1342, 'nützen': 1343, 'fläschchen': 1344, 'feinen': 1345, 'lockt': 1346, 'sphären': 1347, 'kehre': 1348, 'rücken': 1349, 'weicht': 1350, 'ganzer': 1351, 'erstanden': 1352, 'dumpfen': 1353, 'glaube': 1354, 'holde': 1355, 'heißen': 1356, 'schoos': 1357, 'tanzt': 1358, 'magd': 1359, 'bürgermädchen': 1360, 'gewogen': 1361, 'besen': 1362, 'bürger': 1363, 'täglich': 1364, 'mögen': 1365, 'wüßt': 1366, 'mauern': 1367, 'mühen': 1368, 'streifen': 1369, 'höhen': 1370, 'kleider': 1371, 'linde': 1372, 'toll': 1373, 'ellenbogen': 1374, 'dumm': 1375, 'warm': 1376, 'betrogen': 1377, 'seit': 1378, 'gemeynt': 1379, 'lehrt': 1380, 'gehst': 1381, 'reihen': 1382, 'höh': 1383, 'schritte': 1384, 'beten': 1385, 'dacht': 1386, 'küche': 1387, 'freyer': 1388, 'andere': 1389, 'darauf': 1390, 'mörder': 1391, 'lobt': 1392, 'betrüben': 1393, 'braver': 1394, 'hoffen': 1395, 'stille': 1396, 'thal': 1397, 'wellen': 1398, 'seen': 1399, 'blatt': 1400, 'wäre': 1401, 'länder': 1402, 'schaar': 1403, 'häufen': 1404, 'füße': 1405, 'gespenst': 1406, 'geselle': 1407, 'sprichst': 1408, 'springen': 1409, 'ofen': 1410, 'stiller': 1411, 'gast': 1412, 'unserm': 1413, 'seel': 1414, 'gewohnt': 1415, 'verstehn': 1416, 'feder': 1417, 'heulen': 1418, 'halbe': 1419, 'gange': 1420, 'gefangen': 1421, 'mache': 1422, 'hinan': 1423, 'also': 1424, 'nennst': 1425, 'lügner': 1426, 'finsterniß': 1427, 'körpern': 1428, 'wußte': 1429, 'begraben': 1430, 'feuchten': 1431, 'dürft': 1432, 'entfernen': 1433, 'thüre': 1434, 'wärst': 1435, 'gesetz': 1436, 'schließen': 1437, 'gegangen': 1438, 'halte': 1439, 'künste': 1440, 'scheinen': 1441, 'stürzen': 1442, 'flieget': 1443, 'schuld': 1444, 'wiedersehn': 1445, 'wunsch': 1446, 'ewige': 1447, 'erfüllen': 1448, 'geschenkt': 1449, 'verhaßt': 1450, 'tanze': 1451, 'jemand': 1452, 'süß': 1453, 'voraus': 1454, 'mammon': 1455, 'trümmern': 1456, 'diener': 1457, 'rothes': 1458, 'bäume': 1459, 'dienen': 1460, 'legen': 1461, 'top': 1462, 'vergessen': 1463, 'treue': 1464, 'zerrissen': 1465, 'end': 1466, 'möglich': 1467, 'setz': 1468, 'vier': 1469, 'beine': 1470, 'heide': 1471, 'geführt': 1472, 'schicksal': 1473, 'höflichkeit': 1474, 'weisheit': 1475, 'essen': 1476, 'lernt': 1477, 'paßt': 1478, 'fünf': 1479, 'denke': 1480, 'schwarz': 1481, 'geschlecht': 1482, 'betrifft': 1483, 'kräftig': 1484, 'ziemlich': 1485, 'schwör': 1486, 'gönn': 1487, 'giebts': 1488, 'machts': 1489, 'welcher': 1490, 'lüfte': 1491, 'pfuy': 1492, 'wenigstens': 1493, 'wacht': 1494, 'lob': 1495, 'wards': 1496, 'lag': 1497, 'freuen': 1498, 'streuen': 1499, 'unglück': 1500, 'wirth': 1501, 'siehts': 1502, 'spürt': 1503, 'setzen': 1504, 'hans': 1505, 'schneider': 1506, 'stern': 1507, 'knicken': 1508, 'ersticken': 1509, 'sticht': 1510, 'lebe': 1511, 'gutes': 1512, 'bohrt': 1513, 'pfropfen': 1514, 'platze': 1515, 'geberden': 1516, 'wart': 1517, 'seitwärts': 1518, 'brenne': 1519, 'zauberey': 1520, 'weibe': 1521, 'gibts': 1522, 'thieren': 1523, 'brey': 1524, 'kochen': 1525, 'kätzinn': 1526, 'erkennst': 1527, 'wedel': 1528, 'bild': 1529, 'geschrey': 1530, 'entzwey': 1531, 'angesicht': 1532, 'pferdefuß': 1533, 'gesehen': 1534, 'geblieben': 1535, 'stinkt': 1536, 'dünkt': 1537, 'gehend': 1538, 'sünden': 1539, 'beichte': 1540, 'frieden': 1541, 'gelegenheit': 1542, 'verdrießen': 1543, 'nachbarinn': 1544, 'kenne': 1545, 'süßer': 1546, 'wirft': 1547, 'fromm': 1548, 'sand': 1549, 'meint': 1550, 'treu': 1551, 'becher': 1552, 'gab': 1553, 'geschmeide': 1554, 'liebchens': 1555, 'stätte': 1556, 'scherzen': 1557, 'zeugniß': 1558, 'reisender': 1559, 'soldat': 1560, 'tränken': 1561, 'bett': 1562, 'kammer': 1563, 'vergnügen': 1564, 'bächlein': 1565, 'glüht': 1566, 'gretchens': 1567, 'glaubst': 1568, 'schlief': 1569, 'neige': 1570, 'gleicht': 1571, 'stumm': 1572, 'walpurgisnacht': 1573, 'gipfel': 1574, 'manches': 1575, 'brocktophantasmist': 1576, 'oberon': 1577, 'ariel': 1578, 'fliegenschnauz': 1579, 'mückennas': 1580, 'grill': 1581, 'umgebracht': 1582, 'bists': 1583, 'been': 1584, 'print': 1585, 'printed': 1586, 'file': 1587, 'found': 1588, 'means': 1589, 'these': 1590, 'used': 1591, 'receive': 1592, 'complying': 1593, 'derivative': 1594, 'they': 1595, 'please': 1596, 'read': 1597, 'return': 1598, 'obtain': 1599, 'entity': 1600, 'people': 1601, 'who': 1602, 'most': 1603, 'see': 1604, 'below': 1605, 'future': 1606, 'performing': 1607, 'where': 1608, 'outside': 1609, 'following': 1610, 'give': 1611, 'notice': 1612, 'paragraphs': 1613, 'up': 1614, 'official': 1615, 'user': 1616, 'royalty': 1617, 'applicable': 1618, 'donate': 1619, 'payments': 1620, 'money': 1621, 'writing': 1622, 'defect': 1623, 'efforts': 1624, 'defective': 1625, 'equipment': 1626, 'disclaimer': 1627, 'except': 1628, 'liability': 1629, 'written': 1630, 'warranties': 1631, 'limitation': 1632, 'foundations': 1633, 'number': 1634, 'make': 1635, 'our': 1636, 'help': 1637, 'how': 1638, 'gestalten': 1639, 'zug': 1640, 'umwittert': 1641, 'froher': 1642, 'schatten': 1643, 'freundschaft': 1644, 'wiederholt': 1645, 'gesänge': 1646, 'zerstoben': 1647, 'gedränge': 1648, 'beyfall': 1649, 'irrt': 1650, 'thräne': 1651, 'strenge': 1652, 'mild': 1653, 'theater': 1654, 'deutschen': 1655, 'möchten': 1656, 'erstaunen': 1657, 'verlegen': 1658, 'hellem': 1659, 'verschiedne': 1660, 'wider': 1661, 'strudel': 1662, 'herzens': 1663, 'segen': 1664, 'lippe': 1665, 'gelungen': 1666, 'glänzt': 1667, 'volkes': 1668, 'wünscht': 1669, 'gewisser': 1670, 'chören': 1671, 'merkt': 1672, 'vieles': 1673, 'breite': 1674, 'gewonnen': 1675, 'masse': 1676, 'zufrieden': 1677, 'stücken': 1678, 'ragout': 1679, 'glücken': 1680, 'fühlet': 1681, 'ächten': 1682, 'künstler': 1683, 'merk': 1684, 'wirken': 1685, 'werkzeug': 1686, 'holz': 1687, 'spalten': 1688, 'schreibt': 1689, 'zerstreut': 1690, 'putz': 1691, 'musen': 1692, 'höchste': 1693, 'deinetwillen': 1694, 'wodurch': 1695, 'schlingt': 1696, 'ewge': 1697, 'zwingt': 1698, 'gleiche': 1699, 'leidenschaften': 1700, 'wüthen': 1701, 'grünen': 1702, 'götter': 1703, 'wächst': 1704, 'greift': 1705, 'erquickt': 1706, 'schönste': 1707, 'lauscht': 1708, 'offenbarung': 1709, 'werk': 1710, 'nahrung': 1711, 'jenes': 1712, 'weinen': 1713, 'dankbar': 1714, 'gedrängter': 1715, 'gebar': 1716, 'versprach': 1717, 'brach': 1718, 'thäler': 1719, 'ungebändigt': 1720, 'jene': 1721, 'allenfalls': 1722, 'drängen': 1723, 'nächte': 1724, 'schopfe': 1725, 'wirket': 1726, 'probirt': 1727, 'himmelslicht': 1728, 'felsenwänden': 1729, 'schöpfung': 1730, 'wandelt': 1731, 'erzengel': 1732, 'engeln': 1733, 'stärke': 1734, 'ergründen': 1735, 'pracht': 1736, 'wechselt': 1737, 'schäumt': 1738, 'bilden': 1739, 'wüthend': 1740, 'tiefsten': 1741, 'flammt': 1742, 'verzeih': 1743, 'sonn': 1744, 'welten': 1745, 'wunderlich': 1746, 'nennts': 1747, 'gnaden': 1748, 'läg': 1749, 'grase': 1750, 'begräbt': 1751, 'dient': 1752, 'besondre': 1753, 'gährung': 1754, 'tollheit': 1755, 'frucht': 1756, 'künftgen': 1757, 'zieren': 1758, 'sacht': 1759, 'todten': 1760, 'befangen': 1761, 'meisten': 1762, 'maus': 1763, 'erfassen': 1764, 'beschämt': 1765, 'dauert': 1766, 'bange': 1767, 'thätigkeit': 1768, 'geb': 1769, 'erfreut': 1770, 'brechen': 1771, 'unruhig': 1772, 'theologie': 1773, 'bemühn': 1774, 'heiße': 1775, 'gescheidter': 1776, 'fürchte': 1777, 'bilde': 1778, 'bessern': 1779, 'bekehren': 1780, 'herrlichkeit': 1781, 'manch': 1782, 'kund': 1783, 'schweiß': 1784, 'letztenmal': 1785, 'büchern': 1786, 'wiesen': 1787, 'gesund': 1788, 'trüb': 1789, 'bedeckt': 1790, 'gewölb': 1791, 'angeraucht': 1792, 'instrumenten': 1793, 'lebendigen': 1794, 'schuf': 1795, 'umgiebt': 1796, 'rauch': 1797, 'weite': 1798, 'geleit': 1799, 'unterweist': 1800, 'erklärt': 1801, 'zügen': 1802, 'beschaut': 1803, 'schwingen': 1804, 'dringen': 1805, 'faß': 1806, 'quellen': 1807, 'welke': 1808, 'unwillig': 1809, 'höher': 1810, 'neuem': 1811, 'verbirgt': 1812, 'fühls': 1813, 'hingegeben': 1814, 'neigt': 1815, 'kräften': 1816, 'zittert': 1817, 'weichen': 1818, 'wendet': 1819, 'declamiren': 1820, 'komödiant': 1821, 'gebannt': 1822, 'schmaus': 1823, 'gaumen': 1824, 'werdet': 1825, 'rechter': 1826, 'wenns': 1827, 'ernst': 1828, 'nöthig': 1829, 'dürren': 1830, 'heilge': 1831, 'woraus': 1832, 'durst': 1833, 'stillt': 1834, 'erquickung': 1835, 'weiser': 1836, 'puppen': 1837, 'wenigen': 1838, 'erkannt': 1839, 'thöricht': 1840, 'pöbel': 1841, 'besprechen': 1842, 'klebt': 1843, 'verzweiflung': 1844, 'empfinden': 1845, 'genoß': 1846, 'freye': 1847, 'ahndungsvoll': 1848, 'hinweggerafft': 1849, 'vermessen': 1850, 'augenblicke': 1851, 'stießest': 1852, 'fremd': 1853, 'fremder': 1854, 'gewühle': 1855, 'geheime': 1856, 'deckt': 1857, 'hof': 1858, 'dolch': 1859, 'trifft': 1860, 'beweinen': 1861, 'göttern': 1862, 'staube': 1863, 'gequält': 1864, 'hie': 1865, 'grinsest': 1866, 'hirn': 1867, 'dämmrung': 1868, 'spottet': 1869, 'offenbaren': 1870, 'nützt': 1871, 'schwere': 1872, 'lieblich': 1873, 'helle': 1874, 'grüße': 1875, 'inbegriff': 1876, 'neuer': 1877, 'bahn': 1878, 'aether': 1879, 'beben': 1880, 'schaale': 1881, 'gäste': 1882, 'zugebracht': 1883, 'reiche': 1884, 'witz': 1885, 'zeigen': 1886, 'bereitet': 1887, 'sterblichen': 1888, 'umwanden': 1889, 'gewißheit': 1890, 'hatten': 1891, 'binden': 1892, 'reinlich': 1893, 'heilsam': 1894, 'wag': 1895, 'stürzte': 1896, 'entstehn': 1897, 'kindlichem': 1898, 'leide': 1899, 'schmachtend': 1900, 'banden': 1901, 'einige': 1902, 'mühle': 1903, 'wandern': 1904, 'handwerksbursch': 1905, 'zweyter': 1906, 'zweyten': 1907, 'dritter': 1908, 'bier': 1909, 'graut': 1910, 'großes': 1911, 'plan': 1912, 'schmach': 1913, 'könnten': 1914, 'hinten': 1915, 'nachbarin': 1916, 'schlimmer': 1917, 'zahlen': 1918, 'vorher': 1919, 'bettler': 1920, 'feyern': 1921, 'gespräch': 1922, 'krieg': 1923, 'schlagen': 1924, 'gläschen': 1925, 'fluß': 1926, 'hinab': 1927, 'segnet': 1928, 'stolz': 1929, 'nehme': 1930, 'soldaten': 1931, 'burgen': 1932, 'stolzen': 1933, 'kühn': 1934, 'farben': 1935, 'druck': 1936, 'behend': 1937, 'lustigen': 1938, 'pfaden': 1939, 'ehrenvoll': 1940, 'rohen': 1941, 'getrieben': 1942, 'nennens': 1943, 'fiedelbogen': 1944, 'frische': 1945, 'tanzten': 1946, 'roth': 1947, 'vertraut': 1948, 'bauer': 1949, 'verschmäht': 1950, 'frischem': 1951, 'damals': 1952, 'proben': 1953, 'helfer': 1954, 'helfen': 1955, 'hülfe': 1956, 'vortheil': 1957, 'tänzer': 1958, 'käm': 1959, 'rasten': 1960, 'pest': 1961, 'hohn': 1962, 'innern': 1963, 'unendlichen': 1964, 'kühner': 1965, 'arzeney': 1966, 'höllischen': 1967, 'erleben': 1968, 'ehrst': 1969, 'brauchen': 1970, 'betrachte': 1971, 'rückt': 1972, 'entzündet': 1973, 'beruhigt': 1974, 'erwacht': 1975, 'eile': 1976, 'kranich': 1977, 'empfunden': 1978, 'hold': 1979, 'würdig': 1980, 'lerne': 1981, 'liebeslust': 1982, 'gewaltsam': 1983, 'erd': 1984, 'gewänder': 1985, 'dunstkreis': 1986, 'mittag': 1987, 'west': 1988, 'schwarm': 1989, 'betrügen': 1990, 'stellen': 1991, 'lügen': 1992, 'erstaunt': 1993, 'ergreifen': 1994, 'saat': 1995, 'stoppel': 1996, 'hältst': 1997, 'irr': 1998, 'legt': 1999, 'bauch': 2000, 'hunde': 2001, 'wartet': 2002, 'gezogen': 2003, 'studirzimmer': 2004, 'verlassen': 2005, 'auen': 2006, 'ungestümen': 2007, 'renne': 2008, 'lege': 2009, 'draußen': 2010, 'ergetzt': 2011, 'willkommner': 2012, 'zelle': 2013, 'bächen': 2014, 'umfassen': 2015, 'quillen': 2016, 'unmöglich': 2017, 'erleuchtet': 2018, 'getrost': 2019, 'freyen': 2020, 'drinnen': 2021, 'folg': 2022, 'spruch': 2023, 'viere': 2024, 'salamander': 2025, 'undene': 2026, 'winden': 2027, 'silphe': 2028, 'verschwinden': 2029, 'kobold': 2030, 'elemente': 2031, 'leucht': 2032, 'incubus': 2033, 'schluß': 2034, 'beschwören': 2035, 'flüchtling': 2036, 'beugen': 2037, 'schaaren': 2038, 'schwillt': 2039, 'haaren': 2040, 'zerfließen': 2041, 'steige': 2042, 'decke': 2043, 'meisters': 2044, 'erwarte': 2045, 'dreymal': 2046, 'glühende': 2047, 'fahrender': 2048, 'diensten': 2049, 'sünde': 2050, 'rang': 2051, 'körper': 2052, 'kenn': 2053, 'fängst': 2054, 'plumpe': 2055, 'zeug': 2056, 'trocknen': 2057, 'warmen': 2058, 'suche': 2059, 'beginnen': 2060, 'gesteh': 2061, 'kleines': 2062, 'bannt': 2063, 'kamst': 2064, 'außen': 2065, 'sache': 2066, 'jetzo': 2067, 'höchst': 2068, 'mähr': 2069, 'zweytenmale': 2070, 'beliebt': 2071, 'geruch': 2072, 'beysammen': 2073, 'wolken': 2074, 'söhne': 2075, 'geistige': 2076, 'folget': 2077, 'bänder': 2078, 'decken': 2079, 'fürs': 2080, 'traube': 2081, 'edle': 2082, 'hügel': 2083, 'hellen': 2084, 'bewegen': 2085, 'zerstreuen': 2086, 'liebender': 2087, 'schläft': 2088, 'ratten': 2089, 'mäuse': 2090, 'träume': 2091, 'vertragen': 2092, 'edler': 2093, 'seide': 2094, 'hahnenfeder': 2095, 'hut': 2096, 'degen': 2097, 'entsetzen': 2098, 'wach': 2099, 'morgens': 2100, 'ängstlich': 2101, 'lager': 2102, 'strecken': 2103, 'schrecken': 2104, 'wohnt': 2105, 'rasch': 2106, 'blend': 2107, 'träumen': 2108, 'schmeichelt': 2109, 'kühnen': 2110, 'geduld': 2111, 'unsichtbar': 2112, 'mächtiger': 2113, 'klagen': 2114, 'baue': 2115, 'lebenslauf': 2116, 'rathen': 2117, 'locken': 2118, 'frißt': 2119, 'pack': 2120, 'dienst': 2121, 'verbinden': 2122, 'drüben': 2123, 'scheinet': 2124, 'künftig': 2125, 'verbinde': 2126, 'gebe': 2127, 'gefaßt': 2128, 'gold': 2129, 'gewinnt': 2130, 'zeig': 2131, 'guts': 2132, 'fesseln': 2133, 'fallen': 2134, 'wessen': 2135, 'paar': 2136, 'befreyen': 2137, 'opfer': 2138, 'scheuen': 2139, 'wachs': 2140, 'faden': 2141, 'rauschen': 2142, 'rollen': 2143, 'verdruß': 2144, 'maß': 2145, 'rede': 2146, 'greifen': 2147, 'wiege': 2148, 'belehren': 2149, 'schweifen': 2150, 'feurig': 2151, 'nordens': 2152, 'bleibst': 2153, 'haar': 2154, 'flieht': 2155, 'henker': 2156, 'weniger': 2157, 'ennuyiren': 2158, 'buben': 2159, 'rock': 2160, 'starren': 2161, 'müßte': 2162, 'kurze': 2163, 'ehrfurcht': 2164, 'wählt': 2165, 'facultät': 2166, 'hinnen': 2167, 'theurer': 2168, 'kreuz': 2169, 'fäden': 2170, 'dritt': 2171, 'orten': 2172, 'geworden': 2173, 'beschreiben': 2174, 'prächtig': 2175, 'buche': 2176, 'erben': 2177, 'unsinn': 2178, 'plage': 2179, 'irre': 2180, 'bereiten': 2181, 'rauben': 2182, 'ringsum': 2183, 'vertrauen': 2184, 'unterm': 2185, 'übersteigt': 2186, 'streicht': 2187, 'blicken': 2188, 'grau': 2189, 'grün': 2190, 'goldner': 2191, 'vermag': 2192, 'welchem': 2193, 'leichte': 2194, 'lebensart': 2195, 'pferde': 2196, 'nimmst': 2197, 'keller': 2198, 'leipzig': 2199, 'zeche': 2200, 'gesichter': 2201, 'bringst': 2202, 'schwein': 2203, 'gewölbe': 2204, 'tara': 2205, 'lara': 2206, 'römsche': 2207, 'garstig': 2208, 'dankt': 2209, 'singe': 2210, 'angeführt': 2211, 'jauchzend': 2212, 'fuhr': 2213, 'zahm': 2214, 'dingen': 2215, 'volke': 2216, 'schwanz': 2217, 'bildet': 2218, 'fremden': 2219, 'spät': 2220, 'gereist': 2221, 'gegen': 2222, 'schwach': 2223, 'eignen': 2224, 'miß': 2225, 'hosen': 2226, 'angethan': 2227, 'minister': 2228, 'königinn': 2229, 'finger': 2230, 'fein': 2231, 'gäb': 2232, 'gästen': 2233, 'verlang': 2234, 'maul': 2235, 'bohrer': 2236, 'mancherley': 2237, 'verstopft': 2238, 'gesagt': 2239, 'seltsamen': 2240, 'hörner': 2241, 'glaubet': 2242, 'brunnen': 2243, 'helft': 2244, 'theuer': 2245, 'unterstehn': 2246, 'hokuspokus': 2247, 'sollen': 2248, 'pfropf': 2249, 'messer': 2250, 'geberde': 2251, 'falsch': 2252, 'stuhl': 2253, 'betrug': 2254, 'herde': 2255, 'darneben': 2256, 'tolle': 2257, 'verschwunden': 2258, 'verjüngen': 2259, 'arzt': 2260, 'vieh': 2261, 'geschäftig': 2262, 'wunderbare': 2263, 'kanns': 2264, 'erblickend': 2265, 'schornstein': 2266, 'quirlt': 2267, 'hohl': 2268, 'inwendig': 2269, 'scherben': 2270, 'sieb': 2271, 'herunter': 2272, 'topf': 2273, 'nöthigt': 2274, 'himmlisch': 2275, 'auszuspüren': 2276, 'sitz': 2277, 'bisher': 2278, 'allerley': 2279, 'wunderliche': 2280, 'zerbrechen': 2281, 'verrückt': 2282, 'brennen': 2283, 'schlich': 2284, 'gebein': 2285, 'gläser': 2286, 'grimm': 2287, 'vorm': 2288, 'rothen': 2289, 'weile': 2290, 'klauen': 2291, 'jahren': 2292, 'verlier': 2293, 'satan': 2294, 'lacht': 2295, 'zuweilen': 2296, 'gedeihen': 2297, 'winkt': 2298, 'fausten': 2299, 'zehn': 2300, 'vollbracht': 2301, 'ungestört': 2302, 'verborgen': 2303, 'sibylle': 2304, 'schale': 2305, 'rand': 2306, 'hinunter': 2307, 'spüren': 2308, 'leibhaftig': 2309, 'zugleich': 2310, 'unschuldig': 2311, 'vierzehn': 2312, 'geschicht': 2313, 'wacker': 2314, 'stirne': 2315, 'vorwelt': 2316, 'gehangen': 2317, 'würdest': 2318, 'leidlich': 2319, 'habs': 2320, 'müh': 2321, 'drückt': 2322, 'wenden': 2323, 'drauß': 2324, 'buhle': 2325, 'ritter': 2326, 'putzt': 2327, 'schönheit': 2328, 'golde': 2329, 'wüßte': 2330, 'kriegt': 2331, 'ungerechtes': 2332, 'weihen': 2333, 'vernommen': 2334, 'drauf': 2335, 'geschmeid': 2336, 'solls': 2337, 'gassen': 2338, 'stündchen': 2339, 'besuch': 2340, 'michs': 2341, 'hoffe': 2342, 'verzweifelt': 2343, 'erzählt': 2344, 'padua': 2345, 'kühlen': 2346, 'dreyhundert': 2347, 'übrigen': 2348, 'liebenswürdig': 2349, 'seys': 2350, 'gewerb': 2351, 'schiff': 2352, 'nahm': 2353, 'liebs': 2354, 'schändlich': 2355, 'schatze': 2356, 'liebte': 2357, 'ungefähr': 2358, 'warten': 2359, 'wills': 2360, 'kuppler': 2361, 'heilger': 2362, 'bethören': 2363, 'gluth': 2364, 'nenne': 2365, 'schone': 2366, 'zunge': 2367, 'küßt': 2368, 'hagestolz': 2369, 'unschuld': 2370, 'langsam': 2371, 'wars': 2372, 'schmeckt': 2373, 'eures': 2374, 'bekommen': 2375, 'schlug': 2376, 'dom': 2377, 'bös': 2378, 'verstehst': 2379, 'händedruck': 2380, 'pärchen': 2381, 'guckt': 2382, 'fassend': 2383, 'bester': 2384, 'führst': 2385, 'busch': 2386, 'sichern': 2387, 'öffnen': 2388, 'begierde': 2389, 'schuhu': 2390, 'moos': 2391, 'mark': 2392, 'schnee': 2393, 'entflohn': 2394, 'weidet': 2395, 'hilf': 2396, 'siedet': 2397, 'tapfer': 2398, 'fühlst': 2399, 'messe': 2400, 'name': 2401, 'stich': 2402, 'ahndungsvoller': 2403, 'alleine': 2404, 'würden': 2405, 'übrig': 2406, 'ungeheuer': 2407, 'geschenke': 2408, 'mägdlein': 2409, 'bloß': 2410, 'schmerzenreiche': 2411, 'gnädig': 2412, 'flor': 2413, 'dämmert': 2414, 'wahres': 2415, 'zither': 2416, 'parire': 2417, 'heraustretend': 2418, 'hur': 2419, 'schande': 2420, 'altar': 2421, 'laden': 2422, 'orgel': 2423, 'gräber': 2424, 'cum': 2425, 'quid': 2426, 'sum': 2427, 'miser': 2428, 'tunc': 2429, 'dicturus': 2430, 'lustig': 2431, 'ei': 2432, 'bach': 2433, 'wurzeln': 2434, 'bande': 2435, 'wandrer': 2436, 'gedrängten': 2437, 'drehen': 2438, 'zipfel': 2439, 'trüber': 2440, 'schlünde': 2441, 'nacken': 2442, 'packen': 2443, 'strömt': 2444, 'brocken': 2445, 'baubo': 2446, 'gabel': 2447, 'schritten': 2448, 'wind': 2449, 'sausen': 2450, 'sprüht': 2451, 'räthsel': 2452, 'lösen': 2453, 'nackt': 2454, 'trauen': 2455, 'aufmerksam': 2456, 'blickt': 2457, 'bund': 2458, 'neuigkeiten': 2459, 'gefiel': 2460, 'unerhört': 2461, 'lässest': 2462, 'mitten': 2463, 'däucht': 2464, 'dilettanten': 2465, 'hochzeit': 2466, 'golden': 2467, 'zweye': 2468, 'orchester': 2469, 'musikanten': 2470, 'neugieriger': 2471, 'hexenheer': 2472, 'capellmeister': 2473, 'windfahne': 2474, 'frommen': 2475, 'teufeln': 2476, 'rohr': 2477, 'lasse': 2478, 'qual': 2479, 'geschöpf': 2480, 'teuflischen': 2481, 'wandle': 2482, 'lauern': 2483, 'fliege': 2484, 'aufzuschließen': 2485, 'flehen': 2486, 'gerettet': 2487, 'folge': 2488, 'edition': 2489, 'made': 2490, 'font': 2491, 'out': 2492, 'antiqua': 2493, 'should': 2494, 'files': 2495, 'formats': 2496, 'produced': 2497, 'one': 2498, 'old': 2499, 'owns': 2500, 'paying': 2501, 'royalties': 2502, 'rules': 2503, 'protect': 2504, 'concept': 2505, 'registered': 2506, 'anything': 2507, 'nearly': 2508, 'purpose': 2509, 'research': 2510, 'away': 2511, 'redistribution': 2512, 'start': 2513, 'before': 2514, 'promoting': 2515, 'way': 2516, 'available': 2517, 'redistributing': 2518, 'intellectual': 2519, 'property': 2520, 'destroy': 2521, 'obtaining': 2522, 'bound': 2523, 'only': 2524, 'there': 2525, 'things': 2526, 'even': 2527, 'based': 2528, 'references': 2529, 'removed': 2530, 'keeping': 2531, 'check': 2532, 'country': 2533, 'concerning': 2534, 'sentence': 2535, 'active': 2536, 'immediate': 2537, 'prominently': 2538, 'copied': 2539, 'cost': 2540, 'included': 2541, 'wwwgutenbergorg': 2542, 'does': 2543, 'contain': 2544, 'through': 2545, 'both': 2546, 'marked': 2547, 'than': 2548, 'plain': 2549, 'vanilla': 2550, 'ascii': 2551, 'upon': 2552, 'specified': 2553, 'date': 2554, 'prepare': 2555, 'receipt': 2556, 'physical': 2557, 'accordance': 2558, 'employees': 2559, 'considerable': 2560, 'effort': 2561, 'computer': 2562, 'damage': 2563, 'warranty': 2564, 'costs': 2565, 'expenses': 2566, 'legal': 2567, 'breach': 2568, 'explanation': 2569, 'lieu': 2570, 'electronically': 2571, 'second': 2572, 'implied': 2573, 'certain': 2574, 'law': 2575, 'shall': 2576, 'permitted': 2577, 'cause': 2578, 'b': 2579, 'readable': 2580, 'widest': 2581, 'computers': 2582, 'generations': 2583, 'created': 2584, 'page': 2585, 'exempt': 2586, 'federal': 2587, 'contributions': 2588, 'office': 2589, 'dr': 2590, 'locations': 2591, 'httppglaforg': 2592, 'many': 2593, 'small': 2594, 'keep': 2595, 'solicit': 2596, 'particular': 2597, 'visit': 2598, 'accepted': 2599, 'dunst': 2600, 'diesmal': 2601, 'versuch': 2602, 'new': 2603, 'trüben': 2604, 'schwankende': 2605, 'naht': 2606, 'jugendlich': 2607, 'erschüttert': 2608, 'zauberhauch': 2609, 'halbverklungnen': 2610, 'klage': 2611, 'labyrinthisch': 2612, 'getäuscht': 2613, 'hinweggeschwunden': 2614, 'folgenden': 2615, 'freundliche': 2616, 'verklungen': 2617, 'wiederklang': 2618, 'leidlied': 2619, 'ertönt': 2620, 'unbekannten': 2621, 'erfreuet': 2622, 'zerstreuet': 2623, 'entwöhntes': 2624, 'geisterreich': 2625, 'unbestimmten': 2626, 'lispelnd': 2627, 'aeolsharfe': 2628, 'weich': 2629, 'besitze': 2630, 'verschwand': 2631, 'wirklichkeiten': 2632, 'vorspiel': 2633, 'theaterdichter': 2634, 'trübsal': 2635, 'beygestanden': 2636, 'landen': 2637, 'unternehmung': 2638, 'pfosten': 2639, 'breter': 2640, 'aufgeschlagen': 2641, 'jedermann': 2642, 'erwartet': 2643, 'augenbraunen': 2644, 'volks': 2645, 'versöhnt': 2646, 'schrecklich': 2647, 'gelesen': 2648, 'bedeutung': 2649, 'bude': 2650, 'gewaltig': 2651, 'wiederholten': 2652, 'wehen': 2653, 'gnadenpforte': 2654, 'zwängt': 2655, 'vieren': 2656, 'stößen': 2657, 'kasse': 2658, 'hungersnoth': 2659, 'beckerthüren': 2660, 'billet': 2661, 'hälse': 2662, 'deren': 2663, 'entflieht': 2664, 'verhülle': 2665, 'wogende': 2666, 'himmelsenge': 2667, 'blüht': 2668, 'götterhand': 2669, 'erschaffen': 2670, 'erpflegen': 2671, 'entsprungen': 2672, 'schüchtern': 2673, 'vorgelallt': 2674, 'mißrathen': 2675, 'verschlingt': 2676, 'wilden': 2677, 'augenblicks': 2678, 'durchgedrungen': 2679, 'vollendeter': 2680, 'aechte': 2681, 'unverloren': 2682, 'machte': 2683, 'mitwelt': 2684, 'braven': 2685, 'behaglich': 2686, 'mitzutheilen': 2687, 'laune': 2688, 'erbittern': 2689, 'erschüttern': 2690, 'musterhaft': 2691, 'empfindung': 2692, 'leidenschaft': 2693, 'narrheit': 2694, 'schaun': 2695, 'abgesponnen': 2696, 'staunend': 2697, 'vielgeliebter': 2698, 'manchem': 2699, 'vorgelegt': 2700, 'ausgedacht': 2701, 'dargebracht': 2702, 'publikum': 2703, 'zerpflücken': 2704, 'solches': 2705, 'handwerk': 2706, 'zieme': 2707, 'saubern': 2708, 'pfuscherey': 2709, 'maxime': 2710, 'vorwurf': 2711, 'ungekränkt': 2712, 'habet': 2713, 'weiches': 2714, 'langeweile': 2715, 'übertischten': 2716, 'mahle': 2717, 'allerschlimmste': 2718, 'journale': 2719, 'maskenfesten': 2720, 'neugier': 2721, 'beflügelt': 2722, 'damen': 2723, 'gage': 2724, 'träumet': 2725, 'dichterhöhe': 2726, 'beseht': 2727, 'gönner': 2728, 'roh': 2729, 'kartenspiel': 2730, 'solchem': 2731, 'verirren': 2732, 'verwirren': 2733, 'befriedigen': 2734, 'entzückung': 2735, 'menschenrecht': 2736, 'vergönnt': 2737, 'verscherzen': 2738, 'besiegt': 2739, 'einklang': 2740, 'fadens': 2741, 'gleichgültig': 2742, 'drehend': 2743, 'spindel': 2744, 'unharmonsche': 2745, 'verdrießlich': 2746, 'theilt': 2747, 'fließend': 2748, 'belebend': 2749, 'rythmisch': 2750, 'einzelne': 2751, 'allgemeinen': 2752, 'weihe': 2753, 'herrlichen': 2754, 'accorden': 2755, 'abendroth': 2756, 'glühn': 2757, 'schüttet': 2758, 'frühlingsblüten': 2759, 'geliebten': 2760, 'flicht': 2761, 'unbedeutend': 2762, 'ehrenkranz': 2763, 'verdiensten': 2764, 'sichert': 2765, 'olymp': 2766, 'vereinet': 2767, 'offenbart': 2768, 'dichtrischen': 2769, 'geschäfte': 2770, 'liebesabenteuer': 2771, 'zufällig': 2772, 'verflochten': 2773, 'angefochten': 2774, 'versieht': 2775, 'roman': 2776, 'menschenleben': 2777, 'lebts': 2778, 'interessant': 2779, 'bildern': 2780, 'fünkchen': 2781, 'gebraut': 2782, 'auferbaut': 2783, 'blüte': 2784, 'sauget': 2785, 'zärtliche': 2786, 'gemüthe': 2787, 'melancholsche': 2788, 'aufgeregt': 2789, 'schwung': 2790, 'fertig': 2791, 'werdender': 2792, 'ununterbrochen': 2793, 'verhüllten': 2794, 'knospe': 2795, 'reichlich': 2796, 'füllten': 2797, 'schmerzenvolle': 2798, 'hasses': 2799, 'bedarfst': 2800, 'schlachten': 2801, 'feinde': 2802, 'allerliebste': 2803, 'schnellen': 2804, 'laufes': 2805, 'erreichten': 2806, 'winket': 2807, 'heftgen': 2808, 'wirbeltanz': 2809, 'schmausend': 2810, 'vertrinket': 2811, 'bekannte': 2812, 'saitenspiel': 2813, 'anmuth': 2814, 'einzugreifen': 2815, 'selbgesteckten': 2816, 'holdem': 2817, 'hinzuschweifen': 2818, 'minder': 2819, 'kindisch': 2820, 'wahre': 2821, 'gewechselt': 2822, 'complimente': 2823, 'drechselt': 2824, 'nützliches': 2825, 'stimmung': 2826, 'zaudernden': 2827, 'kommandirt': 2828, 'poesie': 2829, 'bedürfen': 2830, 'stark': 2831, 'getränke': 2832, 'schlürfen': 2833, 'unverzüglich': 2834, 'geschieht': 2835, 'verpassen': 2836, 'mögliche': 2837, 'entschluß': 2838, 'beherzt': 2839, 'bühnen': 2840, 'schonet': 2841, 'prospecte': 2842, 'maschinen': 2843, 'dürfet': 2844, 'verschwenden': 2845, 'vögeln': 2846, 'schreitet': 2847, 'breterhaus': 2848, 'bedächtger': 2849, 'prolog': 2850, 'heerscharen': 2851, 'raphael': 2852, 'brudersphären': 2853, 'wettgesang': 2854, 'vorgeschriebne': 2855, 'vollendet': 2856, 'donnergang': 2857, 'gabriel': 2858, 'paradieseshelle': 2859, 'schauervoller': 2860, 'flüssen': 2861, 'fortgerissen': 2862, 'schnellem': 2863, 'sphärenlauf': 2864, 'stürme': 2865, 'brausen': 2866, 'wirkung': 2867, 'blitzendes': 2868, 'verheeren': 2869, 'donnerschlags': 2870, 'boten': 2871, 'sanfte': 2872, 'wandeln': 2873, 'tags': 2874, 'nahst': 2875, 'befinde': 2876, 'sahst': 2877, 'gesinde': 2878, 'verhöhnt': 2879, 'pathos': 2880, 'brächte': 2881, 'abgewöhnt': 2882, 'gleichem': 2883, 'himmelslichts': 2884, 'brauchts': 2885, 'thierischer': 2886, 'verlaub': 2887, 'ew': 2888, 'langbeinigen': 2889, 'cicaden': 2890, 'fliegt': 2891, 'fliegend': 2892, 'liedchen': 2893, 'quark': 2894, 'anzuklagen': 2895, 'dauern': 2896, 'jammertagen': 2897, 'kennst': 2898, 'irdisch': 2899, 'fordert': 2900, 'näh': 2901, 'befriedigt': 2902, 'tiefbewegte': 2903, 'verworren': 2904, 'gärtner': 2905, 'bäumchen': 2906, 'grünt': 2907, 'blüt': 2908, 'wettet': 2909, 'erlaubniß': 2910, 'verboten': 2911, 'anam': 2912, 'frischen': 2913, 'wangen': 2914, 'leichnam': 2915, 'katze': 2916, 'überlassen': 2917, 'urquell': 2918, 'drange': 2919, 'weges': 2920, 'gelange': 2921, 'triumph': 2922, 'fressen': 2923, 'berühmte': 2924, 'gehaßt': 2925, 'verneinen': 2926, 'schalk': 2927, 'wenigsten': 2928, 'allzuleicht': 2929, 'erschlaffen': 2930, 'unbedingte': 2931, 'reizt': 2932, 'göttersöhne': 2933, 'werdende': 2934, 'umfaß': 2935, 'schranken': 2936, 'schwankender': 2937, 'befestiget': 2938, 'dauernden': 2939, 'schließt': 2940, 'vertheilen': 2941, 'hüte': 2942, 'menschlich': 2943, 'tragödie': 2944, 'hochgewölbten': 2945, 'gothischen': 2946, 'pulte': 2947, 'philosophie': 2948, 'juristerey': 2949, 'durchaus': 2950, 'studirt': 2951, 'heißem': 2952, 'zuvor': 2953, 'ziehe': 2954, 'zehen': 2955, 'krumm': 2956, 'verbrennen': 2957, 'laffen': 2958, 'doctoren': 2959, 'schreiber': 2960, 'scrupel': 2961, 'entrissen': 2962, 'magie': 2963, 'ergeben': 2964, 'sauerm': 2965, 'erkenne': 2966, 'innersten': 2967, 'zusammenhält': 2968, 'wirkenskraft': 2969, 'samen': 2970, 'kramen': 2971, 'sähst': 2972, 'mondenschein': 2973, 'herangewacht': 2974, 'trübselger': 2975, 'erschienst': 2976, 'bergeshöhn': 2977, 'lichte': 2978, 'bergeshöle': 2979, 'dämmer': 2980, 'wissensqualm': 2981, 'entladen': 2982, 'baden': 2983, 'steck': 2984, 'dumpfes': 2985, 'mauerloch': 2986, 'gemahlte': 2987, 'scheiben': 2988, 'beschränkt': 2989, 'bücherhauf': 2990, 'würme': 2991, 'nagen': 2992, 'umsteckt': 2993, 'gläsern': 2994, 'büchsen': 2995, 'umstellt': 2996, 'vollgepfropft': 2997, 'urväter': 2998, 'hausrath': 2999, 'gestopft': 3000, 'klemmt': 3001, 'unerklärter': 3002, 'lebensregung': 3003, 'hemmt': 3004, 'moder': 3005, 'thiergeripp': 3006, 'todtenbein': 3007, 'flieh': 3008, 'geheimnißvolle': 3009, 'nostradamus': 3010, 'erkennest': 3011, 'seelenkraft': 3012, 'trocknes': 3013, 'antwortet': 3014, 'makrokosmus': 3015, 'heilges': 3016, 'lebensglück': 3017, 'neuglühend': 3018, 'nerv': 3019, 'rinnen': 3020, 'schrieb': 3021, 'füllen': 3022, 'geheimnißvollem': 3023, 'enthüllen': 3024, 'wirkende': 3025, 'erkenn': 3026, 'geisterwelt': 3027, 'verschlossen': 3028, 'bade': 3029, 'unverdrossen': 3030, 'irdsche': 3031, 'himmelskräfte': 3032, 'eimer': 3033, 'segenduftenden': 3034, 'harmonisch': 3035, 'durchklingen': 3036, 'unendliche': 3037, 'brüste': 3038, 'quellt': 3039, 'tränkt': 3040, 'schmacht': 3041, 'erdgeistes': 3042, 'glüh': 3043, 'herumzuschlagen': 3044, 'schiffbruchs': 3045, 'knirschen': 3046, 'wölkt': 3047, 'dampft': 3048, 'zucken': 3049, 'rothe': 3050, 'strahlen': 3051, 'weht': 3052, 'schwebst': 3053, 'erflehter': 3054, 'enthülle': 3055, 'reißt': 3056, 'gefühlen': 3057, 'erwühlen': 3058, 'kostet': 3059, 'röthliche': 3060, 'abgewendet': 3061, 'schreckliches': 3062, 'sphäre': 3063, 'gesogen': 3064, 'ertrag': 3065, 'flehst': 3066, 'erathmend': 3067, 'seelenflehn': 3068, 'uebermenschen': 3069, 'ruf': 3070, 'erschuf': 3071, 'hegte': 3072, 'freudebeben': 3073, 'erschwoll': 3074, 'deß': 3075, 'erklang': 3076, 'hauch': 3077, 'lebenstiefen': 3078, 'weggekrümmter': 3079, 'flammenbildung': 3080, 'lebensfluthen': 3081, 'thatensturm': 3082, 'wall': 3083, 'webe': 3084, 'geburt': 3085, 'ewiges': 3086, 'wechselnd': 3087, 'glühend': 3088, 'sausenden': 3089, 'webstuhl': 3090, 'wirke': 3091, 'lebendiges': 3092, 'kleid': 3093, 'umschweifst': 3094, 'geschäftiger': 3095, 'gleichst': 3096, 'begreifst': 3097, 'zusammenstürzend': 3098, 'kenns': 3099, 'famulus': 3100, 'schönstes': 3101, 'nichte': 3102, 'gesichte': 3103, 'trockne': 3104, 'schleicher': 3105, 'stören': 3106, 'schlafrocke': 3107, 'nachtmütze': 3108, 'griechisch': 3109, 'trauerspiel': 3110, 'profitiren': 3111, 'öfters': 3112, 'rühmen': 3113, 'museum': 3114, 'feyertag': 3115, 'fernglas': 3116, 'ueberredung': 3117, 'leiten': 3118, 'werdets': 3119, 'erjagen': 3120, 'urkräftigem': 3121, 'hörer': 3122, 'leimt': 3123, 'blast': 3124, 'kümmerlichen': 3125, 'aschenhäufchen': 3126, 'raus': 3127, 'bewundrung': 3128, 'affen': 3129, 'vortrag': 3130, 'redners': 3131, 'redlichen': 3132, 'schellenlauter': 3133, 'nachzujagen': 3134, 'blinkend': 3135, 'schnitzel': 3136, 'kräuselt': 3137, 'unerquicklich': 3138, 'nebelwind': 3139, 'herbstlich': 3140, 'säuselt': 3141, 'kritischen': 3142, 'bestreben': 3143, 'erwerben': 3144, 'halben': 3145, 'erreicht': 3146, 'bronnen': 3147, 'versetzen': 3148, 'vergangenheit': 3149, 'siegeln': 3150, 'bespiegeln': 3151, 'kehrichtfaß': 3152, 'rumpelkammer': 3153, 'höchstens': 3154, 'staatsaction': 3155, 'trefflichen': 3156, 'pragmatischen': 3157, 'maximen': 3158, 'ziemen': 3159, 'jeglicher': 3160, 'wahrten': 3161, 'offenbarten': 3162, 'gekreutzigt': 3163, 'verbrannt': 3164, 'müssens': 3165, 'unterbrechen': 3166, 'fortgewacht': 3167, 'ostertage': 3168, 'eifer': 3169, 'studien': 3170, 'beflissen': 3171, 'schalem': 3172, 'zeuge': 3173, 'gierger': 3174, 'gräbt': 3175, 'regenwürmer': 3176, 'menschenstimme': 3177, 'geisterfülle': 3178, 'umgab': 3179, 'ertönen': 3180, 'ärmlichsten': 3181, 'erdensöhnen': 3182, 'rissest': 3183, 'zerstören': 3184, 'riesengroß': 3185, 'zwerg': 3186, 'gedünkt': 3187, 'ewger': 3188, 'himmelsglanz': 3189, 'abgestreift': 3190, 'cherub': 3191, 'schaffend': 3192, 'götterleben': 3193, 'vermaß': 3194, 'donnerwort': 3195, 'anzuziehn': 3196, 'besessen': 3197, 'selgen': 3198, 'fühlte': 3199, 'grausam': 3200, 'ungewisse': 3201, 'menschenloos': 3202, 'hemmen': 3203, 'herrlichsten': 3204, 'stoff': 3205, 'beßre': 3206, 'erstarren': 3207, 'irdischen': 3208, 'kühnem': 3209, 'flug': 3210, 'hoffnungsvoll': 3211, 'erweitert': 3212, 'zeitenstrudel': 3213, 'scheitert': 3214, 'sorge': 3215, 'nistet': 3216, 'wiegt': 3217, 'störet': 3218, 'masken': 3219, 'bebst': 3220, 'verlierst': 3221, 'gefühlt': 3222, 'wurme': 3223, 'durchwühlt': 3224, 'nährend': 3225, 'wandrers': 3226, 'vernichtet': 3227, 'wand': 3228, 'fächern': 3229, 'verenget': 3230, 'trödel': 3231, 'tausendfachem': 3232, 'tand': 3233, 'mottenwelt': 3234, 'dränget': 3235, 'glücklicher': 3236, 'hohler': 3237, 'schädel': 3238, 'verwirret': 3239, 'gesucht': 3240, 'jämmerlich': 3241, 'geirret': 3242, 'instrumente': 3243, 'rad': 3244, 'kämmen': 3245, 'walz': 3246, 'bügel': 3247, 'kraus': 3248, 'lichten': 3249, 'schleyers': 3250, 'berauben': 3251, 'zwingst': 3252, 'hebeln': 3253, 'schrauben': 3254, 'geräthe': 3255, 'rolle': 3256, 'schmauchte': 3257, 'weniges': 3258, 'verpraßt': 3259, 'belastet': 3260, 'ererbt': 3261, 'vätern': 3262, 'erwirb': 3263, 'besitzen': 3264, 'erschafft': 3265, 'heftet': 3266, 'magnet': 3267, 'nächtgen': 3268, 'mondenglanz': 3269, 'umweht': 3270, 'einzige': 3271, 'phiole': 3272, 'andacht': 3273, 'herunterhole': 3274, 'verehr': 3275, 'menschenwitz': 3276, 'schlummersäfte': 3277, 'auszug': 3278, 'tödlich': 3279, 'erweise': 3280, 'gelindert': 3281, 'gemindert': 3282, 'fluthstrom': 3283, 'ebbet': 3284, 'hinausgewiesen': 3285, 'spiegelfluth': 3286, 'erglänzt': 3287, 'ufern': 3288, 'feuerwagen': 3289, 'durchdringen': 3290, 'reiner': 3291, 'götterwonne': 3292, 'verdienest': 3293, 'erdensonne': 3294, 'entschlossen': 3295, 'vermesse': 3296, 'pforten': 3297, 'aufzureißen': 3298, 'beweisen': 3299, 'manneswürde': 3300, 'götterhöhe': 3301, 'quaal': 3302, 'verdammt': 3303, 'durchgang': 3304, 'hinzustreben': 3305, 'heiter': 3306, 'entschließen': 3307, 'krystallne': 3308, 'futterale': 3309, 'glänztest': 3310, 'väter': 3311, 'freudenfeste': 3312, 'erheitertest': 3313, 'künstlich': 3314, 'trinkers': 3315, 'reimweis': 3316, 'erklären': 3317, 'höhlung': 3318, 'auszuleeren': 3319, 'erinnert': 3320, 'jugendnacht': 3321, 'eilig': 3322, 'trunken': 3323, 'brauner': 3324, 'flut': 3325, 'erfüllt': 3326, 'wähle': 3327, 'festlich': 3328, 'setzt': 3329, 'glockenklang': 3330, 'chorgesang': 3331, 'verderblichen': 3332, 'schleichenden': 3333, 'erblichen': 3334, 'mängel': 3335, 'tiefes': 3336, 'summen': 3337, 'heller': 3338, 'verkündiget': 3339, 'glocken': 3340, 'osterfestes': 3341, 'feyerstunde': 3342, 'tröstlichen': 3343, 'grabes': 3344, 'engelslippen': 3345, 'bunde': 3346, 'spezereyen': 3347, 'gepflegt': 3348, 'treuen': 3349, 'hingelegt': 3350, 'tücher': 3351, 'betrübende': 3352, 'übende': 3353, 'prüfung': 3354, 'bestanden': 3355, 'gelind': 3356, 'himmelstöne': 3357, 'weiche': 3358, 'botschaft': 3359, 'glaubens': 3360, 'liebstes': 3361, 'woher': 3362, 'nachricht': 3363, 'himmelsliebe': 3364, 'ernster': 3365, 'sabathstille': 3366, 'glockentones': 3367, 'gebet': 3368, 'brünstiger': 3369, 'hinzugehn': 3370, 'verkündete': 3371, 'muntre': 3372, 'spiele': 3373, 'frühlingsfeyer': 3374, 'freyes': 3375, 'erinnrung': 3376, 'tönet': 3377, 'himmelslieder': 3378, 'jünger': 3379, 'begrabene': 3380, 'lebend': 3381, 'erhabene': 3382, 'erhoben': 3383, 'werdelust': 3384, 'schaffender': 3385, 'verwesung': 3386, 'reißet': 3387, 'freudig': 3388, 'thätig': 3389, 'preisenden': 3390, 'beweisenden': 3391, 'brüderlich': 3392, 'speisenden': 3393, 'predigend': 3394, 'reisenden': 3395, 'verheißenden': 3396, 'spaziergänger': 3397, 'handwerksbursche': 3398, 'jägerhaus': 3399, 'wasserhof': 3400, 'thust': 3401, 'vierter': 3402, 'burgdorf': 3403, 'händel': 3404, 'sorte': 3405, 'fünfter': 3406, 'überlustiger': 3407, 'gesell': 3408, 'juckt': 3409, 'drittenmal': 3410, 'fell': 3411, 'orte': 3412, 'dienstmädchen': 3413, 'pappeln': 3414, 'krauskopf': 3415, 'blitz': 3416, 'wackern': 3417, 'dirnen': 3418, 'schreiten': 3419, 'begleiten': 3420, 'starkes': 3421, 'beizender': 3422, 'toback': 3423, 'geschmack': 3424, 'allerbeste': 3425, 'mägden': 3426, 'niedlich': 3427, 'genirt': 3428, 'wildpret': 3429, 'samstags': 3430, 'sontags': 3431, 'caressiren': 3432, 'burgemeister': 3433, 'dreister': 3434, 'wohlgeputzt': 3435, 'backenroth': 3436, 'belieb': 3437, 'anzuschauen': 3438, 'mildert': 3439, 'leyern': 3440, 'aerndetag': 3441, 'feyertagen': 3442, 'kriegsgeschrey': 3443, 'türkey': 3444, 'völker': 3445, 'schiffe': 3446, 'gleiten': 3447, 'abends': 3448, 'fried': 3449, 'friedenszeiten': 3450, 'köpfe': 3451, 'bleibs': 3452, 'geputzt': 3453, 'vergaffen': 3454, 'agathe': 3455, 'öffentlich': 3456, 'sanct': 3457, 'andreas': 3458, 'leiblich': 3459, 'zeigte': 3460, 'krystall': 3461, 'soldatenhaft': 3462, 'mehreren': 3463, 'verwegnen': 3464, 'zinnen': 3465, 'höhnenden': 3466, 'trompete': 3467, 'werben': 3468, 'eise': 3469, 'befreyt': 3470, 'bäche': 3471, 'frühlings': 3472, 'belebenden': 3473, 'thale': 3474, 'grünet': 3475, 'hoffnungsglück': 3476, 'winter': 3477, 'schwäche': 3478, 'rauhe': 3479, 'berge': 3480, 'dorther': 3481, 'sendet': 3482, 'fliehend': 3483, 'ohnmächtige': 3484, 'körnigen': 3485, 'eises': 3486, 'grünende': 3487, 'flur': 3488, 'duldet': 3489, 'weißes': 3490, 'ueberall': 3491, 'bildung': 3492, 'beleben': 3493, 'fehlts': 3494, 'revier': 3495, 'geputzte': 3496, 'hohlen': 3497, 'finstren': 3498, 'buntes': 3499, 'gewimmel': 3500, 'sonnt': 3501, 'auferstehung': 3502, 'auferstanden': 3503, 'niedriger': 3504, 'häuser': 3505, 'gemächern': 3506, 'handwerks': 3507, 'gewerbes': 3508, 'giebeln': 3509, 'dächern': 3510, 'straßen': 3511, 'quetschender': 3512, 'kirchen': 3513, 'ehrwürdiger': 3514, 'gärten': 3515, 'felder': 3516, 'zerschlägt': 3517, 'nachen': 3518, 'überladen': 3519, 'kahn': 3520, 'berges': 3521, 'fernen': 3522, 'blinken': 3523, 'farbige': 3524, 'dorfs': 3525, 'getümmel': 3526, 'wahrer': 3527, 'jauchzet': 3528, 'spazieren': 3529, 'feind': 3530, 'fiedeln': 3531, 'schreien': 3532, 'kegelschieben': 3533, 'verhaßter': 3534, 'bauern': 3535, 'schäfer': 3536, 'putzte': 3537, 'bunter': 3538, 'jacke': 3539, 'tanzte': 3540, 'drückte': 3541, 'hastig': 3542, 'stieß': 3543, 'sagte': 3544, 'ungezogen': 3545, 'hurtig': 3546, 'gings': 3547, 'röcke': 3548, 'flogen': 3549, 'ruhten': 3550, 'athmend': 3551, 'hüft': 3552, 'belogen': 3553, 'schmeichelte': 3554, 'scholl': 3555, 'geschrei': 3556, 'volksgedräng': 3557, 'hochgelahrter': 3558, 'nehmet': 3559, 'krug': 3560, 'gefüllt': 3561, 'wünsche': 3562, 'zahl': 3563, 'hegt': 3564, 'zugelegt': 3565, 'erquickungstrank': 3566, 'erwiedr': 3567, 'heil': 3568, 'frohen': 3569, 'vormals': 3570, 'fieberwuth': 3571, 'entriß': 3572, 'seuche': 3573, 'junger': 3574, 'gingt': 3575, 'krankenhaus': 3576, 'leiche': 3577, 'kamt': 3578, 'bestandet': 3579, 'harte': 3580, 'half': 3581, 'gesundheit': 3582, 'bewährten': 3583, 'gebückt': 3584, 'wagnern': 3585, 'verehrung': 3586, 'fiedel': 3587, 'stockt': 3588, 'weilt': 3589, 'mützen': 3590, 'beugten': 3591, 'knie': 3592, 'venerabile': 3593, 'wandrung': 3594, 'gedankenvoll': 3595, 'quälte': 3596, 'fasten': 3597, 'seufzen': 3598, 'händeringen': 3599, 'himmels': 3600, 'erzwingen': 3601, 'könntest': 3602, 'ruhmes': 3603, 'dunkler': 3604, 'ehrenmann': 3605, 'redlichkeit': 3606, 'jedoch': 3607, 'grillenhafter': 3608, 'mühe': 3609, 'sann': 3610, 'adepten': 3611, 'schwarze': 3612, 'recepten': 3613, 'widrige': 3614, 'zusammengoß': 3615, 'rother': 3616, 'leu': 3617, 'lauen': 3618, 'bad': 3619, 'lilie': 3620, 'vermählt': 3621, 'offnem': 3622, 'flammenfeuer': 3623, 'brautgemach': 3624, 'erschien': 3625, 'königin': 3626, 'patienten': 3627, 'starben': 3628, 'fragte': 3629, 'genas': 3630, 'latwergen': 3631, 'thälern': 3632, 'bergen': 3633, 'getobt': 3634, 'tausende': 3635, 'welkten': 3636, 'frechen': 3637, 'übertrug': 3638, 'gewissenhaft': 3639, 'pünctlich': 3640, 'auszuüben': 3641, 'jüngling': 3642, 'vermehrst': 3643, 'höhrem': 3644, 'irrthums': 3645, 'aufzutauchen': 3646, 'trübsinn': 3647, 'verkümmern': 3648, 'abendsonneglut': 3649, 'grünumgebnen': 3650, 'hütten': 3651, 'schimmern': 3652, 'überlebt': 3653, 'fördert': 3654, 'säh': 3655, 'abendstrahl': 3656, 'höhn': 3657, 'silberbach': 3658, 'ströme': 3659, 'hemmte': 3660, 'göttergleichen': 3661, 'schluchten': 3662, 'erwärmten': 3663, 'buchten': 3664, 'erstaunten': 3665, 'göttin': 3666, 'wegzusinken': 3667, 'ewges': 3668, 'entweicht': 3669, 'flügeln': 3670, 'körperlicher': 3671, 'eingeboren': 3672, 'blauen': 3673, 'schmetternd': 3674, 'lerche': 3675, 'schroffen': 3676, 'fichtenhöhen': 3677, 'adler': 3678, 'ausgebreitet': 3679, 'flächen': 3680, 'heimat': 3681, 'grillenhafte': 3682, 'feldern': 3683, 'vogels': 3684, 'fittig': 3685, 'beneiden': 3686, 'geistesfreuden': 3687, 'winternächte': 3688, 'wärmet': 3689, 'entrollst': 3690, 'pergamen': 3691, 'triebs': 3692, 'wohnen': 3693, 'trennen': 3694, 'derber': 3695, 'klammernden': 3696, 'organen': 3697, 'dust': 3698, 'gefilden': 3699, 'ahnen': 3700, 'zwischen': 3701, 'herrschend': 3702, 'steiget': 3703, 'duft': 3704, 'buntem': 3705, 'zaubermantel': 3706, 'trüg': 3707, 'köstlichsten': 3708, 'feil': 3709, 'königsmantel': 3710, 'berufe': 3711, 'wohlbekannte': 3712, 'strömend': 3713, 'überbreitet': 3714, 'tausendfältige': 3715, 'enden': 3716, 'norden': 3717, 'scharfe': 3718, 'geisterzahn': 3719, 'pfeilgespitzten': 3720, 'zungen': 3721, 'vertrocknend': 3722, 'nähren': 3723, 'lungen': 3724, 'wüste': 3725, 'scheitel': 3726, 'aue': 3727, 'ersäufen': 3728, 'gewandt': 3729, 'gesandt': 3730, 'lispeln': 3731, 'englisch': 3732, 'ergraut': 3733, 'gekühlt': 3734, 'schätzt': 3735, 'wichtig': 3736, 'betracht': 3737, 'bemerkst': 3738, 'weitem': 3739, 'schneckenkreise': 3740, 'jagt': 3741, 'feuerstrudel': 3742, 'hinterdrein': 3743, 'augentäuschung': 3744, 'magisch': 3745, 'schlingen': 3746, 'künftgem': 3747, 'ungewiß': 3748, 'umspringen': 3749, 'unbekannte': 3750, 'knurrt': 3751, 'zweifelt': 3752, 'wedelt': 3753, 'pudelnärrisch': 3754, 'stehest': 3755, 'verliere': 3756, 'dressur': 3757, 'verdient': 3758, 'studenten': 3759, 'trefflicher': 3760, 'scolar': 3761, 'stadtthor': 3762, 'hereintretend': 3763, 'ahndungsvollem': 3764, 'heilgem': 3765, 'bessre': 3766, 'weckt': 3767, 'entschlafen': 3768, 'reget': 3769, 'menschenliebe': 3770, 'schnoperst': 3771, 'bestes': 3772, 'kissen': 3773, 'bergigen': 3774, 'rennen': 3775, 'pflege': 3776, 'blühn': 3777, 'sehnt': 3778, 'quelle': 3779, 'knurre': 3780, 'thierische': 3781, 'passen': 3782, 'verhöhnen': 3783, 'beschwerlich': 3784, 'murren': 3785, 'beknurren': 3786, 'befriedigung': 3787, 'versiegen': 3788, 'durste': 3789, 'erfahrung': 3790, 'mangel': 3791, 'ersetzen': 3792, 'ueberirdische': 3793, 'nirgends': 3794, 'würdger': 3795, 'testament': 3796, 'drängts': 3797, 'grundtext': 3798, 'aufzuschlagen': 3799, 'redlichem': 3800, 'heilige': 3801, 'geliebtes': 3802, 'deutsch': 3803, 'übertragen': 3804, 'volum': 3805, 'übersetzen': 3806, 'geiste': 3807, 'bedenke': 3808, 'zeile': 3809, 'übereile': 3810, 'niederschreibe': 3811, 'warnt': 3812, 'schreibe': 3813, 'theilen': 3814, 'bellen': 3815, 'störenden': 3816, 'ungern': 3817, 'heb': 3818, 'gastrecht': 3819, 'geschehen': 3820, 'wirklichkeit': 3821, 'hundes': 3822, 'bracht': 3823, 'nilpferd': 3824, 'feurigen': 3825, 'schrecklichem': 3826, 'gebiß': 3827, 'höllenbrut': 3828, 'salomonis': 3829, 'bleibet': 3830, 'haußen': 3831, 'eisen': 3832, 'fuchs': 3833, 'zagt': 3834, 'höllenluchs': 3835, 'losgemacht': 3836, 'glühen': 3837, 'kennte': 3838, 'eigenschaft': 3839, 'verschwind': 3840, 'rauschend': 3841, 'fließe': 3842, 'meteorenschöne': 3843, 'häußliche': 3844, 'keines': 3845, 'grinst': 3846, 'stärker': 3847, 'borstigen': 3848, 'verworfnes': 3849, 'entsprossnen': 3850, 'unausgesprochnen': 3851, 'gegossnen': 3852, 'durchstochnen': 3853, 'elephant': 3854, 'füllt': 3855, 'drohe': 3856, 'versenge': 3857, 'heiliger': 3858, 'lohe': 3859, 'stärkste': 3860, 'künsten': 3861, 'gekleidet': 3862, 'scholastikus': 3863, 'wozu': 3864, 'lärm': 3865, 'pudels': 3866, 'kern': 3867, 'scolast': 3868, 'casus': 3869, 'salutire': 3870, 'gelehrten': 3871, 'weidlich': 3872, 'verachtet': 3873, 'trachtet': 3874, 'allzudeutlich': 3875, 'weist': 3876, 'fliegengott': 3877, 'verderber': 3878, 'räthselwort': 3879, 'verneint': 3880, 'wärs': 3881, 'entstünde': 3882, 'zerstörung': 3883, 'eigentliches': 3884, 'bescheidne': 3885, 'sprech': 3886, 'narrenwelt': 3887, 'theils': 3888, 'anfangs': 3889, 'stolze': 3890, 'streitig': 3891, 'gelingts': 3892, 'verhaftet': 3893, 'strömts': 3894, 'hemmts': 3895, 'würdgen': 3896, 'pflichten': 3897, 'vernichten': 3898, 'entgegenstellt': 3899, 'unternommen': 3900, 'beyzukommen': 3901, 'schütteln': 3902, 'brand': 3903, 'geruhig': 3904, 'verdammten': 3905, 'menschenbrut': 3906, 'anzuhaben': 3907, 'zirkulirt': 3908, 'frisches': 3909, 'rasend': 3910, 'entwinden': 3911, 'keime': 3912, 'kalten': 3913, 'vorbehalten': 3914, 'aparts': 3915, 'setzest': 3916, 'schaffenden': 3917, 'kalte': 3918, 'teufelsfaust': 3919, 'tückisch': 3920, 'ballt': 3921, 'chaos': 3922, 'wunderlicher': 3923, 'wirklich': 3924, 'besinnen': 3925, 'nächstenmale': 3926, 'besuche': 3927, 'rauchfang': 3928, 'hinausspaziere': 3929, 'verbietet': 3930, 'hinderniß': 3931, 'drudenfuß': 3932, 'pentagramma': 3933, 'winkel': 3934, 'zufall': 3935, 'getroffen': 3936, 'gefangner': 3937, 'ohngefähr': 3938, 'merkte': 3939, 'hereingesprungen': 3940, 'gespenster': 3941, 'hereingeschlüpft': 3942, 'knechte': 3943, 'ließe': 3944, 'verspricht': 3945, 'abgezwackt': 3946, 'zunächst': 3947, 'diesesmal': 3948, 'entlassen': 3949, 'belieben': 3950, 'nachgestellt': 3951, 'garn': 3952, 'bedingniß': 3953, 'vertreiben': 3954, 'jahres': 3955, 'einerley': 3956, 'leeres': 3957, 'zauberspiel': 3958, 'letzen': 3959, 'bereitung': 3960, 'voran': 3961, 'fanget': 3962, 'wölbungen': 3963, 'reizender': 3964, 'schaue': 3965, 'blaue': 3966, 'zerronnen': 3967, 'sternelein': 3968, 'funkeln': 3969, 'mildere': 3970, 'sonnen': 3971, 'darein': 3972, 'himmlischer': 3973, 'beugung': 3974, 'sehnende': 3975, 'neigung': 3976, 'flatternde': 3977, 'sprossende': 3978, 'ranken': 3979, 'lastende': 3980, 'behälter': 3981, 'drängender': 3982, 'kelter': 3983, 'schäumende': 3984, 'rieseln': 3985, 'gesteine': 3986, 'genügen': 3987, 'grünender': 3988, 'geflügel': 3989, 'schlürfet': 3990, 'inseln': 3991, 'gauklend': 3992, 'jauchzende': 3993, 'tanzende': 3994, 'glimmenklimmen': 3995, 'schwimmen': 3996, 'seliger': 3997, 'huld': 3998, 'luftgen': 3999, 'treulich': 4000, 'eingesungen': 4001, 'concert': 4002, 'umgaukelt': 4003, 'traumgestalten': 4004, 'versenkt': 4005, 'wahns': 4006, 'zauber': 4007, 'zerspalten': 4008, 'bedarf': 4009, 'rattenzahns': 4010, 'raschelt': 4011, 'frösche': 4012, 'wanzen': 4013, 'läuse': 4014, 'befiehlt': 4015, 'benagen': 4016, 'oel': 4017, 'betupft': 4018, 'hervorgehupft': 4019, 'spitze': 4020, 'bannte': 4021, 'vornen': 4022, 'kante': 4023, 'biß': 4024, 'fauste': 4025, 'erwachend': 4026, 'abermals': 4027, 'geisterreiche': 4028, 'vorgelogen': 4029, 'entsprang': 4030, 'gefällst': 4031, 'grillen': 4032, 'verjagen': 4033, 'rothem': 4034, 'goldverbrämten': 4035, 'mäntelchen': 4036, 'starrer': 4037, 'spitzen': 4038, 'rathe': 4039, 'dergleichen': 4040, 'gleichfalls': 4041, 'anzulegen': 4042, 'losgebunden': 4043, 'erfahrest': 4044, 'erdelebens': 4045, 'gewähren': 4046, 'heiser': 4047, 'bittre': 4048, 'ahndung': 4049, 'eigensinnigem': 4050, 'krittel': 4051, 'mindert': 4052, 'lebensfratzen': 4053, 'hindert': 4054, 'niedersenkt': 4055, 'innerstes': 4056, 'erregen': 4057, 'thront': 4058, 'daseyn': 4059, 'erwünscht': 4060, 'seelig': 4061, 'siegesglanze': 4062, 'blutgen': 4063, 'lorbeern': 4064, 'schläfe': 4065, 'windet': 4066, 'durchrastem': 4067, 'mädchens': 4068, 'entseelt': 4069, 'gesunken': 4070, 'braunen': 4071, 'ausgetrunken': 4072, 'spioniren': 4073, 'scheints': 4074, 'allwissend': 4075, 'schrecklichen': 4076, 'bekannter': 4077, 'rest': 4078, 'anklang': 4079, 'betrog': 4080, 'lock': 4081, 'gaukelwerk': 4082, 'umspannt': 4083, 'trauerhöle': 4084, 'schmeichelkräften': 4085, 'meinung': 4086, 'womit': 4087, 'umfängt': 4088, 'blenden': 4089, 'heuchelt': 4090, 'ruhms': 4091, 'namensdauer': 4092, 'besitz': 4093, 'pflug': 4094, 'müßigem': 4095, 'polster': 4096, 'zurechte': 4097, 'balsamsaft': 4098, 'liebeshuld': 4099, 'geisterchor': 4100, 'zerstört': 4101, 'zerfällt': 4102, 'halbgott': 4103, 'zerschlagen': 4104, 'verlorne': 4105, 'erdensöhne': 4106, 'prächtiger': 4107, 'beginne': 4108, 'altklug': 4109, 'einsamkeit': 4110, 'säfte': 4111, 'stocken': 4112, 'gram': 4113, 'geyer': 4114, 'schlechteste': 4115, 'stoßen': 4116, 'vereint': 4117, 'dagegen': 4118, 'frist': 4119, 'egoist': 4120, 'nützlich': 4121, 'bedingung': 4122, 'deutlich': 4123, 'wink': 4124, 'kümmern': 4125, 'schlägst': 4126, 'haßt': 4127, 'sättigt': 4128, 'quecksilber': 4129, 'zerrinnt': 4130, 'aeugeln': 4131, 'verbindet': 4132, 'götterlust': 4133, 'meteor': 4134, 'fault': 4135, 'begrünen': 4136, 'auftrag': 4137, 'schreckt': 4138, 'ruhe': 4139, 'schmausen': 4140, 'faulbett': 4141, 'schmeichelnd': 4142, 'belügen': 4143, 'biet': 4144, 'verweile': 4145, 'todtenglocke': 4146, 'schallen': 4147, 'dienstes': 4148, 'uhr': 4149, 'zeiger': 4150, 'bedenk': 4151, 'werdens': 4152, 'beharre': 4153, 'frag': 4154, 'doctorschmaus': 4155, 'sterbens': 4156, 'zeilen': 4157, 'geschriebnes': 4158, 'forderst': 4159, 'pedant': 4160, 'manneswort': 4161, 'gekannt': 4162, 'gesprochnes': 4163, 'schalten': 4164, 'strömen': 4165, 'versprechen': 4166, 'gelegt': 4167, 'beglückt': 4168, 'gereuen': 4169, 'beschrieben': 4170, 'beprägt': 4171, 'erstirbt': 4172, 'herrschaft': 4173, 'leder': 4174, 'erz': 4175, 'marmor': 4176, 'griffel': 4177, 'meißel': 4178, 'schreiben': 4179, 'wahl': 4180, 'rednerey': 4181, 'hitzig': 4182, 'übertreiben': 4183, 'blättchen': 4184, 'unterzeichnest': 4185, 'tröpfchen': 4186, 'völlig': 4187, 'gnüge': 4188, 'fratze': 4189, 'besondrer': 4190, 'furcht': 4191, 'bündniß': 4192, 'breche': 4193, 'verspreche': 4194, 'gebläht': 4195, 'gehör': 4196, 'verschließt': 4197, 'denkens': 4198, 'ekelt': 4199, 'sinnlichkeit': 4200, 'undurchdrungnen': 4201, 'zauberhüllen': 4202, 'begebenheit': 4203, 'wechseln': 4204, 'rastlos': 4205, 'bethätigt': 4206, 'beliebts': 4207, 'naschen': 4208, 'fliehen': 4209, 'erhaschen': 4210, 'bekomm': 4211, 'blöde': 4212, 'hörest': 4213, 'taumel': 4214, 'weih': 4215, 'schmerzlichsten': 4216, 'verliebtem': 4217, 'haß': 4218, 'erquickendem': 4219, 'wissensdrang': 4220, 'geheilt': 4221, 'verschließen': 4222, 'zugetheilt': 4223, 'tiefste': 4224, 'eigen': 4225, 'erweitern': 4226, 'zerscheitern': 4227, 'harten': 4228, 'kaut': 4229, 'bahre': 4230, 'sauerteig': 4231, 'verdaut': 4232, 'ewgen': 4233, 'glanze': 4234, 'taugt': 4235, 'ließet': 4236, 'associirt': 4237, 'qualitäten': 4238, 'ehrenscheitel': 4239, 'löwen': 4240, 'hirsches': 4241, 'schnelligkeit': 4242, 'italiäners': 4243, 'daurbarkeit': 4244, 'großmuth': 4245, 'arglist': 4246, 'jugendtrieben': 4247, 'plane': 4248, 'verlieben': 4249, 'mikrokosmus': 4250, 'erringen': 4251, 'perrücken': 4252, 'millionen': 4253, 'ellenhohe': 4254, 'socken': 4255, 'schätze': 4256, 'menschengeists': 4257, 'herbeygerafft': 4258, 'niedersetze': 4259, 'innerlich': 4260, 'händ': 4261, 'h': 4262, 'hintern': 4263, 'genieße': 4264, 'hengste': 4265, 'zwanzig': 4266, 'speculirt': 4267, 'dürrer': 4268, 'grüne': 4269, 'weide': 4270, 'marterort': 4271, 'jungens': 4272, 'wanst': 4273, 'dreschen': 4274, 'knabe': 4275, 'ungetröstet': 4276, 'mütze': 4277, 'maske': 4278, 'köstlich': 4279, 'kleidet': 4280, 'überlaß': 4281, 'witze': 4282, 'viertelstündchen': 4283, 'fahrt': 4284, 'fausts': 4285, 'langem': 4286, 'verachte': 4287, 'allerhöchste': 4288, 'zauberwerken': 4289, 'lügengeist': 4290, 'bestärken': 4291, 'unbedingt': 4292, 'übereiltes': 4293, 'überspringt': 4294, 'schlepp': 4295, 'flache': 4296, 'unbedeutenheit': 4297, 'zappeln': 4298, 'kleben': 4299, 'unersättlichkeit': 4300, 'speis': 4301, 'giergen': 4302, 'erflehn': 4303, 'alhier': 4304, 'ergebenheit': 4305, 'umgethan': 4306, 'leidlichem': 4307, 'hieraußen': 4308, 'aufrichtig': 4309, 'hallen': 4310, 'keineswegs': 4311, 'beschränkter': 4312, 'grünes': 4313, 'sälen': 4314, 'bänken': 4315, 'vergeht': 4316, 'gewohnheit': 4317, 'willig': 4318, 'ernährt': 4319, 'brüsten': 4320, 'gelüsten': 4321, 'hangen': 4322, 'hingelangen': 4323, 'sommerfeiertagen': 4324, 'zuerst': 4325, 'collegium': 4326, 'logicum': 4327, 'dressirt': 4328, 'spanische': 4329, 'stiefeln': 4330, 'eingeschnürt': 4331, 'bedächtiger': 4332, 'hinschleiche': 4333, 'gedankenbahn': 4334, 'irlichtelire': 4335, 'gedankenfabrik': 4336, 'webermeisterstück': 4337, 'schifflein': 4338, 'schießen': 4339, 'ungesehen': 4340, 'verbindungen': 4341, 'philosoph': 4342, 'beweist': 4343, 'zweyte': 4344, 'vierte': 4345, 'zweyt': 4346, 'viert': 4347, 'preisen': 4348, 'weber': 4349, 'lebendigs': 4350, 'theile': 4351, 'encheiresin': 4352, 'naturae': 4353, 'chimie': 4354, 'verstehen': 4355, 'nächstens': 4356, 'reduciren': 4357, 'gehörig': 4358, 'klassificiren': 4359, 'mühlrad': 4360, 'metaphysik': 4361, 'tiefsinnig': 4362, 'vorerst': 4363, 'wahr': 4364, 'glockenschlag': 4365, 'präparirt': 4366, 'paragraphos': 4367, 'einstudirt': 4368, 'schreibens': 4369, 'befleißt': 4370, 'dictirt': 4371, 'zweymal': 4372, 'besitzt': 4373, 'rechtsgelehrsamkeit': 4374, 'lehre': 4375, 'krankheit': 4376, 'schleppen': 4377, 'geschlechte': 4378, 'wohlthat': 4379, 'enkel': 4380, 'abscheu': 4381, 'vermehrt': 4382, 'belehrt': 4383, 'studiren': 4384, 'falschen': 4385, 'verborgnes': 4386, 'unterscheiden': 4387, 'schwört': 4388, 'haltet': 4389, 'sichre': 4390, 'pforte': 4391, 'tempel': 4392, 'begriff': 4393, 'allzu': 4394, 'quälen': 4395, 'begriffe': 4396, 'streiten': 4397, 'system': 4398, 'jota': 4399, 'wörtchen': 4400, 'fingerzeig': 4401, 'eher': 4402, 'tons': 4403, 'durchstudirt': 4404, 'wissenschaftlich': 4405, 'schweift': 4406, 'wohlgebaut': 4407, 'kühnheit': 4408, 'tausendfach': 4409, 'puncte': 4410, 'curiren': 4411, 'halbweg': 4412, 'ehrbar': 4413, 'titel': 4414, 'vertraulich': 4415, 'willkomm': 4416, 'tappt': 4417, 'siebensachen': 4418, 'pülslein': 4419, 'drücken': 4420, 'fasset': 4421, 'schlauen': 4422, 'schlanke': 4423, 'hüfte': 4424, 'geschnürt': 4425, 'theorie': 4426, 'andermal': 4427, 'beschweren': 4428, 'stammbuch': 4429, 'überreichen': 4430, 'liest': 4431, 'eritis': 4432, 'sicut': 4433, 'deus': 4434, 'scientes': 4435, 'bonum': 4436, 'et': 4437, 'malum': 4438, 'ehrerbietiegehrerbietig': 4439, 'empfiehlt': 4440, 'gottähnlichkeit': 4441, 'nutzen': 4442, 'cursum': 4443, 'durchschmarutzen': 4444, 'schicken': 4445, 'vertraust': 4446, 'mantel': 4447, 'bündel': 4448, 'feuerluft': 4449, 'gratulire': 4450, 'auerbachs': 4451, 'lustiger': 4452, 'nasses': 4453, 'lichterloh': 4454, 'dummheit': 4455, 'sauerey': 4456, 'gießt': 4457, 'beydes': 4458, 'doppelt': 4459, 'entzweyt': 4460, 'offner': 4461, 'runda': 4462, 'sauft': 4463, 'schreyt': 4464, 'holla': 4465, 'ho': 4466, 'baumwolle': 4467, 'sprengt': 4468, 'wiederschallt': 4469, 'basses': 4470, 'grundgewalt': 4471, 'kehlen': 4472, 'gestimmt': 4473, 'hälts': 4474, 'politisch': 4475, 'leidig': 4476, 'reichlichen': 4477, 'kaiser': 4478, 'kanzler': 4479, 'oberhaupt': 4480, 'papst': 4481, 'erwählen': 4482, 'qualität': 4483, 'ausschlag': 4484, 'erhöht': 4485, 'schwing': 4486, 'nachtigall': 4487, 'grüß': 4488, 'zehentausendmal': 4489, 'verwehren': 4490, 'liebste': 4491, 'rühme': 4492, 'bescheert': 4493, 'kreuzweg': 4494, 'schäkern': 4495, 'galopp': 4496, 'meckern': 4497, 'echtem': 4498, 'fleisch': 4499, 'gruße': 4500, 'eingeschmissen': 4501, 'schlagend': 4502, 'gehorchet': 4503, 'gesteht': 4504, 'verliebte': 4505, 'standsgebühr': 4506, 'neusten': 4507, 'schnitt': 4508, 'rundreim': 4509, 'ratt': 4510, 'kellernest': 4511, 'lebte': 4512, 'fett': 4513, 'butter': 4514, 'ränzlein': 4515, 'angemästt': 4516, 'luther': 4517, 'köchinn': 4518, 'gestellt': 4519, 'soff': 4520, 'pfützen': 4521, 'zernagt': 4522, 'zerkratzt': 4523, 'aengstesprung': 4524, 'zugelaufen': 4525, 'fiel': 4526, 'heerd': 4527, 'schnaufen': 4528, 'lachte': 4529, 'vergifterinn': 4530, 'pfeift': 4531, 'platten': 4532, 'bursche': 4533, 'schmerbauch': 4534, 'kahlen': 4535, 'platte': 4536, 'geschwollnen': 4537, 'ratte': 4538, 'zirkeltanz': 4539, 'katzen': 4540, 'kopfweh': 4541, 'borgt': 4542, 'vergnügt': 4543, 'unbesorgt': 4544, 'wunderlichen': 4545, 'paris': 4546, 'glase': 4547, 'kinderzahn': 4548, 'burschen': 4549, 'würmer': 4550, 'unzufrieden': 4551, 'marktschreyer': 4552, 'sinds': 4553, 'schraube': 4554, 'völkchen': 4555, 'kragen': 4556, 'gegrüßt': 4557, 'gegengruß': 4558, 'ansehend': 4559, 'hinkt': 4560, 'trunks': 4561, 'verwöhnter': 4562, 'rippach': 4563, 'aufgebrochen': 4564, 'gespeist': 4565, 'letztemal': 4566, 'gesprochen': 4567, 'vettern': 4568, 'wußt': 4569, 'aufgetragen': 4570, 'verstehts': 4571, 'pfiffiger': 4572, 'patron': 4573, 'warte': 4574, 'irrte': 4575, 'hörten': 4576, 'geübte': 4577, 'wölbung': 4578, 'wiederklingen': 4579, 'virtuos': 4580, 'nagelneues': 4581, 'spanien': 4582, 'weins': 4583, 'horcht': 4584, 'saubrer': 4585, 'vergeßt': 4586, 'einzuschärfen': 4587, 'genauste': 4588, 'mißt': 4589, 'falten': 4590, 'werfen': 4591, 'sammet': 4592, 'daran': 4593, 'geschwister': 4594, 'fraun': 4595, 'hofe': 4596, 'geplagt': 4597, 'zofe': 4598, 'gestochen': 4599, 'genagt': 4600, 'durften': 4601, 'jucken': 4602, 'ergehn': 4603, 'spitzt': 4604, 'tränke': 4605, 'beschweret': 4606, 'werthen': 4607, 'nehms': 4608, 'loben': 4609, 'judiciren': 4610, 'rheine': 4611, 'fässer': 4612, 'dahinten': 4613, 'körbchen': 4614, 'wünschet': 4615, 'schmecken': 4616, 'meynt': 4617, 'stell': 4618, 'aha': 4619, 'abzulecken': 4620, 'wählen': 4621, 'rheinwein': 4622, 'vaterland': 4623, 'verleiht': 4624, 'allerbesten': 4625, 'tischrand': 4626, 'verschafft': 4627, 'taschenspielersachen': 4628, 'champagner': 4629, 'mussirend': 4630, 'wachspropfenwachspfropfen': 4631, 'echter': 4632, 'deutscher': 4633, 'franzen': 4634, 'nähert': 4635, 'gestehn': 4636, 'sauren': 4637, 'echten': 4638, 'tokayer': 4639, 'gewagt': 4640, 'gefragt': 4641, 'nachdem': 4642, 'löcher': 4643, 'gebohrt': 4644, 'weinstock': 4645, 'ziegenbock': 4646, 'saftig': 4647, 'reben': 4648, 'hölzerne': 4649, 'genießt': 4650, 'verlangte': 4651, 'hütet': 4652, 'vergießt': 4653, 'kannibalisch': 4654, 'säuen': 4655, 'wohls': 4656, 'abzufahren': 4657, 'bestialität': 4658, 'unvorsichtig': 4659, 'besprechend': 4660, 'fegefeuer': 4661, 'bezahlt': 4662, 'hießen': 4663, 'sachte': 4664, 'weinfaß': 4665, 'besenstiel': 4666, 'grob': 4667, 'schläge': 4668, 'regnen': 4669, 'stoßt': 4670, 'vogelfrey': 4671, 'ernsthafter': 4672, 'gebild': 4673, 'verändern': 4674, 'welches': 4675, 'weinberge': 4676, 'siebeln': 4677, 'beibey': 4678, 'wechselseitig': 4679, 'spaße': 4680, 'sinke': 4681, 'kellerthüre': 4682, 'reiten': 4683, 'bleyschwer': 4684, 'tische': 4685, 'wendend': 4686, 'lug': 4687, 'däuchte': 4688, 'tränk': 4689, 'hexenküche': 4690, 'niedrigen': 4691, 'dampfe': 4692, 'meerkatze': 4693, 'sorgt': 4694, 'überläuft': 4695, 'meerkater': 4696, 'wärmt': 4697, 'wände': 4698, 'seltsamsten': 4699, 'hexenhausrath': 4700, 'ausgeschmückt': 4701, 'widersteht': 4702, 'zauberwesen': 4703, 'versprichst': 4704, 'genesen': 4705, 'wust': 4706, 'raserey': 4707, 'sudelköcherey': 4708, 'dreyßig': 4709, 'irgend': 4710, 'balsam': 4711, 'ausgefunden': 4712, 'dochdich': 4713, 'capitel': 4714, 'begib': 4715, 'fang': 4716, 'hacken': 4717, 'graben': 4718, 'erhalte': 4719, 'beschränkten': 4720, 'ernähre': 4721, 'ungemischter': 4722, 'leb': 4723, 'raub': 4724, 'acker': 4725, 'ärndest': 4726, 'düngen': 4727, 'achtzig': 4728, 'spaten': 4729, 'brauen': 4730, 'brücken': 4731, 'bauen': 4732, 'feine': 4733, 'zierliches': 4734, 'schmause': 4735, 'pflegt': 4736, 'schwärmen': 4737, 'pfoten': 4738, 'wärmen': 4739, 'findest': 4740, 'abgeschmackt': 4741, 'discours': 4742, 'bettelsuppen': 4743, 'publicum': 4744, 'würfle': 4745, 'bestellt': 4746, 'affe': 4747, 'lotto': 4748, 'meerkätzchen': 4749, 'kugel': 4750, 'gespielt': 4751, 'rollt': 4752, 'beständig': 4753, 'thon': 4754, 'holt': 4755, 'durchsehen': 4756, 'nähernd': 4757, 'alberne': 4758, 'tropf': 4759, 'unhöfliches': 4760, 'gestanden': 4761, 'genähert': 4762, 'zauberspiegel': 4763, 'leihe': 4764, 'schnellsten': 4765, 'gefild': 4766, 'wage': 4767, 'hingestreckten': 4768, 'himmeln': 4769, 'gescheidtes': 4770, 'schätzchen': 4771, 'bräutigam': 4772, 'heim': 4773, 'dehnend': 4774, 'spielend': 4775, 'throne': 4776, 'zepter': 4777, 'bewegungen': 4778, 'großem': 4779, 'leimen': 4780, 'ungeschickt': 4781, 'stücke': 4782, 'herumspringen': 4783, 'reimen': 4784, 'deutend': 4785, 'schwanken': 4786, 'glückt': 4787, 'obiger': 4788, 'stellung': 4789, 'aufrichtige': 4790, 'ausser': 4791, 'überzulaufen': 4792, 'grosse': 4793, 'entsetzlichem': 4794, 'gefahren': 4795, 'verdammtes': 4796, 'sau': 4797, 'versäumst': 4798, 'versengst': 4799, 'feuerpein': 4800, 'schaumlöffel': 4801, 'spritzt': 4802, 'winseln': 4803, 'umkehrt': 4804, 'töpfe': 4805, 'tact': 4806, 'aas': 4807, 'melodey': 4808, 'zurücktritt': 4809, 'gerippe': 4810, 'scheusal': 4811, 'zerschmettre': 4812, 'katzengeister': 4813, 'wamms': 4814, 'respect': 4815, 'versteckt': 4816, 'raben': 4817, 'cultur': 4818, 'beleckt': 4819, 'erstreckt': 4820, 'nordische': 4821, 'phantom': 4822, 'schweif': 4823, 'missen': 4824, 'bedien': 4825, 'falscher': 4826, 'waden': 4827, 'verbitt': 4828, 'fabelbuch': 4829, 'baron': 4830, 'cavalier': 4831, 'cavaliere': 4832, 'zweifelst': 4833, 'wapen': 4834, 'unanständige': 4835, 'gebärde': 4836, 'unmäßig': 4837, 'umzugehn': 4838, 'bekannten': 4839, 'ältste': 4840, 'bitten': 4841, 'doppeln': 4842, 'flasche': 4843, 'nasche': 4844, 'mindsten': 4845, 'unvorbereitet': 4846, 'sprüche': 4847, 'tasse': 4848, 'klingen': 4849, 'musik': 4850, 'meerkatzen': 4851, 'fackel': 4852, 'rasenden': 4853, 'abgeschmackteste': 4854, 'possen': 4855, 'strenger': 4856, 'emphase': 4857, 'hex': 4858, 'neun': 4859, 'keins': 4860, 'hexeneinmaleins': 4861, 'fieber': 4862, 'vollkommner': 4863, 'widerspruch': 4864, 'kluge': 4865, 'verbreiten': 4866, 'schwätzt': 4867, 'narrn': 4868, 'befassen': 4869, 'glaubt': 4870, 'müsse': 4871, 'narren': 4872, 'treffliche': 4873, 'graden': 4874, 'schluck': 4875, 'ceremonien': 4876, 'schenkt': 4877, 'löst': 4878, 'mög': 4879, 'schlückchen': 4880, 'walpurgis': 4881, 'würkung': 4882, 'nothwendig': 4883, 'transpiriren': 4884, 'inn': 4885, 'äußres': 4886, 'müßiggang': 4887, 'lehr': 4888, 'hernach': 4889, 'empfindest': 4890, 'innigem': 4891, 'cupido': 4892, 'frauenbild': 4893, 'muster': 4894, 'helenen': 4895, 'anzutragen': 4896, 'ungeleitet': 4897, 'sitt': 4898, 'tugendreich': 4899, 'schnippisch': 4900, 'wange': 4901, 'vergess': 4902, 'niederschlägt': 4903, 'geprägt': 4904, 'angebunden': 4905, 'entzücken': 4906, 'liederlich': 4907, 'blum': 4908, 'dünkelt': 4909, 'pflücken': 4910, 'lobesan': 4911, 'ruht': 4912, 'geschieden': 4913, 'geschöpfchen': 4914, 'verführen': 4915, 'sprecht': 4916, 'franzos': 4917, 'laßts': 4918, 'brimborium': 4919, 'püppchen': 4920, 'geknetet': 4921, 'zugerichtt': 4922, 'welsche': 4923, 'appetit': 4924, 'schimpf': 4925, 'allemal': 4926, 'einzunehmen': 4927, 'list': 4928, 'engelsschatz': 4929, 'ruheplatz': 4930, 'halstuch': 4931, 'strumpfband': 4932, 'förderlich': 4933, 'dienstlich': 4934, 'künftger': 4935, 'weiden': 4936, 'sorg': 4937, 'geschenk': 4938, 'schenken': 4939, 'reüssiren': 4940, 'vergrabnen': 4941, 'revidiren': 4942, 'reinliches': 4943, 'zöpfe': 4944, 'flechtend': 4945, 'aufbindend': 4946, 'keck': 4947, 'einigem': 4948, 'stillschweigen': 4949, 'herumspürend': 4950, 'aufschauend': 4951, 'willkommen': 4952, 'dämmerschein': 4953, 'heiligthum': 4954, 'durchwebst': 4955, 'ergreif': 4956, 'liebespein': 4957, 'lebst': 4958, 'athmet': 4959, 'zufriedenheit': 4960, 'armuth': 4961, 'seligkeit': 4962, 'ledernen': 4963, 'bette': 4964, 'offnen': 4965, 'väterthron': 4966, 'kinderwangen': 4967, 'ahnherrn': 4968, 'geküßt': 4969, 'füll': 4970, 'säuseln': 4971, 'mütterlich': 4972, 'teppich': 4973, 'kräuseln': 4974, 'göttergleich': 4975, 'hütte': 4976, 'himmelreich': 4977, 'bettvorhang': 4978, 'wonnegraus': 4979, 'säumen': 4980, 'bildetest': 4981, 'eingebornen': 4982, 'warmem': 4983, 'angefüllt': 4984, 'reinem': 4985, 'entwirkte': 4986, 'götterbild': 4987, 'hergeführt': 4988, 'innig': 4989, 'gerührt': 4990, 'armselger': 4991, 'zauberduft': 4992, 'drangs': 4993, 'liebestraum': 4994, 'träte': 4995, 'frevel': 4996, 'hingeschmolzen': 4997, 'hergenommen': 4998, 'stellts': 4999, 'vergehn': 5000, 'sächelchen': 5001, 'wahren': 5002, 'lüsternheit': 5003, 'tageszeit': 5004, 'weitre': 5005, 'sparen': 5006, 'geitzig': 5007, 'kratz': 5008, 'reib': 5009, 'händen': 5010, 'hörsal': 5011, 'stünd': 5012, 'physik': 5013, 'metaphysika': 5014, 'schwül': 5015, 'dumpfig': 5016, 'übern': 5017, 'auszieht': 5018, 'thule': 5019, 'sterbend': 5020, 'darüber': 5021, 'leert': 5022, 'gingen': 5023, 'daraus': 5024, 'zählt': 5025, 'städt': 5026, 'gönnt': 5027, 'königsmahle': 5028, 'hohem': 5029, 'vätersaale': 5030, 'zecher': 5031, 'lebensgluth': 5032, 'warf': 5033, 'fluth': 5034, 'thäten': 5035, 'eröffnet': 5036, 'einzuräumen': 5037, 'schmuckkästchen': 5038, 'wunderbar': 5039, 'drinne': 5040, 'brachts': 5041, 'pfand': 5042, 'lieh': 5043, 'schlüsselchen': 5044, 'edelfrau': 5045, 'feiertage': 5046, 'gehören': 5047, 'ohrring': 5048, 'läßts': 5049, 'erbarmen': 5050, 'spazirgang': 5051, 'verschmähten': 5052, 'ärgers': 5053, 'fluchen': 5054, 'kneipt': 5055, 'verschoben': 5056, 'kleidets': 5057, 'rasender': 5058, 'angeschafft': 5059, 'pfaff': 5060, 'fängts': 5061, 'schnuffelt': 5062, 'gebetbuch': 5063, 'riechts': 5064, 'möbel': 5065, 'profan': 5066, 'klar': 5067, 'befängt': 5068, 'zehrt': 5069, 'wollens': 5070, 'himmelsmanna': 5071, 'margretlein': 5072, 'schiefes': 5073, 'geschenkter': 5074, 'gaul': 5075, 'gottlos': 5076, 'hierher': 5077, 'gesinnt': 5078, 'überwindet': 5079, 'magen': 5080, 'aufgefressen': 5081, 'übergessen': 5082, 'kirch': 5083, 'verdauen': 5084, 'allgemeiner': 5085, 'jud': 5086, 'strich': 5087, 'spange': 5088, 'kett': 5089, 'wärens': 5090, 'pfifferling': 5091, 'obs': 5092, 'korb': 5093, 'nüsse': 5094, 'erbaut': 5095, 'unruhvoll': 5096, 'ders': 5097, 'kummer': 5098, 'kinderspiel': 5099, 'richts': 5100, 'häng': 5101, 'gnädger': 5102, 'verliebter': 5103, 'verpufft': 5104, 'verzeihs': 5105, 'stracks': 5106, 'weint': 5107, 'todtenschein': 5108, 'gretelchen': 5109, 'kniee': 5110, 'ebenholz': 5111, 'reicher': 5112, 'thäts': 5113, 'glückselge': 5114, 'creatur': 5115, 'leg': 5116, 'spazier': 5117, 'spiegelglas': 5118, 'anlaß': 5119, 'kettchen': 5120, 'perle': 5121, 'ohr': 5122, 'vorhängel': 5123, 'guckend': 5124, 'verzeihn': 5125, 'erbeten': 5126, 'ehrerbietig': 5127, 'margareten': 5128, 'schwerdlein': 5129, 'vornehmen': 5130, 'genommen': 5131, 'nachmittage': 5132, 'denk': 5133, 'scharf': 5134, 'freut': 5135, 'verlange': 5136, 'frohere': 5137, 'grüßen': 5138, 'vergeh': 5139, 'traurige': 5140, 'verlust': 5141, 'tode': 5142, 'antonius': 5143, 'wohlgeweihten': 5144, 'ruhebette': 5145, 'messen': 5146, 'taschen': 5147, 'leer': 5148, 'schaustück': 5149, 'säckels': 5150, 'spart': 5151, 'angedenken': 5152, 'aufbewahrt': 5153, 'hungert': 5154, 'bettelt': 5155, 'madam': 5156, 'verzettelt': 5157, 'bereute': 5158, 'fehler': 5159, 'bejammerte': 5160, 'unglücklich': 5161, 'requiem': 5162, 'wäret': 5163, 'derweil': 5164, 'galan': 5165, 'größten': 5166, 'himmelsgaben': 5167, 'landes': 5168, 'gibt': 5169, 'sterbebette': 5170, 'mist': 5171, 'halbgefaultem': 5172, 'starb': 5173, 'fand': 5174, 'hassen': 5175, 'erinnerung': 5176, 'tödtet': 5177, 'vergäb': 5178, 'weinend': 5179, 'vergeben': 5180, 'lügt': 5181, 'grabs': 5182, 'fabelte': 5183, 'kenner': 5184, 'allerweitsten': 5185, 'plackerey': 5186, 'malta': 5187, 'betet': 5188, 'brünstig': 5189, 'günstig': 5190, 'türkisch': 5191, 'fahrzeug': 5192, 'fing': 5193, 'sultans': 5194, 'führte': 5195, 'tapferkeit': 5196, 'empfing': 5197, 'gebührte': 5198, 'wohlgemessnes': 5199, 'vergraben': 5200, 'winde': 5201, 'napel': 5202, 'spazirte': 5203, 'treus': 5204, 'spürte': 5205, 'hindern': 5206, 'betraurt': 5207, 'züchtig': 5208, 'visirte': 5209, 'unterweil': 5210, 'herziger': 5211, 'närrchen': 5212, 'allzuviele': 5213, 'würfelspiel': 5214, 'nachgesehen': 5215, 'beding': 5216, 'hielte': 5217, 'unschuldigs': 5218, 'wann': 5219, 'gestorben': 5220, 'wochenblättchen': 5221, 'zweyer': 5222, 'zeugen': 5223, 'allerwegs': 5224, 'richter': 5225, 'jungfrau': 5226, 'knab': 5227, 'fräuleins': 5228, 'erweist': 5229, 'schamroth': 5230, 'könige': 5231, 'hinterm': 5232, 'mehpistopheles': 5233, 'fördern': 5234, 'ah': 5235, 'kurzer': 5236, 'marthen': 5237, 'auserlesen': 5238, 'zigeunerwesen': 5239, 'gültig': 5240, 'ihres': 5241, 'ehherrn': 5242, 'ausgereckte': 5243, 'sancta': 5244, 'simplicitas': 5245, 'bezeugt': 5246, 'wärt': 5247, 'erstemal': 5248, 'abgelegt': 5249, 'drin': 5250, 'definitionen': 5251, 'frecher': 5252, 'gestehen': 5253, 'schwerdleins': 5254, 'gewußt': 5255, 'sophiste': 5256, 'seelenlieb': 5257, 'schwören': 5258, 'ewiger': 5259, 'überallmächtgem': 5260, 'empfinde': 5261, 'gewühl': 5262, 'schweife': 5263, 'greife': 5264, 'unendlich': 5265, 'teuflisch': 5266, 'lügenspiel': 5267, 'lunge': 5268, 'behalten': 5269, 'behälts': 5270, 'schwätzens': 5271, 'ueberdruß': 5272, 'vorzüglich': 5273, 'faustens': 5274, 'spazirend': 5275, 'schont': 5276, 'beschämen': 5277, 'gütigkeit': 5278, 'fürlieb': 5279, 'erfahrnen': 5280, 'unterhalten': 5281, 'unterhält': 5282, 'incommodirt': 5283, 'rauh': 5284, 'reist': 5285, 'verläßt': 5286, 'raschen': 5287, 'kömmt': 5288, 'schleifen': 5289, 'grausen': 5290, 'werther': 5291, 'berathet': 5292, 'geläufig': 5293, 'freunde': 5294, 'häufig': 5295, 'verständiger': 5296, 'verständig': 5297, 'eitelkeit': 5298, 'kurzsinn': 5299, 'einfalt': 5300, 'erkennt': 5301, 'demuth': 5302, 'niedrigkeit': 5303, 'liebevoll': 5304, 'austheilenden': 5305, 'augenblickchen': 5306, 'wirthschaft': 5307, 'versehen': 5308, 'fegen': 5309, 'stricken': 5310, 'nähn': 5311, 'spat': 5312, 'accurat': 5313, 'einzuschränken': 5314, 'hinterließ': 5315, 'vermögen': 5316, 'häuschen': 5317, 'gärtchen': 5318, 'schwesterchen': 5319, 'übernähm': 5320, 'glich': 5321, 'vaters': 5322, 'erholte': 5323, 'würmchen': 5324, 'erzog': 5325, 'milch': 5326, 'zappelte': 5327, 'reinste': 5328, 'durfte': 5329, 'schwieg': 5330, 'aufstehn': 5331, 'tänzelnd': 5332, 'waschtrog': 5333, 'markt': 5334, 'muthig': 5335, 'schwerlich': 5336, 'käme': 5337, 'gefunden': 5338, 'irgendwo': 5339, 'gebunden': 5340, 'sprichwort': 5341, 'herd': 5342, 'braves': 5343, 'perlen': 5344, 'höflich': 5345, 'aufgenommen': 5346, 'versteh': 5347, 'gütig': 5348, 'kanntest': 5349, 'saht': 5350, 'verzeihst': 5351, 'frechheit': 5352, 'unterfangen': 5353, 'jüngst': 5354, 'bestürzt': 5355, 'übels': 5356, 'betragen': 5357, 'freches': 5358, 'unanständiges': 5359, 'anzuwandeln': 5360, 'handeln': 5361, 'begonnte': 5362, 'pflückt': 5363, 'sternblume': 5364, 'zupft': 5365, 'strauß': 5366, 'rupft': 5367, 'murmelt': 5368, 'murmelst': 5369, 'himmelsangesicht': 5370, 'lezteletzte': 5371, 'ausrupfend': 5372, 'blumenwort': 5373, 'götterausspruch': 5374, 'überläufts': 5375, 'schaudre': 5376, 'unaussprechlich': 5377, 'hinzugeben': 5378, 'kommend': 5379, 'bät': 5380, 'nachbarn': 5381, 'gered': 5382, 'aufgeflogen': 5383, 'muthwillge': 5384, 'sommervögel': 5385, 'gartenhäuschen': 5386, 'fingerspitze': 5387, 'ritze': 5388, 'neckst': 5389, 'treff': 5390, 'gebend': 5391, 'stampfend': 5392, 'geleiten': 5393, 'ade': 5394, 'baldig': 5395, 'unwissend': 5396, 'begreife': 5397, 'findt': 5398, 'erhabner': 5399, 'bat': 5400, 'zugewendet': 5401, 'königreich': 5402, 'staunenden': 5403, 'erlaubst': 5404, 'vergönnest': 5405, 'freunds': 5406, 'lehrst': 5407, 'brüder': 5408, 'walde': 5409, 'braust': 5410, 'knarrt': 5411, 'riesenfichte': 5412, 'stürzend': 5413, 'nachbaräste': 5414, 'nachbarstämme': 5415, 'quetschend': 5416, 'streift': 5417, 'fall': 5418, 'dumpf': 5419, 'donnert': 5420, 'zeigst': 5421, 'besänftigend': 5422, 'silberne': 5423, 'lindern': 5424, 'betrachtung': 5425, 'vollkommnes': 5426, 'empfind': 5427, 'gefährten': 5428, 'frech': 5429, 'erniedrigt': 5430, 'worthauch': 5431, 'facht': 5432, 'wildes': 5433, 'tauml': 5434, 'verschmacht': 5435, 'hättest': 5436, 'ernste': 5437, 'unhold': 5438, 'barsch': 5439, 'ennüyirt': 5440, 'kribskrabs': 5441, 'imagination': 5442, 'curirt': 5443, 'erdball': 5444, 'abspazirt': 5445, 'höhlen': 5446, 'felsenritzen': 5447, 'versitzen': 5448, 'schlurfst': 5449, 'dumpfem': 5450, 'triefendem': 5451, 'gestein': 5452, 'kröte': 5453, 'lebenskraft': 5454, 'wandel': 5455, 'oede': 5456, 'ahnden': 5457, 'wärest': 5458, 'gönnen': 5459, 'überirdisches': 5460, 'gebirgen': 5461, 'wonniglich': 5462, 'aufschwellen': 5463, 'ahndungsdrang': 5464, 'durchwühlen': 5465, 'tagewerk': 5466, 'stolzer': 5467, 'liebewonniglich': 5468, 'überfließen': 5469, 'intuition': 5470, 'gesittet': 5471, 'keuschen': 5472, 'keusche': 5473, 'gelegentlich': 5474, 'vorzulügen': 5475, 'abgetrieben': 5476, 'währt': 5477, 'aufgerieben': 5478, 'graus': 5479, 'dadrinne': 5480, 'übermächtig': 5481, 'liebeswuth': 5482, 'übergeflossen': 5483, 'geschmolznen': 5484, 'gegossen': 5485, 'seicht': 5486, 'anstatt': 5487, 'wäldern': 5488, 'thronen': 5489, 'affenjunge': 5490, 'belohnen': 5491, 'stadtmauer': 5492, 'vöglein': 5493, 'tagelang': 5494, 'munter': 5495, 'meist': 5496, 'betrübt': 5497, 'ausgeweint': 5498, 'verliebt': 5499, 'gelt': 5500, 'fange': 5501, 'verruchter': 5502, 'hebe': 5503, 'begier': 5504, 'verrückten': 5505, 'seyst': 5506, 'beneide': 5507, 'berühren': 5508, 'beneidet': 5509, 'zwillingspaar': 5510, 'rosen': 5511, 'entfliehe': 5512, 'schimpft': 5513, 'bub': 5514, 'erkannte': 5515, 'edelsten': 5516, 'beruf': 5517, 'himmelsfreud': 5518, 'erwarmen': 5519, 'unbehauste': 5520, 'unmensch': 5521, 'wassersturz': 5522, 'brauste': 5523, 'begierig': 5524, 'abgrund': 5525, 'kindlich': 5526, 'hüttchen': 5527, 'alpenfeld': 5528, 'häusliches': 5529, 'umfangen': 5530, 'gottverhaßte': 5531, 'faßte': 5532, 'untergraben': 5533, 'mußtest': 5534, 'verkürzen': 5535, 'mags': 5536, 'geschick': 5537, 'zusammenstürzen': 5538, 'tröste': 5539, 'köpfchen': 5540, 'ausgang': 5541, 'eingeteufelt': 5542, 'abgeschmackters': 5543, 'stube': 5544, 'spinnrade': 5545, 'vergällt': 5546, 'zerstückt': 5547, 'mundes': 5548, 'lächeln': 5549, 'zauberfluß': 5550, 'vergehen': 5551, 'marthens': 5552, 'versprich': 5553, 'religion': 5554, 'sacramente': 5555, 'verlangen': 5556, 'priester': 5557, 'antwort': 5558, 'spott': 5559, 'frager': 5560, 'mißhör': 5561, 'unterwinden': 5562, 'allumfasser': 5563, 'allerhalter': 5564, 'erhält': 5565, 'wölbt': 5566, 'dadroben': 5567, 'hierunten': 5568, 'blickend': 5569, 'aug': 5570, 'auge': 5571, 'ewigem': 5572, 'sichtbar': 5573, 'erfüll': 5574, 'nenn': 5575, 'nenns': 5576, 'schall': 5577, 'umnebelnd': 5578, 'himmelsgluth': 5579, 'sagens': 5580, 'sprache': 5581, 'möchts': 5582, 'schief': 5583, 'christenthum': 5584, 'innrer': 5585, 'widrig': 5586, 'puppe': 5587, 'fürcht': 5588, 'sehne': 5589, 'unrecht': 5590, 'käuze': 5591, 'spöttisch': 5592, 'ergrimmt': 5593, 'antheil': 5594, 'stirn': 5595, 'schnürt': 5596, 'übermannt': 5597, 'meyn': 5598, 'antipathie': 5599, 'betroffen': 5600, 'umhüllen': 5601, 'tiefem': 5602, 'schlaf': 5603, 'hoffentlich': 5604, 'grasaff': 5605, 'spionirt': 5606, 'ausführlich': 5607, 'katechisirt': 5608, 'mädels': 5609, 'interessirt': 5610, 'schlicht': 5611, 'altem': 5612, 'duckt': 5613, 'machend': 5614, 'quäle': 5615, 'übersinnlicher': 5616, 'sinnlicher': 5617, 'mägdelein': 5618, 'nasführet': 5619, 'spottgeburt': 5620, 'dreck': 5621, 'physiognomie': 5622, 'meisterlich': 5623, 'mäskchen': 5624, 'weissagt': 5625, 'verborgnen': 5626, 'genie': 5627, 'dichs': 5628, 'krügen': 5629, 'bärbelchen': 5630, 'bethört': 5631, 'vornehmthun': 5632, 'füttert': 5633, 'ißt': 5634, 'ergangen': 5635, 'spaziren': 5636, 'dorf': 5637, 'tanzplatz': 5638, 'curtesirt': 5639, 'pastetchen': 5640, 'bildt': 5641, 'ehrlos': 5642, 'schämen': 5643, 'anzunehmen': 5644, 'gekos': 5645, 'geschleck': 5646, 'blümchen': 5647, 'bedauerst': 5648, 'spinnen': 5649, 'nachts': 5650, 'hinunterließ': 5651, 'buhlen': 5652, 'thürbank': 5653, 'ducken': 5654, 'sünderhemdchen': 5655, 'kirchbuß': 5656, 'narr': 5657, 'flinker': 5658, 'anderwärts': 5659, 'kränzel': 5660, 'reißen': 5661, 'häckerling': 5662, 'schmählen': 5663, 'schwärzts': 5664, 'zwinger': 5665, 'mauerhöhle': 5666, 'andachtsbild': 5667, 'mater': 5668, 'dolorosa': 5669, 'blumenkrüge': 5670, 'davor': 5671, 'krüge': 5672, 'schwert': 5673, 'sohnes': 5674, 'seufzer': 5675, 'schickst': 5676, 'wühlet': 5677, 'banget': 5678, 'verlanget': 5679, 'wehe': 5680, 'zerbricht': 5681, 'bethaut': 5682, 'frühen': 5683, 'hell': 5684, 'gelag': 5685, 'berühmen': 5686, 'gepriesen': 5687, 'vollem': 5688, 'verschwemmt': 5689, 'aufgestemmt': 5690, 'schwadroniren': 5691, 'streiche': 5692, 'lächelnd': 5693, 'kriege': 5694, 'trauten': 5695, 'gretel': 5696, 'schwester': 5697, 'reicht': 5698, 'kling': 5699, 'schrieen': 5700, 'zier': 5701, 'saßen': 5702, 'lober': 5703, 'auszuraufen': 5704, 'wänden': 5705, 'stichelreden': 5706, 'naserümpfen': 5707, 'schurke': 5708, 'beschimpfen': 5709, 'schuldner': 5710, 'zufallswörtchen': 5711, 'zusammenschmeißen': 5712, 'felle': 5713, 'sakristey': 5714, 'aufwärts': 5715, 'lämpchens': 5716, 'flämmert': 5717, 'schwächer': 5718, 'nächtig': 5719, 'kätzlein': 5720, 'schmächtig': 5721, 'feuerleitern': 5722, 'leis': 5723, 'tugendlich': 5724, 'diebsgelüst': 5725, 'rammeley': 5726, 'spukt': 5727, 'übermorgen': 5728, 'dorthinten': 5729, 'flimmern': 5730, 'kesselchen': 5731, 'herauszuheben': 5732, 'schielte': 5733, 'neulich': 5734, 'löwenthaler': 5735, 'perlenschnüren': 5736, 'kunststück': 5737, 'sing': 5738, 'moralisch': 5739, 'cathrinchen': 5740, 'frühem': 5741, 'tagesblicke': 5742, 'dinger': 5743, 'lockst': 5744, 'vermaledeyter': 5745, 'rattenfänger': 5746, 'instrument': 5747, 'sänger': 5748, 'schedelspalten': 5749, 'gewichen': 5750, 'flederwisch': 5751, 'zugestoßen': 5752, 'lahm': 5753, 'stoß': 5754, 'lümmel': 5755, 'mörderlich': 5756, 'polizey': 5757, 'blutbann': 5758, 'abzufinden': 5759, 'schilt': 5760, 'rauft': 5761, 'schreit': 5762, 'allmächtiger': 5763, 'sterbe': 5764, 'bälder': 5765, 'heult': 5766, 'klagt': 5767, 'gescheidt': 5768, 'fingst': 5769, 'mehre': 5770, 'dutzend': 5771, 'schleyer': 5772, 'ermorden': 5773, 'häßlicher': 5774, 'tageslicht': 5775, 'brave': 5776, 'bürgersleut': 5777, 'angesteckten': 5778, 'leichen': 5779, 'metze': 5780, 'seitab': 5781, 'verzagen': 5782, 'spitzenkragen': 5783, 'wohlbehagen': 5784, 'finstre': 5785, 'jammerecken': 5786, 'krüpel': 5787, 'verstecken': 5788, 'vermaledeyt': 5789, 'befehlt': 5790, 'lästrung': 5791, 'kupplerisches': 5792, 'vergebung': 5793, 'höllenpein': 5794, 'sprachst': 5795, 'schwersten': 5796, 'herzensstoß': 5797, 'todesschlaf': 5798, 'stirbt': 5799, 'amt': 5800, 'vielem': 5801, 'tratst': 5802, 'vergriffnen': 5803, 'büchelchen': 5804, 'gebete': 5805, 'lalltest': 5806, 'kinderspiele': 5807, 'missethat': 5808, 'betst': 5809, 'hinüberschlief': 5810, 'quillend': 5811, 'ängstet': 5812, 'irae': 5813, 'illa': 5814, 'solvet': 5815, 'saeclum': 5816, 'favilla': 5817, 'orgelton': 5818, 'posaune': 5819, 'aschenruh': 5820, 'flammenqualen': 5821, 'aufgeschaffen': 5822, 'bebt': 5823, 'athem': 5824, 'versetzte': 5825, 'löste': 5826, 'judex': 5827, 'ergo': 5828, 'sedebit': 5829, 'quidquid': 5830, 'latet': 5831, 'adparebit': 5832, 'nil': 5833, 'inultum': 5834, 'remanebit': 5835, 'mauernpfeiler': 5836, 'verbirg': 5837, 'sünd': 5838, 'quem': 5839, 'patronum': 5840, 'rogaturus': 5841, 'vix': 5842, 'justus': 5843, 'sit': 5844, 'securus': 5845, 'verklärte': 5846, 'schauerts': 5847, 'ohnmacht': 5848, 'harzgebirg': 5849, 'gegend': 5850, 'schirke': 5851, 'verlangst': 5852, 'besenstiele': 5853, 'allerderbsten': 5854, 'beinen': 5855, 'genügt': 5856, 'knotenstock': 5857, 'verkürzt': 5858, 'labyrinth': 5859, 'hinzuschleichen': 5860, 'ersteigen': 5861, 'sprudelnd': 5862, 'würzt': 5863, 'frühling': 5864, 'birken': 5865, 'fichte': 5866, 'winterlich': 5867, 'frost': 5868, 'traurig': 5869, 'unvollkommne': 5870, 'scheibe': 5871, 'monds': 5872, 'später': 5873, 'rennt': 5874, 'erlaub': 5875, 'fodern': 5876, 'lodern': 5877, 'leichtes': 5878, 'naturell': 5879, 'zickzack': 5880, 'denkts': 5881, 'nachzuahmen': 5882, 'teufels': 5883, 'blas': 5884, 'flackerleben': 5885, 'merke': 5886, 'zaubertoll': 5887, 'weisen': 5888, 'wechselgesang': 5889, 'zaubersphäre': 5890, 'eingegangen': 5891, 'öden': 5892, 'räumen': 5893, 'bäumen': 5894, 'klippen': 5895, 'bücken': 5896, 'felsennasen': 5897, 'schnarchen': 5898, 'blasen': 5899, 'steine': 5900, 'rasen': 5901, 'eilet': 5902, 'liebesklage': 5903, 'himmelstage': 5904, 'echo': 5905, 'hallet': 5906, 'uhu': 5907, 'kauz': 5908, 'kibitz': 5909, 'häher': 5910, 'molche': 5911, 'gesträuche': 5912, 'dicke': 5913, 'bäuche': 5914, 'schlangen': 5915, 'sande': 5916, 'belebten': 5917, 'derben': 5918, 'masern': 5919, 'steckenstrecken': 5920, 'polypenfasern': 5921, 'tausendfärbig': 5922, 'schaarenweise': 5923, 'funkenwürmer': 5924, 'schwärmezügen': 5925, 'verwirrenden': 5926, 'geleite': 5927, 'schneiden': 5928, 'lichter': 5929, 'mehren': 5930, 'blähen': 5931, 'mittelgipfel': 5932, 'seltsam': 5933, 'glimmert': 5934, 'gründe': 5935, 'morgenröthlich': 5936, 'abgrunds': 5937, 'wittert': 5938, 'dampf': 5939, 'schwaden': 5940, 'zarter': 5941, 'strecke': 5942, 'ecke': 5943, 'vereinzelt': 5944, 'sprühen': 5945, 'funken': 5946, 'ausgestreuter': 5947, 'felsenwand': 5948, 'feste': 5949, 'pallast': 5950, 'windsbraut': 5951, 'schlägen': 5952, 'felsens': 5953, 'rippen': 5954, 'gruft': 5955, 'verdichtet': 5956, 'wälder': 5957, 'kracht': 5958, 'aufgescheucht': 5959, 'eulen': 5960, 'splittern': 5961, 'säulen': 5962, 'grüner': 5963, 'palläste': 5964, 'girren': 5965, 'aeste': 5966, 'stämme': 5967, 'mächtiges': 5968, 'dröhnen': 5969, 'knarren': 5970, 'gähnen': 5971, 'fürchterlich': 5972, 'verworrenen': 5973, 'falle': 5974, 'krachen': 5975, 'übertrümmerten': 5976, 'klüfte': 5977, 'zischen': 5978, 'hörst': 5979, 'entlang': 5980, 'wüthender': 5981, 'zaubergesang': 5982, 'gelb': 5983, 'hauf': 5984, 'urian': 5985, 'f': 5986, 'tfarzt': 5987, 'st': 5988, 'tstinckt': 5989, 'reitet': 5990, 'mutterschwein': 5991, 'demdenn': 5992, 'gebürt': 5993, 'tüchtig': 5994, 'hexenhauf': 5995, 'uebern': 5996, 'ilsenstein': 5997, 'eule': 5998, 'nest': 5999, 'fahre': 6000, 'reitst': 6001, 'geschunden': 6002, 'wunden': 6003, 'toller': 6004, 'kratzt': 6005, 'erstickt': 6006, 'platzt': 6007, 'hexenmeister': 6008, 'halbes': 6009, 'schleichen': 6010, 'schneck': 6011, 'hälfte': 6012, 'eilen': 6013, 'sprunge': 6014, 'felsensee': 6015, 'waschen': 6016, 'blank': 6017, 'unfruchtbar': 6018, 'schweigt': 6019, 'zauberchor': 6020, 'feuerfunken': 6021, 'felsenspalte': 6022, 'erreichen': 6023, 'verlorner': 6024, 'halbhexe': 6025, 'tripple': 6026, 'salbe': 6027, 'lumpen': 6028, 'segel': 6029, 'trog': 6030, 'flog': 6031, 'streichet': 6032, 'hexenheit': 6033, 'stößt': 6034, 'ruscht': 6035, 'klappert': 6036, 'zischt': 6037, 'plappert': 6038, 'hexenelement': 6039, 'getrennt': 6040, 'hingerissen': 6041, 'hausrecht': 6042, 'voland': 6043, 'satz': 6044, 'gedräng': 6045, 'entweichen': 6046, 'besondrem': 6047, 'sträuchen': 6048, 'schlupfen': 6049, 'widerspruchs': 6050, 'wandlen': 6051, 'beliebig': 6052, 'hieselbst': 6053, 'isoliren': 6054, 'muntrer': 6055, 'klub': 6056, 'wirbelrauch': 6057, 'knüpft': 6058, 'hausen': 6059, 'hergebracht': 6060, 'hexchen': 6061, 'blos': 6062, 'verhüllen': 6063, 'meinetwillen': 6064, 'geschnarr': 6065, 'gewöhnen': 6066, 'tret': 6067, 'sagst': 6068, 'schwazt': 6069, 'kocht': 6070, 'einzuführen': 6071, 'zaubrer': 6072, 'produziren': 6073, 'incognito': 6074, 'galatag': 6075, 'orden': 6076, 'knieband': 6077, 'zeichnet': 6078, 'schnecke': 6079, 'herangekrochen': 6080, 'tastenden': 6081, 'abgerochen': 6082, 'verläugn': 6083, 'werber': 6084, 'einigen': 6085, 'verglimmende': 6086, 'kohlen': 6087, 'mitte': 6088, 'fände': 6089, 'saus': 6090, 'umzirkt': 6091, 'jugendbraus': 6092, 'nationen': 6093, 'allzuweit': 6094, 'lobe': 6095, 'galten': 6096, 'parvenü': 6097, 'sollten': 6098, 'erhalten': 6099, 'wollten': 6100, 'autor': 6101, 'überhaupt': 6102, 'schrift': 6103, 'mäßig': 6104, 'klugem': 6105, 'inhalt': 6106, 'naseweis': 6107, 'jüngsten': 6108, 'gereift': 6109, 'hexenberg': 6110, 'ersteige': 6111, 'fäßchen': 6112, 'trödelhexe': 6113, 'waaren': 6114, 'dahier': 6115, 'tüchtgen': 6116, 'gereicht': 6117, 'geflossen': 6118, 'kelch': 6119, 'gesunden': 6120, 'verzehrend': 6121, 'heißes': 6122, 'ergossen': 6123, 'verführt': 6124, 'schwerdt': 6125, 'gebrochen': 6126, 'hinterrücks': 6127, 'gegenmann': 6128, 'durchstochen': 6129, 'verleg': 6130, 'vergesse': 6131, 'heiß': 6132, 'schieben': 6133, 'geschoben': 6134, 'lilith': 6135, 'adams': 6136, 'prangt': 6137, 'erlangt': 6138, 'gesprungen': 6139, 'apfelbaum': 6140, 'aepfel': 6141, 'glänzten': 6142, 'reizten': 6143, 'stieg': 6144, 'aepfelchen': 6145, 'paradiese': 6146, 'wüsten': 6147, 'gespaltnen': 6148, 'ungeheures': 6149, 'biete': 6150, 'scheut': 6151, 'untersteht': 6152, 'bewiesen': 6153, 'ordentlichen': 6154, 'ball': 6155, 'tanzen': 6156, 'beschwätzen': 6157, 'ärgert': 6158, 'wolltet': 6159, 'hieß': 6160, 'begrüßen': 6161, 'aufgeklärt': 6162, 'teufelspack': 6163, 'regel': 6164, 'dennoch': 6165, 'spukts': 6166, 'tegel': 6167, 'hinausgekehrt': 6168, 'sags': 6169, 'geistesdespotismus': 6170, 'exerziren': 6171, 'fortgetanzt': 6172, 'nehm': 6173, 'bezwingen': 6174, 'pfütze': 6175, 'soulagirt': 6176, 'blutegel': 6177, 'steiß': 6178, 'ergötzen': 6179, 'kurirt': 6180, 'getreten': 6181, 'gesange': 6182, 'sprang': 6183, 'mäuschen': 6184, 'schäferstunde': 6185, 'mephisto': 6186, 'blasses': 6187, 'schiebt': 6188, 'geschloßnen': 6189, 'zauberbild': 6190, 'leblos': 6191, 'idol': 6192, 'erstarrt': 6193, 'verkehrt': 6194, 'meduse': 6195, 'geboten': 6196, 'verführter': 6197, 'sonderbar': 6198, 'schnürchen': 6199, 'schmücken': 6200, 'breiter': 6201, 'messerrücken': 6202, 'ebenfalls': 6203, 'perseus': 6204, 'hats': 6205, 'abgeschlagen': 6206, 'hügelchen': 6207, 'prater': 6208, 'servibilis': 6209, 'soviel': 6210, 'allhier': 6211, 'dilettant': 6212, 'spielens': 6213, 'verschwinde': 6214, 'dilettirts': 6215, 'vorhang': 6216, 'aufzuziehn': 6217, 'walpurgisnachtstraum': 6218, 'oberons': 6219, 'titanias': 6220, 'intermezzo': 6221, 'theatermeister': 6222, 'ruhen': 6223, 'miedings': 6224, 'wackre': 6225, 'feuchtes': 6226, 'scene': 6227, 'herold': 6228, 'solln': 6229, 'funfzig': 6230, 'streit': 6231, 'zeigts': 6232, 'verbunden': 6233, 'queer': 6234, 'schleift': 6235, 'hinterher': 6236, 'fratzen': 6237, 'gatten': 6238, 'lernens': 6239, 'titania': 6240, 'schmollt': 6241, 'grillt': 6242, 'behende': 6243, 'tutti': 6244, 'fortissimo': 6245, 'anverwandten': 6246, 'solo': 6247, 'dudelsack': 6248, 'seifenblase': 6249, 'schneckeschnickeschnack': 6250, 'stumpfe': 6251, 'spinnenfuß': 6252, 'krötenbauch': 6253, 'flügelchen': 6254, 'wichtchen': 6255, 'thierchen': 6256, 'gedichtchen': 6257, 'sprung': 6258, 'honigthau': 6259, 'düfte': 6260, 'trippelst': 6261, 'maskeradenspott': 6262, 'orthodox': 6263, 'außer': 6264, 'griechenlands': 6265, 'nordischer': 6266, 'ergreife': 6267, 'skizzenweise': 6268, 'bereite': 6269, 'italiänschen': 6270, 'purist': 6271, 'geludert': 6272, 'gepudert': 6273, 'puder': 6274, 'graue': 6275, 'weibchen': 6276, 'derbes': 6277, 'leibchen': 6278, 'matrone': 6279, 'maulen': 6280, 'zart': 6281, 'verfaulen': 6282, 'umschwärmt': 6283, 'nackte': 6284, 'tacte': 6285, 'wünschen': 6286, 'lauter': 6287, 'bräute': 6288, 'junggesellen': 6289, 'hoffnungsvollsten': 6290, 'verschlingen': 6291, 'behendem': 6292, 'xenien': 6293, 'insekten': 6294, 'scharfen': 6295, 'scheren': 6296, 'papa': 6297, 'hennings': 6298, 'naiv': 6299, 'hätten': 6300, 'musaget': 6301, 'anzuführen': 6302, 'cidevant': 6303, 'genius': 6304, 'deutsche': 6305, 'parnaß': 6306, 'steife': 6307, 'schnopert': 6308, 'schnopern': 6309, 'klaren': 6310, 'fischen': 6311, 'mischen': 6312, 'weltkind': 6313, 'vehikel': 6314, 'conventikel': 6315, 'trommeln': 6316, 'unisonen': 6317, 'dommeln': 6318, 'dogmatiker': 6319, 'schreyn': 6320, 'critik': 6321, 'gäbs': 6322, 'idealist': 6323, 'herrisch': 6324, 'närrisch': 6325, 'realist': 6326, 'baß': 6327, 'stehe': 6328, 'erstenmal': 6329, 'supernaturalist': 6330, 'freue': 6331, 'skeptiker': 6332, 'flämmchen': 6333, 'glaubn': 6334, 'reimt': 6335, 'gewandten': 6336, 'sanssouci': 6337, 'heer': 6338, 'geschöpfen': 6339, 'köpfen': 6340, 'unbehülflichen': 6341, 'bissen': 6342, 'erschranzt': 6343, 'befohlen': 6344, 'unsere': 6345, 'schuhe': 6346, 'durchgetanzt': 6347, 'nackten': 6348, 'sohlen': 6349, 'irrlichter': 6350, 'sumpfe': 6351, 'entstanden': 6352, 'glänzenden': 6353, 'galanten': 6354, 'sternschnuppe': 6355, 'schoß': 6356, 'feuerscheine': 6357, 'liege': 6358, 'massiven': 6359, 'ringsherum': 6360, 'gräschen': 6361, 'tretet': 6362, 'mastig': 6363, 'elephantenkälber': 6364, 'plumpst': 6365, 'derbe': 6366, 'rosenhügel': 6367, 'pianissimo': 6368, 'wolkenzug': 6369, 'nebelflor': 6370, 'erhellen': 6371, 'verzweifelnd': 6372, 'verirrt': 6373, 'missethäterinn': 6374, 'entsetzlichen': 6375, 'qualen': 6376, 'eingesperrt': 6377, 'unselige': 6378, 'verräthrischer': 6379, 'nichtswürdiger': 6380, 'verheimlicht': 6381, 'wälze': 6382, 'ingrimmend': 6383, 'trutze': 6384, 'unerträgliche': 6385, 'unwiederbringlichen': 6386, 'richtenden': 6387, 'gefühllosen': 6388, 'wiegst': 6389, 'abgeschmackten': 6390, 'zerstreuungen': 6391, 'verbirgst': 6392, 'wachsenden': 6393, 'hülflos': 6394, 'abscheuliches': 6395, 'unthier': 6396, 'unendlicher': 6397, 'hundsgestalt': 6398, 'nächtlicher': 6399, 'weiseweile': 6400, 'herzutrotten': 6401, 'harmlosen': 6402, 'kollern': 6403, 'niederstürzenden': 6404, 'schultern': 6405, 'wandl': 6406, 'lieblingsbildung': 6407, 'krieche': 6408, 'trete': 6409, 'verworfnen': 6410, 'menschenseele': 6411, 'elendes': 6412, 'versank': 6413, 'genugthat': 6414, 'windenden': 6415, 'todesnoth': 6416, 'verzeihenden': 6417, 'wühlt': 6418, 'einzigen': 6419, 'tausenden': 6420, 'gränze': 6421, 'witzes': 6422, 'überschnappt': 6423, 'gemeinschaft': 6424, 'durchführen': 6425, 'schwindel': 6426, 'drangen': 6427, 'fletsche': 6428, 'gefräßigen': 6429, 'zähne': 6430, 'eckelts': 6431, 'herrlicher': 6432, 'würdigtest': 6433, 'kennest': 6434, 'schandgesellen': 6435, 'schmieden': 6436, 'letzt': 6437, 'endigst': 6438, 'gräßlichsten': 6439, 'jahrtausende': 6440, 'rächers': 6441, 'wild': 6442, 'greifst': 6443, 'donner': 6444, 'elenden': 6445, 'entgegnenden': 6446, 'zerschmettern': 6447, 'tyrannenart': 6448, 'verlegenheiten': 6449, 'bringe': 6450, 'aussetzest': 6451, 'wisse': 6452, 'blutschuld': 6453, 'erschlagenen': 6454, 'rächende': 6455, 'wiederkehrenden': 6456, 'mord': 6457, 'befrey': 6458, 'thürners': 6459, 'umnebeln': 6460, 'bemächtige': 6461, 'menschenhand': 6462, 'wache': 6463, 'zauberpferde': 6464, 'entführe': 6465, 'pferden': 6466, 'daher': 6467, 'brausend': 6468, 'rabenstein': 6469, 'neigen': 6470, 'hexenzunft': 6471, 'eisernen': 6472, 'thürchen': 6473, 'entwohnter': 6474, 'mauer': 6475, 'verbrechen': 6476, 'zauderst': 6477, 'fürchtest': 6478, 'zögert': 6479, 'gessen': 6480, 'schwesterlein': 6481, 'hub': 6482, 'bein': 6483, 'waldvögelein': 6484, 'aufschließend': 6485, 'ahndet': 6486, 'geliebte': 6487, 'klirren': 6488, 'rauscht': 6489, 'verbergend': 6490, 'bittrer': 6491, 'hinwälzend': 6492, 'wächter': 6493, 'schlafe': 6494, 'schreyen': 6495, 'knieen': 6496, 'holst': 6497, 'erbarme': 6498, 'zeitig': 6499, 'überstehen': 6500, 'herzt': 6501, 'kränken': 6502, 'mährchen': 6503, 'endigt': 6504, 'deuten': 6505, 'jammerknechtschaft': 6506, 'knien': 6507, 'anzurufen': 6508, 'stufen': 6509, 'furchtbarem': 6510, 'grimme': 6511, 'getöse': 6512, 'freundes': 6513, 'rufen': 6514, 'wehren': 6515, 'klappen': 6516, 'grimmigen': 6517, 'liebenden': 6518, 'kerkers': 6519, 'retten': 6520, 'erstenmale': 6521, 'heitere': 6522, 'fortstrebend': 6523, 'weilest': 6524, 'liebkosend': 6525, 'eilest': 6526, 'hasts': 6527, 'verlernt': 6528, 'halse': 6529, 'überdrang': 6530, 'küßtest': 6531, 'wolltest': 6532, 'küsse': 6533, 'küss': 6534, 'umfaßt': 6535, 'brachte': 6536, 'herze': 6537, 'tausendfacher': 6538, 'gewendet': 6539, 'scheust': 6540, 'befreyst': 6541, 'ertränkt': 6542, 'feucht': 6543, 'wische': 6544, 'stecke': 6545, 'vergangne': 6546, 'vergangen': 6547, 'schmiegen': 6548, 'süßes': 6549, 'dahinaus': 6550, 'lauert': 6551, 'ruhebett': 6552, 'wolle': 6553, 'fliehn': 6554, 'betteln': 6555, 'bösem': 6556, 'gewissen': 6557, 'steg': 6558, 'planke': 6559, 'teich': 6560, 'zappelt': 6561, 'besinne': 6562, 'sizt': 6563, 'wackelt': 6564, 'kopfe': 6565, 'nickt': 6566, 'freuten': 6567, 'glückliche': 6568, 'hinweg': 6569, 'mörderisch': 6570, 'hochzeittag': 6571, 'warst': 6572, 'kranze': 6573, 'glocke': 6574, 'stäbchen': 6575, 'blutstuhl': 6576, 'entrückt': 6577, 'schärfe': 6578, 'zückt': 6579, 'unnützes': 6580, 'zaudern': 6581, 'plaudern': 6582, 'schaudern': 6583, 'schicke': 6584, 'gericht': 6585, 'lagert': 6586, 'bewahren': 6587, 'grauts': 6588, 'gerichtet': 6589, 'innen': 6590, 'verhallend': 6591, 'anmerkungen': 6592, 'transkription': 6593, 'elektronische': 6594, 'wurde': 6595, 'grundlage': 6596, 'erschienenen': 6597, 'erstausgabe': 6598, 'erstellt': 6599, 'strikt': 6600, 'korrekturen': 6601, 'späteren': 6602, 'druckausgaben': 6603, 'eckigen': 6604, 'klammern': 6605, 'originalbuch': 6606, 'frakturschrift': 6607, 'gedruckt': 6608, 'textauszeichnungen': 6609, 'folgendermaßen': 6610, 'ersezt': 6611, 'sperrung': 6612, 'gesperrter': 6613, 'antiquaschrift': 6614, 'antiquatext': 6615, 'transcribers': 6616, 'note': 6617, 'prepared': 6618, 'first': 6619, 'published': 6620, 'strictly': 6621, 'follows': 6622, 'corrections': 6623, 'later': 6624, 'denoted': 6625, 'square': 6626, 'brackets': 6627, 'book': 6628, 'fraktur': 6629, 'markedup': 6630, 'replaced': 6631, 'spacedout': 6632, 'spaced': 6633, 'johann': 6634, 'wolfgang': 6635, 'goethe': 6636, 'named': 6637, 'various': 6638, 'markus': 6639, 'brenner': 6640, 'proofreading': 6641, 'team': 6642, 'httpwwwpgdpnet': 6643, 'scans': 6644, 'material': 6645, 'klassik': 6646, 'stiftung': 6647, 'weimar': 6648, 'herzogin': 6649, 'anna': 6650, 'amalia': 6651, 'bibliothek': 6652, 'updated': 6653, 'replace': 6654, 'previous': 6655, 'renamed': 6656, 'special': 6657, 'apply': 6658, 'specific': 6659, 'very': 6660, 'easy': 6661, 'creation': 6662, 'reports': 6663, 'performances': 6664, 'modified': 6665, 'given': 6666, 'practically': 6667, 'subject': 6668, 'especially': 6669, 'commercial': 6670, 'httpgutenbergorglicense': 6671, 'reading': 6672, 'indicate': 6673, 'understand': 6674, 'accept': 6675, 'trademarkcopyright': 6676, 'abide': 6677, 'cease': 6678, 'possession': 6679, 'whom': 6680, 'few': 6681, 'lot': 6682, 'follow': 6683, 'preserve': 6684, 'pglaf': 6685, 'compilation': 6686, 'claim': 6687, 'prevent': 6688, 'long': 6689, 'course': 6690, 'hope': 6691, 'sharing': 6692, 'easily': 6693, 'same': 6694, 'attached': 6695, 'when': 6696, 'share': 6697, 'others': 6698, 'place': 6699, 'govern': 6700, 'what': 6701, 'countries': 6702, 'constant': 6703, 'change': 6704, 'addition': 6705, 'downloading': 6706, 'makes': 6707, 'representations': 6708, 'appear': 6709, 'whenever': 6710, 'appears': 6711, 'accessed': 6712, 'displayed': 6713, 'performed': 6714, 'viewed': 6715, 'anywhere': 6716, 'almost': 6717, 'restrictions': 6718, 'whatsoever': 6719, 'reuse': 6720, 'derived': 6721, 'indicating': 6722, 'charges': 6723, 'appearing': 6724, 'either': 6725, 'imposed': 6726, 'linked': 6727, 'beginning': 6728, 'unlink': 6729, 'detach': 6730, 'remove': 6731, 'containing': 6732, 'display': 6733, 'perform': 6734, 'redistribute': 6735, 'convert': 6736, 'binary': 6737, 'compressed': 6738, 'nonproprietary': 6739, 'proprietary': 6740, 'word': 6741, 'processing': 6742, 'hypertext': 6743, 'however': 6744, 'version': 6745, 'expense': 6746, 'exporting': 6747, 'request': 6748, 'alternate': 6749, 'include': 6750, 'viewing': 6751, 'reasonable': 6752, 'pay': 6753, 'gross': 6754, 'profits': 6755, 'derive': 6756, 'calculated': 6757, 'method': 6758, 'already': 6759, 'calculate': 6760, 'taxes': 6761, 'owed': 6762, 'agreed': 6763, 'each': 6764, 'legally': 6765, 'required': 6766, 'periodic': 6767, 'returns': 6768, 'clearly': 6769, 'sent': 6770, 'address': 6771, 'notifies': 6772, 'she': 6773, 'require': 6774, 'possessed': 6775, 'discontinue': 6776, 'discovered': 6777, 'reported': 6778, 'wish': 6779, 'group': 6780, 'different': 6781, 'expend': 6782, 'identify': 6783, 'transcribe': 6784, 'proofread': 6785, 'despite': 6786, 'stored': 6787, 'defects': 6788, 'incomplete': 6789, 'inaccurate': 6790, 'corrupt': 6791, 'data': 6792, 'transcription': 6793, 'errors': 6794, 'infringement': 6795, 'damaged': 6796, 'disk': 6797, 'virus': 6798, 'codes': 6799, 'described': 6800, 'party': 6801, 'disclaim': 6802, 'remedies': 6803, 'negligence': 6804, 'strict': 6805, 'contract': 6806, 'those': 6807, 'distributor': 6808, 'liable': 6809, 'actual': 6810, 'direct': 6811, 'indirect': 6812, 'consequential': 6813, 'punitive': 6814, 'incidental': 6815, 'possibility': 6816, 'discover': 6817, 'receiving': 6818, 'sending': 6819, 'elect': 6820, 'choose': 6821, 'opportunity': 6822, 'demand': 6823, 'further': 6824, 'opportunities': 6825, 'fix': 6826, 'problem': 6827, 'asis': 6828, 'express': 6829, 'merchantibility': 6830, 'fitness': 6831, 'some': 6832, 'allow': 6833, 'disclaimers': 6834, 'exclusion': 6835, 'types': 6836, 'violates': 6837, 'interpreted': 6838, 'maximum': 6839, 'invalidity': 6840, 'unenforceability': 6841, 'provision': 6842, 'void': 6843, 'remaining': 6844, 'provisions': 6845, 'indemnity': 6846, 'indemnify': 6847, 'agent': 6848, 'employee': 6849, 'production': 6850, 'promotion': 6851, 'harmless': 6852, 'arise': 6853, 'directly': 6854, 'indirectly': 6855, 'occur': 6856, 'alteration': 6857, 'modification': 6858, 'additions': 6859, 'deletions': 6860, 'c': 6861, 'synonymous': 6862, 'variety': 6863, 'obsolete': 6864, 'middleaged': 6865, 'exists': 6866, 'because': 6867, 'hundreds': 6868, 'walks': 6869, 'life': 6870, 'financial': 6871, 'assistance': 6872, 'need': 6873, 'critical': 6874, 'reaching': 6875, 'gutenbergtms': 6876, 'goals': 6877, 'ensuring': 6878, 'remain': 6879, 'come': 6880, 'secure': 6881, 'permanent': 6882, 'learn': 6883, 'more': 6884, 'sections': 6885, 'httpwwwpglaforg': 6886, 'non': 6887, 'profit': 6888, 'educational': 6889, 'corporation': 6890, 'organized': 6891, 'mississippi': 6892, 'granted': 6893, 'internal': 6894, 'revenue': 6895, 'service': 6896, 'identification': 6897, 'letter': 6898, 'httppglaforgfundraising': 6899, 'deductible': 6900, 'extent': 6901, 'principal': 6902, 'melan': 6903, 'fairbanks': 6904, 'ak': 6905, 'scattered': 6906, 'throughout': 6907, 'numerous': 6908, 'business': 6909, 'north': 6910, 'salt': 6911, 'lake': 6912, 'city': 6913, 'ut': 6914, 'businesspglaforg': 6915, 'gregory': 6916, 'newby': 6917, 'chief': 6918, 'executive': 6919, 'gbnewbypglaforg': 6920, 'depends': 6921, 'survive': 6922, 'wide': 6923, 'spread': 6924, 'carry': 6925, 'increasing': 6926, 'licensed': 6927, 'machine': 6928, 'accessible': 6929, 'array': 6930, 'outdated': 6931, 'particularly': 6932, 'important': 6933, 'maintaining': 6934, 'irs': 6935, 'committed': 6936, 'regulating': 6937, 'charities': 6938, 'charitable': 6939, 'uniform': 6940, 'takes': 6941, 'much': 6942, 'paperwork': 6943, 'meet': 6944, 'confirmation': 6945, 'send': 6946, 'determine': 6947, 'while': 6948, 'met': 6949, 'solicitation': 6950, 'know': 6951, 'prohibition': 6952, 'against': 6953, 'accepting': 6954, 'unsolicited': 6955, 'donors': 6956, 'approach': 6957, 'offers': 6958, 'international': 6959, 'gratefully': 6960, 'statements': 6961, 'treatment': 6962, 'alone': 6963, 'swamp': 6964, 'staff': 6965, 'pages': 6966, 'current': 6967, 'donation': 6968, 'methods': 6969, 'addresses': 6970, 'ways': 6971, 'checks': 6972, 'credit': 6973, 'card': 6974, 'httppglaforgdonate': 6975, 'professor': 6976, 'originator': 6977, 'library': 6978, 'could': 6979, 'shared': 6980, 'thirty': 6981, 'years': 6982, 'loose': 6983, 'network': 6984, 'volunteer': 6985, 'often': 6986, 'several': 6987, 'confirmed': 6988, 'thus': 6989, 'necessarily': 6990, 'paper': 6991, 'main': 6992, 'pg': 6993, 'search': 6994, 'facility': 6995, 'walten': 6996, 'httpwwwgutenbergorg': 6997, 'mögt': 6998, 'includes': 6999, 'geneigt': 7000, 'produce': 7001, 'gezeigt': 7002, 'subscribe': 7003, 'newsletter': 7004, 'hear': 7005}\n"
     ]
    }
   ],
   "source": [
    "print(tokenizer.word_index)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can access the mapping of words to integers as a dictionary attribute called `word_index` on the \n",
    "Tokenizer object.\n",
    "\n",
    "We need to know the size of the vocabulary for defining the embedding layer later. We can determine \n",
    "the vocabulary by calculating the size of the mapping dictionary.\n",
    "\n",
    "Words are assigned values from 1 to the total number of words (e.g. 7005). The Embedding layer \n",
    "needs to allocate a vector representation for each word in this vocabulary from index 1 to the \n",
    "largest index and because indexing of arrays is zero-offset, the index of the word at the end of \n",
    "the vocabulary will be 7005; that means the array must be 7005 + 1 in length.\n",
    "\n",
    "Therefore, when specifying the vocabulary size to the Embedding layer, we specify it as 1 \n",
    "larger than the actual vocabulary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "7006\n"
     ]
    }
   ],
   "source": [
    "# vocabulary size\n",
    "vocab_size = len(tokenizer.word_index) + 1\n",
    "print(vocab_size)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Sequence Inputs and Output\n",
    "\n",
    "Now that we have encoded the input sequences, we need to separate them into input (X) and \n",
    "output (y) elements.\n",
    "\n",
    "We can do this with array slicing.\n",
    "\n",
    "After separating, we need to one-hot-encode the output word. This means converting it from \n",
    "an integer to a vector of 0 values, one for each word in the vocabulary, with a 1 to indicate \n",
    "the specific word at the index of the words integer value.\n",
    "\n",
    "This is so that the model learns to predict the probability distribution for the next word and \n",
    "the ground truth from which to learn from is 0 for all words except the actual word that comes \n",
    "next.\n",
    "\n",
    "Keras provides the `to_categorical()` that can be used to one-hot-encode the output words for each \n",
    "input-output sequence pair.\n",
    "\n",
    "Finally, we need to specify to the Embedding layer how long input sequences are. We know that there \n",
    "are 50 words because we designed the model, but a good generic way to specify that is to use the \n",
    "second dimension (number of columns) of the input data’s shape. That way, if you change the length \n",
    "of sequences when preparing data, you do not need to change this data loading code; it is generic."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "50\n",
      "[  21 2606   33  103 2605 1639    3  654   19  944   27 2604  356 7002\n",
      " 2602    2   61   33 2601  389    8  483  355    2   44  134   51  564\n",
      "  782 7000   21  321   33    8   47  102   14 6998   21 6996   25   21\n",
      "   55 2600    1  652   69   22  482   44]\n",
      "[0. 0. 0. ... 0. 0. 0.]\n",
      "(33549, 7006)\n"
     ]
    }
   ],
   "source": [
    "# separate into input and output\n",
    "from tensorflow.keras.utils import to_categorical\n",
    "sequences = np.array(sequences)\n",
    "X, y = sequences[:,:-1], sequences[:,-1]\n",
    "y = to_categorical(y, num_classes=vocab_size)\n",
    "seq_length = X.shape[1]\n",
    "print(seq_length)\n",
    "print(X[0]) ; print(y[0])\n",
    "print(y.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Fit Model\n",
    "\n",
    "We can now define and fit our language model on the training data.\n",
    "\n",
    "The learned embedding needs to know the size of the vocabulary and the \n",
    "length of input sequences as previously discussed. It also has a parameter \n",
    "to specify how many dimensions will be used to represent each word. That is, \n",
    "the size of the embedding vector space.\n",
    "\n",
    "Common values are 50, 100, and 300. We will use 50 here, but consider testing smaller or larger values.\n",
    "\n",
    "We will use two LSTM hidden layers with 100 memory units each. By means of `return_sequences=True`, the first LSTM hidden \n",
    "layers (each hidden layer consisting of 100 units) at each position serve in turn as input into the next LSTM hidden layers. \n",
    "More memory cells and a deeper network may achieve better results.\n",
    "\n",
    "A dense fully connected layer with 100 neurons connects to the final output of the second LSTM hidden layers to interpret the features extracted from the sequence. The output layer predicts the next word as a single vector the size of the vocabulary with a probability for each word in the vocabulary. A softmax activation function is used to ensure the outputs have the characteristics of normalized probabilities."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define model\n",
    "model = Sequential()\n",
    "model.add(Embedding(vocab_size, 50, input_length=seq_length))\n",
    "model.add(LSTM(100, return_sequences=True))\n",
    "model.add(LSTM(100))\n",
    "model.add(Dense(100, activation='relu'))\n",
    "model.add(Dense(vocab_size, activation='softmax'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A summary of the defined network is printed as a sanity check to ensure we have constructed what we intended."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model: \"sequential_1\"\n",
      "_________________________________________________________________\n",
      "Layer (type)                 Output Shape              Param #   \n",
      "=================================================================\n",
      "embedding_1 (Embedding)      (None, 50, 50)            350300    \n",
      "_________________________________________________________________\n",
      "lstm_2 (LSTM)                (None, 50, 100)           60400     \n",
      "_________________________________________________________________\n",
      "lstm_3 (LSTM)                (None, 100)               80400     \n",
      "_________________________________________________________________\n",
      "dense_2 (Dense)              (None, 100)               10100     \n",
      "_________________________________________________________________\n",
      "dense_3 (Dense)              (None, 7006)              707606    \n",
      "=================================================================\n",
      "Total params: 1,208,806\n",
      "Trainable params: 1,208,806\n",
      "Non-trainable params: 0\n",
      "_________________________________________________________________\n",
      "None\n"
     ]
    }
   ],
   "source": [
    "print(model.summary())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, the model is compiled specifying the categorical cross entropy loss needed to fit the model. Technically, the model is learning a multi-class classification and this is the suitable loss function for this type of problem. The efficient Adam implementation to mini-batch gradient descent is used and accuracy is evaluated of the model.\n",
    "\n",
    "Finally, the model is fit on the data for 100 training epochs with a modest batch size of 128 to speed things up.\n",
    "\n",
    "Training may take a few hours on modern hardware without GPUs. You can speed it up with a larger batch size and/or fewer training epochs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Train on 33549 samples\n",
      "Epoch 1/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 7.6015 - accuracy: 0.0256WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 53s 2ms/sample - loss: 7.6019 - accuracy: 0.0256\n",
      "Epoch 2/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 7.1517 - accuracy: 0.0273WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 46s 1ms/sample - loss: 7.1512 - accuracy: 0.0273\n",
      "Epoch 3/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 7.0093 - accuracy: 0.0265WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 7.0095 - accuracy: 0.0265\n",
      "Epoch 4/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 6.8323 - accuracy: 0.0302WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 46s 1ms/sample - loss: 6.8321 - accuracy: 0.0302\n",
      "Epoch 5/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 6.6149 - accuracy: 0.0361WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 46s 1ms/sample - loss: 6.6146 - accuracy: 0.0361\n",
      "Epoch 6/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 6.4739 - accuracy: 0.0386WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 46s 1ms/sample - loss: 6.4739 - accuracy: 0.0386\n",
      "Epoch 7/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 6.3614 - accuracy: 0.0415WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 6.3614 - accuracy: 0.0416\n",
      "Epoch 8/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 6.2550 - accuracy: 0.0440WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 6.2547 - accuracy: 0.0439\n",
      "Epoch 9/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 6.1668 - accuracy: 0.0447WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 6.1670 - accuracy: 0.0447\n",
      "Epoch 10/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 6.0883 - accuracy: 0.0478WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 6.0882 - accuracy: 0.0478\n",
      "Epoch 11/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 6.0216 - accuracy: 0.0502WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 46s 1ms/sample - loss: 6.0221 - accuracy: 0.0502\n",
      "Epoch 12/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 5.9525 - accuracy: 0.0515WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 5.9524 - accuracy: 0.0515\n",
      "Epoch 13/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 5.8864 - accuracy: 0.0522WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 5.8869 - accuracy: 0.0522\n",
      "Epoch 14/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 5.8199 - accuracy: 0.0543WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 49s 1ms/sample - loss: 5.8197 - accuracy: 0.0544\n",
      "Epoch 15/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 5.7495 - accuracy: 0.0550WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 5.7498 - accuracy: 0.0550\n",
      "Epoch 16/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 5.6753 - accuracy: 0.0578WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 5.6755 - accuracy: 0.0578\n",
      "Epoch 17/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 5.5986 - accuracy: 0.0590WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 49s 1ms/sample - loss: 5.5989 - accuracy: 0.0590\n",
      "Epoch 18/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 5.5267 - accuracy: 0.0613WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 5.5264 - accuracy: 0.0613\n",
      "Epoch 19/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 5.4496 - accuracy: 0.0628WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 5.4498 - accuracy: 0.0628\n",
      "Epoch 20/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 5.3814 - accuracy: 0.0650WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 5.3812 - accuracy: 0.0650\n",
      "Epoch 21/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 5.3068 - accuracy: 0.0672WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 49s 1ms/sample - loss: 5.3069 - accuracy: 0.0672\n",
      "Epoch 22/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 5.2312 - accuracy: 0.0700WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 49s 1ms/sample - loss: 5.2310 - accuracy: 0.0700\n",
      "Epoch 23/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 5.1572 - accuracy: 0.0718WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 5.1572 - accuracy: 0.0718\n",
      "Epoch 24/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 5.0897 - accuracy: 0.0745WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 49s 1ms/sample - loss: 5.0898 - accuracy: 0.0745\n",
      "Epoch 25/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 5.0156 - accuracy: 0.0769WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 5.0158 - accuracy: 0.0769\n",
      "Epoch 26/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 4.9431 - accuracy: 0.0803WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 4.9432 - accuracy: 0.0802\n",
      "Epoch 27/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 4.8671 - accuracy: 0.0843WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 4.8673 - accuracy: 0.0843\n",
      "Epoch 28/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 4.7983 - accuracy: 0.0873WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 4.7986 - accuracy: 0.0873\n",
      "Epoch 29/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 4.7307 - accuracy: 0.0912WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 4.7307 - accuracy: 0.0912\n",
      "Epoch 30/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 4.6548 - accuracy: 0.0961WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 4.6546 - accuracy: 0.0962\n",
      "Epoch 31/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 4.5846 - accuracy: 0.1013WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 4.5846 - accuracy: 0.1013\n",
      "Epoch 32/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 4.5059 - accuracy: 0.1088WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 4.5060 - accuracy: 0.1088\n",
      "Epoch 33/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 4.4415 - accuracy: 0.1146WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 4.4416 - accuracy: 0.1146\n",
      "Epoch 34/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 4.3711 - accuracy: 0.1222WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 4.3711 - accuracy: 0.1222\n",
      "Epoch 35/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 4.3028 - accuracy: 0.1302WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 4.3029 - accuracy: 0.1302\n",
      "Epoch 36/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 4.2402 - accuracy: 0.1380WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 4.2402 - accuracy: 0.1379\n",
      "Epoch 37/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 4.1797 - accuracy: 0.1457WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 4.1796 - accuracy: 0.1457\n",
      "Epoch 38/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 4.1106 - accuracy: 0.1569WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 4.1106 - accuracy: 0.1570\n",
      "Epoch 39/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 4.0483 - accuracy: 0.1647WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 4.0485 - accuracy: 0.1647\n",
      "Epoch 40/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 3.9894 - accuracy: 0.1724WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 3.9895 - accuracy: 0.1724\n",
      "Epoch 41/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 3.9309 - accuracy: 0.1811WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 3.9309 - accuracy: 0.1811\n",
      "Epoch 42/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 3.8759 - accuracy: 0.1899WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 3.8759 - accuracy: 0.1899\n",
      "Epoch 43/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 3.8169 - accuracy: 0.1981WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 3.8172 - accuracy: 0.1980\n",
      "Epoch 44/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 3.7583 - accuracy: 0.2088WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 3.7582 - accuracy: 0.2088\n",
      "Epoch 45/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 3.7058 - accuracy: 0.2167WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 3.7060 - accuracy: 0.2167\n",
      "Epoch 46/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 3.6524 - accuracy: 0.2301WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 3.6523 - accuracy: 0.2301\n",
      "Epoch 47/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 3.6044 - accuracy: 0.2360WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 3.6045 - accuracy: 0.2360\n",
      "Epoch 48/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 3.5476 - accuracy: 0.2443WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 3.5478 - accuracy: 0.2443\n",
      "Epoch 49/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 3.5011 - accuracy: 0.2523WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 3.5014 - accuracy: 0.2523\n",
      "Epoch 50/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 3.4601 - accuracy: 0.2603WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 3.4600 - accuracy: 0.2602\n",
      "Epoch 51/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 3.4058 - accuracy: 0.2702WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 3.4059 - accuracy: 0.2702\n",
      "Epoch 52/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 3.3570 - accuracy: 0.2793WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 3.3572 - accuracy: 0.2793\n",
      "Epoch 53/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 3.3133 - accuracy: 0.2850WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 3.3131 - accuracy: 0.2850\n",
      "Epoch 54/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 3.2754 - accuracy: 0.2924WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 3.2753 - accuracy: 0.2924\n",
      "Epoch 55/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 3.2363 - accuracy: 0.3007WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 3.2361 - accuracy: 0.3008\n",
      "Epoch 56/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 3.1929 - accuracy: 0.3083WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 3.1927 - accuracy: 0.3084\n",
      "Epoch 57/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 3.1515 - accuracy: 0.3143WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 3.1519 - accuracy: 0.3143\n",
      "Epoch 58/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 3.1186 - accuracy: 0.3206WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 3.1188 - accuracy: 0.3206\n",
      "Epoch 59/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 3.0711 - accuracy: 0.3320WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 3.0712 - accuracy: 0.3320\n",
      "Epoch 60/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 3.0397 - accuracy: 0.3364WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 3.0399 - accuracy: 0.3364\n",
      "Epoch 61/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 3.0060 - accuracy: 0.3419WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 3.0060 - accuracy: 0.3419\n",
      "Epoch 62/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.9942 - accuracy: 0.3403WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 2.9945 - accuracy: 0.3402\n",
      "Epoch 63/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.9397 - accuracy: 0.3525WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.9392 - accuracy: 0.3526\n",
      "Epoch 64/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.8967 - accuracy: 0.3592WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.8966 - accuracy: 0.3592\n",
      "Epoch 65/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.8706 - accuracy: 0.3664WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.8707 - accuracy: 0.3663\n",
      "Epoch 66/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.8385 - accuracy: 0.3706WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.8387 - accuracy: 0.3705\n",
      "Epoch 67/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.8082 - accuracy: 0.3757WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.8081 - accuracy: 0.3757\n",
      "Epoch 68/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.7849 - accuracy: 0.3771WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.7850 - accuracy: 0.3771\n",
      "Epoch 69/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.7496 - accuracy: 0.3854WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.7495 - accuracy: 0.3854\n",
      "Epoch 70/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.7183 - accuracy: 0.3910WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.7185 - accuracy: 0.3910\n",
      "Epoch 71/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.6926 - accuracy: 0.3980WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.6927 - accuracy: 0.3980\n",
      "Epoch 72/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.6641 - accuracy: 0.4029WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.6643 - accuracy: 0.4028\n",
      "Epoch 73/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.6417 - accuracy: 0.4046WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 2.6418 - accuracy: 0.4046\n",
      "Epoch 74/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.6221 - accuracy: 0.4098WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 2.6222 - accuracy: 0.4098\n",
      "Epoch 75/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.5986 - accuracy: 0.4140WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 2.5987 - accuracy: 0.4141\n",
      "Epoch 76/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.5569 - accuracy: 0.4222WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.5570 - accuracy: 0.4221\n",
      "Epoch 77/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.5347 - accuracy: 0.4278WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.5349 - accuracy: 0.4278\n",
      "Epoch 78/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.5190 - accuracy: 0.4299WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.5192 - accuracy: 0.4299\n",
      "Epoch 79/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.5160 - accuracy: 0.4271WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.5157 - accuracy: 0.4272\n",
      "Epoch 80/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.4712 - accuracy: 0.4372WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.4711 - accuracy: 0.4372\n",
      "Epoch 81/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.4473 - accuracy: 0.4414WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 46s 1ms/sample - loss: 2.4472 - accuracy: 0.4414\n",
      "Epoch 82/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.4155 - accuracy: 0.4514WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.4154 - accuracy: 0.4515\n",
      "Epoch 83/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.3977 - accuracy: 0.4535WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.3975 - accuracy: 0.4535\n",
      "Epoch 84/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.3798 - accuracy: 0.4548WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.3797 - accuracy: 0.4549\n",
      "Epoch 85/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.3571 - accuracy: 0.4598WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.3570 - accuracy: 0.4598\n",
      "Epoch 86/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.3404 - accuracy: 0.4629WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 2.3407 - accuracy: 0.4628\n",
      "Epoch 87/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.3224 - accuracy: 0.4678WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.3225 - accuracy: 0.4678\n",
      "Epoch 88/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.3009 - accuracy: 0.4702WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.3010 - accuracy: 0.4701\n",
      "Epoch 89/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.2780 - accuracy: 0.4740WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.2782 - accuracy: 0.4739\n",
      "Epoch 90/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.2663 - accuracy: 0.4786WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.2663 - accuracy: 0.4786\n",
      "Epoch 91/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.2450 - accuracy: 0.4796WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.2452 - accuracy: 0.4796\n",
      "Epoch 92/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.2341 - accuracy: 0.4837WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.2344 - accuracy: 0.4837\n",
      "Epoch 93/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.2122 - accuracy: 0.4875WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 2.2124 - accuracy: 0.4874\n",
      "Epoch 94/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.1793 - accuracy: 0.4960WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 2.1793 - accuracy: 0.4960\n",
      "Epoch 95/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.1565 - accuracy: 0.5007WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.1566 - accuracy: 0.5006\n",
      "Epoch 96/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.1420 - accuracy: 0.5010WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.1420 - accuracy: 0.5010\n",
      "Epoch 97/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.1189 - accuracy: 0.5081WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.1189 - accuracy: 0.5081\n",
      "Epoch 98/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.1183 - accuracy: 0.5060WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 2.1186 - accuracy: 0.5060\n",
      "Epoch 99/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.0925 - accuracy: 0.5143WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 2.0924 - accuracy: 0.5143\n",
      "Epoch 100/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.0765 - accuracy: 0.5176WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.0764 - accuracy: 0.5177\n",
      "Epoch 101/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.0518 - accuracy: 0.5211WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.0518 - accuracy: 0.5211\n",
      "Epoch 102/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.0427 - accuracy: 0.5235WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.0425 - accuracy: 0.5235\n",
      "Epoch 103/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 2.0103 - accuracy: 0.5310WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 2.0104 - accuracy: 0.5309\n",
      "Epoch 104/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.9949 - accuracy: 0.5353WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.9947 - accuracy: 0.5354\n",
      "Epoch 105/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.9812 - accuracy: 0.5350WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.9813 - accuracy: 0.5350\n",
      "Epoch 106/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.9673 - accuracy: 0.5397WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.9677 - accuracy: 0.5396\n",
      "Epoch 107/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.9609 - accuracy: 0.5386WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.9611 - accuracy: 0.5386\n",
      "Epoch 108/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.9310 - accuracy: 0.5461WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.9313 - accuracy: 0.5460\n",
      "Epoch 109/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.9283 - accuracy: 0.5480WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.9284 - accuracy: 0.5479\n",
      "Epoch 110/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.8997 - accuracy: 0.5525WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.8997 - accuracy: 0.5525\n",
      "Epoch 111/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.8885 - accuracy: 0.5559WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.8887 - accuracy: 0.5559\n",
      "Epoch 112/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.8712 - accuracy: 0.5583WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.8712 - accuracy: 0.5583\n",
      "Epoch 113/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.8687 - accuracy: 0.5610WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.8686 - accuracy: 0.5611\n",
      "Epoch 114/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.8495 - accuracy: 0.5645WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.8495 - accuracy: 0.5645\n",
      "Epoch 115/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.8277 - accuracy: 0.5694WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.8275 - accuracy: 0.5695\n",
      "Epoch 116/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.8087 - accuracy: 0.5715WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.8084 - accuracy: 0.5716\n",
      "Epoch 117/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.7855 - accuracy: 0.5787WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.7852 - accuracy: 0.5788\n",
      "Epoch 118/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.7622 - accuracy: 0.5824WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.7620 - accuracy: 0.5823\n",
      "Epoch 119/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.7621 - accuracy: 0.5810WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.7619 - accuracy: 0.5810\n",
      "Epoch 120/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.7591 - accuracy: 0.5838WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.7592 - accuracy: 0.5837\n",
      "Epoch 121/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.7241 - accuracy: 0.5911WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 1.7242 - accuracy: 0.5910\n",
      "Epoch 122/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.7056 - accuracy: 0.5967WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 1.7059 - accuracy: 0.5966\n",
      "Epoch 123/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.7064 - accuracy: 0.5963WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 1.7064 - accuracy: 0.5963\n",
      "Epoch 124/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.6958 - accuracy: 0.5958WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.6960 - accuracy: 0.5957\n",
      "Epoch 125/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.6797 - accuracy: 0.6000WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.6800 - accuracy: 0.6000\n",
      "Epoch 126/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.6657 - accuracy: 0.6036WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.6660 - accuracy: 0.6036\n",
      "Epoch 127/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.6428 - accuracy: 0.6090WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 1.6426 - accuracy: 0.6091\n",
      "Epoch 128/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.6215 - accuracy: 0.6142WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 1.6215 - accuracy: 0.6142\n",
      "Epoch 129/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.6349 - accuracy: 0.6091WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.6348 - accuracy: 0.6091\n",
      "Epoch 130/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.6177 - accuracy: 0.6121WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.6178 - accuracy: 0.6120\n",
      "Epoch 131/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.5900 - accuracy: 0.6198WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.5900 - accuracy: 0.6198\n",
      "Epoch 132/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.5646 - accuracy: 0.6270WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.5643 - accuracy: 0.6271\n",
      "Epoch 133/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.5621 - accuracy: 0.6268WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.5622 - accuracy: 0.6268\n",
      "Epoch 134/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.5661 - accuracy: 0.6259WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.5663 - accuracy: 0.6258\n",
      "Epoch 135/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.5490 - accuracy: 0.6277WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 1.5489 - accuracy: 0.6277\n",
      "Epoch 136/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.5199 - accuracy: 0.6350WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.5198 - accuracy: 0.6350\n",
      "Epoch 137/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.5261 - accuracy: 0.6338WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.5261 - accuracy: 0.6338\n",
      "Epoch 138/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.4908 - accuracy: 0.6413WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 1.4905 - accuracy: 0.6414\n",
      "Epoch 139/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.4639 - accuracy: 0.6512WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.4637 - accuracy: 0.6513\n",
      "Epoch 140/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.4618 - accuracy: 0.6485WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.4618 - accuracy: 0.6485\n",
      "Epoch 141/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.4472 - accuracy: 0.6539WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.4471 - accuracy: 0.6539\n",
      "Epoch 142/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.4373 - accuracy: 0.6557WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 1.4372 - accuracy: 0.6557\n",
      "Epoch 143/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.4504 - accuracy: 0.6492WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 1.4505 - accuracy: 0.6492\n",
      "Epoch 144/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.4328 - accuracy: 0.6540WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.4330 - accuracy: 0.6539\n",
      "Epoch 145/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.4204 - accuracy: 0.6559WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.4203 - accuracy: 0.6559\n",
      "Epoch 146/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.3985 - accuracy: 0.6629WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.3986 - accuracy: 0.6629\n",
      "Epoch 147/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.3763 - accuracy: 0.6692WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.3763 - accuracy: 0.6692\n",
      "Epoch 148/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.3575 - accuracy: 0.6757WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.3575 - accuracy: 0.6757\n",
      "Epoch 149/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.3612 - accuracy: 0.6738WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.3615 - accuracy: 0.6737\n",
      "Epoch 150/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.3751 - accuracy: 0.6689WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.3752 - accuracy: 0.6689\n",
      "Epoch 151/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.3438 - accuracy: 0.6749WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.3441 - accuracy: 0.6747\n",
      "Epoch 152/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.3398 - accuracy: 0.6752WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.3397 - accuracy: 0.6752\n",
      "Epoch 153/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.3187 - accuracy: 0.6811WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.3185 - accuracy: 0.6812\n",
      "Epoch 154/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.3064 - accuracy: 0.6835WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 1.3065 - accuracy: 0.6835\n",
      "Epoch 155/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.2743 - accuracy: 0.6935WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 49s 1ms/sample - loss: 1.2742 - accuracy: 0.6935\n",
      "Epoch 156/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.2659 - accuracy: 0.6971WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 1.2661 - accuracy: 0.6971\n",
      "Epoch 157/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.2636 - accuracy: 0.6966WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.2634 - accuracy: 0.6966\n",
      "Epoch 158/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.2448 - accuracy: 0.6991WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.2450 - accuracy: 0.6991\n",
      "Epoch 159/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.2827 - accuracy: 0.6882WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.2828 - accuracy: 0.6882\n",
      "Epoch 160/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.2369 - accuracy: 0.6989WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.2370 - accuracy: 0.6989\n",
      "Epoch 161/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.2147 - accuracy: 0.7076WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.2145 - accuracy: 0.7077\n",
      "Epoch 162/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.2147 - accuracy: 0.7052WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 1.2147 - accuracy: 0.7052\n",
      "Epoch 163/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.2157 - accuracy: 0.7043WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.2156 - accuracy: 0.7043\n",
      "Epoch 164/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.1859 - accuracy: 0.7139WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 46s 1ms/sample - loss: 1.1859 - accuracy: 0.7138\n",
      "Epoch 165/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.1531 - accuracy: 0.7207WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.1531 - accuracy: 0.7208\n",
      "Epoch 167/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.1415 - accuracy: 0.7232WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 46s 1ms/sample - loss: 1.1414 - accuracy: 0.7232\n",
      "Epoch 170/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.1168 - accuracy: 0.7312WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 46s 1ms/sample - loss: 1.1166 - accuracy: 0.7312\n",
      "Epoch 171/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.1078 - accuracy: 0.7320WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 46s 1ms/sample - loss: 1.1080 - accuracy: 0.7319\n",
      "Epoch 172/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.1192 - accuracy: 0.7290WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 46s 1ms/sample - loss: 1.1190 - accuracy: 0.7291\n",
      "Epoch 173/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.1254 - accuracy: 0.7261WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.1256 - accuracy: 0.7260\n",
      "Epoch 174/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.1307 - accuracy: 0.7236WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.1307 - accuracy: 0.7236\n",
      "Epoch 175/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.0979 - accuracy: 0.7309WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.0978 - accuracy: 0.7309\n",
      "Epoch 176/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.0695 - accuracy: 0.7407WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.0695 - accuracy: 0.7407\n",
      "Epoch 177/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.0534 - accuracy: 0.7440WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.0534 - accuracy: 0.7440\n",
      "Epoch 178/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.0387 - accuracy: 0.7470WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.0389 - accuracy: 0.7470\n",
      "Epoch 179/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.0216 - accuracy: 0.7530WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.0216 - accuracy: 0.7529\n",
      "Epoch 180/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.0215 - accuracy: 0.7544WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 48s 1ms/sample - loss: 1.0214 - accuracy: 0.7544\n",
      "Epoch 181/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.0024 - accuracy: 0.7589WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.0024 - accuracy: 0.7589\n",
      "Epoch 182/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.0034 - accuracy: 0.7576WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.0032 - accuracy: 0.7577\n",
      "Epoch 183/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 0.9853 - accuracy: 0.7616WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 0.9855 - accuracy: 0.7615\n",
      "Epoch 184/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 0.9828 - accuracy: 0.7642WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 0.9830 - accuracy: 0.7641\n",
      "Epoch 185/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 0.9833 - accuracy: 0.7598WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 0.9837 - accuracy: 0.7597\n",
      "Epoch 186/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 0.9894 - accuracy: 0.7581WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 0.9898 - accuracy: 0.7580\n",
      "Epoch 187/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 1.0017 - accuracy: 0.7554WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 1.0016 - accuracy: 0.7554\n",
      "Epoch 188/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 0.9590 - accuracy: 0.7679WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 0.9590 - accuracy: 0.7679\n",
      "Epoch 189/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 0.9440 - accuracy: 0.7728WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 0.9441 - accuracy: 0.7728\n",
      "Epoch 190/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 0.9322 - accuracy: 0.7750WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 0.9323 - accuracy: 0.7750\n",
      "Epoch 191/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 0.9246 - accuracy: 0.7771WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 0.9244 - accuracy: 0.7772\n",
      "Epoch 192/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 0.9264 - accuracy: 0.7781WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 0.9264 - accuracy: 0.7781\n",
      "Epoch 193/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 0.9295 - accuracy: 0.7738WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 46s 1ms/sample - loss: 0.9294 - accuracy: 0.7738\n",
      "Epoch 194/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 0.9172 - accuracy: 0.7759WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 0.9172 - accuracy: 0.7760\n",
      "Epoch 195/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 0.8771 - accuracy: 0.7895WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 0.8771 - accuracy: 0.7895\n",
      "Epoch 196/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 0.8834 - accuracy: 0.7864WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 0.8836 - accuracy: 0.7864\n",
      "Epoch 197/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 0.8764 - accuracy: 0.7882WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 0.8765 - accuracy: 0.7882\n",
      "Epoch 198/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 0.8671 - accuracy: 0.7904WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 0.8670 - accuracy: 0.7904\n",
      "Epoch 199/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 0.8518 - accuracy: 0.7943WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 0.8519 - accuracy: 0.7943\n",
      "Epoch 200/200\n",
      "33536/33549 [============================>.] - ETA: 0s - loss: 0.8349 - accuracy: 0.8001WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,accuracy\n",
      "WARNING:tensorflow:Can save best model only with val_loss available, skipping.\n",
      "33549/33549 [==============================] - 47s 1ms/sample - loss: 0.8348 - accuracy: 0.8001\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<tensorflow.python.keras.callbacks.History at 0x7fe36a0318d0>"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from tensorflow.keras import callbacks\n",
    "model_checkpoint =callbacks.ModelCheckpoint(\"my_checkpoint.h5\", save_best_only=True)\n",
    "early_stopping = callbacks.EarlyStopping(patience=50)\n",
    "# compile model\n",
    "model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])\n",
    "# fit model\n",
    "model.fit(X, y, batch_size=256, epochs=200, callbacks=[early_stopping, model_checkpoint])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Save Model\n",
    "\n",
    "At the end of the run, the trained model is saved to file.\n",
    "\n",
    "Here, we use the Keras model API to save the model to the file `model_goethe.h5` in the current working directory.\n",
    "\n",
    "Later, when we load the model to make predictions, we will also need the mapping of words to integers. This is in the Tokenizer object, and we can save that too using Pickle."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "# save the model to file\n",
    "model.save('model_goethe_generator.h5')\n",
    "# save the tokenizer\n",
    "import pickle\n",
    "pickle.dump(tokenizer, open('tokenizer_goethe.pkl', 'wb'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Use Language Model\n",
    "\n",
    "Now that we have a trained language model, we can use it.\n",
    "\n",
    "In this case, we can use it to generate new sequences of text that have the same statistical properties as the source text.\n",
    "\n",
    "This is not practical, at least not for this example, but it gives a concrete example of what the language model has learned.\n",
    "\n",
    "We will start by loading the training sequences again.\n",
    "\n",
    "\n",
    "### Load Data\n",
    "\n",
    "We can use the same code from the previous section to load the training data sequences of text.\n",
    "\n",
    "Specifically, the `load_doc()` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load doc into memory\n",
    "def load_doc(filename):\n",
    "    # open the file as read only\n",
    "    file = open(filename, 'r')\n",
    "    # read all text\n",
    "    text = file.read()\n",
    "    # close the file\n",
    "    file.close()\n",
    "    return text\n",
    " \n",
    "# load cleaned text sequences\n",
    "in_filename = 'goethe_sequences.txt'\n",
    "doc = load_doc(in_filename)\n",
    "lines = doc.split('\\n')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We need the text so that we can choose a source sequence as input to the model for generating a new sequence of text.\n",
    "\n",
    "The model will require 50 words as input.\n",
    "\n",
    "Later, we will need to specify the expected length of input. We can determine this from the input sequences by calculating the length of one line of the loaded data and subtracting 1 for the expected output word that is also on the same line."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "50\n"
     ]
    }
   ],
   "source": [
    "seq_length = len(lines[0].split()) - 1\n",
    "print(seq_length)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load Model\n",
    "\n",
    "We can now load the model from file.\n",
    "\n",
    "Keras provides the `load_model()` function for loading the model, ready for use."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: h5py<3.0.0 in /opt/conda/lib/python3.7/site-packages (2.10.0)\n",
      "Requirement already satisfied: six in /opt/conda/lib/python3.7/site-packages (from h5py<3.0.0) (1.14.0)\n",
      "Requirement already satisfied: numpy>=1.7 in /opt/conda/lib/python3.7/site-packages (from h5py<3.0.0) (1.19.1)\n",
      "\u001b[33mWARNING: You are using pip version 20.1.1; however, version 21.0.1 is available.\n",
      "You should consider upgrading via the '/opt/conda/bin/python3 -m pip install --upgrade pip' command.\u001b[0m\n"
     ]
    },
    {
     "ename": "AttributeError",
     "evalue": "'str' object has no attribute 'decode'",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mAttributeError\u001b[0m                            Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-58-7352f2d33ec0>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0;31m# load the model\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      2\u001b[0m \u001b[0mget_ipython\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msystem\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"pip3 install 'h5py<3.0.0'\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mmodel\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mload_model\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'model_goethe_generator.h5'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/keras/saving/save.py\u001b[0m in \u001b[0;36mload_model\u001b[0;34m(filepath, custom_objects, compile)\u001b[0m\n\u001b[1;32m    144\u001b[0m   if (h5py is not None and (\n\u001b[1;32m    145\u001b[0m       isinstance(filepath, h5py.File) or h5py.is_hdf5(filepath))):\n\u001b[0;32m--> 146\u001b[0;31m     \u001b[0;32mreturn\u001b[0m \u001b[0mhdf5_format\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mload_model_from_hdf5\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilepath\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcustom_objects\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcompile\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    147\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    148\u001b[0m   \u001b[0;32mif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilepath\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msix\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstring_types\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/keras/saving/hdf5_format.py\u001b[0m in \u001b[0;36mload_model_from_hdf5\u001b[0;34m(filepath, custom_objects, compile)\u001b[0m\n\u001b[1;32m    164\u001b[0m     \u001b[0;32mif\u001b[0m \u001b[0mmodel_config\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    165\u001b[0m       \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'No model found in config file.'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 166\u001b[0;31m     \u001b[0mmodel_config\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mjson\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mloads\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmodel_config\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdecode\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'utf-8'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    167\u001b[0m     model = model_config_lib.model_from_config(model_config,\n\u001b[1;32m    168\u001b[0m                                                custom_objects=custom_objects)\n",
      "\u001b[0;31mAttributeError\u001b[0m: 'str' object has no attribute 'decode'"
     ]
    }
   ],
   "source": [
    "# load the model\n",
    "!pip3 install 'h5py<3.0.0'\n",
    "model = load_model('model_goethe_generator.h5')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also load the tokenizer from file using the Pickle API."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load the tokenizer\n",
    "tokenizer = pickle.load(open('tokenizer_goethe.pkl', 'rb'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Generate Text\n",
    "\n",
    "The first step in generating text is preparing a seed input.\n",
    "\n",
    "We will select a random line of text from the input text for this purpose. Once selected, we will print it so that we have some idea of what was used."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "gleich in die hölle springen xenien als insekten sind wir da mit kleinen scharfen scheren satan unsern herrn papa nach würden zu verehren hennings seht wie sie in gedrängter schaar naiv zusammen scherzen am ende sagen sie noch gar sie hätten gute herzen musaget ich mag in diesem hexenheer mich gar\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# set a seed text\n",
    "from random import randint\n",
    "seed_text = lines[randint(0,len(lines))]\n",
    "print(seed_text + '\\n')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we can generate new words, one at a time.\n",
    "\n",
    "First, the seed text must be encoded to integers using the same tokenizer that we used when training the model.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(1, 50)\n",
      "[[  86    9    3  337 1409 6293   42 6294   57   38   37   17  635 6295\n",
      "  6296 2294 1254  149 6297   52 2405    8 1249 6298  250   25   12    9\n",
      "  1715 1403 6299 1004 1557   75  261  167   12   51   88   12 6300  421\n",
      "   169 6301    2  117    9  166 2472   22]]\n"
     ]
    }
   ],
   "source": [
    "encoded = tokenizer.texts_to_sequences([seed_text])[0]\n",
    "# truncate sequences to a fixed length\n",
    "encoded = np.array(encoded)[0:50].reshape((1,50))\n",
    "print(encoded.shape)\n",
    "print(encoded)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The model can predict the next word directly by calling `model.predict_classes()` that will return the index of the word with the highest probability."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[183]\n"
     ]
    }
   ],
   "source": [
    "# predict probabilities for each word\n",
    "yhat = model.predict_classes(encoded, verbose=0)\n",
    "print(yhat)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can then look up the index in the Tokenizers mapping to get the associated word."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "liebe\n"
     ]
    }
   ],
   "source": [
    "out_word = ''\n",
    "for word, index in tokenizer.word_index.items():\n",
    "    if index == yhat:\n",
    "        out_word = word\n",
    "        print(out_word)\n",
    "        break"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can then append this word to the seed text and repeat the process.\n",
    "\n",
    "Importantly, the input sequence is going to get too long. We can truncate it to the desired length after the input sequence has been encoded to integers. Keras provides the `pad_sequences()` function that we can use to perform this truncation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tensorflow.keras.preprocessing.sequence import pad_sequences\n",
    "encoded = pad_sequences([encoded], maxlen=seq_length, truncating='pre')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can wrap all of this into a function called `generate_seq()` that takes as input the model, the tokenizer, input sequence length, the seed text, and the number of words to generate. It then returns a sequence of words generated by the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [],
   "source": [
    "# generate a sequence from a language model\n",
    "def generate_seq(model, tokenizer, seq_length, seed_text, n_words):\n",
    "    result = list()\n",
    "    in_text = seed_text\n",
    "    # generate a fixed number of words\n",
    "    for _ in range(n_words):\n",
    "        # encode the text as integer\n",
    "        encoded = tokenizer.texts_to_sequences([in_text])[0]\n",
    "        # truncate sequences to a fixed length\n",
    "        encoded = pad_sequences([encoded], maxlen=seq_length, truncating='pre')\n",
    "        # predict probabilities for each word\n",
    "        yhat = model.predict_classes(encoded, verbose=0)\n",
    "        # map predicted word index to word\n",
    "        out_word = ''\n",
    "        for word, index in tokenizer.word_index.items():\n",
    "            if index == yhat:\n",
    "                out_word = word\n",
    "                break\n",
    "        # append to input\n",
    "        in_text += ' ' + out_word\n",
    "        result.append(out_word)\n",
    "    return ' '.join(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We are now ready to generate a sequence of new words given some seed text.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ringsum gegenwart erstanden erstanden erstanden judex judex fremder fremder fremder seltsamen seltsamen moralisch moralisch wiederholt zittert zittert wiederholt wiederholt zittert wiederholt seltsamen seltsamen wiederholt wiederholt zittert zittert werden wohlbekannte wohlbekannte wohlbekannte einzunehmen gegenwart meinem meinem meinem himmelslicht himmelslicht himmelslicht hieß hieß hieß hieß damaged damaged samstags genoß genoß genoß genoß\n"
     ]
    }
   ],
   "source": [
    "#### generate new text\n",
    "generated = generate_seq(model, tokenizer, seq_length, seed_text, 50)\n",
    "print(generated)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Extensions for Your Course Project\n",
    "\n",
    "This section lists some ideas for extending the example that you may wish to explore in your project.\n",
    "\n",
    "0. __Choose Dataset__ : Besides books consider cooking recipes, recipes for gin, beer, bread etc. Report on your experimentally verified result!\n",
    "\n",
    "1. __Sentence-Wise Model__: Split the raw data based on sentences and pad each sentence to a fixed length (e.g. the longest sentence length).\n",
    "   \n",
    "2. __Simplify Vocabulary__: Explore a simpler vocabulary, perhaps with stemmed words or stop words removed.\n",
    "    \n",
    "3. __Tune Model__: Tune the model, such as the size of the embedding or number of memory cells in the hidden layer, to see if you can develop a better model.\n",
    "    \n",
    "4. __Deeper Model__: Extend the model to have multiple LSTM hidden layers, perhaps with dropout to see if you can develop a better model.\n",
    "    \n",
    "5. __Pre-Trained Word Embedding__: Extend the model to use pre-trained word2vec or GloVe vectors to see if it results in a better model.\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part II : Attention Basics\n",
    "In this notebook, we look at how attention is implemented. We will focus on implementing attention in isolation from a larger model. That's because when implementing attention in a real-world model, a lot of the focus goes into piping the data and juggling the various vectors rather than the concepts of attention themselves.\n",
    "\n",
    "We will implement attention scoring as well as calculating an attention context vector.\n",
    "\n",
    "## Attention Scoring\n",
    "### Inputs to the scoring function\n",
    "Let's start by looking at the inputs we'll give to the scoring function. We will assume we're in the first step in the decoding phase. The first input to the scoring function is the hidden state of decoder (assuming a toy RNN with three hidden nodes -- not usable in real life, but easier to illustrate):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [],
   "source": [
    "dec_hidden_state = [5,1,20]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's visualize this vector:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<AxesSubplot:>"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAJAAAAEYCAYAAACz0n+5AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAARzklEQVR4nO2de7BV1X3HP19Qx4QrD0vFKxDNVEMabUFFGkfaYEVEg2I7DIU/IlgtJhNrnKZTLdUosUkcMzFDpUZpYMSMNQ8NyiSIEMeMkkblEfAFCFEc7i2PGORxTQi98OsfZyOHw3nce9bZZ9297+/D7Llnr/1Y69z74bfW3mev35GZ4Tj10id2A5xs4wI5QbhAThAukBOEC+QE4QI5QbhAOULScEnPS3pT0huSvpSUnypphaTNyc9BFY6fkeyzWdKMLtXp94Hyg6RWoNXM1ko6BVgDXAvMBHab2b2SbgcGmdltJceeCqwGRgOWHHuhmb1frU6PQDnCzLab2drk9X5gAzAUmAwsSnZbREGqUq4AVpjZ7kSaFcDEWnW6QDlF0lnA+cDLwBAz255s2gEMKXPIUGBb0XpbUlaVE8Ka2SW8jzyKilfmaE63fzd3c/dNwKyiovlmNv+YSqQW4EngVjPbJx2t1sxMUsP+Js0QiC3Pb2lGNT2asy89uyHnSWSZX2m7pBMpyPOYmf04Kd4pqdXMtifjpF1lDm0HxhWtDwN+Xqs93oXFRHUs1U5XCDULgA1mdn/RpiXAkauqGcDTZQ5/FpggaVBylTYhKatKUyKQU57irqVBXAJ8DnhN0rqkbDZwL/BDSTcA7wJTk/pHA583sxvNbLeke4BVyXFfNbPdtSp0gWLSYH/MbGWVs15WZv/VwI1F6wuBhd2p0wWKiPo0PAI1HRcoIil0YU3HBYpJ9v1xgWKShwjkl/FOEB6BIpKHCOQCxST7/rhAUXGBnBC8C3PCyL4/LlBMPAI5YWTfHxcoJh6BnDCy748LFBOPQE4Y2ffHBYqKC+SE4F2YE0b2/XGBYuIRyAkj+/64QDHxCOQE4QI5YaTgj6SFwCRgl5mdl5T9ABiR7DIQ2GNmo8ocuxXYDxwCOs1sdK36XKCIpBSBHgHmAY8eKTCzvyuq81vA3irHX2pm73W1MhcoJin4Y2YvJKldjq+uYOxU4K8bVZ/PyohJg5MrdIG/BHaa2eYK2w1YLmmNpFkV9jkGj0ARqacLS/6wVfMDVWE68HiV7WPNrF3SacAKSRvN7IVqJ3SBYlJHRKmVH6hiVdIJwN8CF1Y5d3vyc5ekxcAYoKpA3oVFRFK3lwDGAxvNrK1CW/oliTmR1I9CfqDXa53UBYpJCmMgSY8DvwRGSGpLcgIBTKOk+5J0hqSlyeoQYKWk9cArwE/NbFmt+nLfhV0/+3o+cvJH6NOnD3379GXu7Lmxm/QhaVzGm9n0CuUzy5T9L3BV8vptYGR368u9QADf+KdvMKBlQOxmHE/2b0TXFkjSJynkGT6S8rUdWGJmG9JsWG8gDx9lVB0DSboN+D6F/yuvJIuAx5OM5z0eSdw5905u+fotPPPiM7GbcyzNvw/UcGpFoBuAc83s/4oLJd0PvEEheWOP5r5/vo/BgwazZ98e7ph7B8NPH85555wXu1kFeqAQ3aXWVdhh4Iwy5a3JtrJImiVptaTV8+d3+5ZFQxk8aDAAA/sP5OJRF7PpnU1R21OM6vjX06gVgW4FnpO0maNp8D8GnA3cXOmgkptdFivR+IE/HOCwHeajJ3+UA384wNoNa5n+2bIXKXHoeT50m6oCmdkySZ+gcEeyeBC9yswOpd24UN7f9z5fe+hrABw6fIjPXPQZRp9b8wmFppGHQXTNqzAzOwy81IS2NJzWP25l3p3zYjejMtn3p3fcB+qp9IoI5KRI9v1xgWLimeqdILwLc8LIvj8uUFRcICcE78KcMLLvjwsUE49AThjZ98cFiolHICeM7PvjAsWkJz7f011coJhk3x8XKCoukBNCHgbRPjM1JunMTF0oaZek14vK7pbULmldslxV4diJkjZJ2tLVWTcuUERSmhv/CDCxTPm3zWxUsiwt3SipL/CfwJXAp4Dpkj5VqzIXKCYpRKAkHcvuOlozBthiZm+b2UEK8wEn1zrIBYpIk7Nz3Czp1aSLG1Rm+1COzrwBaOPoRIqKuEAxqSMCFc+5S5auZBL7DvAnwChgO/CtRr0FvwqLSD0RpZ4EU2a2s6jO/wJ+Uma3dmB40fqwpKwqHoFi0qS58ZJai1b/hvKJo1YB50j6uKSTKOQTWlLr3B6BIpLGfaAkwdQ4YLCkNuAuYJykURSSaG4Fbkr2PQP4rpldZWadkm4GngX6AgvN7I1a9blAMUkh/ldIMLWgwr4fJphK1pcCx13iV8MFikge7kS7QDHJvj8uUEzyEIH8KswJwiNQRPIQgVygmGTfHxcoJh6BnDCy748LFBUXyAnBuzAnjOz70xyBzr707GZUkzk8AjlhZN+f5gjUsbejGdX0aFoGtBxX5hHICSP7/rhAMfEI5ISRfX9coJh4BHLCyL4//jyQE4ZHoIj4Vx04QfgYyAkj+/64QDFJaWLhQmASsMvMzkvKvglcDRwEfg1cb2Z7yhy7FdgPHAI6zazm1zv6IDom6UxtfoTj8wOtAM4zsz8H3gL+tcrxlyY5hLr03aAuUETSSO9SLj+QmS03s85k9SUKiRMaggsUkyYlVyjh74FnKmwzYLmkNV1MG+NjoKjUIUTyhy3+485PUr505dh/AzqBxyrsMtbM2iWdBqyQtDGJaBVxgSLSrPxASV0zKQyuLzMzq3Du9uTnLkmLKaS9qyqQd2ExaV5+oInAvwDXmNnvKuzTT9IpR14DEyifR+gYXKCIpDGITvID/RIYIalN0g3APOAUCt3SOkkPJfueIelIOpchwEpJ64FXgJ+a2bJa9XkXFpMUbiTWmx/IzN4GRna3PhcoIv5RhhNG9v1xgWKShwjkg2gnCI9AEclDBHKBYpJ9f1ygqLhATgjehTlhZN8fFygmHoGcIHxWhhNG9v1xgWLiXZgTRvb9cYGi4gI5IeShC8v9h6lz7pnD+CvGM3Xa1NhNOZ44szIaSu4FuvqzV/PA3AdiN6MsTf7a71TIvUAXXHABA/oPiN2M8vTmCCTp+kY2pDeiOv71NEIi0JxKGyTNkrRa0ur587s9han3kIMIVPUqTNKrlTZRmAZSlpLJb+Z5osvTE8c03aXWZfwQ4Arg/ZJyAf+TSot6E9n3p2YX9hOgxczeLVm2Aj9PvXUNYPYds5l5w0y2vruVKyddyVNPPxW7SR+S0sTChZJ2SXq9qOxUSSskbU5+Dqpw7Ixkn82SZnTpPVSYJt1IvAvjw686OMaAn93xs27/8sf/+/iqFkn6K6ADeLQowdR9wG4zu1fS7cAgM7ut5LhTgdXAaApZOtYAF5pZae9zDLm/jO/RpDCILpcfCJgMLEpeLwKuLXPoFcAKM9udSLOC4xNVHYd/lBGRJg6ih5jZ9uT1DspfAA0FthWttyVlVfEIFJM6IlDxLZJk6VIiqCMkqV0aNm7xCBSRJuYH2imp1cy2S2oFdpXZpx0YV7Q+jC5cKHkEiknzbiQuAY5cVc0Ani6zz7PABEmDkqu0CUlZVVygiDQxP9C9wOWSNgPjk3UkjZb0XQAz2w3cA6xKlq8mZVXxLiwiaTxUXyE/EMBlZfZdDdxYtL4QWNid+lygmOTgTrQLFBMXyAmhJz6e0V1coJhk3x8XKCa94XEOJ02y748LFBOPQE4Y2ffHBYqJRyAnjOz74wLFxCOQE0b2/XGBouICOSF4F+aEkX1/XKCYeARywsi+Py5QTDwCOWFk3x8XKCb+QJkTRvb9cYFi4l914ISRfX98YmFMGj2xUNIISeuKln2Sbi3ZZ5ykvUX7fCXkPXgEikmDI5CZbQJGAUjqS2G+++Iyu75oZpMaUacLFJGU7wNdBvzazN5NsxLvwmKSbnKFacDjFbZdLGm9pGcknVtX2xNcoIjUMwbqSn4gSScB1wA/KlPtWuBMMxsJPAA8FfIemtKFJfkBnVLq6MG6mB/oSmCtme0sc/y+otdLJT0oabCZvdf91vgYKCopjoGmU6H7knQ6sNPMTNIYCr3Qb+utqCkCzVHFpPa9hrvsruMLU/BHUj/gcuCmorLPA5jZQ8AU4AuSOoHfA9MsIFWvR6CYpCCQmX0A/FFJ2UNFr+cB8xpVnwsUEX+cwwkj+/64QDHJQwTy+0BOEB6BIpKHCOQCxST7/rhAMfEI5ISRfX9coKi4QE4I3oU5QfhD9U4Y2ffHBYqJd2FOGNn3xwWKiUcgJ4zs++MCxcQjkBNG9v3xxzmcMDwCRcS7MCeM7PvjAsXEI5ATRvb9cYFikkYEkrQV2A8cAjrNbHTJdgFzgauA3wEzzWxtvfW5QDFJLwJdWmWu+5XAOcnyF8B3kp914QJFJNIYaDLwaDKd+SVJAyW1mtn2ek7m94Fikk5+IAOWS1pTLvULMBTYVrTelpTVhUegmNQRgBIpisWYn6R8OcJYM2uXdBqwQtJGM3shrKGVcYEiUk8XVis/kJm1Jz93SVoMjAGKBWoHhhetD0vK6sK7sJg0uAuT1E/SKUdeAxOA10t2WwJcpwKfBvbWO/4Bj0BRSWEQPQRYnJz3BOC/zWxZSX6gpRQu4bdQuIy/PqRCFygmjU/z+zYwskx5cX4gA77YqDpdoIj4rIweSP9h/bn20WtpGdKCmbF2/lpe/o+XOXnQyUz5wRQGnjWQPVv38MTUJziw50DUtubhs7DcDaIPdx5m+ZeX8+C5D7Lg0wu46IsXMfhPBzP29rG889w7zPvEPN557h3G3j42dlNzQU2BJH1S0mWSWkrKJ6bXrPrp2NHBjl/tAOBgx0F+s+E39B/anxGTR7B+0XoA1i9az4hrR8RsJtD478qIQVWBJN0CPA38I/C6pMlFm7+eZsMawYAzB9B6fittL7fRMqSFjh0dQEGyliE9IHd1upnqm0KtMdA/ABeaWYeks4AnJJ1lZnPpkW/nKCf2O5GpT05l2a3LOLj/4HHbAzLbNo4e/RvsGrUE6mNmHQBmtlXSOAoSnUmVt198u/3hhx9uTEu7QZ8T+jD1yam89thrbFy8EYCOnR20nF6IQi2nt/DBrg+a3q5SemKX1F1qjYF2Shp1ZCWRaRIwGPizSgeZ2XwzG21mo2fNKvd5Xrpcs+Aa3tvwHi99+6UPy95a8hYjZxRukYycMZJNT29qeruOoxd0YdcBncUFZtZJ4VZ480NLFxh+yXBGXjeSna/u5KZfFZK1Pzf7OVbeu5IpP5zC+Tecz9539/KjqeW+h6S55CECVRXIzNqqbPtF45sTzrZfbKv41QrfG/+9JremBtn3J383ErNE7iOQkzLZ98cFiolHICeM7PvjAkXFBXJC8C7MCSP7/rhAMfEI5ISRfX9coJgoBwa5QDHJvj8uUEz8oXonCB9EO2Fk35/8zcrIFI2f2jxc0vOS3pT0hqQvldlnnKS9ktYly1dC3oJHoIik0IV1Al82s7XJHPk1klaY2Zsl+71oZpMaUaFHoJg0OAKZ2fYj6erMbD+wgYDcP13BBYpIPfPCJM2StLpoKfvQeTKL5nzg5TKbL5a0XtIzks4NeQ/ehcWkjh6sVn4ggGQS6JPArWa2r2TzWuDMZKrWVcBTFPIl1oVHoIikMTNV0okU5HnMzH5cut3M9hVN1VoKnChpcL3vwQWKSeOvwgQsADaY2f0V9jk92Q9JYyg48Nt634J3YRFJ4SrsEuBzwGuS1iVls4GPwYd5gqYAX5DUCfwemGYB03RdoJg0PsHUylpnNbN5wLxG1ekCxSQHd6JdoIj44xxOGNn3xwWKiX8a74SRfX9coJh4BHLCyL4/LlBMPAI5YWTfHxcoJh6BnDBy8FG2CxQRj0BOGNn3BzUh4XYPyOjdYzhGmY69Hd3+3bQMaOlR2jVDoB6BpFkl3y3qNIAcDOO6TPMznvcCepNATgq4QE4QvUkgH/+kQK8ZRDvp0JsikJMCuRdI0kRJmyRtkXR77PbkjVx3YZL6Am8BlwNtwCpgeplsFU6d5D0CjQG2mNnbZnYQ+D4wucYxTjfIu0BDgW1F622knO6kt5F3gZyUybtA7cDwovVhSZnTIPIu0CrgHEkfl3QSMA1YErlNuSLXzwOZWaekm4Fngb7AQjN7I3KzckWuL+Od9Ml7F+akjAvkBOECOUG4QE4QLpAThAvkBOECOUG4QE4Q/w85D+BWR2XEqQAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 108x324 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "%matplotlib inline\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "\n",
    "# Let's visualize our decoder hidden state\n",
    "plt.figure(figsize=(1.5, 4.5))\n",
    "sns.heatmap(np.transpose(np.matrix(dec_hidden_state)), annot=True, cmap=sns.light_palette(\"purple\", as_cmap=True), linewidths=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our first scoring function will score a single annotation (encoder hidden state), which looks like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [],
   "source": [
    "annotation = [3,12,45] #e.g. Encoder hidden state"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<AxesSubplot:>"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAIYAAAEYCAYAAACZYo4WAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAO60lEQVR4nO3df5BV5X3H8feXXVBWNlUqIkoUR02JgYYGQ3FMGyISqdGapplMxGCm44hmSmtqiJLUTELbMGbGhGSaX6wGTVN/EWsSyzQqFRmaEFeKUhWJihYbkR+hRYVx3HWXb/+4Z/GyPveePXfPvc+5dz+vmTvuPXfvuY+Xzz7Pc8693/OYuyMy2KjYDZBiUjAkSMGQIAVDghQMCVIwJEjBaEFm1mZmj5vZmuT+bWb232a2JbnNSNtHe91bKTFcA2wD3lG27fPufs9Qd6Aeo8WY2WTgI8Atw9mPgtF6vglcBxwatP2rZvaEma0ws6PSdtKIoUTn3N9iR9y7wzK/N3YZVwGLyjZ1uXsXgJldBOx1981mNqfsd74A7AbGAF3A9cDfVXudhswxDr66vxEvU2jjfue4XPaThKCrwsPnAn9qZhcCRwPvMLN/dvdPJY/3mNmtwJK019FQEpXVcKvM3b/g7pPdfQrwSWCdu3/KzCYBmJkBHwWeSmuZjkpisur/0Dm63cwmUErWFuDqtCcoGFHVLxjuvh5Yn/x8XtbnKxhRFXckVzBiatxQkpmCEVVxg1HcvkyiUo8Rk4YSCVMwJMBrCEajoqRgxKShRMIUDAlSMCRIwZAQzTEkTMGQIAVDQjSUSJiCIUEKhgQpGBJS4DmGvo8hQQpGVPmWDxze69uLmk8zs24z225md5vZmLR9KBgx2ajst6EZKGoe8DVghbufAewHrkjbgYIRVf49xuCi5qTI6DxgoNL9h5SKjqrS5DOm+kw+v0mpqLkzuf+7wCvu3pfcfwk4OW0n6jGiyt5jmNkiM/vPstvhAufyoubhtkw9RlTZe4ysRc3At4Bjzaw96TUmAzvTXkc9RkSOZb5V3V+4qPky4GHg48mvfRr4WVrbFIyYzLLfanM9cK2Zbac05/hB2hM0lETVsKLmF4BZWZ6vYERV3FPiCkZMBf6spKWD0dPTw5VXfYbe3l76+/uZO/c8rl50ZexmlVEwohgzZgzf/+636ejo4M2+Pq64chHnnnMO06dPi920RBMHw8ymApfw1tmyncB97r6t8rOKwczo6OgAoK+vj76+viL/WxRK1cNVM7seuIvS2/locjPgTjNbWv/mDV9/fz+XXraQeRf8CbNnzWL6tKL0FjTycDWztB7jCuA97v5m+UYz+wawFbixXg3LS1tbG3fe/iMOHDjA5667nu3PP88Zp58eu1mJ4nZfaSe4DgEnBbZP4u1Xnj2s/Hx+V1els7eN1dnZydkzZ7LxV4/EbkqZ+nwfIw9pPcZngYfM7DngN8m2U4AzgMWVnjTofL7HugDs/v37aW9vp7OzkzfeeIPu7kf59OULo7QlqFkPV939fjN7F6WzZuWTz03u3l/vxg3Xvn37+PKyv6f/UD9+yDn//Ln88R99IHazyhQ3GNaA5TWj9RhFklwy+ogkHPrX92R+80ddvLUhaWrp8xiF16xDidSbgiFBxf3Wg4IRk4YSCVMwJEjBkAAv8FBS3NmPRKUeIyr1GBKU74doZna0mT1qZv9lZlvNbFmyXSs1N5X85xg9wHnuftDMRgO/MLOfJ49lWqlZwYgq32B46YOvg8nd0cmtpg/DNJREVZdq9zYz2wLsBda6e3fyUKaVmhWMmGr4al+1omYAd+939xmUalRnmdk0Sis1TwXeD4ynVJlWlYaSqHIvai7/vVfM7GFgvrvflGzWSs3NIfejkglmdmzy81hgHvBrrdTcbPI/KpkE/NDM2ij90a929zVmtk4rNTeV3I9KngD+ILBdKzU3l+Ke+VQwolIwJKTAn64qGFEpGBKkYEhIcXOhYMRV3POLCkZMmnxKSC1ruzeKghGVgiFBCoaEaI4hYQqGBCkYEqKhRMIUDAlSMCRohAcjuTCZDFbgOUZxP8WRqBrSY7z+4kONeJlC6zh1bmCregwJyfki81Wq3bWEd3PJvXZ1oNr9vcAMYL6ZzUZLeDebfIPhJaFq98xLeCsYMdVhvZLB1e7A82gJ72aTvcfIWu1Oqco9M53giqoh1e7noCW8m4yNyn6rtrtwtfs2aljCWz1GRHX4zmelavengbvM7B+Ax9ES3kXXsGp3LeHdVAr8WYmCEZWCIUEKhgQpGBJS3FwoGHEVNxkKRlQKhoTocFXCFAwJUjAkREOJhCkYEqRgSJCCISGaY0iYgiFBCoaEaCiRMAVDAop8AViVD8SUf1HzO83sYTN7OilqvibZ/hUz21m2hPeFaU1TjxFV7j1GH/A5d3/MzDqBzWa2NnlsRdkym6kUjKhyLx/YBexKfj5gZtsYQp1qiIaSmOpQ1PzWrm0KpRqTgSW8FydLeK8ys9RrXykYUeVf1AxgZuOAfwE+6+6vAd8DTqd0zYxdwNfTWtZyQ8lXvv4jNjzyJOOP7eSem78EwIque9nwyJOMHt3G5EkTWLZkIZ3jOiK3FOpR1GxmoymF4nZ3vzd5zp6yx28G1qS9Tsv1GBfPm813li8+Ytvs903lxzffwOqVN3Dq5BNYddcDkVo3SP5HJUapLnWbu3+jbPuksl/7M0biEt4zf/9MXt79v0dsO+fssw7/PH3qafz7fzze6GZVkPtRybnAQuDJ5OIpAF8ELjWzGZSurrMDuCptRzUHw8z+wt1vrfX5sfzsgY18+IMzYzcjkftRyS8q7PTfsu5rOEPJskoPlE+QurpSr/HRMLfc8XPa2tq4cG6mwu86yvcaXHmq2mOY2ROVHgImVnreoAmSF+E6n/c9+Cs2dD/Fyq9dgxXlw6uitCMgbSiZCFxA6RKA5QzYWJcW1cEvN23lttVrueWmv2Hs0amXuGyg5g3GGmCcu28Z/ICZra9Hg4Zr6fJVbH7iWV559SAXLPgiVy/8CLfe/SC9vW/ymaX/CMD0d0/hhmsWRG4pFDkY5u71fo1CDCWxJZeMPiIJvZv/NvObP2bmVxuSppY7XG0uxe0xFIyoFAwJUjAkpIkPV6WuFAwJKPJ3PhWMmFIuAR2TghGVegwJ0eRTwhQMCVIwJEjBkBDNMSSsuMEo7oG0RKVgxNS4oubxZrbWzJ5L/qtKtGLL/cvAA0XNZwGzgb80s7OApcBD7n4m8FByvyoFI6p8g+Huu9z9seTnA5RWUDwZuITSCs0wxJWaNfmMqn6Tz0FFzROTSniA3VT5hv8A9Rgx1TDHqLGo+TAvfck39bum6jGiakxRM7DHzCa5+66kjnVv2uuox4gq3zlGpaJm4D5KKzSDVmpuAvmf+axU1HwjsNrMrgBeBD6RtiMFI6qGFTUDzM2yLwUjquKeElcwInJ9iCZhCoYEKRgSoqFEwhQMCVIwJERDiYQpGBKkYEjQCA9Gcv0pGUxzDAkb6cG4o7hvQMMsCH1pqrjvi3qMmDSUSJiCIUEKhoRoKJGw4gZD3xKXIAUjqtxrV0mWz9xrZk+Vbcu8UrOCEZGbZb4NwW3A/MD2Fe4+I7mlLoWlYESVf4/h7huA/xtuyxSMqPIPRhVaqblp1KmoOUArNTeX7H+XaUXNFZ6jlZqbSs6XWqr8Mlqpucnkf4LLzO4E5gDHm9lLwJeBOQ1bqVnykH8w3P3SwOYfZN2PghFVcU+JKxgx6UM0CVMwJEjBkJACDyU6jyFB6jGiKm6PoWBEpWBISIHnGApGVAqGBCkYEqKhREK0trtUoGBIiIYSCVMwJEjBkCAFQ0KsuJ9htmww+g/Bn998ChM7+1i54GWW/nQij77YQedR/QDc+NE9vPvEnsitLK6WDcY/dR/L6cf3crDnrb/K6+b9lvlnHYzYqkHqcFRiZquAi4C97j4t2TYeuBuYQulb4p9w9/3V9pPal5nZVDObmyzZWL49VDhbCLtfa2f9c+P4+Ptejd2UFHUpUbyNtxc157tSs5n9NaUV9/4KeMrMLil7ePlQWhnD8vsn8Pnzf8uoQe/jinXHc/H3TmX5/RPo7SvCxK9hRc25r9R8JTDT3Q8mK//eY2ZT3P1bQ2plBA8/ewzjj+ln2kk9dO8Ye3j7tXP3MWFcP2/2G19acwJdvzyOxR8cdlH4MDXsLcy8UnNaMEa5+0EAd99hZnMoheNUqvxfJYW2iwBWrlzJonGVfjN/j/3PWNY9cwwbnjuNnj7jYM8oltx7Ijd9bDcAY9qdj814jVUbUwu+66+GOUb5e5voSupZh8Td3cxSV2q20orOFRuxDrjW3beUbWsHVgGXuXvbUNoS6wKw3TvGsmrjcaxc8DJ7D7RxQmc/7rD8gQkc1e4sOX9f4xpTugDsEW/E6zu7U/+BBus4+Q9T38ykd19TNvl8BphTtlLzenf/vWr7SOsxLgf6yje4ex9wuZmtTGtgkSy5dxL7X2/DHaae2MOyi/akP6nuGvYHM7BS840McaXmqj1GTqL1GIUS6jFe3pS9xzjp/VXfzPKiZmAPpaLmnwKrgVNIVmp296oTrJY9j9EcGlbUDFqpuZkUtydVMGLS9zEkTMGQAH3nUypQMCSkwHOM4n5TRKJSjxFVcXsMBSOmAg8lCkZUCoYEFXeKp2DEpKFEwhQMCVIwJERDiYQpGBKkYEiIhhIJUzAkSMGQBjGzHcABoB/oc/eza9mPghFT/eYYH3L3YVVTKRhRFXcoKe6nOCOAY5lvQ9otPGhmm4e4WG+QeoyY6lPU/AF332lmJwBrzezXyaURMlEwosoejLSVmt19Z/LfvWb2E2AWkDkYGkqiyvfCKWZ2jJl1DvwMfJghrMocoh4jqtwnnxOBn1hpiGoH7nD3+2vZkYIRU86Hq+7+AvDePPalYERV3MNVBSMqBUNC9OmqhBX3oFDBiKnAPUZxIytRNabHWFD3C8A1qeL2GI24al8hmNmiLBdKHelG0lBS8yeNI9FICoZkoGBI0EgKhuYXGYyYyadkM5J6DMmg5YNhZvPN7Bkz225mqUs+SUlLDyVm1gY8C8wDXgI2AZe6+9NRG9YEWr3HmAVsd/cX3L0XuIvS+mCSotWDcTLwm7L7LyXbJEWrB0Nq1OrB2Am8s+z+5GSbpGj1YGwCzjSz08xsDPBJSuuDSYqW/qKOu/eZ2WLgAaANWOXuWyM3qym09OGq1K7VhxKpkYIhQQqGBCkYEqRgSJCCIUEKhgQpGBL0/zoXS/HKfsFBAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 108x324 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Let's visualize the single annotation\n",
    "plt.figure(figsize=(1.5, 4.5))\n",
    "sns.heatmap(np.transpose(np.matrix(annotation)), annot=True, cmap=sns.light_palette(\"orange\", as_cmap=True), linewidths=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### IMPLEMENT: Scoring a Single Annotation\n",
    "Let's calculate the dot product of a single annotation. Numpy's [dot()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html) is a good candidate for this operation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "927"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def single_dot_attention_score(dec_hidden_state, enc_hidden_state):\n",
    "    # TODO: return the dot product of the two vectors\n",
    "    return np.dot(dec_hidden_state, enc_hidden_state)\n",
    "    \n",
    "single_dot_attention_score(dec_hidden_state, annotation)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Annotations Matrix\n",
    "Let's now look at scoring all the annotations at once. To do that, here's our annotation matrix:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [],
   "source": [
    "annotations = np.transpose([[3,12,45], [59,2,5], [1,43,5], [4,3,45.3]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And it can be visualized like this (each column is a hidden state of an encoder time step):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAVoAAAD4CAYAAACt8i4nAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAW00lEQVR4nO3df5xVdZ3H8ddnLqDMD5lBcCQg8UeFiZuGmUqZgRSZBptlRhEZMWVWmpmS2SqZRm6r9dh2W2dFI1bwt6trxa5rIPmD3ymCWBpqSPwQnEGGXzNz57N/3FtMLMydce73fo9n3s/H4zy495x7z/nMcc7bz3zP986YuyMiIuGUxS5ARCTtFLQiIoEpaEVEAlPQiogEpqAVEQmsVwmOoWkNItJZ1u09zLHOZ85E7/7xOqEUQUtTw8ZSHCbRKmsOzz2YU5L/rsk2MXcdNG1riFxIfJX9agBo2rouciXxVR46NHYJwZQkaEVESid5zYyCVkTSxZJ360lBKyIpo45WRCQwBa2ISFimoBURCSx5QZu8UWMRkZRRRysi6aJZByIiYXkXhg5KNcigoBWRdNHNMBGR0BS0IiKBKWhFRMLSzTARkdDU0YqIBKagFREJS7MORERCU9CKiASmoBURCSuBsw6SV5GISMqooxWRlNHQgYhIWJp1ICISmoJWRCSsBN4MU9CKSMqooxURCUxBKyISlP7CgohIaJp1ICISmoJWRCSsIs46MLOXgO1AFmh195PMrD9wJzAMeAk4z90bOtpPqoN2z549TL3w6zQ3t5DNZhkz+gN8eeoXYpdVUqN/fCQVB7VRZk6mDO6r+xPPbezD1b+sZWdzGYOrW/jRxzdSeVBb7FJLavq13+e3jz1O/5oa7rpjTuxyostms0z6wlcYOHAAP/nRdbHL6aaid7QfdPct7Z5PAx5x9xlmNi3//IqOdpDqoO3Tpw//9tObKC8vp6W1lSl1X2XUqe/l+BHHxS6tpGZNXkf/8r1B+p3/Opwrxr7KycN2cc/vDuGWx2u4ZPTWiBWW3jkf/SjnffITXH3N92KXkghz77qfYcPeyo4dO2OXUgTBhw7GA2fkH88CFlAgaJM3s7eIzIzy8nIAWltbaW1tJYnjN6X20tbevOeIXQCMOmon/7OmMnJFpffud59Iv0MOiV1GImza/CqPPbGYCeecFbuUkjOzOjNb1m6p2+clDvyPmS1vt63W3TfkH28Eagsdp2BHa2bDySX44Pyq9cCD7r6mU19JZNlsls9+vo51r6znvHMncPyId8YuqbQMpsweghl8auQ2PjVyG28b2Mwjv6/gzOE7mPdsJRte7x27Sonon378r1x80VR27ExDN0uXZh24ez1Q38FL3ufu683sMOBhM3tun/e7mXmh43TY0ZrZFcAd5NrAJfnFgLn5sYnEy2QyzJ09k18/eDernl3DC39cG7ukkpp7wTru/9Kf+PfPrOf2pdUsfbkv143fyJyl1Xy8/q3s2FNGn0zB7xNJqYWPL6Kmpppjh789dilFZF1YOubu6/P/bgbuB04GNpnZIID8v5sL7adQRzsFOM7dW/7myzC7EVgNzNjfm/Itdh3AzTffzMRPfqxQHcFVVVVx0sgTeWLREo45+qjY5ZRM7SGtABxakWXs8CZWrj+YKac1cOuk9QC8uLU3C57veUMHkvP0ylUsfOxJHn9yCc3NzTTt2MlV1/yA71/z7dilvXFFmnVgZhVAmbtvzz/+EPA94EFgMrn8mww8UGhfhYK2DXgL8PI+6wflt+3XPu24NzVsLFRHEA0NjfTqlaGqqordu/eweMkyJk+aGKWWGHY2G20OlQc5O5uNx/9Yzlc+sJWtOzIcWpGlzeFnCw/l/JMaY5cqkXztwi/ytQu/CMCyFU8xe87db+6QBYp4H6YWuN9yQxG9gDnuPs/MlgJ3mdkUctl4XqEdFQraS4BHzOx5YF1+3VuBY4CvvrHaS2fLlq1cfe31ZLNtuDtnjjmD0993WuyySmbrjl5cdOdbAMi2wdkjtnP6MTuZtaiaOUurARh7bBPnnvB6xCrjuPKq77Js+QoaGxv5yNnn8KWpU5kwPv5PXlIMxQlad18LvGs/67cCY7pUkXvH43NmVkZuXKL9zbCl7p7t5DGidbRJUllzeO7BHM16YGLue65pW4dzvHuEyn41ADRtXVfglelXeehQKEJKtj00otM3HcrOXlWSC7LgrAN3bwMWlaAWEZEiSF4zk+oPLIhIT5S8jwcoaEUkXfTbu0REQlPQiogEpqAVEQnKEzh0kLxRYxGRlFFHKyIpk7z+UUErIumSwKEDBa2IpIyCVkQkMAWtiEhYGjoQEQlNQSsiElYR/9x4sShoRSRl1NGKiASmoBURCUxBKyISlmYdiIiEppthIiJhJa+hVdCKSNokL2kVtCKSMgpaEZGwdDNMRCQsV0crIhKaZh2IiISloQMRkdCSF7TJ67FFRLrFurB0Ym9mGTP7nZk9lH9+pJktNrMXzOxOM+tTaB8KWhFJF7POL51zMbCm3fMfAje5+zFAAzCl0A4UtCKSMsXraM1sCPBR4Jb8cwNGA/fkXzILmFBoPxqjFZGUKWr/+GPgcqAq//xQoNHdW/PPXwEGF9pJSYK2subwUhzmzWGix64gMSr71cQuITEqDx0au4T06MKsAzOrA+rarap39/r8trOBze6+3MzO6E5J6mhFpMfKh2r9ATaPAj5mZmcBBwOHAD8Bqs2sV76rHQKsL3SckgTtznW/LcVhEq186PsBaGrcErmS+CqrBwCQfXh05Eriy4z9DQBNDRsjVxJf8X7yLc70Lnf/NvBtgHxHe5m7f8bM7gY+AdwBTAYeKLQv3QwTkXQp/qyDfV0BXGpmL5Abs51Z6A0aOhCRlCl+/+juC4AF+cdrgZO78n4FrYikiz6CKyISmoJWRCQwBa2ISGAKWhGRsDRGKyISlidw1qqCVkTSRR2tiEhoCloRkcAUtCIigSloRUTC0hitiEhoCloRkbDU0YqIhKagFREJTEErIhKWhg5ERELTR3BFRAJTRysiEpaGDkREQlPQiogEpqAVEQlLQwciImG5OloRkdAUtCIiYWnoQEQkNAWtiEhgCloRkbBMH8EN7pp/vI2Fi1fSv7qKe275HgA33Xw3Cxc9Te9eGYa85TCmf+sCqirLI1daWhs3beIfrrmW115rwAz+fsJ4Jp5/XuyySi7b5nzyhlep7VfGzy4cwFW3N7D6T824w7DDenHdpBoqDkrehRrKnj17mHrh12lubiGbzTJm9Af48tQvxC6rm5LX0abuO+qcD4/iX35wyd+sO2XkO7n7lunc9e/TOWJILbfO/VWc4iLKZDJ84+Kvcc+dt/PzmfXcfc99rF37YuyySm72/CaOrt3bX0z7eD/u/3Yt/3llLYNqMsx5dEfE6kqvT58+/NtPb+KO/7iVObNn8sSTS3hm1erYZXWPWeeXDndjB5vZEjN72sxWm9n0/PojzWyxmb1gZneaWZ9CJaUuaEf+3dvpV1XxN+tOPek4emUyABx/7FFserUhRmlRDRwwgGOHvwOAiooKjhx2BJtffTVyVaW1sSHLo6v3cO5pe78/KvvmLgF3Z3dLIm9YB2VmlJfnfrprbW2ltbWVJHaEXWNdWDq0Bxjt7u8CTgDGmdkpwA+Bm9z9GKABmFJoR284aM3sgjf63pgemPcYo04eEbuMqP785w0894fnGXHccbFLKakZ9zZy2YRDKNvn+rpydgOnX7mRFze18JkPVOz/zSmWzWb59KQpjP3IBE45+SSOH/HO2CV1U3GC1nOa8k975xcHRgP35NfPAiYUqqg7He30A20wszozW2Zmy+rr67txiOK65faHyGQynDXmlNilRLNz506+Ne07XPaNr1NZ2XNCZcEzu+hfleG4t/7/n/Kun1TDgusO56jDe/Pr5bsiVBdXJpNh7uyZ/PrBu1n17Bpe+OPa2CV1U+eDtn1W5Ze6v9mTWcbMngI2Aw8DfwQa3b01/5JXgMGFKurwZpiZrezgK6k90PvcvR74S8L6znW/LVRHcA/+9+MsXLSSm//xm1hP+/kwr6W1lW9N+w4fGfchRn/wjNjllNSKtc3Mf2YXC1fvZk+Ls2O3c/ms17hhcn8AMmXGWSP7MvPh7Xz81J7zP6D2qqqqOGnkiTyxaAnHHH1U7HLeuC7MOtgnq/a3PQucYGbVwP3A8DdSUqFZB7XAh8mNQ7RnwBNv5IAxPL5kFT+/cx633Hg5fQ8+KHY5Ubg7137/Bxw57Ag+O/H82OWU3KXj+3Hp+H4ALPnDHm57ZDs//FwNL7/ayhEDe+Hu/Gblbo6s7R250tJqaGikV68MVVVV7N69h8VLljF50sTYZXVT8Rspd280s/nAqUC1mfXKd7VDgPWF3l8oaB8CKt39qX03mNmCrpcb3rTr6ln+9O9p3NbEh8//Fl+e/DFum/srmltaufCKG4HcDbGrLpkUudLSeurplfzy1/M45pij+fRnJwNw0YVf4n2jTotcWTzuufHZpl1tOPCOwb25+lPVscsqqS1btnL1tdeTzbbh7pw55gxOf9+b/HuiSDlrZgOBlnzI9gXGkrsRNh/4BHAHMBl4oOC+3L04VR1YIoYOYisf+n4Amhq3RK4kvsrqAQBkHx4duZL4MmN/A0BTw8bIlcRXWXM4FCEmm5df1elQ6zPy+wc8npn9HbmbXRly97PucvfvmdlR5EK2P/A74LPuvqej46TuAwsi0tMVp6V195XAiftZvxY4uSv7UtCKSMok72a3glZE0kW/60BEJCz9hQURkdASOE9eQSsiKaOgFREJTEErIhKWboaJiISmjlZEJDAFrYhIWJp1ICISWvKCNnmjxiIiKaOOVkTSRbMORERCS97QgYJWRFJGQSsiEpZmHYiIhKagFREJTEErIhKWZh2IiISmjlZEJCjXzTARkdAUtCIigSloRUTC0tCBiEhoCloRkcAUtCIiYWnoQEQktOQFbfI+QiEi0i3WhaWDvZgNNbP5Zvasma02s4vz6/ub2cNm9nz+35qCFbl7t76kTgh+ABFJjW63o7ufu63TmXPw8AsOeDwzGwQMcvcVZlYFLAcmAJ8HXnP3GWY2Dahx9ys6Oo46WhFJmeJ0tO6+wd1X5B9vB9YAg4HxwKz8y2aRC98OlWSMNjvvtFIcJtEy454AYMeWl+IWkgAVA4YBOhew91xk550at5AEyIx7skh7Kv4YrZkNA04EFgO17r4hv2kjUFvo/epoRSRdzDq9mFmdmS1rt9T9/91ZJXAvcIm7v95+m+fGXgsOVWjWgYikTOc7WnevB+oPuCez3uRC9nZ3vy+/epOZDXL3Dflx3M2FjqOOVkTSxco6v3S0GzMDZgJr3P3GdpseBCbnH08GHihUkjpaEUmZoo3RjgImAc+Y2VP5dVcCM4C7zGwK8DJwXqEdKWhFRPbD3R/jwKk9piv7UtCKSMok75NhCloRSRX9hQURkeAUtCIiYemv4IqIhKaOVkQkMAWtiEhYuhkmIhKaglZEJDAFrYhIWJp1ICISmjpaEZHAFLQiImElcNZB8gYzRERSRh2tiKRM8vpHBa2IpEsChw4UtCKSMgpaEZHAFLQiImFp6EBEJCxXRysiEpg+gisiEpo6WhGRwBS0IiKBKWhFRMLSrIPSybY5n/zRNmr7lfGzLx3Clbc3sfSFFir75v4jXD+xkmOHpPbL36+Pnvs5Ksr7UlZWRiaT4fZbfxq7pGh0Lv5yjbyev0aq8tdIa7trpOJNeo0oaEtm9qO7Obo2Q9Nu/+u6y8aX8+ETDopYVXw3//MN1FT3i11GIvT0c3Hga6RPxKqKIIEdbfLmQRTBxsYsj65u5txTD45dikgibWxs49HVLZx7ahobD+vCUhoFg9bMhpvZGDOr3Gf9uHBldc+M+3Zy2fgKyvY5jz/55U4mzGhkxn07aG71/b85xczgom9cycQvXMS9D/wqdjlR9fRzMeO+HVw2vvwA18i2N/k1kryg7XDowMy+DlwErAFmmtnF7v5AfvP1wLzA9XXZglXN9K80jhvaiyXPt/x1/TfOLmfAIUZLFq6+Ywe3/O8uvjKuPGKlpXfrz27ksIEDeK2hkQsvmcawI4Yy8oTjY5cVRU8+F7lrpKwT18huvjKub8RK36jiBaiZ3QqcDWx29xH5df2BO4FhwEvAee7e0NF+CnW0U4GR7j4BOAP4rpld/JcaOiiuzsyWmdmy+vr6gl9MMa14sYX5q1o4c3oD35y1ncXPt3D5L7YzsF8ZZkafXsbfv/cgnnm5taR1JcFhAwcA0L+mmg+ePorVzz4XuaJ4evK5WPFiK/NXNXPm9Ea+Oaspf400pecaMev8UtjPgX1/ep8GPOLubwMeyT/vUKGbYWXu3gTg7i+Z2RnAPWZ2BB0ErbvXA39JWM/O+3mhOorm0nMquPScCgCWPN/Cbb/ZxQ2fq+LVbW0M7FeGu/PIM828bVCmZDUlwa5du2lra6Oiopxdu3azaMlypl7wmdhlRdHTz8Wl55Rz6Tm5n+Zy18hubvhcZYqukeLdenL3hWY2bJ/V48k1ngCzgAXAFR3tp1DQbjKzE9z9qfxBm8zsbOBW4E31c9bls7fzWpPjDsMHZ7j6U5WF35QiW19r4JtXTgcg25pl3Ic+yKhT3hO5qjh0Lvbv8tlN+1wjb9KhtS7MOjCzOqCu3ar6fKPYkVp335B/vBGoLXgc9wMPeJvZEKDV3TfuZ9sod3+80AEAz847rRMvS7fMuCcA2LHlpbiFJEDFgGGAzgXsPRfZeafGLSQBMuOehCIMsO7887JO38Urf8tJBY+X72gfajdG2+ju1e22N7h7TUf76LCjdfdXOtjWmZAVESmx4LMJNpnZIHffYGaDgM2F3pDKebQi0oMV92bY/jwITM4/ngw80MFrAQWtiKRO8ebRmtlc4EngHWb2iplNAWYAY83seeDM/PMOpfYjuCLSM3lxZx18+gCbxnRlPwpaEUkX/a4DEZGeRx2tiKRM8jpaBa2IpEsChw4UtCKSMgpaEZHAknfrSUErIumioQMRkdAUtCIigSloRUTC0tCBiEhouhkmIhKWOloRkdAUtCIigSUvaJM3mCEikjLqaEUkXTRGKyISVjF/8XexKGhFJF3U0YqIhKagFREJTEErIhKYglZEJCyN0YqIhKZZByIiYamjFREJTUErIhKYglZEJKwEDh2Yu4c+RvADiEhqdDslm7Y1dDpzKvvVlCSVSxG0iWBmde5eH7uOJNC52EvnYi+di3CSNw8inLrYBSSIzsVeOhd76VwE0pOCVkQkCgWtiEhgPSloNfa0l87FXjoXe+lcBNJjboaJiMTSkzpaEZEoFLQiIoGlPmjNbJyZ/d7MXjCzabHricnMbjWzzWa2KnYtMZnZUDObb2bPmtlqM7s4dk2xmNnBZrbEzJ7On4vpsWtKo1SP0ZpZBvgDMBZ4BVgKfNrdn41aWCRmdjrQBPzC3UfEricWMxsEDHL3FWZWBSwHJvTE7wszM6DC3ZvMrDfwGHCxuy+KXFqqpL2jPRl4wd3XunszcAcwPnJN0bj7QuC12HXE5u4b3H1F/vF2YA0wOG5VcXhOU/5p7/yS3u4rkrQH7WBgXbvnr9BDLyjZPzMbBpwILI5cSjRmljGzp4DNwMPu3mPPRShpD1qRAzKzSuBe4BJ3fz12PbG4e9bdTwCGACebWY8dVgol7UG7Hhja7vmQ/Drp4fLjkfcCt7v7fbHrSQJ3bwTmA+Mil5I6aQ/apcDbzOxIM+sDnA88GLkmiSx/A2gmsMbdb4xdT0xmNtDMqvOP+5K7cfxc1KJSKNVB6+6twFeB/yZ3w+Mud18dt6p4zGwu8CTwDjN7xcymxK4pklHAJGC0mT2VX86KXVQkg4D5ZraSXGPysLs/FLmm1En19C4RkSRIdUcrIpIECloRkcAUtCIigSloRUQCU9CKiASmoBURCUxBKyIS2P8BFXZJt2Hl094AAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Let's visualize our annotation (each column is an annotation)\n",
    "ax = sns.heatmap(annotations, annot=True, cmap=sns.light_palette(\"orange\", as_cmap=True), linewidths=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Scoring All Annotations at Once\n",
    "Let's calculate the scores of all the annotations in one step using matrix multiplication. Let's continue to us the dot scoring method\n",
    "\n",
    "\n",
    "$$\\text{score}\\left(h_t,  \\overline{h}_s\\right) = \n",
    "\\begin{cases}\n",
    "    h^T_t\\overline{h}_s       &  \\quad \\text{dot}\\\\\n",
    "    h^T_t W_a\\overline{h}_s  & \\quad \\text{general } \\\\\n",
    "    v_a\\tanh\\left(W_a [h^T_t, \\overline{h}_s] \\right) & \\quad \\text{concat }\n",
    "  \\end{cases}$$\n",
    "\n",
    "\n",
    "\n",
    "To do that, we'll have to transpose `dec_hidden_state` and [matrix multiply](https://docs.scipy.org/doc/numpy/reference/generated/numpy.matmul.html) it with `annotations`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([927., 397., 148., 929.])"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def dot_attention_score(dec_hidden_state, annotations):\n",
    "    # TODO: return the product of dec_hidden_state transpose and enc_hidden_states\n",
    "    return np.matmul(np.transpose(dec_hidden_state), annotations)\n",
    "    \n",
    "attention_weights_raw = dot_attention_score(dec_hidden_state, annotations)\n",
    "attention_weights_raw"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Looking at these scores, can you guess which of the four vectors will get the most attention from the decoder at this time step?\n",
    "\n",
    "## Softmax\n",
    "Now that we have our scores, let's apply softmax:\n",
    "\n",
    "\n",
    "$$ \\begin{align} \\alpha_t(s) & = \\text{align}\\left(h_t,  \\overline{h}_s\\right)  \\\\  & = \\frac{\\text{score}(h_t,  \\overline{h}_s)}{\\sum_{s'} \\text{score}(h_t,  \\overline{h}_{s'})}  \\end{align}      $$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1.19202922e-001, 7.94715151e-232, 5.76614420e-340, 8.80797078e-001],\n",
       "      dtype=float128)"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def softmax(x):\n",
    "    x = np.array(x, dtype=np.float128)\n",
    "    e_x = np.exp(x)\n",
    "    return e_x / e_x.sum(axis=0) \n",
    "\n",
    "attention_weights = softmax(attention_weights_raw)\n",
    "attention_weights"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Even when knowing which annotation will get the most focus, it's interesting to see how drastic softmax makes the end score become. The first and last annotation had the respective scores of 927 and 929. But after softmax, the attention they'll get is 0.119 and 0.880 respectively.\n",
    "\n",
    "# Applying the scores back on the annotations\n",
    "Now that we have our scores, let's multiply each annotation by its score to proceed closer to the attention context vector. This is the multiplication part of this formula (we'll tackle the summation part in the latter cells)\n",
    "\n",
    "$$ c_i = \\sum_{j=1}^T \\alpha_{ij} h_j  $$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[3.57608766e-001, 4.68881939e-230, 5.76614420e-340,\n",
       "        3.52318831e+000],\n",
       "       [1.43043506e+000, 1.58943030e-231, 2.47944200e-338,\n",
       "        2.64239123e+000],\n",
       "       [5.36413149e+000, 3.97357575e-231, 2.88307210e-339,\n",
       "        3.99001076e+001]], dtype=float128)"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def apply_attention_scores(attention_weights, annotations):\n",
    "    # TODO: Multiple the annotations by their weights\n",
    "    return attention_weights * annotations\n",
    "\n",
    "applied_attention = apply_attention_scores(attention_weights, annotations)\n",
    "applied_attention"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's visualize how the context vector looks now that we've applied the attention scores back on it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAVoAAAD4CAYAAACt8i4nAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAibUlEQVR4nO3deXgUVdbH8e9JggoJBBgTjIIkIAqIioqIg4iCoOOGKOoICjKOUdlHwW3cEBfGAXFBFFDGZRRU1FGZcUFFEVEUGOQVQXFBEDEJw5YECaRz3z+6CYmEdJau7qb4fZ6nHrpruX26qJyc3LpVZc45RETEOwmxDkBExO+UaEVEPKZEKyLiMSVaERGPKdGKiHgsKQqfoWENIlJVVusWnreq55y+rvafVwXRSLQUbN4QjY+JaympjQEo2LwxxpHEXkpqI0D7Anbti8K8lTGOJPaS01rFOgTPRCXRiohET1SK1GpRohURf7H4O/WkRCsiPqOKVkTEY0q0IiLeMiVaERGPxV+ijb9eYxERn1FFKyL+olEHIiLectXoOohWJ4MSrYj4i06GiYh4TYlWRMRjSrQiIt7SyTAREa+pohUR8ZgSrYiItzTqQETEa0q0IiIeU6IVEfFWHI46iL+IRER8RhWtiPiMug5ERLwVh6MO1HUgIj5j1ZgqacXsADP7zMy+MLNlZjY6NP8pM/vBzJaEpvbhIlJFKyL+ErmTYUVAN+dcgZnVAeaZ2ZuhZaOcczOr2pASrYj4TGS6DpxzDigIva0TmlxN2lLXgYj4TNW7Dsws28wWlpmyy7VklmhmS4BcYLZzbkFo0T1mttTMJpjZ/uEiUkUrIr5SnScsOOemAFMqWR4A2ptZQ+BVM2sH3Az8AuwX2vZG4K7KPkcVrYj4i1nVpypyzm0C5gBnOufWuaAi4B9Ax3DbK9GKiM9EbNRBWqiSxczqAj2AFWaWEZpnwPnAl+EiUteBiPhL5EYdZABPm1kiwaL0RefcLDN738zSCGbqJcA14Rra6xPt/E8+Ydz4BwmUBDi/13kMHNC/3PKZL7/CizNfJjEhkbr16nLrzTfRokUWACtXfss99/2NwsJCLMF49qlp7L9/2H7tqAsEAlw+YCBpaWk8NGF8uWXjH3iQhYsWAbBt2zY2bNzIh++/W+W2//PWWzz9zLM4B8n16nHzjTdw+OGtKCoq4qqrr2X79u0EAgG6d+/GNdlXAbB27c/cfOutbN68hTatj2DM6DupU6dO5L5wFASPmwkESkoqPG78qKhoO38eciPbt+8gECih+2mdufbKfuXWef0/7/LgpGmkH/g7AC658Bx6n3tGLMKthYiNOlgKHFvB/G7VbWuvTrSBQICx949n0sSHaJKezuUD/kTXLl1KEynAmWecQZ8LLwDgw7kf8cCDDzHx4QcpLi7m1jvuZMydd3D44a3YtGkzSUnxuTumz3iBzMxMCgsLd1t2/XUjSl/PeOFFvv7mm2q1fcjBBzP18cdo0KABH8+fz9333ccz/5jGfvvtx+OTJlKvXj12FBdz5VXZdD7pJI46qh0PT3yUfpdeyhk9e3DvfX/jX6+9zkV9Lqzt14ya4HEzjkkTHw4dNwN3O278aL/96jD5oXupV69u8P/02hvofOLxHN2udbn1enbrwk3XXRujKCNBV4ZF1LJlX9GsaVOaHnIIderUoWfP0/lg7txy66SkJJe+/vXXX7FQB/inCz6j1WGHcfjhrQBo2DCVxMTE6AVfRTk5ucz7eD7n9zov7LpvvzObM3r2KH3/zLP/5PIBA7mkbz8enzK1wm2OOfpoGjRoAMBR7dqRm5sHgJlRr149AIqLiykuLgYD5xyfL1xI926nAXDO2WfxwYdzK2w7Xu1+3PTY7bjxo+D/aV0g9H8aCJT+PIi3wpZwZtYa6AUcEpq1FnjdObfcy8CqIjcvjyZN0kvfN0lP58tly3Zb78WXZvLP52dQvGMHj0+aCMDq1asxMwYPHcHGTRs5o0cPBvS/LGqxV9X4CRMYPnQIhVt3r2bLWrduHWt//pkTOnQA4JNPF7B6zRqeeWoazjn+cv0oFi/+L8cdt9tfQqX+9fob/P6kTqXvA4EAl/W/gjU//cTFfS7kqHbt2LhpE/Xr1y+t/tObpJOXlxeBbxo9VT1u/CgQCNDvyhGsWbuOi3ufzVFHHrHbOu9/OJ/FXyyjebODuX7oVRzUJC0GkdZCHP7yqLSiNbMbgRkEa/HPQpMB083sJu/Di4yLL+rD66/OZOiQQTwx7R8AFAcCLFnyBXePuZMnp05mzgcf8tlnn8c40vLmfjSPRo0a0aZN67Drvv3ObE7vdlppVf7pggV8umABfS/rT7/LB7Dqxx9ZvWbNHrf/fOEiXnv9dYYNGVI6LzExkenPPcubs17ny6++4tvvvqv9l5KYSkxMZMZTj/DWK0+xbPk3fPv9qnLLT+nckVkvTePFpydyYodjuf2eCbEJtFYiM+ogksJVtFcCRzrndpSdaWYPAMuAsRVtFLq6Ihtg8uTJ9L2kTwRC3V16Who5Obml73Nyc0lL2/Nv3zN69uC+v/0dCFYxxx7bnkYNGwLQufNJrPj6azp2PMGTWGvii6VLmfvRR3w8fz7bi7ZTUFjIrbffwd13jd5t3Xdmv8uNN4wsfe+cY+CAAVx4Qe9y67340kxe/ddrADz84AOkpaWxcuVKxtxzL488OIGGDVN3a7t+/fp0OP545n/yKZf360t+fj7FxcUkJSWRm1P5Po9H1T1u/Kh+/RQ6HHc08z9dzGEtMkvnN0xtUPq697k9efixf8QgulraC2/8XQIcXMH8jNCyCjnnpjjnOjjnOmRnZ+9ptVpr27YNa9asYe3an9mxYwfvvPMuXbt0KbfO6tW7qrh5H3/Moc2aAXBSpxP59rvv+HXbNoqLi1m8+L9kZcXXyZChgwfx5qw3mPXav7j3njGc0KFDhUn2h1Wr2JK/haOPOqp03kmdOvHaG2+wdetWAHJzc9mwYQMXX9SH6c89y/TnniUtLY11v/zCyBtvZszoO2je/NDS7Tdu3Eh+fj4QHM2wYMFnZDZvjpnR4fjjee/9OQDM+vd/6Nq1/D6Pd7sfN7N3O278aOPGzeTnBy/d31ZUxKef/5fM5k3LrZO3fkPp6w/nLSCzebOoxhgZe19FOwJ4z8xWAjsz1qHAYcCQPW0ULUlJSdww6nqGDBtBoKSEXueeQ8uWLXhs8hTatmlD11O68MJLM/nss89JSkqifoP6jL7jNgAaNGjAZX0vpf+AP2FmdP79SXQ5uXOMv1HVBL9fa7qecgoA77wzm549epQ7sXFSpxP5YdUqrrgyOCSrXt26jLnrTho3blyuralPPMnmzZsZG6r0ExMT+eczT7F+/XruGD2GQEkAV+I4/fTunNLlZACGDR3MLX+9jUmPT+aIww/n/PPCn6iLJ8HjZiRDhg0vd9z4Xd7/NnDHPcEhba6khB7dunBK54489sQ/adu6FV1PPpEZM1/nw3mfkZiYQGqD+oz+64hYh10D8ddHa8Eb1FSyglkCwUvMyp4M+zx0DXBVuILNG8Kv5XMpqcEEV7B5Y4wjib2U1EaA9gXs2heFeStjHEnsJae1gghkyZJZ7ap8h62Ec76MSlYOO+rAOVcCfBqFWEREIiD+Ktr4HKEvIlJj8XcyTIlWRPwlDsfRKtGKiM8o0YqIeEyJVkTEUy4Ouw7ir9dYRMRnVNGKiM/EX/2oRCsi/hKHXQdKtCLiM/GXaOOvxhYRqZWIPZzxADP7zMy+MLNlZjY6ND/LzBaY2bdm9oKZ7RcuIiVaEfGXyD1uvAjo5pw7BmgPnGlmnYC/AROcc4cBGwneTrZSSrQi4jORqWhdUEHobZ3Q5IBuwMzQ/KcJPnK8Ukq0IuIvllDlycyyzWxhmancDbTNLNHMlgC5wGzgO2CTc644tMpP7Lqz4R7pZJiI+EzVT4Y556YAUypZHgDam1lD4FUg/HOlKqCKVkR8JvJPWHDObQLmACcBDc1sZ5HalOA9uiulRCsiPhOxUQdpoUoWM6sL9ACWE0y4Ox+EOAB4LVxE6joQEX+J3AULGcDTZpZIsCh90Tk3y8y+AmaY2d3Af4EnwzWkRCsiPhOZP9Sdc0uBYyuY/z3Bx3tVmRKtiPhL/F0YpkQrIn4Tf5lWiVZEfEaJVkTEW7p7l4iIt5wqWhERr8Xf5QFKtCLiL+o6EBHxmhKtiIjHlGhFRLylrgMREa8p0YqIeGwfHXWQkto4Gh+zV0hJbRTrEOKG9sUuyWmtYh2Cf8Rh10H8pX4REZ+JSkVbsGFdND4mrqU0zgCgYPPGGEcSezsrWe2LXfuicP2q2AYSB5IPzIxQS/FX0aqPVkT8JQ67DpRoRcRn4q9HVIlWRPwlDiva+Ev9IiK1ErGHMzYzszlm9pWZLTOz4aH5d5rZWjNbEprOCheRKloR8ZmIVbTFwPXOucVmVh9YZGazQ8smOOfGVbUhJVoR8ZnIJFrn3DpgXeh1vpktBw6pSVvqOhARfzGr+lTlJi2T4BNxF4RmDTGzpWY2zczCXnmjRCsivuJIqPJkZtlmtrDMlP3b9swsBXgZGOGc2wI8BrQE2hOseMeHi0ldByLiL9WoVJ1zU4Ape27K6hBMss85514JbZNTZvlUYFa4z1FFKyI+E7FRBwY8CSx3zj1QZn5GmdV6A1+Gi0gVrYj4TMRGHXQGLgf+z8yWhObdAlxqZu0BB6wCrg7XkBKtiPhMxEYdzNtDY/+pbltKtCLiL3F4ZZgSrYj4jBKtiIi3VNGKiHhNiVZExGNKtCIi3lLXgYiI1+LvOiwlWhHxGVW0IiLeUteBiIjXlGhFRDymRCsi4i11HYiIeMupohUR8ZoSrYiIt9R1ICLiNSVaERGPKdGKiHjLdAmup0bf/Tc+mv8JjRs15MXnntrjesu+WsHA7EHce9ftnN7t1KjFV12jx9zNR/M+pnGjRrw44/kK11m4aBHjH3iQ4uJiGjZsyNTJj1W5/U8XLOCRRyexY0cxdeokMXzoUDqe0AGAIcNGsH79egKBAMe2b8+NN4wkMTGR2e++x5SpT/DDqlU8849ptG3bJiLfNdrmf/IJ48ZPIFBSwvm9zmPggP6xDslzv+TkcvuYv/O/jZsw4IJeZ9H34t67rbdw8ReMe+jx0DGVyhOPjot+sLWiitZT5559Jhdf1Js77rp3j+sEAgEenjSZTh1PiGJkNXPu2Wdz8UV9uOPOuypcnp+fz9j7/84jDz1IxkEHsWHDhmq137BhQx4cP460tDS+/e47hgwbwVv/fgOAsffeQ0pKMs45brjpZt59733O6NmDw1q24O/3j+Xe+8bW+vvFSiAQYOz945g08WGapKdz+YCBdO3ShRYtsmIdmqcSExP5y9Bs2hzRisLCrfS7cgidTjiOFlnNS9fJzy/gvvETmTj+HjIOSmfDxk2xC7imInQyzMyaAc8ATQg+iHGKc+4hM2sMvABkEnw448XOuY2VtRV/NXYtHHfsMaQ2qF/pOi+89ArdTz2FRo0aRieoWjjuuGNJbdBgj8vffPttup16KhkHHQRA48aNS5f958036X/Fn7i03+Xcc99YAoHAbtu3PuII0tLSAGjZogVFRUVs374dgJSUZACKAwF27NhReuxmZWWR2bz5bm3tTZYt+4pmTZvS9JBDqFOnDj179uCDuXNjHZbn0g78HW2OaAVAcnI9spo3Izdvfbl13pw9h25dO5NxUDoAjfeCn5PdReZx40AxcL1zri3QCRhsZm2Bm4D3nHOtgPdC7ytV40RrZgNrum2s5ObmMefDefS5oFesQ4mI1avXsCU/n+xrrqVf/wHM+nfw4Zw//PAD78x+lyefmML0554lISGBN996u9K23nt/Dq2POJz99tuvdN7gocPpccYfqFcvme7dunn6XaIpNy+PJk3SS983SU8nLy8vhhFF38/rfuHrld/R7sjW5eb/uPontuQXcNWQUfT902BmvTk7RhHWRmQSrXNunXNuceh1PrAcOAToBTwdWu1p4PxwEdWm62A08I+KFphZNpANMHnyZPr2ObcWHxM54x6cyLDB2SQk+KOQDwQCLF+xgscfnci2oiIGXvlnjmrXjs8+X8jyFV/Tf0Dwd2FRURGNGzXaYzvfffc9D098lEcfeajc/EcfeYiioiJuvf0OPl+4kE4nnujp95Ho2Lr1V0b+dQzXD7uGlOTkcsuCx9RKJj/8N7YVFXHF1SM46sg2ND+0aYyirYmqdx2UzVUhU5xzUypYLxM4FlgANHHOrQst+oVg10KlKk20ZrZ0T4sqazwU6M5gXcGGdXtaNaqWr/iam28L9ndu2ryZjz9ZQGJiIqd17RLjyGomPT2d1NRU6tatS926dTmu/bF8s3IlzjnOOfsshg4eVG799+d8wNQnngTgtr/eQtu2bcjJyWXkDTdy152306zp7j9M+++/P11POYUP537km0SbnpZGTk5u6fuc3NzSLhS/21FczMi/juGsnt3ofurJuy1vkp5GamoD6tY9gLp1D+C49kfxzbff712JthqjDn6TqypuziwFeBkY4ZzbYmX6gJ1zzsxcuM8JF1EToD9wbgXT/8I1Hm/eeGUGs159gVmvvkD307py08gRe22SBTj1lC4sWfIFxcXF/LptG18uW0ZWViYdTziB995/v/Tk2ObNm1m3bh3dTjuV6c89y/TnnqVt2zbk5+cz/C/XMXTIINofc0xpu1u3biVvfbDvrri4mHkff7zX98uW1bZtG9asWcPatT+zY8cO3nlnNl277L3HQVU557jrvgfIat6My/54YYXrdO1yEkuWLqO4OBA6plaQlXlolCOtrYj10WJmdQgm2eecc6+EZueYWUZoeQaQu6ftdwrXdTALSHHOLakggA/CRhllt9x+FwsXL2HTps384bw+XP3ngRQXFwPslf2yt9x6GwsXLWbTpk384Zxzufqqq3Z9nwsvICsri9+f1Ik/9ruMBEvg/F7ncVjLlgAMuuZqBg8dTokrISkpiZtGjSIjI6Nc+y+8+BJrfvqJqU9MY+oT04Bgd4FzjuuuH8X2HdtxJY4Oxx/HhRcEhwG9P+cD/j5+PBs3bmL4dddxeKvDd+tyiHdJSUncMGokQ4YNJ1BSQq9zz6FlyxaxDstzS5Yu499vvcdhLbP444BrARhy9UB+CVX3fXqfQ4vMQ/n9iR24ZMA1JJhx/rlncliLzBhGXQMRGt1lwdL1SWC5c+6BMoteBwYAY0P/vha2LefCVr21FTddB7GU0jiY5Ao2VzoKZJ+QkhrsL9a+2LUvCtevim0gcSD5wEyIQJrcvujWKie1/Y6/e4+fZ2YnAx8B/weUhGbfQrCf9kXgUOBHgsO7Kh1b6atxtCIikSppnXPzKmmse3XaUqIVEZ/RlWEiIt7SvQ5ERLylJyyIiHhNN/4WEfGaEq2IiMeUaEVEvKWTYSIiXlNFKyLiMSVaERFvadSBiIjX4i/Rxl+vsYiIz6iiFRF/0agDERGvxV/XgRKtiPiMEq2IiLc06kBExGtKtCIiHou/RBt/p+dERGrDEqo+hWvKbJqZ5ZrZl2Xm3Wlma81sSWg6K1w7SrQi4jORe9w48BRwZgXzJzjn2oem/4RrRF0HIuIrLoInw5xzc80ss7btqKIVEZ+pekVrZtlmtrDMlF3FDxliZktDXQuNwq2sRCsiPlP1ROucm+Kc61BmmlKFD3gMaAm0B9YB48NtoK4DEfEXj8fROudydn2UTQVmhdtGFa2I+ExET4bt3rpZRpm3vYEv97TuTqpoRcRnIlfRmtl04FTgQDP7CbgDONXM2gMOWAVcHa4dJVoR8ZfIjjq4tILZT1a3HSVaEfGZ+LsyTIlWRHxmH020KY0zwq+0j0hJDTvkbp+hfbFL8oGZsQ7BP3TjbxERr+2jFW3huiXR+Ji4lpzRHoCCzRtjG0gc2FnJal+Uqeqfj7/kEHV9XYQair99qYpWRPxFN/4WEfGaEq2IiLd0MkxExGvxV9HGX+oXEfEZVbQi4jPxV9Eq0YqIr0TyCQuRokQrIj6jRCsi4i2NOhAR8ZoqWhERjynRioh4SyfDRES8Fn+JNv56jUVEaiVyD2c0s2lmlmtmX5aZ19jMZpvZytC/YW+srEQrIv5iCVWfwnsKOPM3824C3nPOtQLeC72vlBKtiPhM5Cpa59xcYMNvZvcCng69fho4P1w7SrQi4jNVT7Rmlm1mC8tM2VX4gCbOuXWh178ATcJtoJNhIuIv1Rh14JybAkyp6Uc555yZhX00hCpaEZHqyTGzDIDQv7nhNlCiFRGfSajGVCOvAwNCrwcAr1UlIhER/zCr+hS2KZsOfAIcYWY/mdmVwFigh5mtBE4Pva+U+mhFxGcid8GCc+7SPSzqXp12lGhFxGfi78owJVoR8Rfd60BExFtOFa2IiMd0428REa+pohUR8ZgSrYiIx5RoRUS8pVEH3jv7kiEk1zuAhIQEEhMTeW7KfRWut2zFt1wx6Dbuu304p5/aKcpR1l4gEODyAQNJS0vjoQnjq7TNLzk53H7naDZs2IBh9O59Pn3/eAkAkx6fzIdz55JgCTRq3IjRt99GWloaP6xaxei77mbF118z6Npr6H9ZPy+/VtTM/+QTxo2fQKCkhPN7ncfAAf1jHVLUBUrgwqmH0qR+MZP7/syajUlc93IGm7YmcuTBRdzfex37JcY6yppQoo2KyRNup1HDBntcHgiU8NDk5+l0wtFRjCqyps94gczMTAoLC6u8TWJiIn8ZPow2rVtTWFjIZf2voFPHjrRokUX/yy5j0DVXB9t+4QWmPjGNW26+kdQGDRg18jo++OBDr75K1AUCAcbeP45JEx+mSXo6lw8YSNcuXWjRIivWoUXVMwsa0vLA7RQUBc/Sj3s3jSs6beLsdvncPiudmYtT6XvC5hhHWQNxWNHG3ziIKJjxypt0P+VEGjdMjXUoNZKTk8u8j+dzfq/zSuctX76Cq66+ln79BzB46HDy1q/fbbu0Aw+kTevWACQnJ5OVlUluXvDGQykpyaXr/frrttKioHHjxhzZti1JSf75nbxs2Vc0a9qUpoccQp06dejZswcfzJ0b67Ci6pctSXywMoU+xwUTqXPw6Q/1OKNtPgC9j9nCe1+nxDLEWojcjb8jJWyiNbPWZtbdzFJ+M/+3j3eIC2YweNQ99M2+iZffeHe35bl5G5gz73Mu6tUjBtFFxvgJExg+dAgJCcEDZUdxMfePG8/9Y+/luWeeptd55zDpsccrbePnn39mxdff0O7IdqXzHp30GGedcx5vvfU2115dlfsf751y8/Jo0iS99H2T9HTy8vJiGFH03ftWGqNOzyN0CLHx1wQaHBAgKZQRDmpQTM6WvfWXa/wl2kr3pJkNAwYDy4EnzWy4c27nLcHuBd7yOL5qm/bIXaSnNWbDxs1cO/JuMg89mOOPaVu6fNzEpxiW3ZeEhL2zmJ/70TwaNWpEmzatWbhoEQA//vgj333/HYOGDAMgUFLCgQf+bo9tbN26lVE33czI60aUq2QHD7qWwYOuZdpTT/PCSzO5Jvsqb7+MxMScb5JpnByg3cFFLFhVN9bheCD+ug7C/cq6CjjeOVdgZpnATDPLdM49RCXfJvQ4iGyAyZMn0+/cjpGKN6z0tMYANG6Uymknd2TZ8u/KJdqvvv6em+96GIBNm7cwb8F/SUxM5LQuJ0Qtxtr4YulS5n70ER/Pn8/2ou0UFBYyecpUWmS14KlpT5Rb95ecHP5y3UgALrygN30uvIAdxcWMuvFm/nDGGXQ77bQKP+MPZ57B8BHX+TbRpqelkZOz617NObm5pKWlxTCi6Fq8ui7vf53M3JVZFBUbBUUJ3PNWOlu2JVJcAkkJwa6FJg2KYx1qzcRhH224RJvgnCsAcM6tMrNTCSbb5lSSaH/zeAhXuG5J7SOtgl9/3UaJcyTXq8uvv27j04VLuar/heXWmTVjYunrO+6bRJeTjttrkizA0MGDGDp4EAALFy3i2X8+z713j6HPJZeydOn/cfTRR7GjuJjVP66mZcsWTH/u2dJtnXOMGXMPWVmZXNavb7l2V69ezaGHHgrAhx/OJTOzefS+VJS1bduGNWvWsHbtz6Snp/HOO7O5Z8xdsQ4raq4/fT3Xnx7sw1+wqi7T5jdi/AW/MOylDN7+qj5nt8vn1S8a0O2IghhHWlPx99dquESbY2btnXNLAEKV7TnANOAor4Orrv9t3Mz1t40DgiMLzuzemc4ntmfma7MB6LMX98tWpk6dOtw/9l7+Pu4BCgoKCAQCXHrpJbRs2aLceku++IJ/v/kmhx3Wkkv7XQ4EuwtO7vx7Hnl0Ej/+uBpLMDIOOohbbroRgPXr/8flV1xBYWEhZglMnzGDl2bMKNflsLdJSkrihlEjGTJsOIGSEnqde85u+2pfNOr09fxlZgYPvv872mQUcdGxW2IdUs3EYUVrzu35uWJm1hQods79UsGyzs65j6vwGVGraONZckZ7AAo2b4xtIHEgJbURoH0Bu/YFz8dfcoi6vg4i0MG69eeFYR+WuFO9gztEZcdXWtE6536qZFlVkqyISJTF3y+tvXX8hohIxSLYdWBmq4B8IEDwr/sONWlHiVZEfCbiFe1pzrndrwCqBiVaEfEVF4ejDuIvIhGR2ojg48YBB7xjZotC1wfUiCpaEdlnlb24KmRK6DqAnU52zq01s3RgtpmtcM5V+8YYSrQi4jNV76P9zcVVFS1fG/o318xeBToC1U606joQEX+JUNeBmSWbWf2dr4GewJc1CUkVrYj4TMRGHTQBXrVgQk4CnnfO1ehGWkq0IuIzkflD3Tn3PXBMJNpSohURf4nDex0o0YqIzyjRioh4TIlWRMRb6joQEfFa/I1aVaIVEX9RRSsi4jUlWhERj8Vfoo2/zgwREZ9RRSsi/qI+WhERb8Xjjb+VaEXEX1TRioh4TYlWRMRjSrQiIh5TohUR8Zb6aEVEvKZRByIi3lJFKyLitfhLtPFXY4uI1IpVYwrTktmZZva1mX1rZjfVNCIlWhHxl8g9bjwReBT4A9AWuNTM2tYkpKh0HSRntI/Gx+wVUlIbxTqEuKF9UUZfF+sIfCRiXQcdgW9DT8PFzGYAvYCvqttQNBJtXHSYmFm2c25KrOOIB9oXu2hf7OKXfZGS2qjKOcfMsoHsMrOmlNkHhwBryiz7CTixJjHtS10H2eFX2WdoX+yifbHLPrcvnHNTnHMdykye/KLZlxKtiEh1rAWalXnfNDSv2pRoRUQq9jnQysyyzGw/4I/A6zVpaF8aR7vX9z1FkPbFLtoXu2hflOGcKzazIcDbQCIwzTm3rCZtmXM62yki4iV1HYiIeEyJVkTEY75PtJG6hM4PzGyameWa2ZexjiWWzKyZmc0xs6/MbJmZDY91TLFiZgeY2Wdm9kVoX4yOdUx+5Os+2tAldN8APQgONv4cuNQ5V+0rO/zAzE4BCoBnnHPtYh1PrJhZBpDhnFtsZvWBRcD5++JxYWYGJDvnCsysDjAPGO6c+zTGofmK3yva0kvonHPbgZ2X0O2TnHNzgQ2xjiPWnHPrnHOLQ6/zgeUErwLa57iggtDbOqHJv9VXjPg90VZ0Cd0++QMlFTOzTOBYYEGMQ4kZM0s0syVALjDbObfP7guv+D3RiuyRmaUALwMjnHNbYh1PrDjnAs659gSvfOpoZvtst5JX/J5oI3YJnfhLqD/yZeA559wrsY4nHjjnNgFzgDNjHIrv+D3RRuwSOvGP0AmgJ4HlzrkHYh1PLJlZmpk1DL2uS/DE8YqYBuVDvk60zrliYOcldMuBF2t6CZ0fmNl04BPgCDP7ycyujHVMMdIZuBzoZmZLQtNZsQ4qRjKAOWa2lGBhMts5NyvGMfmOr4d3iYjEA19XtCIi8UCJVkTEY0q0IiIeU6IVEfGYEq2IiMeUaEVEPKZEKyLisf8HvCA8GpFxBlIAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Let's visualize our annotations after applying attention to them\n",
    "ax = sns.heatmap(applied_attention, annot=True, cmap=sns.light_palette(\"orange\", as_cmap=True), linewidths=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Contrast this with the raw annotations visualized earlier in the notebook, and we can see that the second and third annotations (columns) have been nearly wiped out. The first annotation maintains some of its value, and the fourth annotation is the most pronounced.\n",
    "\n",
    "# Calculating the Attention Context Vector\n",
    "All that remains to produce our attention context vector now is to sum up the four columns to produce a single attention context vector"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 3.88079708,  4.0728263 , 45.26423912], dtype=float128)"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def calculate_attention_vector(applied_attention):\n",
    "    return np.sum(applied_attention, axis=1)\n",
    "\n",
    "attention_vector = calculate_attention_vector(applied_attention)\n",
    "attention_vector"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<AxesSubplot:>"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAIYAAAEWCAYAAACjaO9mAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAPQUlEQVR4nO3df4xVZX7H8fd3BpFSQJeyIsI2bLqTgkrEKMQu64+Cv7qA2LR1/YFsGiM2AcuWKmBT6k7QrU264jbNGigrbAwVN+ua1km3rYuwBE2RHRAU6K4ImoUghAS2CwjsDN/+cc/AneFhztzLnPOce+fzSk6YOXfuuU8un3mec+6c7/OYuyPSVUPsBkgxKRgSpGBIkIIhQQqGBCkYEqRg1CEzazSzrWbWkny/ysz2mtl7yTY+7Rj9Mm+lxDAP2AUMKdv3pLv/sKcHyKPHcG1nt07M8Eq3tDfbzEYBU4EVaT/bnVx6jF/96lgeL1Nol102KK+XegFYAAzusv9ZM/s7YC2wyN1PdXcQnWNEZFbNZrPN7Gdl2+xzx7NpwCF3b+3yUk8BY4AJwFBgYWrbcvhbiavHONtjWPm+xsb0oaGr9vbOxyhnZn8PPAy0AQMonWP8yN1nlv3MbcAT7j6tu9dRjxFRNT1Gd9z9KXcf5e6jgfuBt9x9ppmNKL2eGXAv8EFa23RVElFDfr+Wq83s85R6rPeAv0h7goaSnISGkksvrXwoOXXqwkNJb1KPEVHa0BCTghFRkYOhk08JUo8RUZF7DAUjIrNqTvzzSZOCEZF6DAlSMCRIwZAgBUOCFAwJUjAkSMGQIAVDghQMCVIwJEjBkCAFQ4IUDAlSMCRIwZCgIgdDt/ZF1Nt1JeeOe161+xfNbJOZ7TazV82sf9oxFIyIsgoG56rdO/wDsNTdvwQcAR5JO4CCEVEWweha7Z5Un00GOqZA+D6larRuKRgRmXkV24WLmhMvUKp2P5N8/zvAUXdvS77fB4xMa5tOPiOq5uTT3ZcDy8PHO1ftnhQvV03BiCiDq5JJwD1m9lXOVbt/B7jczPolvcYoYH/agTSURJRTtftDwDrgT5Mf+zrwb2ltUzAiyvCqpKuFwHwz203pnON7aU+om6Hk1KlTPPbYo5w+fZr29namTJnC7Nmdq/0PHDjAkiXNHD16hCFDLqO5eQnDhw+P1OJsP+By9/XA+uTrPcDESp5fN9MguDufffYZAwcOpK3tNzz66CPMn/8k48aNO/szixYt4CtfuZlp06azefO7tLS8QXPzkszbBuFpEJqazlT85n/4YUMun5fWzVBiZgwcOBCAtrY22trazvuN3Lt3LxMmTADgxhsnsGHDT/NuZic5DiUVSw2GmY0xs4Vm9k/JttDMxubRuEq1t7fz0EMPcNdddzBx4k1ce+24To83NTWxbt1bAKxfv47jx49z9OjRCC0tqdlgmNlCYA2lLvDdZDPgFTNblH3zKtPY2Mjq1a/Q0vJjdu78gI8+2t3p8Xnz/ootW7Ywc+aDbNnSyhVXXEFjY2Ok1hY7GN2eY5jZL4Br3P03Xfb3B3a4e9MFnjcbmA2wbNmyG772tQd7r8U9tGLFcgYMGMDMmbOCj584cYL77vsTWlp+nEt7QucYY8dWfo6xa1c+5xhpVyVngKuAT7rsH8G5j1zP0+XTuVxOPo8cOUK/fv0YPHgwJ0+eZNOmTcya9fVOP9NxNdLQ0MCqVSuZPv2ezNvVnSL/2T0tGN8A1prZh8Avk32/C3wJmJthuyp2+PBhmpuf5syZds6ccW6//XZuvvkWli17kbFjr+aWW26ltbWV7373nwHj+uuvZ8GCwo2GhZF6uWpmDZSugTv+8LIf2Ozu7T18Dc3aR3goueaayoeSHTuKMZTg7meA/8mhLX1OLQ8lkiEFQ4JynBm4YgpGROoxJEjBkCAFQ4Kqm+czHwpGROoxJEjBkCAFQ4IUDAkqcjAK/Nlb/evtG3XMbICZvWtm28xsh5k1J/tXaQnvGpJBj3EKmOzux8zsEmCjmXXciVTREt4KRkS9HQwv3UPRcY/DJclW1YclGkrqTDI3xnvAIeBNd9+UPPSsmW03s6VmdmnacRSMiKo5x0irdnf3dncfT6lGdaKZXUsVS3hrKImot6vdu/zcUTNbB9zt7v+Y7D5lZiuBJ9Kerx4jogyuSj5vZpcnX/8WcAfwv1rCu8ZkcFUyAvi+mTVS+qX/gbu3mNlblS7hrWBElMFVyXbg+sD+yZUeS8GIqMiffCoYESkYEqRgSJDuEpcg3donQRpKJEjBkCAFQ4IUDAkqcjAKfMEkManHiKjIPYaCEZGCIUEKhgT1+WAkE5NJF30+GBLW54Nx9Kimc7z88vN7zT4fDAlTMCRIwZCgIgdDH4lHlGO1u5bwriUZrFfSUe1+HTAeuNvMbkJLeNeW3g6Gl4Sq3bWEdy1paPCKt7Si5q7V7sBHaAnv2pJFUXOyXMj4pIb1dUpV7hVTMCLK8qqkrNr9D9AS3n3bBardd1HFEt7qMSLKsdp9J7DGzJ4BttKXlvCuRTlWu1e8hLeCEVGRP/lUMCJSMCRIwZAgBUOCFAwJUjAkSMGQIAVDghQMCVIwJEjBkCAFQ4IUDAkqcjB0o44EqceISBPASpCmjJagIp9jKBgRKRgSVORgFHiUq38ZFDV/wczWmdnOpKh5XrL/m2a2v2wJ76+mtU09RkQZ9BhtwF+7+xYzGwy0mtmbyWNLy5bZTFV3PUZ7ezsPP/wg8+fPO++xrVu3MGvWg3z5yxNZu/YnEVrXWQZFzQfcfUvy9a8pFRul1qmG1F0wXn31FUaPHh18bPjwK1m8uJk777w730ZdQAbTIJQd20ZTqjHpWMJ7brKE90tm9rm059dVMA4ePMjbb29kxox7g49fddVVNDU10dBQjLO+aoKRVu1eOq4NAl4DvuHu/we8CPwepTkzDgDfTmtb1ecYZvbn7r6y2udnYenSbzN37jxOnDgeuyk9kkW1u5ldQikUq939R8lzDpY9/i9AS9rrXEyP0dxN486mevny1GXIe8XGjRsYOvRzjB07NpfX6w0ZXJUYpbrUXe7+fNn+EWU/9sdc7BLeZrb9Qg8Bwy/0vC6p9jzm+dy2bRsbNmzgnXfe5tSp0xw/foynn/5bmpufyfy1q5XBVckk4GHg/WTyFIC/AR4ws/GUZtf5GHgs7UBpQ8lw4C5K8zaVM+CdHjc3B3PmPM6cOY8D0Nr6M1avfrnQoYBMipo3Uvq/6eo/Kj1W2lDSAgxy90+6bB8D6yt9sRiWLXuRDRt+CsDOnTuYNu2PWLv2Jzz33Le4//4/i9q2LK9KLrpt7pn/6TeXoaTokimjO/3XLl58uuI3f8mS/rnEo64uV6X36CPxiIr8RzQFIyIFQ4IUDAnSPZ8SpB5DgnQzsASpx5AgBUOCFAwJUjAkSMGQIAVDghQMCVIwJKjIwSjwZ28Sk3qMiIrcYygYERU5GBpKIsqx2n2omb1pZh8m//atEsVak8Fd4h3V7lcDNwFzzOxqYBGw1t2bgLXJ991SMCLKsdp9BqWlu6GHS3jrHCOiLM8xulS7D3f3A8lDn9JNFWEH9RgR5VjtfpaXColS7ylUjxFRNfd8VlPtDhw0sxHufiApcD6U9jrqMSLKq9od+HdKS3eDlvAuvhyr3Z8DfmBmjwCfAPelHUjBiCjHaneAKZUcS8GISHeJS1CRPxJXMCJSMCRIwZCgIgejwKc/EpN6jIiK3GPkEoxk/inpos8HQ8L6fDCK/AbkJTQ5YpHfF/UYESkYEqRgSJCCIUEKhgQpGBKkYEiQ5vmUIPUYEqRgSJCCIUFFDobux4goi6WvkgV3D5nZB2X7vlnp2u4KRkRZBANYBYSWol7q7uOTLXXxPA0lEWUxlLj7hqSg+aKox4goq6LmC+i7a7vXmmqC4e7L3f3Gsq0nS2Hnt7a7XLy8rkryXttdLlJGJ5+B1+nltd0lW1n0GGb2CnAbMMzM9gFPA7f19trukqGMrkoeCOz+XqXHUTAiKvInnwpGRAqGBCkYEqRgSJCCIUG6tU+C1GNIkIIhQUUOhv5WIkHqMSIqco+hYESkYEiQgiFBRZ4yusBNuzgNDbBlC7zxRun7lSthzx7YurW0XXdd3PZBfjfqVKNue4x582DXLhgy5Ny+J5+E116L16auijyU1GWPMXIkTJ0KK1bEbkn3itxjpAbDzMaY2ZRkna3y/aGilkJ44QVYsADOnOm8/9lnYds2eP556N8/StM6qdlgmNlfUlom6XHgAzObUfbwt7JsWLWmToVDh0rnF+WeegrGjIEJE2DoUFi4ME77ytVsMIBHgRvc/V5KN5gu7lj9Fy64kk6nopjly3tS9tB7Jk2Ce+6BvXthzRqYPBlefhk+/bT0+OnTpRPRiRNzbVZQkYNhHpqZ9GzDbYe7X1P2/SDgh8BOYLK7j+/Ba3isk6xbb4UnnoDp0+HKK8+FY+lSOHmy1IvkJXmbO70TmzefqPjv7hMmDOz23TSzl4BpwCF3vzbZNxR4FRhN6S7x+9z9SHfHSesxDia3nQPg7seSFx0GjEt5bqGsXg3bt8P778OwYfDMM7FblFmPsYrzi5orXsI7rccYBbS5+6eBxya5+9s9aGi0HqNIQj1Ga2vlPcYNN3TfYwAkRc0tZT3Gz4HbytZdXe/uv9/dMbr9HMPd93XzWE9CId3I8RdGS3jXkmqGkouodge0hHeN6P0lvC9AS3jXkhwvV7WEdy3JsahZS3jXkhyLmkFLeNeOIl/GKxgRKRgSpGBIkIIhQQqGBBX5ZmAFIyL1GBKkYEiQgiFBCoYEKRgSpGBImIIhIQXOhYIRk4YSCVIwJEjzfEqQegwJUjAkSMGQIAVDgjIqH/gY+DXQTqnu+MZqjqNgRJRhj/GH7n74Yg6gYERU5KGkwDeX1b+Mipod+G8za6204LlT27qbH6OXaH4MwvNjHD58rOI3f9iwQWkz6ox09/1mdgXwJvC4u2+o9HXUY0SURVGzu+9P/j0EvA5UNduYghFRQ0PlW3fM7LfNbHDH18Cd9GC57hCdfEaUwRA7HHjdSgfuB/yru/9nNQdSMOqIu+8BemWW9FyCkf35bW0q8kl5HlclhWBms5NpiqQH+tLJZ9XX9H1RXwqGVEDBkKC+FAydX1Sgz5x8SmX6Uo8hFaj7YJjZ3Wb2czPbbWaps+5LSV0PJWbWCPwCuAPYB2wGHnD3nVEbVgPqvceYCOx29z3ufhpYA8xIeY5Q/8EYCfyy7Pt9yT5JUe/BkCrVezD2A18o+35Usk9S1HswNgNNZvZFM+sP3E9piQZJUdf3Y7h7m5nNBf4LaARecvcdkZtVE+r6clWqV+9DiVRJwZAgBUOCFAwJUjAkSMGQIAVDghQMCfp/VJz5dwf+kpMAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 108x324 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Let's visualize the attention context vector\n",
    "plt.figure(figsize=(1.5, 4.5))\n",
    "sns.heatmap(np.transpose(np.matrix(attention_vector)), annot=True, cmap=sns.light_palette(\"Blue\", as_cmap=True), linewidths=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have the context vector, we can concatinate it with the hidden state and pass it through a hidden layer to produce the the result of this decoding time step."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part III : [Image Captioning](https://machinelearningmastery.com/develop-a-deep-learning-caption-generation-model-in-python/)\n",
    "\n",
    "Caption generation is a challenging artificial intelligence problem where a textual description must be generated for a given photograph.\n",
    "\n",
    "It requires both methods from computer vision to understand the content of the image and a language model from the field of natural language processing to turn the understanding of the image into words in the right order. Recently, deep learning methods have achieved state-of-the-art results on examples of this problem.\n",
    "\n",
    "What is most impressive about these methods is a single end-to-end model can be defined to predict a caption, given a photo, instead of requiring sophisticated data preparation or a pipeline of specifically designed models.\n",
    "\n",
    "In this section, you will discover how to develop a photo captioning deep learning model from scratch.\n",
    "\n",
    "After completing this section, you will know:\n",
    "\n",
    "- How to prepare photo and text data for training a deep learning model.\n",
    "\n",
    "- How to design and train a deep learning caption generation model.\n",
    "\n",
    "- How to evaluate a trained caption generation model and use it to caption entirely new photographs.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2.2.4-tf\n"
     ]
    }
   ],
   "source": [
    "import tensorflow\n",
    "print(tensorflow.keras.__version__)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Photo and Caption Dataset\n",
    "\n",
    "A good dataset to use when getting started with image captioning is the __Flickr8K__ dataset.\n",
    "\n",
    "The reason is because it is realistic and relatively small so that you can download it and build models on your workstation using a CPU.\n",
    "\n",
    "The definitive description of the dataset is in the paper _“Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics”_ from 2013.\n",
    "\n",
    "The authors describe the dataset as follows:\n",
    "\n",
    "_We introduce a new benchmark collection for sentence-based image description and search, consisting of 8,000 images that are each paired with five different captions which provide clear descriptions of the salient entities and events._\n",
    "\n",
    "The images were chosen from six different Flickr groups, and tend not to contain any well-known people or locations, but were manually selected to depict a variety of scenes and situations. Here are some direct download links:\n",
    "\n",
    "- [__Flickr8k_Dataset.zip__](https://github.com/jbrownlee/Datasets/releases/download/Flickr8k/Flickr8k_Dataset.zip) (1 Gigabyte) An archive of all photographs\n",
    "- [__Flickr8k_text.zip__](https://github.com/jbrownlee/Datasets/releases/download/Flickr8k/Flickr8k_text.zip) (2.2 Megabytes) An archive of all text descriptions for photographs.\n",
    "\n",
    "\n",
    "Download the datasets and unzip them into your current working directory (directory of Jupyter Notebook). \n",
    "You will have two directories:\n",
    "\n",
    "- __Flickr8k_Dataset__: Contains 8092 photographs in JPEG format.\n",
    "- __Flickr8k_text__: Contains a number of files containing different sources of descriptions for the photographs.\n",
    "\n",
    "The dataset has a pre-defined training dataset (6,000 images), validation dataset (1,000 images), and test dataset (1,000 images).\n",
    "\n",
    "One measure that can be used to evaluate the skill of the model are BLEU scores. For reference, below are some ball-park BLEU scores for skillful models when evaluated on the test dataset (taken from the 2017 paper [“Where to put the Image in an Image Caption Generator“](https://arxiv.org/abs/1703.09137)):\n",
    "\n",
    "- BLEU-1: 0.401 to 0.578.\n",
    "- BLEU-2: 0.176 to 0.390.\n",
    "- BLEU-3: 0.099 to 0.260.\n",
    "- BLEU-4: 0.059 to 0.170.\n",
    "\n",
    "We describe the BLEU metric more later when we work on evaluating our model.\n",
    "\n",
    "Next, let’s look at how to load the images."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prepare Photo Data\n",
    "\n",
    "We will use a pre-trained model to interpret the content of the photos.\n",
    "\n",
    "There are many models to choose from. In this case, we will use the Oxford Visual Geometry Group, or VGG, model that won the ImageNet competition in 2014. Learn more about the model here:\n",
    "\n",
    "[Very Deep Convolutional Networks for Large-Scale Visual Recognition](http://www.robots.ox.ac.uk/~vgg/research/very_deep/)\n",
    "\n",
    "Keras provides this pre-trained model directly. Note, the first time you use this model, Keras will download the model weights from the Internet, which are about 500 Megabytes. This may take a few minutes depending on your internet connection.\n",
    "\n",
    "We could use this model as part of a broader image caption model. The problem is, it is a large model and running each photo through the network every time we want to test a new language model configuration (downstream) is redundant.\n",
    "\n",
    "Instead, we can pre-compute the “photo features” using the pre-trained model and save them to file. We can then load these features later and feed them into our model as the interpretation of a given photo in the dataset. It is no different to running the photo through the full VGG model; it is just we will have done it once in advance.\n",
    "\n",
    "This is an optimization that will make training our models faster and consume less memory.\n",
    "\n",
    "We can load the VGG model in Keras using the VGG class. We will remove the last layer from the loaded model, as this is the model used to predict a classification for a photo. We are not interested in classifying images, but we are interested in the internal representation of the photo right before a classification is made. These are the “features” that the model has extracted from the photo.\n",
    "\n",
    "Keras also provides tools for reshaping the loaded photo into the preferred size for the model (e.g. 3 channel 224 x 224 pixel image).\n",
    "\n",
    "Below is a function named `extract_features` that, given a directory name, will load each photo, prepare it for VGG, and collect the predicted features from the VGG model. The image features are a 1-dimensional 4,096 element vector.\n",
    "\n",
    "The function returns a dictionary of image identifier to image features."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can call this function to prepare the photo data for testing our models, then save the resulting dictionary to a file named `‘features.pkl‘`.\n",
    "\n",
    "The complete example is listed below. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5\n",
      "553467904/553467096 [==============================] - 16s 0us/step\n",
      "Model: \"model\"\n",
      "_________________________________________________________________\n",
      "Layer (type)                 Output Shape              Param #   \n",
      "=================================================================\n",
      "input_1 (InputLayer)         [(None, 224, 224, 3)]     0         \n",
      "_________________________________________________________________\n",
      "block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      \n",
      "_________________________________________________________________\n",
      "block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     \n",
      "_________________________________________________________________\n",
      "block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         \n",
      "_________________________________________________________________\n",
      "block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     \n",
      "_________________________________________________________________\n",
      "block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    \n",
      "_________________________________________________________________\n",
      "block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         \n",
      "_________________________________________________________________\n",
      "block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    \n",
      "_________________________________________________________________\n",
      "block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    \n",
      "_________________________________________________________________\n",
      "block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    \n",
      "_________________________________________________________________\n",
      "block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         \n",
      "_________________________________________________________________\n",
      "block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   \n",
      "_________________________________________________________________\n",
      "block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   \n",
      "_________________________________________________________________\n",
      "block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   \n",
      "_________________________________________________________________\n",
      "block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         \n",
      "_________________________________________________________________\n",
      "block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   \n",
      "_________________________________________________________________\n",
      "block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   \n",
      "_________________________________________________________________\n",
      "block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   \n",
      "_________________________________________________________________\n",
      "block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         \n",
      "_________________________________________________________________\n",
      "flatten (Flatten)            (None, 25088)             0         \n",
      "_________________________________________________________________\n",
      "fc1 (Dense)                  (None, 4096)              102764544 \n",
      "_________________________________________________________________\n",
      "fc2 (Dense)                  (None, 4096)              16781312  \n",
      "=================================================================\n",
      "Total params: 134,260,544\n",
      "Trainable params: 134,260,544\n",
      "Non-trainable params: 0\n",
      "_________________________________________________________________\n",
      "None\n"
     ]
    },
    {
     "ename": "FileNotFoundError",
     "evalue": "[Errno 2] No such file or directory: 'Flicker8k_Dataset/'",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mFileNotFoundError\u001b[0m                         Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-60-9545493bef71>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m     39\u001b[0m \u001b[0;31m# extract features from all images\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     40\u001b[0m \u001b[0mdirectory\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'Flicker8k_Dataset/'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 41\u001b[0;31m \u001b[0mfeatures\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mextract_features\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdirectory\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     42\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Extracted Features: %d'\u001b[0m \u001b[0;34m%\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfeatures\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     43\u001b[0m \u001b[0;31m# save to file\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m<ipython-input-60-9545493bef71>\u001b[0m in \u001b[0;36mextract_features\u001b[0;34m(directory)\u001b[0m\n\u001b[1;32m     18\u001b[0m     \u001b[0;31m# extract features from each photo\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     19\u001b[0m     \u001b[0mfeatures\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdict\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 20\u001b[0;31m     \u001b[0;32mfor\u001b[0m \u001b[0mname\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mlistdir\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdirectory\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     21\u001b[0m         \u001b[0;31m# load an image from file\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     22\u001b[0m         \u001b[0mfilename\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdirectory\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;34m'/'\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: 'Flicker8k_Dataset/'"
     ]
    }
   ],
   "source": [
    "from os import listdir\n",
    "from pickle import dump\n",
    "from tensorflow.keras.applications.vgg16 import VGG16\n",
    "from tensorflow.keras.preprocessing.image import load_img\n",
    "from tensorflow.keras.preprocessing.image import img_to_array\n",
    "from tensorflow.keras.applications.vgg16 import preprocess_input\n",
    "from tensorflow.keras.models import Model\n",
    " \n",
    "# extract features from each photo in the directory\n",
    "def extract_features(directory):\n",
    "    # load the model\n",
    "    model = VGG16()\n",
    "    # re-structure the model\n",
    "    model.layers.pop()\n",
    "    model = Model(inputs=model.inputs, outputs=model.layers[-2].output)\n",
    "    # summarize\n",
    "    print(model.summary())\n",
    "    # extract features from each photo\n",
    "    features = dict()\n",
    "    for name in listdir(directory):\n",
    "        # load an image from file\n",
    "        filename = directory + '/' + name\n",
    "        image = load_img(filename, target_size=(224, 224))\n",
    "        # convert the image pixels to a numpy array\n",
    "        image = img_to_array(image)\n",
    "        # reshape data for the model\n",
    "        image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))\n",
    "        # prepare the image for the VGG model\n",
    "        image = preprocess_input(image)\n",
    "        # get features\n",
    "        feature = model.predict(image, verbose=0)\n",
    "        # get image id\n",
    "        image_id = name.split('.')[0]\n",
    "        # store feature\n",
    "        features[image_id] = feature\n",
    "        #print('>%s' % name)\n",
    "    return features\n",
    " \n",
    "# extract features from all images\n",
    "directory = 'Flicker8k_Dataset/'\n",
    "features = extract_features(directory)\n",
    "print('Extracted Features: %d' % len(features))\n",
    "# save to file\n",
    "dump(features, open('./features.pkl', 'wb'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Running this data preparation step may take a while depending on your hardware, perhaps one hour on the CPU with a modern workstation.\n",
    "\n",
    "At the end of the run, you will have the extracted features stored in `‘features.pkl‘` for later use. This file will be about 127 Megabytes in size. You can download it from Ilias."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prepare Text Data\n",
    "\n",
    "The dataset contains multiple descriptions for each photograph and the text of the descriptions requires some minimal cleaning.\n",
    "\n",
    "First, we will load the file containing all of the descriptions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load doc into memory\n",
    "def load_doc(filename):\n",
    "    # open the file as read only\n",
    "    file = open(filename, 'r')\n",
    "    # read all text\n",
    "    text = file.read()\n",
    "    # close the file\n",
    "    file.close()\n",
    "    return text\n",
    "\n",
    "filename = './Flickr8k_text/Flickr8k.token.txt'\n",
    "# load descriptions\n",
    "doc = load_doc(filename)\n",
    "print(doc[:410])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each photo has a unique identifier. This identifier is used on the photo filename and in the text file of descriptions.\n",
    "\n",
    "Next, we will step through the list of photo descriptions. Below defines a function `load_descriptions()` that, given the loaded document text, will return a dictionary of photo identifiers to descriptions. Each photo identifier maps to a list of one or more textual descriptions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# extract descriptions for images\n",
    "def load_descriptions(doc):\n",
    "    mapping = dict()\n",
    "    # process lines\n",
    "    for line in doc.split('\\n'):\n",
    "        # split line by white space\n",
    "        tokens = line.split()\n",
    "        if len(line) < 2:\n",
    "            continue\n",
    "        # take the first token as the image id, the rest as the description\n",
    "        image_id, image_desc = tokens[0], tokens[1:]\n",
    "        # remove filename from image id\n",
    "        image_id = image_id.split('.')[0]\n",
    "        # convert description tokens back to string\n",
    "        image_desc = ' '.join(image_desc)\n",
    "        # create the list if needed\n",
    "        if image_id not in mapping:\n",
    "            mapping[image_id] = list()\n",
    "        # store description\n",
    "        mapping[image_id].append(image_desc)\n",
    "    return mapping\n",
    " \n",
    "# parse descriptions\n",
    "descriptions = load_descriptions(doc)\n",
    "print(descriptions['1000268201_693b08cb0e'])\n",
    "print('Loaded: %d ' % len(descriptions))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we need to clean the description text. The descriptions are already tokenized and easy to work with.\n",
    "\n",
    "We will clean the text in the following ways in order to reduce the size of the vocabulary of words we will need to work with:\n",
    "\n",
    "- Convert all words to lowercase\n",
    "- Remove all punctuation\n",
    "- Remove all words that are one character or less in length (e.g. ‘a’)\n",
    "- Remove all words with numbers in them\n",
    "\n",
    "Below defines the `clean_descriptions()` function that, given the dictionary of image identifiers to descriptions, steps through each description and cleans the text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import string\n",
    "def clean_descriptions(descriptions):\n",
    "    # prepare translation table for removing punctuation\n",
    "    table = str.maketrans('', '', string.punctuation)\n",
    "    for key, desc_list in descriptions.items():\n",
    "        for i in range(len(desc_list)):\n",
    "            desc = desc_list[i]\n",
    "            # tokenize\n",
    "            desc = desc.split()\n",
    "            # convert to lower case\n",
    "            desc = [word.lower() for word in desc]\n",
    "            # remove punctuation from each token\n",
    "            desc = [w.translate(table) for w in desc]\n",
    "            # remove hanging 's' and 'a'\n",
    "            desc = [word for word in desc if len(word)>1]\n",
    "            # remove tokens with numbers in them\n",
    "            desc = [word for word in desc if word.isalpha()]\n",
    "            # store as string\n",
    "            desc_list[i] =  ' '.join(desc)\n",
    " \n",
    "# clean descriptions\n",
    "clean_descriptions(descriptions)\n",
    "print(descriptions['1000268201_693b08cb0e'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once cleaned, we can summarize the size of the vocabulary.\n",
    "\n",
    "Ideally, we want a vocabulary that is both expressive and as small as possible. A smaller vocabulary will result in a smaller model that will train faster.\n",
    "\n",
    "For reference, we can transform the clean descriptions into a set and print its size to get an idea of the size of our dataset vocabulary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# convert the loaded descriptions into a vocabulary of words\n",
    "def to_vocabulary(descriptions):\n",
    "    # build a list of all description strings\n",
    "    all_desc = set()\n",
    "    for key in descriptions.keys():\n",
    "        [all_desc.update(d.split()) for d in descriptions[key]]\n",
    "    return all_desc\n",
    "\n",
    " \n",
    "# summarize vocabulary\n",
    "vocabulary = to_vocabulary(descriptions)\n",
    "print('Vocabulary Size: %d' % len(vocabulary))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we can save the dictionary of image identifiers and descriptions to a new file named _descriptions.txt_, with one image identifier and description per line.\n",
    "\n",
    "Below defines the `save_descriptions()` function that, given a dictionary containing the mapping of identifiers to descriptions and a filename, saves the mapping to file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# save descriptions to file, one per line\n",
    "def save_descriptions(descriptions, filename):\n",
    "    lines = list()\n",
    "    for key, desc_list in descriptions.items():\n",
    "        for desc in desc_list:\n",
    "            lines.append(key + ' ' + desc)\n",
    "    data = '\\n'.join(lines)\n",
    "    file = open(filename, 'w')\n",
    "    file.write(data)\n",
    "    file.close()\n",
    "\n",
    "# save descriptions\n",
    "save_descriptions(descriptions, 'descriptions.txt')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Putting this all together, the complete listing is provided below."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Running the example first prints the number of loaded photo descriptions (8,092) and the size of the clean vocabulary (8,763 words)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, the clean descriptions are written to _descriptions.txt_.\n",
    "\n",
    "Taking a look at the file, we can see that the descriptions are ready for modeling. The order of descriptions in your file may vary."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "252123185_487f21e336 bunch of people are seated in stadium\n",
    "2252123185_487f21e336 crowded stadium is full of people watching an event\n",
    "2252123185_487f21e336 crowd of people fill up packed stadium\n",
    "2252123185_487f21e336 crowd sitting in an indoor stadium\n",
    "2252123185_487f21e336 stadium full of people watch game"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Develop Deep Learning Model\n",
    "\n",
    "In this section, we will define the deep learning model and fit it on the training dataset.\n",
    "\n",
    "This section is divided into the following parts:\n",
    "\n",
    "- Loading Data\n",
    "- Defining the Model\n",
    "- Fitting the Model\n",
    "- Complete Example\n",
    "\n",
    "### Loading Data\n",
    "\n",
    "First, we must load the prepared photo and text data so that we can use it to fit the model.\n",
    "\n",
    "We are going to train the data on all of the photos and captions in the training dataset. While training, we are going to monitor the performance of the model on the validation dataset and use that performance to decide when to save models to file.\n",
    "\n",
    "The train and validation dataset have been predefined in the `Flickr_8k.trainImages.txt` and `Flickr_8k.devImages.txt` files respectively, that both contain lists of photo file names. From these file names, we can extract the photo identifiers and use these identifiers to filter photos and descriptions for each set.\n",
    "\n",
    "The function `load_set()` below will load a pre-defined set of identifiers given the train or validation sets filename."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load doc into memory\n",
    "def load_doc(filename):\n",
    "    # open the file as read only\n",
    "    file = open(filename, 'r')\n",
    "    # read all text\n",
    "    text = file.read()\n",
    "    # close the file\n",
    "    file.close()\n",
    "    return text\n",
    "\n",
    "# load a pre-defined list of photo identifiers\n",
    "def load_set(filename):\n",
    "    doc = load_doc(filename)\n",
    "    dataset = list()\n",
    "    # process line by line\n",
    "    for line in doc.split('\\n'):\n",
    "        # skip empty lines\n",
    "        if len(line) < 1:\n",
    "            continue\n",
    "        # get the image identifier\n",
    "        identifier = line.split('.')[0]\n",
    "        dataset.append(identifier)\n",
    "    return set(dataset)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we can load the photos and descriptions using the pre-defined set of training or validation identifiers.\n",
    "\n",
    "Below is the function `load_clean_descriptions()` that loads the cleaned text descriptions from _descriptions.txt_ for a given set of identifiers and returns a dictionary of identifiers to lists of text descriptions.\n",
    "\n",
    "The model we will develop will generate a caption given a photo, and the caption will be generated one word at a time. The sequence of previously generated words will be provided as input. Therefore, we will need a 'first word' to kick-off the generation process and a 'last word' to signal the end of the caption.\n",
    "\n",
    "We will use the strings `'startseq'` and `'endseq'` for this purpose. These tokens are added to the loaded descriptions as they are loaded. It is important to do this now before we encode the text so that the tokens are also encoded correctly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load clean descriptions into memory\n",
    "def load_clean_descriptions(filename, dataset):\n",
    "    # load document\n",
    "    doc = load_doc(filename)\n",
    "    descriptions = dict()\n",
    "    for line in doc.split('\\n'):\n",
    "        # split line by white space\n",
    "        tokens = line.split()\n",
    "        # split id from description\n",
    "        image_id, image_desc = tokens[0], tokens[1:]\n",
    "        # skip images not in the set\n",
    "        if image_id in dataset:\n",
    "            # create list\n",
    "            if image_id not in descriptions:\n",
    "                descriptions[image_id] = list()\n",
    "            # wrap description in tokens\n",
    "            desc = 'startseq ' + ' '.join(image_desc) + ' endseq'\n",
    "            # store\n",
    "            descriptions[image_id].append(desc)\n",
    "    return descriptions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we can load the photo features for a given dataset.\n",
    "\n",
    "Below defines a function named `load_photo_features()` that loads the entire set of photo descriptions, then returns the subset of interest for a given set of photo identifiers.\n",
    "\n",
    "This is not very efficient; nevertheless, this will get us up and running quickly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load photo features\n",
    "def load_photo_features(filename, dataset):\n",
    "    # load all features\n",
    "    all_features = load(open(filename, 'rb'))\n",
    "    # filter features\n",
    "    features = {k: all_features[k] for k in dataset}\n",
    "    return features"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can pause here and test everything developed so far.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pickle import load\n",
    "# load training dataset (6K)\n",
    "filename = 'Flickr8k_text/Flickr_8k.trainImages.txt'\n",
    "train = load_set(filename)\n",
    "print('Dataset: %d' % len(train))\n",
    "# descriptions\n",
    "train_descriptions = load_clean_descriptions('descriptions.txt', train)\n",
    "print('Descriptions: train=%d' % len(train_descriptions))\n",
    "# photo features\n",
    "train_features = load_photo_features('features.pkl', train)\n",
    "print('Photos: train=%d' % len(train_features))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Running this example first loads the 6,000 photo identifiers in the test dataset. These features are then used to filter and load the cleaned description text and the pre-computed photo features.\n",
    "\n",
    "We are nearly there."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The description text will need to be encoded to numbers before it can be presented to the model as in input or compared to the model’s predictions.\n",
    "\n",
    "The first step in encoding the data is to create a consistent mapping from words to unique integer values. Keras provides the `Tokenizer` class that can learn this mapping from the loaded description data.\n",
    "\n",
    "Below defines the `to_lines()` to convert the dictionary of descriptions into a list of strings and the `create_tokenizer()` function that will fit a Tokenizer given the loaded photo description text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from keras.preprocessing.text import Tokenizer\n",
    "# convert a dictionary of clean descriptions to a list of descriptions\n",
    "def to_lines(descriptions):\n",
    "    all_desc = list()\n",
    "    for key in descriptions.keys():\n",
    "        [all_desc.append(d) for d in descriptions[key]]\n",
    "    return all_desc\n",
    "\n",
    "# fit a tokenizer given caption descriptions\n",
    "def create_tokenizer(descriptions):\n",
    "    lines = to_lines(descriptions)\n",
    "    tokenizer = Tokenizer()\n",
    "    tokenizer.fit_on_texts(lines)\n",
    "    return tokenizer\n",
    "\n",
    "# prepare tokenizer\n",
    "tokenizer = create_tokenizer(train_descriptions)\n",
    "vocab_size = len(tokenizer.word_index) + 1\n",
    "print('Vocabulary Size: %d' % vocab_size)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now encode the text.\n",
    "\n",
    "Each description will be split into words. The model will be provided one word and the photo and generate the next word. Then the first two words of the description will be provided to the model as input with the image to generate the next word. This is how the model will be trained.\n",
    "\n",
    "For example, the input sequence “little girl running in field” would be split into 6 input-output pairs to train the model:"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "X1,\t\tX2 (text sequence), \t\t\t\t\t\ty (word)\n",
    "photo\tstartseq, \t\t\t\t\t\t\t\t\tlittle\n",
    "photo\tstartseq, little,\t\t\t\t\t\t\tgirl\n",
    "photo\tstartseq, little, girl, \t\t\t\t\trunning\n",
    "photo\tstartseq, little, girl, running, \t\t\tin\n",
    "photo\tstartseq, little, girl, running, in, \t\tfield\n",
    "photo\tstartseq, little, girl, running, in, field, endseq"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Later, when the model is used to generate descriptions, the generated words will be concatenated and recursively provided as input to generate a caption for an image.\n",
    "\n",
    "The function below named `create_sequences()`, given the tokenizer, a maximum sequence length, and the dictionary of all descriptions and photos, will transform the data into input-output pairs of data for training the model. There are two input arrays to the model: one for photo features and one for the encoded text. There is one output for the model which is the encoded next word in the text sequence.\n",
    "\n",
    "The input text is encoded as integers, which will be fed to a word embedding layer. The photo features will be fed directly to another part of the model. The model will output a prediction, which will be a probability distribution over all words in the vocabulary.\n",
    "\n",
    "The output data will therefore be a one-hot encoded version of each word, representing an idealized probability distribution with 0 values at all word positions except the actual word position, which has a value of 1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create sequences of images, input sequences and output words for an image\n",
    "import numpy as np\n",
    "def create_sequences(tokenizer, max_length, descriptions, photos, vocab_size):\n",
    "    X1, X2, y = list(), list(), list()\n",
    "    # walk through each image identifier\n",
    "    for key, desc_list in descriptions.items():\n",
    "        # walk through each description for the image\n",
    "        for desc in desc_list:\n",
    "            # encode the sequence\n",
    "            seq = tokenizer.texts_to_sequences([desc])[0]\n",
    "            # split one sequence into multiple X,y pairs\n",
    "            for i in range(1, len(seq)):\n",
    "                # split into input and output pair\n",
    "                in_seq, out_seq = seq[:i], seq[i]\n",
    "                # pad input sequence\n",
    "                in_seq = pad_sequences([in_seq], maxlen=max_length)[0]\n",
    "                # encode output sequence\n",
    "                out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]\n",
    "                # store\n",
    "                X1.append(photos[key][0])\n",
    "                X2.append(in_seq)\n",
    "                y.append(out_seq)\n",
    "    return np.array(X1), np.array(X2), np.array(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will need to calculate the maximum number of words in the longest description. A short helper function named `max_length()` is defined below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# calculate the length of the description with the most words\n",
    "def max_length(descriptions):\n",
    "    lines = to_lines(descriptions)\n",
    "    return max(len(d.split()) for d in lines)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now have enough to load the data for the training and development datasets and transform the loaded data into input-output pairs for fitting a deep learning model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from keras.preprocessing.sequence import pad_sequences\n",
    "from keras.utils import to_categorical\n",
    "\n",
    "# train dataset\n",
    "\n",
    "# load training dataset (6K)\n",
    "filename = 'Flickr8k_text/Flickr_8k.trainImages.txt'\n",
    "train = load_set(filename)\n",
    "print('Dataset: %d' % len(train))\n",
    "# descriptions\n",
    "train_descriptions = load_clean_descriptions('descriptions.txt', train)\n",
    "print('Descriptions: train=%d' % len(train_descriptions))\n",
    "# photo features\n",
    "train_features = load_photo_features('features.pkl', train)\n",
    "print('Photos: train=%d' % len(train_features))\n",
    "# prepare tokenizer\n",
    "tokenizer = create_tokenizer(train_descriptions)\n",
    "vocab_size = len(tokenizer.word_index) + 1\n",
    "print('Vocabulary Size: %d' % vocab_size)\n",
    "# determine the maximum sequence length\n",
    "max_length = max_length(train_descriptions)\n",
    "print('Description Length: %i' % max_length)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# prepare sequences\n",
    "X1train, X2train, ytrain = create_sequences(tokenizer, max_length, train_descriptions, train_features, vocab_size)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# dev dataset\n",
    "\n",
    "# load test set\n",
    "filename = 'Flickr8k_text/Flickr_8k.devImages.txt'\n",
    "test = load_set(filename)\n",
    "print('Dataset: %d' % len(test))\n",
    "# descriptions\n",
    "test_descriptions = load_clean_descriptions('descriptions.txt', test)\n",
    "print('Descriptions: test=%d' % len(test_descriptions))\n",
    "# photo features\n",
    "test_features = load_photo_features('features.pkl', test)\n",
    "print('Photos: test=%d' % len(test_features))\n",
    "# prepare sequences\n",
    "X1test, X2test, ytest = create_sequences(tokenizer, max_length, test_descriptions, test_features, vocab_size)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Defining the Model\n",
    "\n",
    "We will define a deep learning based on the “merge-model” described by Marc Tanti, et al. in their 2017 papers:\n",
    "\n",
    "- [Where to put the Image in an Image Caption Generator, 2017.](https://arxiv.org/abs/1703.09137)\n",
    "- [What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator?, 2017.](https://arxiv.org/abs/1708.02043)\n",
    "\n",
    "For a gentle introduction to this architecture, see the post:\n",
    "\n",
    "- [Caption Generation with the Inject and Merge Architectures for the Encoder-Decoder Model](https://machinelearningmastery.com/caption-generation-inject-merge-architectures-encoder-decoder-model/)\n",
    "\n",
    "The authors provide a nice schematic of the model, reproduced below.\n",
    "\n",
    "<img src=\"./Bilder/schematic_image_captioning.png\" width=\"600\">\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will describe the model in three parts:\n",
    "\n",
    "- __Photo Feature Extractor.__ This is a 16-layer VGG model pre-trained on the ImageNet dataset. We have pre-processed the photos with the VGG model (without the output layer) and will use the extracted features predicted by this model as input.\n",
    "- __Sequence Processor.__ This is a word embedding layer for handling the text input, followed by a Long Short-Term Memory (LSTM) recurrent neural network layer.\n",
    "- __Decoder__ (for lack of a better name). Both the feature extractor and sequence processor output a fixed-length vector. These are merged together and processed by a Dense layer to make a final prediction.\n",
    "\n",
    "The Photo Feature Extractor model expects input photo features to be a vector of 4,096 elements. These are processed by a Dense layer to produce a 256 element representation of the photo.\n",
    "\n",
    "The Sequence Processor model expects input sequences with a pre-defined length (34 words) which are fed into an Embedding layer that uses a mask to ignore padded values. This is followed by an LSTM layer with 256 memory units.\n",
    "\n",
    "Both the input models produce a 256 element vector. Further, both input models use regularization in the form of 50% dropout. This is to reduce overfitting the training dataset, as this model configuration learns very fast.\n",
    "\n",
    "The Decoder model merges the vectors from both input models using an addition operation. This is then fed to a Dense 256 neuron layer and then to a final output Dense layer that makes a softmax prediction over the entire output vocabulary for the next word in the sequence.\n",
    "\n",
    "The function below named `define_model()` defines and returns the model ready to be fit."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define the captioning model\n",
    "from keras.utils import to_categorical\n",
    "from keras.utils import plot_model\n",
    "from keras.models import Model\n",
    "from keras.layers import Input\n",
    "from keras.layers import Dense\n",
    "from keras.layers import LSTM\n",
    "from keras.layers import Embedding\n",
    "from keras.layers import Dropout\n",
    "from keras.layers.merge import add\n",
    "from keras.callbacks import ModelCheckpoint\n",
    "def define_model(vocab_size, max_length):\n",
    "    # feature extractor model\n",
    "    inputs1 = Input(shape=(4096,))\n",
    "    fe1 = Dropout(0.5)(inputs1)\n",
    "    fe2 = Dense(256, activation='relu')(fe1)\n",
    "    # sequence model\n",
    "    inputs2 = Input(shape=(max_length,))\n",
    "    se1 = Embedding(vocab_size, 256, mask_zero=True)(inputs2)\n",
    "    se2 = Dropout(0.5)(se1)\n",
    "    se3 = LSTM(256)(se2)\n",
    "    # decoder model\n",
    "    decoder1 = add([fe2, se3])\n",
    "    decoder2 = Dense(256, activation='relu')(decoder1)\n",
    "    outputs = Dense(vocab_size, activation='softmax')(decoder2)\n",
    "    # tie it together [image, seq] [word]\n",
    "    model = Model(inputs=[inputs1, inputs2], outputs=outputs)\n",
    "    model.compile(loss='categorical_crossentropy', optimizer='adam')\n",
    "    # summarize model\n",
    "    print(model.summary())\n",
    "    plot_model(model, to_file='model.png', show_shapes=True)\n",
    "    return model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To get a sense for the structure of the model, specifically the shapes of the layers, see the summary listed below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define the model\n",
    "model = define_model(vocab_size, max_length)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Running the example first prints a summary of the loaded training and development datasets.\n",
    "\n",
    "\n",
    "\n",
    "We also create a plot to visualize the structure of the network that better helps understand the two streams of input.\n",
    "Plot of the Caption Generation Deep Learning Model\n",
    "\n",
    "<img src=\"./Bilder/model.png\" width=\"600\">\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Fitting the Model\n",
    "\n",
    "Now that we know how to define the model, we can fit it on the training dataset.\n",
    "\n",
    "The model learns fast and quickly overfits the training dataset. For this reason, we will monitor the skill of the trained model on the holdout validation dataset. When the skill of the model on the validation dataset improves at the end of an epoch, we will save the whole model to file.\n",
    "\n",
    "At the end of the run, we can then use the saved model with the best skill on the training dataset as our final model.\n",
    "\n",
    "We can do this by defining a `ModelCheckpoint` in Keras and specifying it to monitor the minimum loss on the validation dataset and save the model to a file that has both the training and validation loss in the filename."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define checkpoint callback\n",
    "filepath = 'model-ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5'\n",
    "checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=True, mode='min')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can then specify the checkpoint in the call to `fit()` via the callbacks argument. We must also specify the development dataset in `fit()` via the validation_data argument.\n",
    "\n",
    "We will only fit the model for 20 epochs, but given the amount of training data, each epoch may take 30 minutes on modern hardware."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# fit model\n",
    "model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "After the summary of the model, we can get an idea of the total number of training and validation (development) input-output pairs."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Train on 306,404 samples, validate on 50,903 samples"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The model then runs, saving the best model to .h5 files along the way.\n",
    "\n",
    "On my run, the best validation results were saved to the file:\n",
    "\n",
    "- __model-ep002-loss3.245-val_loss3.612.h5__\n",
    "\n",
    "This model was saved at the end of epoch 2 with a loss of 3.245 on the training dataset and a loss of 3.612 on the validation dataset\n",
    "\n",
    "Your specific results will vary.\n",
    "\n",
    "\n",
    "Did you get an error like:"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "Memory Error"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then run it on Colab or run the following code:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluate Model\n",
    "\n",
    "Once the model is fit, we can evaluate the skill of its predictions on the holdout test dataset.\n",
    "\n",
    "We will evaluate a model by generating descriptions for all photos in the test dataset and evaluating those predictions with a standard cost function.\n",
    "\n",
    "First, we need to be able to generate a description for a photo using a trained model.\n",
    "\n",
    "This involves passing in the start description token `‘startseq‘`, generating one word, then calling the model recursively with generated words as input until the end of sequence token is reached ‘endseq‘ or the maximum description length is reached.\n",
    "\n",
    "The function below named `generate_desc()` implements this behavior and generates a textual description given a trained model, and a given prepared photo as input. It calls the function `word_for_id()` in order to map an integer prediction back to a word."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# map an integer to a word\n",
    "def word_for_id(integer, tokenizer):\n",
    "    for word, index in tokenizer.word_index.items():\n",
    "        if index == integer:\n",
    "            return word\n",
    "    return None\n",
    "\n",
    "# generate a description for an image\n",
    "def generate_desc(model, tokenizer, photo, max_length):\n",
    "    # seed the generation process\n",
    "    in_text = 'startseq'\n",
    "    # iterate over the whole length of the sequence\n",
    "    for i in range(max_length):\n",
    "        # integer encode input sequence\n",
    "        sequence = tokenizer.texts_to_sequences([in_text])[0]\n",
    "        # pad input\n",
    "        sequence = pad_sequences([sequence], maxlen=max_length)\n",
    "        # predict next word\n",
    "        yhat = model.predict([photo, sequence], verbose=0)\n",
    "        # convert probability to integer\n",
    "        yhat = argmax(yhat)\n",
    "        # map integer to word\n",
    "        word = word_for_id(yhat, tokenizer)\n",
    "        # stop if we cannot map the word\n",
    "        if word is None:\n",
    "            break\n",
    "        # append as input for generating the next word\n",
    "        in_text += ' ' + word\n",
    "        # stop if we predict the end of the sequence\n",
    "        if word == 'endseq':\n",
    "            break\n",
    "    return in_text"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will generate predictions for all photos in the test dataset and in the train dataset.\n",
    "\n",
    "The function below named `evaluate_model()` will evaluate a trained model against a given dataset of photo descriptions and photo features. The actual and predicted descriptions are collected and evaluated collectively using the corpus BLEU score that summarizes how close the generated text is to the expected text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# evaluate the skill of the model\n",
    "def evaluate_model(model, descriptions, photos, tokenizer, max_length):\n",
    "    actual, predicted = list(), list()\n",
    "    # step over the whole set\n",
    "    for key, desc_list in descriptions.items():\n",
    "        # generate description\n",
    "        yhat = generate_desc(model, tokenizer, photos[key], max_length)\n",
    "        # store actual and predicted\n",
    "        references = [d.split() for d in desc_list]\n",
    "        actual.append(references)\n",
    "        predicted.append(yhat.split())\n",
    "    # calculate BLEU score\n",
    "    print('BLEU-1: %f' % corpus_bleu(actual, predicted, weights=(1.0, 0, 0, 0)))\n",
    "    print('BLEU-2: %f' % corpus_bleu(actual, predicted, weights=(0.5, 0.5, 0, 0)))\n",
    "    print('BLEU-3: %f' % corpus_bleu(actual, predicted, weights=(0.3, 0.3, 0.3, 0)))\n",
    "    print('BLEU-4: %f' % corpus_bleu(actual, predicted, weights=(0.25, 0.25, 0.25, 0.25)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "BLEU scores are used in text translation for evaluating translated text against one or more reference translations.\n",
    "\n",
    "Here, we compare each generated description against all of the reference descriptions for the photograph. We then calculate BLEU scores for 1, 2, 3 and 4 cumulative n-grams.\n",
    "\n",
    "You can learn more about the BLEU score here:\n",
    "\n",
    "- [A Gentle Introduction to Calculating the BLEU Score for Text in Python](https://machinelearningmastery.com/calculate-bleu-score-for-text-python/)\n",
    "\n",
    "The [NLTK Python library implements the BLEU score calculation](http://www.nltk.org/api/nltk.translate.html) in the `corpus_bleu()` function. A higher score close to 1.0 is better, a score closer to zero is worse.\n",
    "\n",
    "We can put all of this together with the functions from the previous section for loading the data. We first need to load the training dataset in order to prepare a Tokenizer so that we can encode generated words as input sequences for the model. It is critical that we encode the generated words using exactly the same encoding scheme as was used when training the model.\n",
    "\n",
    "We then use these functions for loading the test dataset.\n",
    "\n",
    "The complete example is listed below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from numpy import argmax\n",
    "from pickle import load\n",
    "from tensorflow.keras.preprocessing.text import Tokenizer\n",
    "from tensorflow.keras.preprocessing.sequence import pad_sequences\n",
    "from tensorflow.keras.models import load_model\n",
    "from nltk.translate.bleu_score import corpus_bleu\n",
    "\n",
    "# load doc into memory\n",
    "def load_doc(filename):\n",
    "    # open the file as read only\n",
    "    file = open(filename, 'r')\n",
    "    # read all text\n",
    "    text = file.read()\n",
    "    # close the file\n",
    "    file.close()\n",
    "    return text\n",
    "\n",
    "# load a pre-defined list of photo identifiers\n",
    "def load_set(filename):\n",
    "    doc = load_doc(filename)\n",
    "    dataset = list()\n",
    "    # process line by line\n",
    "    for line in doc.split('\\n'):\n",
    "        # skip empty lines\n",
    "        if len(line) < 1:\n",
    "            continue\n",
    "        # get the image identifier\n",
    "        identifier = line.split('.')[0]\n",
    "        dataset.append(identifier)\n",
    "    return set(dataset)\n",
    "\n",
    "# load clean descriptions into memory\n",
    "def load_clean_descriptions(filename, dataset):\n",
    "    # load document\n",
    "    doc = load_doc(filename)\n",
    "    descriptions = dict()\n",
    "    for line in doc.split('\\n'):\n",
    "        # split line by white space\n",
    "        tokens = line.split()\n",
    "        # split id from description\n",
    "        image_id, image_desc = tokens[0], tokens[1:]\n",
    "        # skip images not in the set\n",
    "        if image_id in dataset:\n",
    "            # create list\n",
    "            if image_id not in descriptions:\n",
    "                descriptions[image_id] = list()\n",
    "            # wrap description in tokens\n",
    "            desc = 'startseq ' + ' '.join(image_desc) + ' endseq'\n",
    "            # store\n",
    "            descriptions[image_id].append(desc)\n",
    "    return descriptions\n",
    "\n",
    "# load photo features\n",
    "def load_photo_features(filename, dataset):\n",
    "    # load all features\n",
    "    all_features = load(open(filename, 'rb'))\n",
    "    # filter features\n",
    "    features = {k: all_features[k] for k in dataset}\n",
    "    return features\n",
    "\n",
    "# covert a dictionary of clean descriptions to a list of descriptions\n",
    "def to_lines(descriptions):\n",
    "    all_desc = list()\n",
    "    for key in descriptions.keys():\n",
    "        [all_desc.append(d) for d in descriptions[key]]\n",
    "    return all_desc\n",
    "\n",
    "# fit a tokenizer given caption descriptions\n",
    "def create_tokenizer(descriptions):\n",
    "    lines = to_lines(descriptions)\n",
    "    tokenizer = Tokenizer()\n",
    "    tokenizer.fit_on_texts(lines)\n",
    "    return tokenizer\n",
    "\n",
    "# calculate the length of the description with the most words\n",
    "def max_length(descriptions):\n",
    "    lines = to_lines(descriptions)\n",
    "    return max(len(d.split()) for d in lines)\n",
    "\n",
    "# map an integer to a word\n",
    "def word_for_id(integer, tokenizer):\n",
    "    for word, index in tokenizer.word_index.items():\n",
    "        if index == integer:\n",
    "            return word\n",
    "    return None\n",
    "\n",
    "# generate a description for an image\n",
    "def generate_desc(model, tokenizer, photo, max_length):\n",
    "    # seed the generation process\n",
    "    in_text = 'startseq'\n",
    "    # iterate over the whole length of the sequence\n",
    "    for i in range(max_length):\n",
    "        # integer encode input sequence\n",
    "        sequence = tokenizer.texts_to_sequences([in_text])[0]\n",
    "        # pad input\n",
    "        sequence = pad_sequences([sequence], maxlen=max_length)\n",
    "        # predict next word\n",
    "        yhat = model.predict([photo,sequence], verbose=0)\n",
    "        # convert probability to integer\n",
    "        yhat = argmax(yhat)\n",
    "        # map integer to word\n",
    "        word = word_for_id(yhat, tokenizer)\n",
    "        # stop if we cannot map the word\n",
    "        if word is None:\n",
    "            break\n",
    "        # append as input for generating the next word\n",
    "        in_text += ' ' + word\n",
    "        # stop if we predict the end of the sequence\n",
    "        if word == 'endseq':\n",
    "            break\n",
    "    return in_text\n",
    "\n",
    "# evaluate the skill of the model\n",
    "def evaluate_model(model, descriptions, photos, tokenizer, max_length):\n",
    "    actual, predicted = list(), list()\n",
    "    # step over the whole set\n",
    "    for key, desc_list in descriptions.items():\n",
    "        # generate description\n",
    "        yhat = generate_desc(model, tokenizer, photos[key], max_length)\n",
    "        # store actual and predicted\n",
    "        references = [d.split() for d in desc_list]\n",
    "        actual.append(references)\n",
    "        predicted.append(yhat.split())\n",
    "    # calculate BLEU score\n",
    "    print('BLEU-1: %f' % corpus_bleu(actual, predicted, weights=(1.0, 0, 0, 0)))\n",
    "    print('BLEU-2: %f' % corpus_bleu(actual, predicted, weights=(0.5, 0.5, 0, 0)))\n",
    "    print('BLEU-3: %f' % corpus_bleu(actual, predicted, weights=(0.3, 0.3, 0.3, 0)))\n",
    "    print('BLEU-4: %f' % corpus_bleu(actual, predicted, weights=(0.25, 0.25, 0.25, 0.25)))\n",
    "\n",
    "# prepare tokenizer on train set\n",
    "\n",
    "# load training dataset (6K)\n",
    "filename = 'Flickr8k_text/Flickr_8k.trainImages.txt'\n",
    "train = load_set(filename)\n",
    "print('Dataset: %d' % len(train))\n",
    "# descriptions\n",
    "train_descriptions = load_clean_descriptions('./descriptions.txt', train)\n",
    "print('Descriptions: train=%d' % len(train_descriptions))\n",
    "# prepare tokenizer\n",
    "tokenizer = create_tokenizer(train_descriptions)\n",
    "vocab_size = len(tokenizer.word_index) + 1\n",
    "print('Vocabulary Size: %d' % vocab_size)\n",
    "# determine the maximum sequence length\n",
    "max_length = max_length(train_descriptions)\n",
    "print('Description Length: %d' % max_length)\n",
    "\n",
    "# prepare test set\n",
    "\n",
    "# load test set\n",
    "filename = 'Flickr8k_text/Flickr_8k.testImages.txt'\n",
    "test = load_set(filename)\n",
    "print('Dataset: %d' % len(test))\n",
    "# descriptions\n",
    "test_descriptions = load_clean_descriptions('./descriptions.txt', test)\n",
    "print('Descriptions: test=%d' % len(test_descriptions))\n",
    "# photo features\n",
    "test_features = load_photo_features('./features.pkl', test)\n",
    "print('Photos: test=%d' % len(test_features))\n",
    "\n",
    "# load the model; either the model with the smallest validiation loss\n",
    "# or the model which was obtained\n",
    "filename = 'model_19.h5'\n",
    "model = load_model(filename)\n",
    "# evaluate model\n",
    "evaluate_model(model, test_descriptions, test_features, tokenizer, max_length)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Running the example prints the BLEU scores.\n",
    "\n",
    "We can see that the scores fit within and close to the top of the expected range of a skillful model on the problem. The chosen model configuration is by no means optimized."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "BLEU-1: 0.537660\n",
    "BLEU-2: 0.284404\n",
    "BLEU-3: 0.190370\n",
    "BLEU-4: 0.087817"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Generate New Captions\n",
    "\n",
    "Now that we know how to develop and evaluate a caption generation model, how can we use it?\n",
    "\n",
    "Almost everything we need to generate captions for entirely new photographs is in the model file.\n",
    "\n",
    "We also need the Tokenizer for encoding generated words for the model while generating a sequence, and the maximum length of input sequences, used when we defined the model (e.g. 34).\n",
    "\n",
    "We can hard code the maximum sequence length. With the encoding of text, we can create the tokenizer and save it to a file so that we can load it quickly whenever we need it without needing the entire _Flickr8K_ dataset. An alternative would be to use our own vocabulary file and mapping to integers function during training.\n",
    "\n",
    "We can create the Tokenizer as before and save it as a pickle file _tokenizer.pkl_. The complete example is listed below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tensorflow.keras.preprocessing.text import Tokenizer\n",
    "from pickle import dump\n",
    "\n",
    "# load doc into memory\n",
    "def load_doc(filename):\n",
    "    # open the file as read only\n",
    "    file = open(filename, 'r')\n",
    "    # read all text\n",
    "    text = file.read()\n",
    "    # close the file\n",
    "    file.close()\n",
    "    return text\n",
    "\n",
    "# load a pre-defined list of photo identifiers\n",
    "def load_set(filename):\n",
    "    doc = load_doc(filename)\n",
    "    dataset = list()\n",
    "    # process line by line\n",
    "    for line in doc.split('\\n'):\n",
    "        # skip empty lines\n",
    "        if len(line) < 1:\n",
    "            continue\n",
    "        # get the image identifier\n",
    "        identifier = line.split('.')[0]\n",
    "        dataset.append(identifier)\n",
    "    return set(dataset)\n",
    "\n",
    "# load clean descriptions into memory\n",
    "def load_clean_descriptions(filename, dataset):\n",
    "    # load document\n",
    "    doc = load_doc(filename)\n",
    "    descriptions = dict()\n",
    "    for line in doc.split('\\n'):\n",
    "        # split line by white space\n",
    "        tokens = line.split()\n",
    "        # split id from description\n",
    "        image_id, image_desc = tokens[0], tokens[1:]\n",
    "        # skip images not in the set\n",
    "        if image_id in dataset:\n",
    "            # create list\n",
    "            if image_id not in descriptions:\n",
    "                descriptions[image_id] = list()\n",
    "            # wrap description in tokens\n",
    "            desc = 'startseq ' + ' '.join(image_desc) + ' endseq'\n",
    "            # store\n",
    "            descriptions[image_id].append(desc)\n",
    "    return descriptions\n",
    "\n",
    "# covert a dictionary of clean descriptions to a list of descriptions\n",
    "def to_lines(descriptions):\n",
    "    all_desc = list()\n",
    "    for key in descriptions.keys():\n",
    "        [all_desc.append(d) for d in descriptions[key]]\n",
    "    return all_desc\n",
    "\n",
    "# fit a tokenizer given caption descriptions\n",
    "def create_tokenizer(descriptions):\n",
    "    lines = to_lines(descriptions)\n",
    "    tokenizer = Tokenizer()\n",
    "    tokenizer.fit_on_texts(lines)\n",
    "    return tokenizer\n",
    "\n",
    "# load training dataset (6K)\n",
    "filename = 'Flickr8k_text/Flickr_8k.trainImages.txt'\n",
    "train = load_set(filename)\n",
    "print('Dataset: %d' % len(train))\n",
    "# descriptions\n",
    "train_descriptions = load_clean_descriptions('./descriptions.txt', train)\n",
    "print('Descriptions: train=%d' % len(train_descriptions))\n",
    "# prepare tokenizer\n",
    "tokenizer = create_tokenizer(train_descriptions)\n",
    "# save the tokenizer\n",
    "dump(tokenizer, open('tokenizer_flickr.pkl', 'wb'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now load the tokenizer whenever we need it without having to load the entire training dataset of annotations.\n",
    "\n",
    "Now, let’s generate a description for a new photograph.\n",
    "\n",
    "Below is a new photograph from my photo album:\n",
    "\n",
    "<img src=\"./Bilder/example_10.jpg\" width=\"600\">\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will generate a description for it using our model.\n",
    "\n",
    "Download the photograph and save it to your local directory with the filename _example.jpg_.\n",
    "\n",
    "First, we must load the Tokenizer from `tokenizer_flickr.pk` and define the maximum length of the sequence to generate, needed for padding inputs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load the tokenizer\n",
    "tokenizer = load(open('tokenizer_flickr.pkl', 'rb'))\n",
    "# pre-define the max sequence length (from training)\n",
    "max_length = 34"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then we must load the model, as before."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load the model\n",
    "model = load_model('model_19.h5')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we must load the photo from which we want to describe and extract the features.\n",
    "\n",
    "We could do this by re-defining the model and adding the VGG-16 model to it, or we can use the VGG model to predict the features and use them as inputs to our existing model. We will do the latter and use a modified version of the `extract_features()` function used during data preparation, but adapted to work on a single photo."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from PIL import Image\n",
    "from pickle import load\n",
    "from numpy import argmax\n",
    "from tensorflow.keras.preprocessing.sequence import pad_sequences\n",
    "from tensorflow.keras.applications.vgg16 import VGG16\n",
    "from tensorflow.keras.preprocessing.image import load_img\n",
    "from tensorflow.keras.preprocessing.image import img_to_array\n",
    "from tensorflow.keras.applications.vgg16 import preprocess_input\n",
    "from tensorflow.keras.models import Model\n",
    "from tensorflow.keras.models import load_model\n",
    "# extract features from each photo in the directory\n",
    "def extract_features(filename):\n",
    "    # load the model\n",
    "    model = VGG16()\n",
    "    # re-structure the model\n",
    "    model.layers.pop()\n",
    "    model = Model(inputs=model.inputs, outputs=model.layers[-1].output)\n",
    "    # load the photo\n",
    "    image = load_img(filename, target_size=(224, 224))\n",
    "    # convert the image pixels to a numpy array\n",
    "    image = img_to_array(image)\n",
    "    # reshape data for the model\n",
    "    image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))\n",
    "    # prepare the image for the VGG model\n",
    "    image = preprocess_input(image)\n",
    "    # get features\n",
    "    feature = model.predict(image, verbose=0)\n",
    "    return feature\n",
    "\n",
    "# load and prepare the photograph\n",
    "photo = extract_features('./Bilder/example_10.jp2')\n",
    "print(photo.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can then generate a description using the `generate_desc()` function defined when evaluating the model.\n",
    "\n",
    "The complete example for generating a description for an entirely new standalone photograph is listed below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pickle import load\n",
    "from numpy import argmax\n",
    "from keras.preprocessing.sequence import pad_sequences\n",
    "from keras.applications.vgg16 import VGG16\n",
    "from keras.preprocessing.image import load_img\n",
    "from keras.preprocessing.image import img_to_array\n",
    "from keras.applications.vgg16 import preprocess_input\n",
    "from keras.models import Model\n",
    "from keras.models import load_model\n",
    "\n",
    "# extract features from each photo in the directory\n",
    "def extract_features(filename):\n",
    "    # load the model\n",
    "    model = VGG16()\n",
    "    # re-structure the model\n",
    "    model.layers.pop()\n",
    "    model = Model(inputs=model.inputs, outputs=model.layers[-2].output)\n",
    "    # load the photo\n",
    "    image = load_img(filename, target_size=(224, 224))\n",
    "    # convert the image pixels to a numpy array\n",
    "    image = img_to_array(image)\n",
    "    # reshape data for the model\n",
    "    image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))\n",
    "    # prepare the image for the VGG model\n",
    "    image = preprocess_input(image)\n",
    "    # get features\n",
    "    feature = model.predict(image, verbose=0)\n",
    "    return feature\n",
    "\n",
    "# map an integer to a word\n",
    "def word_for_id(integer, tokenizer):\n",
    "    for word, index in tokenizer.word_index.items():\n",
    "        if index == integer:\n",
    "            return word\n",
    "    return None\n",
    "\n",
    "# generate a description for an image\n",
    "def generate_desc(model, tokenizer, photo, max_length):\n",
    "    # seed the generation process\n",
    "    in_text = 'startseq'\n",
    "    # iterate over the whole length of the sequence\n",
    "    for i in range(max_length):\n",
    "        # integer encode input sequence\n",
    "        sequence = tokenizer.texts_to_sequences([in_text])[0]\n",
    "        # pad input\n",
    "        sequence = pad_sequences([sequence], maxlen=max_length)\n",
    "        # predict next word\n",
    "        yhat = model.predict([photo,sequence], verbose=0)\n",
    "        # convert probability to integer\n",
    "        yhat = argmax(yhat)\n",
    "        # map integer to word\n",
    "        word = word_for_id(yhat, tokenizer)\n",
    "        # stop if we cannot map the word\n",
    "        if word is None:\n",
    "            break\n",
    "        # append as input for generating the next word\n",
    "        in_text += ' ' + word\n",
    "        # stop if we predict the end of the sequence\n",
    "        if word == 'endseq':\n",
    "            break\n",
    "    return in_text\n",
    "\n",
    "# load the tokenizer\n",
    "tokenizer = load(open('tokenizer_flickr.pkl', 'rb'))\n",
    "# pre-define the max sequence length (from training)\n",
    "max_length = 34\n",
    "# load the model\n",
    "model = load_model('model_19.h5')\n",
    "# load and prepare the photograph\n",
    "photo = extract_features('./Bilder/example_10.jp2')\n",
    "# generate description\n",
    "description = generate_desc(model, tokenizer, photo, max_length)\n",
    "print(description)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this case, the description generated was as follows:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "startseq man climbs rock face endseq"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You could remove the start and end tokens and you would have the basis for a nice automatic photo captioning model.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Extensions for Your Course Project\n",
    "\n",
    "This section lists some ideas for extending the tutorial that you may wish to explore.\n",
    "\n",
    "- Alternate Pre-Trained Photo Models. A small 16-layer VGG model was used for feature extraction. Consider exploring larger models that offer better performance on the ImageNet dataset, such as Inception or EfficientNet.\n",
    "    \n",
    "- Smaller Vocabulary. A larger vocabulary of nearly eight thousand words was used in the development of the model. Many of the words supported may be misspellings or only used once in the entire dataset. Refine the vocabulary and reduce the size, perhaps by half.\n",
    "    \n",
    "- Pre-trained Word Vectors. The model learned the word vectors as part of fitting the model. Better performance may be achieved by using word vectors either pre-trained on the training dataset or trained on a much larger corpus of text, such as news articles or Wikipedia.]()\n",
    "    \n",
    "- Tune Model. The configuration of the model was not tuned on the problem. Explore alternate configurations and see if you can achieve better performance.\n",
    "\n",
    "- Get acquainted with the [Zalando fashion dataset](https://github.com/zalandoresearch/feidegger/) which is a multi-modal Corpus of fashion images and descriptions in german and train an image captioning model.\n",
    "\n",
    "\n",
    "This section provides more resources on the topic if you are looking go deeper.\n",
    "\n",
    "### Caption Generation Papers\n",
    "\n",
    "- [Show and Tell: A Neural Image Caption Generator, 2015.](https://arxiv.org/abs/1411.4555)\n",
    "- [Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, 2015.](https://arxiv.org/abs/1502.03044)\n",
    "- [Where to put the Image in an Image Caption Generator, 2017.](https://arxiv.org/abs/1703.09137)\n",
    "- [What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator?, 2017.](https://machinelearningmastery.com/develop-a-deep-learning-caption-generation-model-in-python/)\n",
    "- [Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures, 2016.](https://arxiv.org/abs/1601.03896)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part IV : Image Captioning with Visual Attention\n",
    "\n",
    "\n",
    "Given an image like the example below, our goal is to generate a caption such as \"a surfer riding on a wave\".\n",
    "\n",
    "![alt text](./surf.jpg \"Title\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To accomplish this, you'll use an attention-based model, which enables us to see what parts of the image the model focuses on as it generates a caption.\n",
    "\n",
    "\n",
    "![alt text](./imcap_prediction.png \"Title\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The model architecture is similar to [Show, Attend and Tell: Neural Image Caption Generation with Visual Attention](https://arxiv.org/abs/1502.03044).\n",
    "\n",
    "This notebook is an end-to-end example. When you run the notebook, it downloads the [MS-COCO](http://cocodataset.org/#home) dataset, preprocesses and caches a subset of images using Inception V3, trains an encoder-decoder model, and generates captions on new images using the trained model.\n",
    "\n",
    "In this example, you will train a model on a relatively small amount of data—the first 30,000 captions for about 20,000 images (because there are multiple captions per image in the dataset)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "\n",
    "# You'll generate plots of attention in order to see which parts of an image\n",
    "# our model focuses on during captioning\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# Scikit-learn includes many helpful utilities\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.utils import shuffle\n",
    "\n",
    "import re\n",
    "import numpy as np\n",
    "import os\n",
    "import time\n",
    "import json\n",
    "from glob import glob\n",
    "from PIL import Image\n",
    "import pickle"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Download and prepare the MS-COCO dataset\n",
    "\n",
    "You will use the [MS-COCO dataset](http://cocodataset.org/#home) to train our model. The dataset contains over 82,000 images, each of which has at least 5 different caption annotations. The code below downloads and extracts the dataset automatically.\n",
    "\n",
    "__Caution: large download ahead__. You'll use the training set, which is a 13GB file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Download caption annotation files\n",
    "annotation_folder = '/annotations/'\n",
    "if not os.path.exists(os.path.abspath('.') + annotation_folder):\n",
    "  annotation_zip = tf.keras.utils.get_file('captions.zip',\n",
    "                                          cache_subdir=os.path.abspath('.'),\n",
    "                                          origin = 'http://images.cocodataset.org/annotations/annotations_trainval2014.zip',\n",
    "                                          extract = True)\n",
    "  annotation_file = os.path.dirname(annotation_zip)+'/annotations/captions_train2014.json'\n",
    "  os.remove(annotation_zip)\n",
    "\n",
    "# Download image files\n",
    "image_folder = '/train2014/'\n",
    "if not os.path.exists(os.path.abspath('.') + image_folder):\n",
    "  image_zip = tf.keras.utils.get_file('train2014.zip',\n",
    "                                      cache_subdir=os.path.abspath('.'),\n",
    "                                      origin = 'http://images.cocodataset.org/zips/train2014.zip',\n",
    "                                      extract = True)\n",
    "  PATH = os.path.dirname(image_zip) + image_folder\n",
    "  os.remove(image_zip)\n",
    "else:\n",
    "  PATH = os.path.abspath('.') + image_folder"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Optional: limit the size of the training set\n",
    "\n",
    "To speed up training for this tutorial, you'll use a subset of 30,000 captions and their corresponding images to train our model. Choosing to use more data would result in improved captioning quality."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Read the json file\n",
    "annotation_file = './annotations/captions_train2014.json'\n",
    "\n",
    "with open(annotation_file, 'r') as f:\n",
    "    annotations = json.load(f)\n",
    "\n",
    "# Store captions and image names in vectors\n",
    "all_captions = []\n",
    "all_img_name_vector = []\n",
    "\n",
    "for annot in annotations['annotations']:\n",
    "    caption = '<start> ' + annot['caption'] + ' <end>'\n",
    "    image_id = annot['image_id']\n",
    "    full_coco_image_path = PATH + 'COCO_train2014_' + '%012d.jpg' % (image_id)\n",
    "\n",
    "    all_img_name_vector.append(full_coco_image_path)\n",
    "    all_captions.append(caption)\n",
    "\n",
    "# Shuffle captions and image_names together\n",
    "# Set a random state\n",
    "train_captions, img_name_vector = shuffle(all_captions,\n",
    "                                          all_img_name_vector,\n",
    "                                          random_state=1)\n",
    "\n",
    "# Select the first 30000 captions from the shuffled set\n",
    "num_examples = 30000\n",
    "train_captions = train_captions[:num_examples]\n",
    "img_name_vector = img_name_vector[:num_examples]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "len(train_captions), len(all_captions)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preprocess the images using InceptionV3\n",
    "\n",
    "Next, you will use InceptionV3 (which is pretrained on Imagenet) to classify each image. You will extract features from the last convolutional layer.\n",
    "\n",
    "First, you will convert the images into InceptionV3's expected format by:\n",
    "\n",
    "- Resizing the image to 299px by 299px\n",
    "- [Preprocess the images](https://cloud.google.com/tpu/docs/inception-v3-advanced#preprocessing_stage) using the [preprocess_input](https://www.tensorflow.org/api_docs/python/tf/keras/applications/inception_v3/preprocess_input) method to normalize the image so that it contains pixels in the range of -1 to 1, which matches the format of the images used to train InceptionV3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def load_image(image_path):\n",
    "    img = tf.io.read_file(image_path)\n",
    "    img = tf.image.decode_jpeg(img, channels=3)\n",
    "    img = tf.image.resize(img, (299, 299))\n",
    "    img = tf.keras.applications.inception_v3.preprocess_input(img)\n",
    "    return img, image_path"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Initialize InceptionV3 and load the pretrained Imagenet weights\n",
    "\n",
    "Now you'll create a tf.keras model where the output layer is the last convolutional layer in the InceptionV3 architecture. The shape of the output of this layer is 8x8x2048. You use the last convolutional layer because you are using attention in this example. You don't perform this initialization during training because it could become a bottleneck.\n",
    "\n",
    "- You forward each image through the network and store the resulting vector in a dictionary (image_name --> feature_vector).\n",
    "- After all the images are passed through the network, you pickle the dictionary and save it to disk."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "image_model = tf.keras.applications.InceptionV3(include_top=False,\n",
    "                                                weights='imagenet')\n",
    "new_input = image_model.input\n",
    "hidden_layer = image_model.layers[-1].output\n",
    "\n",
    "image_features_extract_model = tf.keras.Model(new_input, hidden_layer)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Caching the features extracted from InceptionV3\n",
    "\n",
    "You will pre-process each image with InceptionV3 and cache the output to disk. Caching the output in RAM would be faster but also memory intensive, requiring 8 * 8 * 2048 floats per image. At the time of writing, this exceeds the memory limitations of Colab (currently 12GB of memory).\n",
    "\n",
    "Performance could be improved with a more sophisticated caching strategy (for example, by sharding the images to reduce random access disk I/O), but that would require more code.\n",
    "\n",
    "The caching will take about 10 minutes to run in Colab with a GPU. If you'd like to see a progress bar, you can:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. install tqdm:\n",
    "\n",
    "`!pip install -q tqdm`\n",
    "\n",
    "2. Import tqdm:\n",
    "\n",
    "`from tqdm import tqdm`\n",
    "\n",
    "3. Change the following line:\n",
    "\n",
    "`for img, path in image_dataset:`\n",
    "\n",
    "to:\n",
    "\n",
    "`for img, path in tqdm(image_dataset):`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get unique images\n",
    "encode_train = sorted(set(img_name_vector))\n",
    "\n",
    "# Feel free to change batch_size according to your system configuration\n",
    "image_dataset = tf.data.Dataset.from_tensor_slices(encode_train)\n",
    "image_dataset = image_dataset.map(\n",
    "  load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE).batch(16)\n",
    "\n",
    "for img, path in image_dataset:\n",
    "  batch_features = image_features_extract_model(img)\n",
    "  batch_features = tf.reshape(batch_features,\n",
    "                              (batch_features.shape[0], -1, batch_features.shape[3]))\n",
    "\n",
    "  for bf, p in zip(batch_features, path):\n",
    "    path_of_feature = p.numpy().decode(\"utf-8\")\n",
    "    np.save(path_of_feature, bf.numpy())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Preprocess and tokenize the captions\n",
    "\n",
    "- First, you'll tokenize the captions (for example, by splitting on spaces). This gives us a vocabulary of all of the unique words in the data (for example, \"surfing\", \"football\", and so on).\n",
    "- Next, you'll limit the vocabulary size to the top 5,000 words (to save memory). You'll replace all other words with the token \"UNK\" (unknown).\n",
    "- You then create word-to-index and index-to-word mappings.\n",
    "- Finally, you pad all sequences to be the same length as the longest one."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Find the maximum length of any caption in our dataset\n",
    "def calc_max_length(tensor):\n",
    "    return max(len(t) for t in tensor)\n",
    "\n",
    "# Choose the top 5000 words from the vocabulary\n",
    "top_k = 5000\n",
    "tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=top_k,\n",
    "                                                  oov_token=\"<unk>\",\n",
    "                                                  filters='!\"#$%&()*+.,-/:;=?@[\\]^_`{|}~ ')\n",
    "tokenizer.fit_on_texts(train_captions)\n",
    "train_seqs = tokenizer.texts_to_sequences(train_captions)\n",
    "\n",
    "tokenizer.word_index['<pad>'] = 0\n",
    "tokenizer.index_word[0] = '<pad>'\n",
    "\n",
    "# Create the tokenized vectors\n",
    "train_seqs = tokenizer.texts_to_sequences(train_captions)\n",
    "\n",
    "# Pad each vector to the max_length of the captions\n",
    "# If you do not provide a max_length value, pad_sequences calculates it automatically\n",
    "cap_vector = tf.keras.preprocessing.sequence.pad_sequences(train_seqs, padding='post')\n",
    "\n",
    "# Calculates the max_length, which is used to store the attention weights\n",
    "max_length = calc_max_length(train_seqs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Split the data into training and testing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create training and validation sets using an 80-20 split\n",
    "img_name_train, img_name_val, cap_train, cap_val = train_test_split(img_name_vector,\n",
    "                                                                    cap_vector,\n",
    "                                                                    test_size=0.2,\n",
    "                                                                    random_state=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "len(img_name_train), len(cap_train), len(img_name_val), len(cap_val)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create a tf.data dataset for training\n",
    "\n",
    "Our images and captions are ready! Next, let's create a `tf.data dataset` to use for training our model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Feel free to change these parameters according to your system's configuration\n",
    "\n",
    "BATCH_SIZE = 64\n",
    "BUFFER_SIZE = 1000\n",
    "embedding_dim = 256\n",
    "units = 512\n",
    "vocab_size = top_k + 1\n",
    "num_steps = len(img_name_train) // BATCH_SIZE\n",
    "# Shape of the vector extracted from InceptionV3 is (64, 2048)\n",
    "# These two variables represent that vector shape\n",
    "features_shape = 2048\n",
    "attention_features_shape = 64\n",
    "\n",
    "# Load the numpy files\n",
    "def map_func(img_name, cap):\n",
    "  img_tensor = np.load(img_name.decode('utf-8')+'.npy')\n",
    "  return img_tensor, cap\n",
    "\n",
    "dataset = tf.data.Dataset.from_tensor_slices((img_name_train, cap_train))\n",
    "\n",
    "# Use map to load the numpy files in parallel\n",
    "dataset = dataset.map(lambda item1, item2: tf.numpy_function(\n",
    "          map_func, [item1, item2], [tf.float32, tf.int32]),\n",
    "          num_parallel_calls=tf.data.experimental.AUTOTUNE)\n",
    "\n",
    "# Shuffle and batch\n",
    "dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE)\n",
    "dataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model\n",
    "Fun fact: the decoder below is identical to the one in the example for [Neural Machine Translation with Attention](https://www.tensorflow.org/tutorials/text/nmt_with_attention).\n",
    "\n",
    "The model architecture is inspired by the [Show, Attend and Tell paper](https://arxiv.org/pdf/1502.03044.pdf).\n",
    "\n",
    "- In this example, you extract the features from the lower convolutional layer of InceptionV3 giving us a vector of shape (8, 8, 2048).\n",
    "- You squash that to a shape of (64, 2048).\n",
    "- This vector is then passed through the CNN Encoder (which consists of a single Fully connected layer).\n",
    "- The RNN (here GRU) attends over the image to predict the next word."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class BahdanauAttention(tf.keras.Model):\n",
    "  def __init__(self, units):\n",
    "    super(BahdanauAttention, self).__init__()\n",
    "    self.W1 = tf.keras.layers.Dense(units)\n",
    "    self.W2 = tf.keras.layers.Dense(units)\n",
    "    self.V = tf.keras.layers.Dense(1)\n",
    "\n",
    "  def call(self, features, hidden):\n",
    "    # features(CNN_encoder output) shape == (batch_size, 64, embedding_dim)\n",
    "\n",
    "    # hidden shape == (batch_size, hidden_size)\n",
    "    # hidden_with_time_axis shape == (batch_size, 1, hidden_size)\n",
    "    hidden_with_time_axis = tf.expand_dims(hidden, 1)\n",
    "\n",
    "    # score shape == (batch_size, 64, hidden_size)\n",
    "    score = tf.nn.tanh(self.W1(features) + self.W2(hidden_with_time_axis))\n",
    "\n",
    "    # attention_weights shape == (batch_size, 64, 1)\n",
    "    # you get 1 at the last axis because you are applying score to self.V\n",
    "    attention_weights = tf.nn.softmax(self.V(score), axis=1)\n",
    "\n",
    "    # context_vector shape after sum == (batch_size, hidden_size)\n",
    "    context_vector = attention_weights * features\n",
    "    context_vector = tf.reduce_sum(context_vector, axis=1)\n",
    "\n",
    "    return context_vector, attention_weights\n",
    "\n",
    "class CNN_Encoder(tf.keras.Model):\n",
    "    # Since you have already extracted the features and dumped it using pickle\n",
    "    # This encoder passes those features through a Fully connected layer\n",
    "    def __init__(self, embedding_dim):\n",
    "        super(CNN_Encoder, self).__init__()\n",
    "        # shape after fc == (batch_size, 64, embedding_dim)\n",
    "        self.fc = tf.keras.layers.Dense(embedding_dim)\n",
    "\n",
    "    def call(self, x):\n",
    "        x = self.fc(x)\n",
    "        x = tf.nn.relu(x)\n",
    "        return x\n",
    "    \n",
    "    \n",
    "class RNN_Decoder(tf.keras.Model):\n",
    "  def __init__(self, embedding_dim, units, vocab_size):\n",
    "    super(RNN_Decoder, self).__init__()\n",
    "    self.units = units\n",
    "\n",
    "    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)\n",
    "    self.gru = tf.keras.layers.GRU(self.units,\n",
    "                                   return_sequences=True,\n",
    "                                   return_state=True,\n",
    "                                   recurrent_initializer='glorot_uniform')\n",
    "    self.fc1 = tf.keras.layers.Dense(self.units)\n",
    "    self.fc2 = tf.keras.layers.Dense(vocab_size)\n",
    "\n",
    "    self.attention = BahdanauAttention(self.units)\n",
    "\n",
    "  def call(self, x, features, hidden):\n",
    "    # defining attention as a separate model\n",
    "    context_vector, attention_weights = self.attention(features, hidden)\n",
    "\n",
    "    # x shape after passing through embedding == (batch_size, 1, embedding_dim)\n",
    "    x = self.embedding(x)\n",
    "\n",
    "    # x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)\n",
    "    x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)\n",
    "\n",
    "    # passing the concatenated vector to the GRU\n",
    "    output, state = self.gru(x)\n",
    "\n",
    "    # shape == (batch_size, max_length, hidden_size)\n",
    "    x = self.fc1(output)\n",
    "\n",
    "    # x shape == (batch_size * max_length, hidden_size)\n",
    "    x = tf.reshape(x, (-1, x.shape[2]))\n",
    "\n",
    "    # output shape == (batch_size * max_length, vocab)\n",
    "    x = self.fc2(x)\n",
    "\n",
    "    return x, state, attention_weights\n",
    "\n",
    "  def reset_state(self, batch_size):\n",
    "    return tf.zeros((batch_size, self.units))\n",
    "\n",
    "encoder = CNN_Encoder(embedding_dim)\n",
    "decoder = RNN_Decoder(embedding_dim, units, vocab_size)\n",
    "\n",
    "optimizer = tf.keras.optimizers.Adam()\n",
    "loss_object = tf.keras.losses.SparseCategoricalCrossentropy(\n",
    "    from_logits=True, reduction='none')\n",
    "\n",
    "def loss_function(real, pred):\n",
    "  mask = tf.math.logical_not(tf.math.equal(real, 0))\n",
    "  loss_ = loss_object(real, pred)\n",
    "\n",
    "  mask = tf.cast(mask, dtype=loss_.dtype)\n",
    "  loss_ *= mask\n",
    "\n",
    "  return tf.reduce_mean(loss_)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Checkpoint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "checkpoint_path = \"./checkpoints/train\"\n",
    "ckpt = tf.train.Checkpoint(encoder=encoder,\n",
    "                           decoder=decoder,\n",
    "                           optimizer = optimizer)\n",
    "ckpt_manager = tf.train.CheckpointManager(ckpt, checkpoint_path, max_to_keep=5)\n",
    "\n",
    "start_epoch = 0\n",
    "if ckpt_manager.latest_checkpoint:\n",
    "  start_epoch = int(ckpt_manager.latest_checkpoint.split('-')[-1])\n",
    "  # restoring the latest checkpoint in checkpoint_path\n",
    "  ckpt.restore(ckpt_manager.latest_checkpoint)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training\n",
    "\n",
    "- You extract the features stored in the respective .npy files and then pass those features through the encoder.\n",
    "- The encoder output, hidden state(initialized to 0) and the decoder input (which is the start token) is passed to the decoder.\n",
    "- The decoder returns the predictions and the decoder hidden state.\n",
    "- The decoder hidden state is then passed back into the model and the predictions are used to calculate the loss.\n",
    "- Use teacher forcing to decide the next input to the decoder.\n",
    "- Teacher forcing is the technique where the target word is passed as the next input to the decoder.\n",
    "- The final step is to calculate the gradients and apply it to the optimizer and backpropagate."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# adding this in a separate cell because if you run the training cell\n",
    "# many times, the loss_plot array will be reset\n",
    "loss_plot = []\n",
    "\n",
    "@tf.function\n",
    "def train_step(img_tensor, target):\n",
    "  loss = 0\n",
    "\n",
    "  # initializing the hidden state for each batch\n",
    "  # because the captions are not related from image to image\n",
    "  hidden = decoder.reset_state(batch_size=target.shape[0])\n",
    "\n",
    "  dec_input = tf.expand_dims([tokenizer.word_index['<start>']] * target.shape[0], 1)\n",
    "\n",
    "  with tf.GradientTape() as tape:\n",
    "      features = encoder(img_tensor)\n",
    "\n",
    "      for i in range(1, target.shape[1]):\n",
    "          # passing the features through the decoder\n",
    "          predictions, hidden, _ = decoder(dec_input, features, hidden)\n",
    "\n",
    "          loss += loss_function(target[:, i], predictions)\n",
    "\n",
    "          # using teacher forcing\n",
    "          dec_input = tf.expand_dims(target[:, i], 1)\n",
    "\n",
    "  total_loss = (loss / int(target.shape[1]))\n",
    "\n",
    "  trainable_variables = encoder.trainable_variables + decoder.trainable_variables\n",
    "\n",
    "  gradients = tape.gradient(loss, trainable_variables)\n",
    "\n",
    "  optimizer.apply_gradients(zip(gradients, trainable_variables))\n",
    "\n",
    "  return loss, total_loss\n",
    "\n",
    "EPOCHS = 20\n",
    "\n",
    "for epoch in range(start_epoch, EPOCHS):\n",
    "    start = time.time()\n",
    "    total_loss = 0\n",
    "\n",
    "    for (batch, (img_tensor, target)) in enumerate(dataset):\n",
    "        batch_loss, t_loss = train_step(img_tensor, target)\n",
    "        total_loss += t_loss\n",
    "\n",
    "        if batch % 100 == 0:\n",
    "            print ('Epoch {} Batch {} Loss {:.4f}'.format(\n",
    "              epoch + 1, batch, batch_loss.numpy() / int(target.shape[1])))\n",
    "    # storing the epoch end loss value to plot later\n",
    "    loss_plot.append(total_loss / num_steps)\n",
    "\n",
    "    if epoch % 5 == 0:\n",
    "      ckpt_manager.save()\n",
    "\n",
    "    print ('Epoch {} Loss {:.6f}'.format(epoch + 1,\n",
    "                                         total_loss/num_steps))\n",
    "    print ('Time taken for 1 epoch {} sec\\n'.format(time.time() - start))\n",
    "    \n",
    "    \n",
    "plt.plot(loss_plot)\n",
    "plt.xlabel('Epochs')\n",
    "plt.ylabel('Loss')\n",
    "plt.title('Loss Plot')\n",
    "plt.show()    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Caption!\n",
    "\n",
    "The evaluate function is similar to the training loop, except you don't use teacher forcing here. The input to the decoder at each time step is its previous predictions along with the hidden state and the encoder output.\n",
    "Stop predicting when the model predicts the end token.\n",
    "And store the attention weights for every time step."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def evaluate(image):\n",
    "    attention_plot = np.zeros((max_length, attention_features_shape))\n",
    "\n",
    "    hidden = decoder.reset_state(batch_size=1)\n",
    "\n",
    "    temp_input = tf.expand_dims(load_image(image)[0], 0)\n",
    "    img_tensor_val = image_features_extract_model(temp_input)\n",
    "    img_tensor_val = tf.reshape(img_tensor_val, (img_tensor_val.shape[0], -1, img_tensor_val.shape[3]))\n",
    "\n",
    "    features = encoder(img_tensor_val)\n",
    "\n",
    "    dec_input = tf.expand_dims([tokenizer.word_index['<start>']], 0)\n",
    "    result = []\n",
    "\n",
    "    for i in range(max_length):\n",
    "        predictions, hidden, attention_weights = decoder(dec_input, features, hidden)\n",
    "\n",
    "        attention_plot[i] = tf.reshape(attention_weights, (-1, )).numpy()\n",
    "\n",
    "        predicted_id = tf.random.categorical(predictions, 1)[0][0].numpy()\n",
    "        result.append(tokenizer.index_word[predicted_id])\n",
    "\n",
    "        if tokenizer.index_word[predicted_id] == '<end>':\n",
    "            return result, attention_plot\n",
    "\n",
    "        dec_input = tf.expand_dims([predicted_id], 0)\n",
    "\n",
    "    attention_plot = attention_plot[:len(result), :]\n",
    "    return result, attention_plot\n",
    "\n",
    "\n",
    "def plot_attention(image, result, attention_plot):\n",
    "    temp_image = np.array(Image.open(image))\n",
    "\n",
    "    fig = plt.figure(figsize=(10, 10))\n",
    "\n",
    "    len_result = len(result)\n",
    "    for l in range(len_result):\n",
    "        temp_att = np.resize(attention_plot[l], (8, 8))\n",
    "        ax = fig.add_subplot(len_result//2, len_result//2, l+1)\n",
    "        ax.set_title(result[l])\n",
    "        img = ax.imshow(temp_image)\n",
    "        ax.imshow(temp_att, cmap='gray', alpha=0.6, extent=img.get_extent())\n",
    "\n",
    "    plt.tight_layout()\n",
    "    plt.show()\n",
    "    \n",
    "# captions on the validation set\n",
    "rid = np.random.randint(0, len(img_name_val))\n",
    "image = img_name_val[rid]\n",
    "real_caption = ' '.join([tokenizer.index_word[i] for i in cap_val[rid] if i not in [0]])\n",
    "result, attention_plot = evaluate(image)\n",
    "\n",
    "print ('Real Caption:', real_caption)\n",
    "print ('Prediction Caption:', ' '.join(result))\n",
    "plot_attention(image, result, attention_plot)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Try it on your own images\n",
    "\n",
    "For fun, below we've provided a method you can use to caption your own images with the model we've just trained. Keep in mind, it was trained on a relatively small amount of data, and your images may be different from the training data (so be prepared for weird results!)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "image_url = 'https://tensorflow.org/images/surf.jpg'\n",
    "image_extension = image_url[-4:]\n",
    "image_path = tf.keras.utils.get_file('image'+image_extension,\n",
    "                                     origin=image_url)\n",
    "\n",
    "result, attention_plot = evaluate(image_path)\n",
    "print ('Prediction Caption:', ' '.join(result))\n",
    "plot_attention(image_path, result, attention_plot)\n",
    "# opening the image\n",
    "Image.open(image_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "image = './Bilder/example_10.jpg'\n",
    "result, attention_plot = evaluate(image)\n",
    "print ('Prediction Caption:', ' '.join(result))\n",
    "plot_attention(image, result, attention_plot)\n",
    "# opening the image\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}