Compare revisions

Solange Emmenegger · Solange Emmenegger · Solange Emmenegger · Solange Emmenegger · Solange Emmenegger · Solange Emmenegger
--- a/.renku/renku.ini
+++ b/.renku/renku.ini
@@ -2,6 +2,7 @@
 default_url = /lab
 mem_request = 8G
 lfs_auto_fetch = true
+disk_request = 4G

 [renku]
 autocommit_lfs = false

--- a/demos/Convolutional Neural Networks/feature_engineering_image_similarity.ipynb
+++ b/demos/Convolutional Neural Networks/feature_engineering_image_similarity.ipynb
 %% Cell type:markdown id: tags:

 # Image Feature Engineering with Pre-Trained Neural Networks

 Deep convolutional neural networks (CNN) are pre-trained on public datasets with millions of images. For example, the famous [ImageNet](http://www.image-net.org) catalogue consists of not less than 24 million images. Such pre-trained networks (i.e. their architecture and weights) are made avaiable for transfer learning (specialization to a specific domain by re-training, see later in this course) or as feature extractors for other machine learning purposes. The architecture of general-purpose CNNs is such that they first learn a compact encoding or representation of input images (e.g. numeric vector of 2048 dimensions) before they map the encoded image to the final category. A CNNs can thus be used for feature engineering by feeding it new images and extract their encodings from one of the inner layers. In this exercise, we use such a pre-trained CNN to encode images from a separate dataset and use the image feature vectors for image similarity computation much like it would be implemented in an image retrieval application with not keywords or captions available. The dataset used in the exercise consists of a small subset from the [Corel](https://sites.google.com/site/dctresearch/Home/content-based-image-retrieval) image database. It consists of 10 concept groups of images where each is composed by 100 images. The dataset is publicly available on [Kaggle](https://www.kaggle.com/elkamel/corel-images).

 %% Cell type:code id: tags:

 ``` python
 import os
 import numpy as np
 import matplotlib.image
 import matplotlib.pyplot as plt

 from PIL import Image
 from pathlib import Path

 # Run pip install --upgrade tensorflow_hub if necessary
 import tensorflow as tf
-import tensorflow_hub as hub
+#import tensorflow_hub as hub

 from sklearn.neighbors import NearestNeighbors

 # Library for progress bars
 from tqdm.auto import tqdm

 # Interactive elements for Jupyter notebooks
 from ipywidgets import interact
-
-# Make sure you have the right tensorflow version installed
-assert tf.__version__ == '2.3.1'
 ```

 %% Cell type:markdown id: tags:

 ## Parameters

 %% Cell type:code id: tags:

 ``` python
 dataset_path = Path(os.getcwd()) / "dataset"

 # Where to load train and test images from
 train_path  = dataset_path / "training_set"
 test_path   = dataset_path / "test_set"

 # Where to save image emeddings
 embeds_path = dataset_path / "embeddings"

 # print("Default dataset path: {}".format(dataset_path))
 # print("Train path: {}".format(train_path))
 # print("Test path: {}".format(test_path))
 # print("Embeds path: {}".format(embeds_path))
 ```

 %% Cell type:markdown id: tags:

 ## Helper Methods

 %% Cell type:markdown id: tags:

 This method loads an image from a given path using tensorflow methods

 %% Cell type:code id: tags:

 ``` python
 def load_image(path):
    img = Image.open(path)
    img = img.resize((img_height, img_width))
    img = np.array(img)
    img = img / 255
    return img
 ```

 %% Cell type:markdown id: tags:

 This method plots 9 random images in a grid

 %% Cell type:code id: tags:

 ``` python
 def plot_nine_random_images(images, labels):
    plt.figure(figsize=(10, 10))
    # Loop over the dataset by taking one image and label at a time
    for i in range(9):
        img_index = np.random.randint(0, len(images))
        # We display a 3x3 grid of images
        ax = plt.subplot(3, 3, i + 1)
        plt.imshow(images[img_index], cmap="gray")
        plt.title(labels[img_index])
 ```

 %% Cell type:markdown id: tags:

 This method returns all images in a given path

 %% Cell type:code id: tags:

 ``` python
 def return_image_file_names(path):
    return_files = []
    for root, dirs, files in os.walk(path):
        for name in files:
            if name.split(".")[-1] == "jpg":
                file_path = os.path.join(root, name)
                folder_name = root.split("/")[-1]
                return_files.append([file_path, folder_name])
    return return_files
 ```

 %% Cell type:markdown id: tags:

 # Part 1: Generate Image Feature Vectors

 To calculate similarty between images, they must first be encoded into a feature vector. There are many ways of doing this. The computer vision community for example proposed Scale-Invariant Feature Transform (SIFT), Speeded-Up Robust Features (SURF) or Features from Accelerated Segment Test (FAST). However, more recent approached use pre-trained CNNs as feature extractors as they can convincingly extract complex high-level features from images. For this purpose, we first load a pre-trained CNN from the TensorFlow hub.

 %% Cell type:markdown id: tags:

 ### Loading Pre-Trained CNN

 There are different pre-trained CNNs avaiable on the [TensorFlow hub](https://tfhub.dev/s?module-type=image-feature-vector). For this exercise we use a pre-trained Inception v3 model and extract the features from the last dense layer of the network. In case you want to try a different model, just make sure that you pass the correct input shape to the model. To load a pre-trained model, you only need the respective TensorFlow Hub URL.

 %% Cell type:code id: tags:

 ``` python
 # Which model to load
 clf_name = "Inception_v3"
 embedder_model = "https://tfhub.dev/google/imagenet/inception_v3/feature_vector/4"

 # Shape of the model input
 img_height   = 299
 img_width    = 299
 img_channels = 3

 # Load model
 embedder = tf.keras.Sequential()
 embedder.add(hub.KerasLayer(embedder_model, trainable=False))

 # Specify input shape (batch, height, width, channels)
 embedder.build([None, img_height, img_width, img_channels])

 # Print model information
 embedder.summary()
 ```

 %% Cell type:markdown id: tags:

 ### Load Image Dataset

 The selected Inception v3 model thus generates 2048 dimensional feature vectors. Now we pass every image from our dataset to the model and extract the feature vectors. As you can imagine, this will takes some time. We show a progress bar and save this the extracted feature vectors to a file. For future experiments, you can just load the file and do not need to re-run the complete encoding process. Our dataset is split into *training* and *testing* images to be loaded separately.

 %% Cell type:code id: tags:

 ``` python
 train_files  = return_image_file_names(train_path)
 train_labels = [labels[1] for labels in train_files]
 print('{} training image paths found'.format(len(train_files)))
 ```

 %% Cell type:code id: tags:

 ``` python
 test_files = return_image_file_names(test_path)
 test_labels = [labels[1] for labels in test_files]
 print('{} test image paths found'.format(len(test_files)))
 ```

 %% Cell type:code id: tags:

 ``` python
 train_images = np.array([load_image(str(f)) for f, _ in train_files])
 print("{} training images loaded".format(len(train_images)))
 ```

 %% Cell type:code id: tags:

 ``` python
 test_images = np.array([load_image(str(f)) for f, _ in test_files])
 print("{} test images loaded".format(len(test_images)))
 ```

 %% Cell type:code id: tags:

 ``` python
 print("Shape of the images: {}".format(train_images[0].shape))

 # Make sure that the shapes of the images match the model input
 assert train_images[0].shape == (img_height, img_width, img_channels)
 ```

 %% Cell type:markdown id: tags:

 Let us display 9 randomly selected training images along with their label.

 %% Cell type:code id: tags:

 ``` python
 plot_nine_random_images(train_images, train_labels)
 ```

 %% Cell type:markdown id: tags:

 Let us extract the feature vector for a single image, e.g. image 100, for illustration.

 %% Cell type:code id: tags:

 ``` python
 image = train_images[100]

 print('Dimensionality of a single image is:\t {}'.format(image.shape))

 # Add one dimension to the front to accomodate model input format
 image = image[np.newaxis, ...]

 print('Dimensionality of the image is now:\t {}'.format(image.shape))

 # Extract feature vector from CNN
 vec = embedder.predict(image)

 print('Dimensionality of feature vector is:\t {}'.format(vec.shape))
 ```

 %% Cell type:markdown id: tags:

 Now we encode the entire dataset of training and test images.

 %% Cell type:code id: tags:

 ``` python
 train_embeds = []
 test_embeds  = []

 # Extract feature vector for training images
 for i in tqdm(range(len(train_images))):
    train_embeds.append(embedder.predict(train_images[i][np.newaxis, ...]).squeeze())

 # Extract feature vector for test images
 for i in tqdm(range(len(test_images))):
    test_embeds.append(embedder.predict(test_images[i][np.newaxis, ...]).squeeze())

 train_embeds = np.asarray(train_embeds)
 test_embeds  = np.asarray(test_embeds)

 print("Shape embeddings train:\t {}".format(train_embeds.shape))
 print("Shape embeddings test:\t {}".format(test_embeds.shape))
 ```

 %% Cell type:markdown id: tags:

 As a final step we will save the embedded dataset to our local drive, using the `np.savez()` function.

 %% Cell type:code id: tags:

 ``` python
 train_embedding_file_name = "embeds_train_{}.npz".format(clf_name)
 train_persisted_embeds = Path(dataset_path / embeds_path / train_embedding_file_name)
 np.savez(train_persisted_embeds, embeds=train_embeds, info=train_files)
 ```

 %% Cell type:code id: tags:

 ``` python
 test_embedding_file_name = "embeds_test_{}.npz".format(clf_name)
 test_persisted_test = Path(dataset_path / embeds_path / test_embedding_file_name)
 np.savez(test_persisted_test, embeds=test_embeds, info=test_files)
 ```

 %% Cell type:markdown id: tags:

 # Part 2: Image Similarity

 We will now use the image feature vectors to compute similarity between images. As a first step, we load the embeddings from the files again using the `np.load()` function. For your projects we recommend to split part 1 and partb 2 over different notebooks as for larger image seits the extraction of feature vectors takes a lot of time. However, you can speed up when you have access to GPU resources of course.

 %% Cell type:code id: tags:

 ``` python
 train_embeds_saved = np.load(dataset_path / embeds_path / train_embedding_file_name)
 train_embeds = train_embeds_saved["embeds"]
 train_info = train_embeds_saved["info"]
 train_names = [f for f, _ in train_info]

 test_embeds_saved = np.load(dataset_path / embeds_path / test_embedding_file_name)
 test_embeds = test_embeds_saved["embeds"]
 test_info = test_embeds_saved["info"]
 test_names = [f for f, _ in test_info]

 print("Train shape:\t {0}\nTest shape:\t {1}".format(train_embeds.shape, test_embeds.shape))
 ```

 %% Cell type:markdown id: tags:

 We use k-nearest neighbors (k-NN) along with the cosine distance to obtain the $k=3$ most similar images.

 %% Cell type:code id: tags:

 ``` python
 knn = NearestNeighbors(n_neighbors=3, metric='cosine')

 # Fit k-NN on the training set
 knn.fit(train_embeds)
 ```

 %% Cell type:markdown id: tags:

 Finally, use the trained k-NN model to make predictions on the unseen test set and plot the three most similar images along with their respective distance to the input image and label.

 %% Cell type:code id: tags:

 ``` python
 selection_names = [os.path.relpath(x, dataset_path) for x in test_names]

 @interact(image_name=selection_names)
 def show_nearest_neighbour(image_name):
    idx_test_image = selection_names.index(image_name)
    distances, indices = knn.kneighbors([test_embeds[idx_test_image]])

    plt.figure(figsize=(20, 10))
    plt.subplot(1,len(indices[0])+1,1)
    plt.imshow(load_image(dataset_path / image_name))
    plt.title("test image: {}".format(test_info[idx_test_image][1]))

    for i, idx in enumerate(indices[0]):
        plt.subplot(1,len(indices[0])+1,i+2)
        plt.imshow(load_image(train_names[idx]))
        plt.title("dist: {0:.2f}, lbl: {1}".format(distances[0][i], train_info[idx][1]))
 ```

 %% Cell type:markdown id: tags:

 Again, the first image comes from the test set, to which the k-NN model did not have access. The following three images are the three nearest neighbors from the training set with respect to the cosine distance calculated over the feature vectors extrected from the CNN.

 %% Cell type:markdown id: tags:

 # Image Feature Engineering with Pre-Trained Neural Networks

 Deep convolutional neural networks (CNN) are pre-trained on public datasets with millions of images. For example, the famous [ImageNet](http://www.image-net.org) catalogue consists of not less than 24 million images. Such pre-trained networks (i.e. their architecture and weights) are made avaiable for transfer learning (specialization to a specific domain by re-training, see later in this course) or as feature extractors for other machine learning purposes. The architecture of general-purpose CNNs is such that they first learn a compact encoding or representation of input images (e.g. numeric vector of 2048 dimensions) before they map the encoded image to the final category. A CNNs can thus be used for feature engineering by feeding it new images and extract their encodings from one of the inner layers. In this exercise, we use such a pre-trained CNN to encode images from a separate dataset and use the image feature vectors for image similarity computation much like it would be implemented in an image retrieval application with not keywords or captions available. The dataset used in the exercise consists of a small subset from the [Corel](https://sites.google.com/site/dctresearch/Home/content-based-image-retrieval) image database. It consists of 10 concept groups of images where each is composed by 100 images. The dataset is publicly available on [Kaggle](https://www.kaggle.com/elkamel/corel-images).

 %% Cell type:code id: tags:

 ``` python
 import os
 import numpy as np
 import matplotlib.image
 import matplotlib.pyplot as plt

 from PIL import Image
 from pathlib import Path

 # Run pip install --upgrade tensorflow_hub if necessary
 import tensorflow as tf
-import tensorflow_hub as hub
+#import tensorflow_hub as hub

 from sklearn.neighbors import NearestNeighbors

 # Library for progress bars
 from tqdm.auto import tqdm

 # Interactive elements for Jupyter notebooks
 from ipywidgets import interact
-
-# Make sure you have the right tensorflow version installed
-assert tf.__version__ == '2.3.1'
 ```

 %% Cell type:markdown id: tags:

 ## Parameters

 %% Cell type:code id: tags:

 ``` python
 dataset_path = Path(os.getcwd()) / "dataset"

 # Where to load train and test images from
 train_path  = dataset_path / "training_set"
 test_path   = dataset_path / "test_set"

 # Where to save image emeddings
 embeds_path = dataset_path / "embeddings"

 # print("Default dataset path: {}".format(dataset_path))
 # print("Train path: {}".format(train_path))
 # print("Test path: {}".format(test_path))
 # print("Embeds path: {}".format(embeds_path))
 ```

 %% Cell type:markdown id: tags:

 ## Helper Methods

 %% Cell type:markdown id: tags:

 This method loads an image from a given path using tensorflow methods

 %% Cell type:code id: tags:

 ``` python
 def load_image(path):
    img = Image.open(path)
    img = img.resize((img_height, img_width))
    img = np.array(img)
    img = img / 255
    return img
 ```

 %% Cell type:markdown id: tags:

 This method plots 9 random images in a grid

 %% Cell type:code id: tags:

 ``` python
 def plot_nine_random_images(images, labels):
    plt.figure(figsize=(10, 10))
    # Loop over the dataset by taking one image and label at a time
    for i in range(9):
        img_index = np.random.randint(0, len(images))
        # We display a 3x3 grid of images
        ax = plt.subplot(3, 3, i + 1)
        plt.imshow(images[img_index], cmap="gray")
        plt.title(labels[img_index])
 ```

 %% Cell type:markdown id: tags:

 This method returns all images in a given path

 %% Cell type:code id: tags:

 ``` python
 def return_image_file_names(path):
    return_files = []
    for root, dirs, files in os.walk(path):
        for name in files:
            if name.split(".")[-1] == "jpg":
                file_path = os.path.join(root, name)
                folder_name = root.split("/")[-1]
                return_files.append([file_path, folder_name])
    return return_files
 ```

 %% Cell type:markdown id: tags:

 # Part 1: Generate Image Feature Vectors

 To calculate similarty between images, they must first be encoded into a feature vector. There are many ways of doing this. The computer vision community for example proposed Scale-Invariant Feature Transform (SIFT), Speeded-Up Robust Features (SURF) or Features from Accelerated Segment Test (FAST). However, more recent approached use pre-trained CNNs as feature extractors as they can convincingly extract complex high-level features from images. For this purpose, we first load a pre-trained CNN from the TensorFlow hub.

 %% Cell type:markdown id: tags:

 ### Loading Pre-Trained CNN

 There are different pre-trained CNNs avaiable on the [TensorFlow hub](https://tfhub.dev/s?module-type=image-feature-vector). For this exercise we use a pre-trained Inception v3 model and extract the features from the last dense layer of the network. In case you want to try a different model, just make sure that you pass the correct input shape to the model. To load a pre-trained model, you only need the respective TensorFlow Hub URL.

 %% Cell type:code id: tags:

 ``` python
 # Which model to load
 clf_name = "Inception_v3"
 embedder_model = "https://tfhub.dev/google/imagenet/inception_v3/feature_vector/4"

 # Shape of the model input
 img_height   = 299
 img_width    = 299
 img_channels = 3

 # Load model
 embedder = tf.keras.Sequential()
 embedder.add(hub.KerasLayer(embedder_model, trainable=False))

 # Specify input shape (batch, height, width, channels)
 embedder.build([None, img_height, img_width, img_channels])

 # Print model information
 embedder.summary()
 ```

 %% Cell type:markdown id: tags:

 ### Load Image Dataset

 The selected Inception v3 model thus generates 2048 dimensional feature vectors. Now we pass every image from our dataset to the model and extract the feature vectors. As you can imagine, this will takes some time. We show a progress bar and save this the extracted feature vectors to a file. For future experiments, you can just load the file and do not need to re-run the complete encoding process. Our dataset is split into *training* and *testing* images to be loaded separately.

 %% Cell type:code id: tags:

 ``` python
 train_files  = return_image_file_names(train_path)
 train_labels = [labels[1] for labels in train_files]
 print('{} training image paths found'.format(len(train_files)))
 ```

 %% Cell type:code id: tags:

 ``` python
 test_files = return_image_file_names(test_path)
 test_labels = [labels[1] for labels in test_files]
 print('{} test image paths found'.format(len(test_files)))
 ```

 %% Cell type:code id: tags:

 ``` python
 train_images = np.array([load_image(str(f)) for f, _ in train_files])
 print("{} training images loaded".format(len(train_images)))
 ```

 %% Cell type:code id: tags:

 ``` python
 test_images = np.array([load_image(str(f)) for f, _ in test_files])
 print("{} test images loaded".format(len(test_images)))
 ```

 %% Cell type:code id: tags:

 ``` python
 print("Shape of the images: {}".format(train_images[0].shape))

 # Make sure that the shapes of the images match the model input
 assert train_images[0].shape == (img_height, img_width, img_channels)
 ```

 %% Cell type:markdown id: tags:

 Let us display 9 randomly selected training images along with their label.

 %% Cell type:code id: tags:

 ``` python
 plot_nine_random_images(train_images, train_labels)
 ```

 %% Cell type:markdown id: tags:

 Let us extract the feature vector for a single image, e.g. image 100, for illustration.

 %% Cell type:code id: tags:

 ``` python
 image = train_images[100]

 print('Dimensionality of a single image is:\t {}'.format(image.shape))

 # Add one dimension to the front to accomodate model input format
 image = image[np.newaxis, ...]

 print('Dimensionality of the image is now:\t {}'.format(image.shape))

 # Extract feature vector from CNN
 vec = embedder.predict(image)

 print('Dimensionality of feature vector is:\t {}'.format(vec.shape))
 ```

 %% Cell type:markdown id: tags:

 Now we encode the entire dataset of training and test images.

 %% Cell type:code id: tags:

 ``` python
 train_embeds = []
 test_embeds  = []

 # Extract feature vector for training images
 for i in tqdm(range(len(train_images))):
    train_embeds.append(embedder.predict(train_images[i][np.newaxis, ...]).squeeze())

 # Extract feature vector for test images
 for i in tqdm(range(len(test_images))):
    test_embeds.append(embedder.predict(test_images[i][np.newaxis, ...]).squeeze())

 train_embeds = np.asarray(train_embeds)
 test_embeds  = np.asarray(test_embeds)

 print("Shape embeddings train:\t {}".format(train_embeds.shape))
 print("Shape embeddings test:\t {}".format(test_embeds.shape))
 ```

 %% Cell type:markdown id: tags:

 As a final step we will save the embedded dataset to our local drive, using the `np.savez()` function.

 %% Cell type:code id: tags:

 ``` python
 train_embedding_file_name = "embeds_train_{}.npz".format(clf_name)
 train_persisted_embeds = Path(dataset_path / embeds_path / train_embedding_file_name)
 np.savez(train_persisted_embeds, embeds=train_embeds, info=train_files)
 ```

 %% Cell type:code id: tags:

 ``` python
 test_embedding_file_name = "embeds_test_{}.npz".format(clf_name)
 test_persisted_test = Path(dataset_path / embeds_path / test_embedding_file_name)
 np.savez(test_persisted_test, embeds=test_embeds, info=test_files)
 ```

 %% Cell type:markdown id: tags:

 # Part 2: Image Similarity

 We will now use the image feature vectors to compute similarity between images. As a first step, we load the embeddings from the files again using the `np.load()` function. For your projects we recommend to split part 1 and partb 2 over different notebooks as for larger image seits the extraction of feature vectors takes a lot of time. However, you can speed up when you have access to GPU resources of course.

 %% Cell type:code id: tags:

 ``` python
 train_embeds_saved = np.load(dataset_path / embeds_path / train_embedding_file_name)
 train_embeds = train_embeds_saved["embeds"]
 train_info = train_embeds_saved["info"]
 train_names = [f for f, _ in train_info]

 test_embeds_saved = np.load(dataset_path / embeds_path / test_embedding_file_name)
 test_embeds = test_embeds_saved["embeds"]
 test_info = test_embeds_saved["info"]
 test_names = [f for f, _ in test_info]

 print("Train shape:\t {0}\nTest shape:\t {1}".format(train_embeds.shape, test_embeds.shape))
 ```

 %% Cell type:markdown id: tags:

 We use k-nearest neighbors (k-NN) along with the cosine distance to obtain the $k=3$ most similar images.

 %% Cell type:code id: tags:

 ``` python
 knn = NearestNeighbors(n_neighbors=3, metric='cosine')

 # Fit k-NN on the training set
 knn.fit(train_embeds)
 ```

 %% Cell type:markdown id: tags:

 Finally, use the trained k-NN model to make predictions on the unseen test set and plot the three most similar images along with their respective distance to the input image and label.

 %% Cell type:code id: tags:

 ``` python
 selection_names = [os.path.relpath(x, dataset_path) for x in test_names]

 @interact(image_name=selection_names)
 def show_nearest_neighbour(image_name):
    idx_test_image = selection_names.index(image_name)
    distances, indices = knn.kneighbors([test_embeds[idx_test_image]])

    plt.figure(figsize=(20, 10))
    plt.subplot(1,len(indices[0])+1,1)
    plt.imshow(load_image(dataset_path / image_name))
    plt.title("test image: {}".format(test_info[idx_test_image][1]))

    for i, idx in enumerate(indices[0]):
        plt.subplot(1,len(indices[0])+1,i+2)
        plt.imshow(load_image(train_names[idx]))
        plt.title("dist: {0:.2f}, lbl: {1}".format(distances[0][i], train_info[idx][1]))
 ```

 %% Cell type:markdown id: tags:

 Again, the first image comes from the test set, to which the k-NN model did not have access. The following three images are the three nearest neighbors from the training set with respect to the cosine distance calculated over the feature vectors extrected from the CNN.

--- a/demos/Feature Engineering/feature_engineering.ipynb
+++ b/demos/Feature Engineering/feature_engineering.ipynb
--- a/demos/Generative Models/Generative Adversarial Network.ipynb
+++ b/demos/Generative Models/Generative Adversarial Network.ipynb
--- a/demos/Generative Models/PokeGAN.ipynb
+++ b/demos/Generative Models/PokeGAN.ipynb
--- a/demos/German Sentiment Analysis/german_sentiment_analysis.ipynb
+++ b/demos/German Sentiment Analysis/german_sentiment_analysis.ipynb
--- a/demos/Principal Component Analysis/PCA step-by-step.ipynb
+++ b/demos/Principal Component Analysis/PCA step-by-step.ipynb
--- a/demos/Recurrent Neural Networks/recurrent_neural_network.ipynb
+++ b/demos/Recurrent Neural Networks/recurrent_neural_network.ipynb
@@ -552,7 +552,7 @@
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
@@ -566,7 +566,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.8.5"
+   "version": "3.9.12"
  }
 },
 "nbformat": 4,

--- a/demos/demos_readme.md
+++ b/demos/demos_readme.md
+# Comments about the Demos
+
+The Demos:
+- Convolutional Neural Networks
+- Feature Engineering
+- Principal Component Analysis
+- Recurrent Neural Networks 
+
+can be executed on renku. Eventually the memory of the container needs to be increased before starting a session. 
+The Demo *Generative Models* needs to be downloaded and run on Google collab as else, the training of the model will be too slow.
+The *German Sentiment Analysis* is best executed locally or on Google Collab as an extensive number of special packages is needed. 
--- a/notebooks/00 Python Tutorial/00a Python Self Study.ipynb
+++ b/notebooks/00 Python Tutorial/00a Python Self Study.ipynb
+%% Cell type:markdown id: tags:
+
+# Machine Learning (ML) Module - Python crash course
+
+%% Cell type:markdown id: tags:
+
+## Attribution
+
+%% Cell type:markdown id: tags:
+
+* A lot of the material in this course was inspired by the free book [A Whirlwind Tour of Python](https://www.oreilly.com/library/view/a-whirlwind-tour/9781492037859/) by Jake VanderPlas
+* And some, as usual, from [Wikipedia](https://en.wikipedia.org/wiki/Python_(programming_language)) and [Stackoverflow](https://stackoverflow.com/questions/tagged/python)
+
+%% Cell type:markdown id: tags:
+
+![book.jpg](attachment:book.jpg)
+
+%% Cell type:markdown id: tags:
+
+# Python
+
+%% Cell type:markdown id: tags:
+
+* An interpreted high-level programming language for general-purpose programming
+* Conceived in the late 1980s by Guido van Rossum
+* Simple, beautiful with a wide range of domain-specific libraries
+* As of February 2019, ranked 3th on the [TIOBE index](https://www.tiobe.com/tiobe-index/)
+* Python has a design philosophy that emphasizes code readability
+* Uses significant whitespace, the most obvious difference to other languages. Also probably the least important.
+* Python is typed *dynamically* (variables may change type) and *strongly* (no unobvious implicit casts)
+* Python has automatic memory management
+* Supports multiple programming paradigms, including object-oriented, functional and procedural
+* Has a large and comprehensive standard library
+* Users of Python are often referred to as Pythonists, Pythonistas, and Pythoneers
+
+%% Cell type:markdown id: tags:
+
+## and...
+
+%% Cell type:markdown id: tags:
+
+Python is arguably the most popular programming language for Data Science, with R and Julia being strong contenders
+
+%% Cell type:markdown id: tags:
+
+# The ZEN of Python
+
+%% Cell type:code id: tags:
+
+``` python
+import this
+```
+
+%% Cell type:markdown id: tags:
+
+# Python Syntax
+
+%% Cell type:code id: tags:
+
+``` python
+# Simple, iterative Fibonacci sequence
+
+# endpoint
+endpoint = 10
+
+# starting numbers
+a = 1; b = 1
+
+for i in range(endpoint-1):
+    print(a, end=" ")
+    a,b = b,a+b
+```
+
+%% Cell type:markdown id: tags:
+
+### Comments
+
+%% Cell type:markdown id: tags:
+
+* comments are marked by #
+
+%% Cell type:code id: tags:
+
+``` python
+# Simple, iterative Fibonacci sequence
+```
+
+%% Cell type:markdown id: tags:
+
+* there are no multi-line comments like /* ... */ such as in C
+
+%% Cell type:markdown id: tags:
+
+### Multi-line Comments
+* But wait, I have seen multi-line comments using triple quotes:
+
+%% Cell type:code id: tags:
+
+``` python
+"""
+Is this is a
+multiline
+comment?
+"""
+```
+
+%% Cell type:markdown id: tags:
+
+* Triple quotes are actually **multi-line strings**.
+
+%% Cell type:markdown id: tags:
+
+* If they are not assigned to a variable they will be immediately garbage collected as soon as the code executes.
+
+%% Cell type:markdown id: tags:
+
+* They are not ignored by the interpreter in the same way as # comment.
+
+%% Cell type:markdown id: tags:
+
+* You often use this multi-line strings as **docstrings**
+
+%% Cell type:markdown id: tags:
+
+### Statement termination
+
+%% Cell type:markdown id: tags:
+
+* end-of-line terminates a statement
+
+%% Cell type:code id: tags:
+
+``` python
+# endpoint
+endpoint = 10
+```
+
+%% Cell type:markdown id: tags:
+
+* statements can also be separated by a semicolon
+
+%% Cell type:code id: tags:
+
+``` python
+# starting numbers
+a = 1; b = 1
+```
+
+%% Cell type:markdown id: tags:
+
+* but this is not often used
+
+%% Cell type:markdown id: tags:
+
+### Code blocks and indentation
+
+%% Cell type:markdown id: tags:
+
+* Code blocks are indented by whitespace
+
+%% Cell type:code id: tags:
+
+``` python
+for i in range(endpoint-1):
+    print(a, end=" ")
+    a,b = b,a+b
+```
+
+%% Cell type:markdown id: tags:
+
+* tabs and spaces are not the same, don't mix them
+* best to set your editor to indent with 4 spaces
+
+%% Cell type:code id: tags:
+
+``` python
+# Simple, iterative Fibonacci sequence
+
+# endpoint
+endpoint = 10
+
+# starting numbers
+a,b = 1,1
+
+for i in range(endpoint-1):
+    print(a, end=" ")
+    a,b = b,a+b
+```
+
+%% Cell type:markdown id: tags:
+
+# Getting Help
+
+%% Cell type:markdown id: tags:
+
+* To get help on an object, you can use the internal help
+
+%% Cell type:code id: tags:
+
+``` python
+help(print)
+```
+
+%% Cell type:markdown id: tags:
+
+* When using iPython, you can also append a ? to the object you want help about
+
+%% Cell type:code id: tags:
+
+``` python
+print?
+```
+
+%% Cell type:markdown id: tags:
+
+* This will open a separate window with the help information
+
+%% Cell type:markdown id: tags:
+
+# Python Semantics
+
+%% Cell type:markdown id: tags:
+
+### Variables and Objects
+
+%% Cell type:markdown id: tags:
+
+* Variables are references, not containers
+
+%% Cell type:code id: tags:
+
+``` python
+a = [1,2,3,4]
+b = a
+print(b)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+b.append(5)
+print(a)
+```
+
+%% Cell type:markdown id: tags:
+
+* The assignment operator does just that, assign a refererence
+
+%% Cell type:markdown id: tags:
+
+* So:
+
+%% Cell type:code id: tags:
+
+``` python
+b = 'something else'
+print(a)
+```
+
+%% Cell type:markdown id: tags:
+
+* And there is no type-safety (and no declaration needed)
+* But above is only true for mutable values
+* Simple types such as numbers are immutable
+* So the following works as expected
+
+%% Cell type:code id: tags:
+
+``` python
+x = 5
+y = x
+y += 1
+print(x, y)
+```
+
+%% Cell type:markdown id: tags:
+
+* Everything in Python is an object
+* So everything has a type
+
+%% Cell type:code id: tags:
+
+``` python
+x = "Hello"
+type(x)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+type(print)
+```
+
+%% Cell type:markdown id: tags:
+
+* But the variable names are still references, the type information is contained in the object
+* even simple types have attributes
+
+%% Cell type:code id: tags:
+
+``` python
+x = 42.0
+x.is_integer()
+```
+
+%% Cell type:markdown id: tags:
+
+### Simple types
+
+%% Cell type:markdown id: tags:
+
+| Scalar Type | Example        | Description                                                     |
+|-------------|----------------|-----------------------------------------------------------------|
+| ``int``     | ``x = 1``      | integers (i.e., whole numbers)                                  |
+| ``float``   | ``x = 1.0``    | floating-point numbers (i.e., real numbers)                     |
+| ``complex`` | ``x = 1 + 2j`` | Complex numbers (i.e., numbers with real and imaginary part)    |
+| ``bool``    | ``x = True``   | Boolean: True/False values                                      |
+| ``str``     | ``x = 'abc'``  | String: characters or text, either with single or double quotes |
+| ``NoneType``| ``x = None``   | Special object indicating nulls                                 |
+
+%% Cell type:markdown id: tags:
+
+### Built-in Data Structures
+
+%% Cell type:markdown id: tags:
+
+There are four built-in compound data structures that act as containers
+
+%% Cell type:code id: tags:
+
+``` python
+a_list = [0,1,2,3,4] # mutable, ordered collection
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_tuple = (0,1,2,3,4) # immutable, ordered collection
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_dictionary = {'a':0, 'b':1, 'c':2} # unordered key/value mapping
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_set = {0,1,2,3,3,4,4,4} # Unordered collection of unique values
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_set
+```
+
+%% Cell type:markdown id: tags:
+
+Lists
+
+%% Cell type:markdown id: tags:
+
+* Lists are the workhorse datatype of Python
+
+%% Cell type:markdown id: tags:
+
+* They are called lists, not arrays
+
+%% Cell type:code id: tags:
+
+``` python
+len(a_list) # they have a length
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_list.append('a') # and many many userful attributes and functions
+a_list
+```
+
+%% Cell type:markdown id: tags:
+
+#### List indexing and slicing
+
+%% Cell type:code id: tags:
+
+``` python
+a_list
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_list[0] # zero-based indexing for accessing elements
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_list[-1] # access from the end of the list
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_list
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_list[2:4] # slice a list, 1st index is included, 2nd excluded
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_list[:2] # omitting the initial 0 or the end index (or both) is possible
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_list
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_list[::2] # a third optional index indicated stepsize
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_list[::-1] # reverse a list with a negative stepsize
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_list[2:4] = [9,8] # indexing and slicing can be used on the left-hand-side too
+a_list
+```
+
+%% Cell type:markdown id: tags:
+
+#### Tuples
+
+%% Cell type:code id: tags:
+
+``` python
+# tuples behave like lists except that they are immutable
+a_tuple[0] = 6
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# convert between the two with list() and tuple()
+another_list = list(a_tuple)
+another_list[0] = 6
+tuple(another_list)
+```
+
+%% Cell type:markdown id: tags:
+
+#### Dictionaries
+
+%% Cell type:code id: tags:
+
+``` python
+a_dict = {'a': 1, 2: 27, 'large_num':2**64}
+a_dict
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_dict['large_num']
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_dict.keys(), a_dict.values(), a_dict.items()
+```
+
+%% Cell type:markdown id: tags:
+
+* Please note that due to efficiency reasons, dictionaries are unordered
+
+%% Cell type:markdown id: tags:
+
+#### Sets
+
+%% Cell type:code id: tags:
+
+``` python
+a_set = {1,2,3,3,4,4,4,4}
+a_set
+```
+
+%% Cell type:markdown id: tags:
+
+* Sets have no duplicates and are unordered
+
+%% Cell type:code id: tags:
+
+``` python
+another_set = {2,3,5,6}
+a_set & another_set # or: a_set.intersection(another_set)
+```
+
+%% Cell type:markdown id: tags:
+
+* set operations: union `|`, difference `-`, symmmetric difference `^` and more
+
+%% Cell type:markdown id: tags:
+
+### Control Flow
+
+%% Cell type:markdown id: tags:
+
+#### if elif else
+
+%% Cell type:code id: tags:
+
+``` python
+a = 10
+
+if a == 10:
+    print("ten")
+elif x > 0:
+    print("positive")
+elif x < 0:
+    print("negative")
+else:
+    print("must be something else then...")
+```
+
+%% Cell type:markdown id: tags:
+
+#### for loops
+
+%% Cell type:code id: tags:
+
+``` python
+# for value in iterable
+for elem in ['foo', 'bar', 'baz']:
+    print(elem, end=" ")
+```
+
+%% Cell type:markdown id: tags:
+
+* There is no need for a counter, we iterate over a so-called sequence
+
+%% Cell type:markdown id: tags:
+
+#### while loops
+
+%% Cell type:code id: tags:
+
+``` python
+condition = True
+while condition:
+    print("it's true only once")
+    condition = False
+```
+
+%% Cell type:markdown id: tags:
+
+#### break and continue
+
+%% Cell type:markdown id: tags:
+
+* break leaves the innermost loop
+* continue moves to the next iteration of the current loop
+
+%% Cell type:markdown id: tags:
+
+#### Ternary expression
+
+%% Cell type:code id: tags:
+
+``` python
+# expr if cond else expr
+a = 1
+b = 2
+"True" if a>b else "False"
+```
+
+%% Cell type:markdown id: tags:
+
+### Functions
+
+%% Cell type:code id: tags:
+
+``` python
+def simple_fib(length=10):
+    result = []
+
+    # starting numbers
+    a = 1; b = 1
+
+    while len(result) < length:
+        result.append(a)
+        a,b = b,a+b
+    return result
+
+simple_fib(10)
+```
+
+%% Cell type:markdown id: tags:
+
+* functions can have defaults for arguments
+
+%% Cell type:code id: tags:
+
+``` python
+def simple_fib(length=10):
+    ...
+```
+
+%% Cell type:markdown id: tags:
+
+* every function returns a value, either the one given in the return statement or *None*
+* Arguments and return values can be any Python object
+
+%% Cell type:code id: tags:
+
+``` python
+def real_and_imag(a_complex_num):
+    return a_complex_num.real, a_complex_num.imag
+
+r, i = real_and_imag(3 + 4j)
+r, i
+```
+
+%% Cell type:markdown id: tags:
+
+* Function calls can have positional arguments and/or keyword arguments
+
+%% Cell type:code id: tags:
+
+``` python
+def real_and_imag(a_complex_num, nothing):
+    return a_complex_num.real, a_complex_num.imag
+
+real_and_imag(3, None) # order matters
+real_and_imag(nothing=None, a_complex_num=3) # order does not matter
+```
+
+%% Cell type:markdown id: tags:
+
+* When mixing, positional arguments must come before keyword
+
+%% Cell type:code id: tags:
+
+``` python
+print(1, 2, 3, sep='--')
+#print(1, sep='--', 2 , 3) # throws an error
+```
+
+%% Cell type:markdown id: tags:
+
+* An arbitrary number of arguments is supported with `*args` and `**kwargs`
+
+%% Cell type:code id: tags:
+
+``` python
+def catch_all(*args, **kwargs):
+    print("positional args =", args)
+    print("keyword wargs = ", kwargs)
+catch_all(1, 2, 3, a=4, b=5)
+```
+
+%% Cell type:markdown id: tags:
+
+* The names args and kwargs are just a convention, important are the `*` and `**`
+* A `*` before a parameter means "unpack this as a sequence", while a double `**` before a parameter means "unpack this as a dictionary"
+
+%% Cell type:code id: tags:
+
+``` python
+inputs = (1, 2, 3)
+keywords = {'pi': 3.14}
+
+catch_all(*inputs, **keywords)
+catch_all(1, 2, 3, pi=3.14) # equal to the previous line
+```
+
+%% Cell type:markdown id: tags:
+
+* Functions are objects too (everything in Python is an object)
+* They can be passed around, assigned and have members
+
+%% Cell type:code id: tags:
+
+``` python
+def what_type(f):
+    print(type(f))
+what_type(print)
+```
+
+%% Cell type:markdown id: tags:
+
+#### Lambda functions
+
+%% Cell type:markdown id: tags:
+
+* Anonymous, short-lived functions
+* Often used with data manipulation
+
+%% Cell type:code id: tags:
+
+``` python
+add = lambda x, y: x + y
+add(1, 2)
+```
+
+%% Cell type:markdown id: tags:
+
+* In the following example, a lambda function is used to filter a list and only return the even values
+* filter(func, iterable) takes a function and applies it to each element of an iterable
+
+%% Cell type:code id: tags:
+
+``` python
+even = filter(lambda x: x % 2 == 0,
+              [1, 2, 3, 4, 5, 6, 7, 8, 9])
+list(even)
+```
+
+%% Cell type:markdown id: tags:
+
+* Or sort a list with an alternate key
+
+%% Cell type:code id: tags:
+
+``` python
+sorted([1, 2, 3, 4, 5, 6, 7, 8, 9], key=lambda x: abs(5-x))
+```
+
+%% Cell type:markdown id: tags:
+
+### Namespaces and Scope
+
+%% Cell type:markdown id: tags:
+
+* Functions can access the global scope, and create their own local scope
+* Functions have their local namespace, which is created and initialized with the parameters when the function is called
+* Variables assigned within the function are added to this namespace, and destroyed when the function ends
+* Functions can access variables that were defined in the outer scope by reference
+
+%% Cell type:code id: tags:
+
+``` python
+def f():
+    a.append(4)
+
+a = [1,2,3]
+f()
+a
+```
+
+%% Cell type:markdown id: tags:
+
+* But using an assignment on that reference does not change it globally
+
+%% Cell type:code id: tags:
+
+``` python
+def f():
+    a = [4,5,6]
+    print("inside f: ", a)
+
+a = [1,2,3]
+f()
+print("outside f after calling f(): ", a)
+```
+
+%% Cell type:markdown id: tags:
+
+* There are the two keywords `global` and `nonlocal` that change these rules, but we will not look at them in detail
+
+%% Cell type:markdown id: tags:
+
+### Iterators
+
+%% Cell type:markdown id: tags:
+
+We have seen a for loop  previously:
+
+%% Cell type:code id: tags:
+
+``` python
+# for value in iterable
+for elem in ['foo', 'bar', 'baz']:
+    print(elem, end=" ")
+```
+
+%% Cell type:markdown id: tags:
+
+* The expression passed for looping over needs to implement the *iterator interface*
+
+%% Cell type:code id: tags:
+
+``` python
+iter(['foo', 'bar', 'baz'])
+```
+
+%% Cell type:code id: tags:
+
+``` python
+iter(1)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+iter(range(10))
+```
+
+%% Cell type:markdown id: tags:
+
+To get the next element of an iterable, the *next()* statement is used
+
+%% Cell type:code id: tags:
+
+``` python
+an_iterable = iter(['foo', 'bar'])
+next(an_iterable)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+next(an_iterable)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+next(an_iterable)
+```
+
+%% Cell type:markdown id: tags:
+
+* When an iterable is depleted, a StopIteration is raised
+* The for in loop implicitely calls iter() and next() and catches the final StopIteration
+* Iterators are a central concept in Python
+* But why are they useful?
+
+%% Cell type:markdown id: tags:
+
+1. Many things can be treated as lists and iterated over (they just need to implement the iterator interface)
+2. The whole list is never explicitely created, which allows for memory-efficient programming
+
+%% Cell type:code id: tags:
+
+``` python
+N = 10 ** 12 # this would need terabytes of memory if completely instanciated as a list...
+for i in range(N):
+    if i >= 10: break # ... which would be a waste sincce we only need the 1st 10 elements
+    print(i, end=' ')
+```
+
+%% Cell type:markdown id: tags:
+
+* If the whole list needs to be generated, wrap the iterable with *list()*
+
+%% Cell type:code id: tags:
+
+``` python
+range(10) # a range iterator, does not print the values
+```
+
+%% Cell type:code id: tags:
+
+``` python
+list(range(10)) # a list
+```
+
+%% Cell type:markdown id: tags:
+
+* Some useful iterators
+
+%% Cell type:markdown id: tags:
+
+* range() produces a sequence of integers
+
+%% Cell type:code id: tags:
+
+``` python
+for i in range(10):
+    print(i, end=' ')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+for i in range(10, 0, -1):
+    print(i, end=' ')
+```
+
+%% Cell type:markdown id: tags:
+
+* enumerate(an_iterable)  creates an iterator of 2-tuples
+
+%% Cell type:code id: tags:
+
+``` python
+for i, val in enumerate(['Federer', 'Nadal', 'Djokovic']):
+    print(i+1, val)
+```
+
+%% Cell type:markdown id: tags:
+
+* Please note the *unpacking* that is happening here. enumerate() generates 2-tuples (1, 'Federer), (2, 'Nadal'), ... which are *unpacked* into the two separate variables i and val.
+
+%% Cell type:markdown id: tags:
+
+* zip() can be used to iterate over multiple lists simultaneously and returns an iterator of the sequence (all, first, elements), (all, second, elements), ...
+
+%% Cell type:code id: tags:
+
+``` python
+first_names = ['Roger', 'Rafael', 'Novak']
+last_names =['Federer', 'Nadal', 'Djokovic']
+for first, last in zip(first_names, last_names):
+    print(first, last)
+```
+
+%% Cell type:markdown id: tags:
+
+* the number of lists can be greater than two
+
+%% Cell type:markdown id: tags:
+
+* If lengths are not equal, the longer ones are truncated
+
+%% Cell type:markdown id: tags:
+
+### List comprehensions
+
+%% Cell type:markdown id: tags:
+
+* Python is famous for its list comprehension construct
+
+%% Cell type:code id: tags:
+
+``` python
+[i**2 for i in range(10)]
+```
+
+%% Cell type:markdown id: tags:
+
+* Takes an iterable, does something to every element and returns a list of the results
+
+%% Cell type:markdown id: tags:
+
+* The basic syntax is `[expression for variable in iterable]`
+
+%% Cell type:markdown id: tags:
+
+There is an extended syntax including a filter
+
+%% Cell type:code id: tags:
+
+``` python
+[i for i in range(10) if i%2]
+```
+
+%% Cell type:markdown id: tags:
+
+* `[expression for variable in iterable if condition]`
+
+%% Cell type:markdown id: tags:
+
+* List comprehensions can of course also be written as a multiline for loop
+
+%% Cell type:markdown id: tags:
+
+* List comprehensions can also be nested
+
+%% Cell type:code id: tags:
+
+``` python
+[(i, j) for i in range(2) for j in range(3)]
+```
+
+%% Cell type:markdown id: tags:
+
+* This is equivalend to a nested for loop with `i` varying in the outer loop and `j` in the inner
+
+%% Cell type:markdown id: tags:
+
+Set comprehension
+
+%% Cell type:markdown id: tags:
+
+* With the same syntax but curly braces (not parenthesis!), a set is created and returned
+
+%% Cell type:code id: tags:
+
+``` python
+{-n for n in [1,1,2,3,4,4,4,5]}
+```
+
+%% Cell type:markdown id: tags:
+
+* Note that duplicates are eliminated
+
+%% Cell type:markdown id: tags:
+
+Dict comprehension
+
+%% Cell type:markdown id: tags:
+
+* With almost similar syntax (only an additional colon), a dictionary is returned
+
+%% Cell type:code id: tags:
+
+``` python
+{n:n**2 for n in range(6)}
+```
+
+%% Cell type:markdown id: tags:
+
+### Generators
+
+%% Cell type:markdown id: tags:
+
+* In the above examples, the list, set ot dict is fully created. These structures are constrained by available memory.
+* When instanciating a list, for example with a list comprehension, we are building a collection of values.
+* When instanciating a generator, we are building a *recipe* to create a collection of values
+* Both expose the same iterator interface, but with a generator, the collection is not precomputed. Elements are generated on demand.
+* Lists can be used multiple times
+
+%% Cell type:code id: tags:
+
+``` python
+a = [1,2,3]
+print(a)
+print(a)
+```
+
+%% Cell type:markdown id: tags:
+
+* Generators are single-use (once-used values are gone)
+
+%% Cell type:code id: tags:
+
+``` python
+g = (n for n in range(1, 4))
+print(list(g))
+print(list(g))
+```
+
+%% Cell type:markdown id: tags:
+
+* Generators cannot be sliced (or concatenated or multiplied or ...)
+
+%% Cell type:code id: tags:
+
+``` python
+a_list = [1,2,3]
+a_gen = iter([1,2,3])
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_list[2]
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_gen[2]
+```
+
+%% Cell type:markdown id: tags:
+
+Building generators with *generator expressions*
+
+%% Cell type:code id: tags:
+
+``` python
+gen = (n**2 for n in [1,2,3])
+gen
+```
+
+%% Cell type:code id: tags:
+
+``` python
+next(gen)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+next(gen)
+```
+
+%% Cell type:markdown id: tags:
+
+Building generators with the *yield* statement
+
+%% Cell type:markdown id: tags:
+
+* For more complex operations, it is better to use a for loop than to construct an overly complex generator expression (or list comprehension)
+* We use the *yield* statement to return a value and stop the execution until the next value is requested
+
+%% Cell type:code id: tags:
+
+``` python
+def gen():
+    for n in [1,2,3]:
+        yield n ** 2
+```
+
+%% Cell type:code id: tags:
+
+``` python
+g = gen()
+# 1st use of next():
+# run the code up to the yield statement, return 1st value and pause
+next(g)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# continue after the yield statement and execute code
+# until the next yield statement is encountered
+next(g)
+```
+
+%% Cell type:markdown id: tags:
+
+### Exceptions
+
+%% Cell type:markdown id: tags:
+
+Runtime errors throw exceptions of different types
+
+%% Cell type:code id: tags:
+
+``` python
+print(2/0)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+l = [1,2,3]
+print(l[42])
+```
+
+%% Cell type:code id: tags:
+
+``` python
+1 + 'a'
+```
+
+%% Cell type:markdown id: tags:
+
+* try/except block (which catches all types of exceptions)
+
+%% Cell type:code id: tags:
+
+``` python
+try:
+    print("Let's try something.")
+    #x = 1 / 0 # ZeroDivisionError
+except:
+    print("Something bad happened!")
+```
+
+%% Cell type:markdown id: tags:
+
+* It is better to catch only anticipated exceptions and let everything unforseen still raise an error, so we don't mask problems in our code.
+
+%% Cell type:code id: tags:
+
+``` python
+try:
+    print("Let's try something.")
+    #x = 1 / 0 # ZeroDivisionError
+except ZeroDivisionError:
+    print("Something bad happened!")
+```
+
+%% Cell type:markdown id: tags:
+
+Access error message
+
+%% Cell type:code id: tags:
+
+``` python
+try:
+    x = 1 / 0
+except ZeroDivisionError as err:
+    print("Error class is:  ", type(err))
+    print("Error message is:", err)
+```
+
+%% Cell type:markdown id: tags:
+
+Catch multiple exceptions
+
+%% Cell type:code id: tags:
+
+``` python
+try:
+    pass # do something that may fail
+except SomeException:
+    pass # do something for this kind of exception
+except AnotherException:
+    pass # do something else for another kind of exception
+```
+
+%% Cell type:code id: tags:
+
+``` python
+except (SomeException, AnotherException) as e: # parenthesis are mandatory, 'as e' is optional
+    pass # do the same thing for all exceptions
+```
+
+%% Cell type:markdown id: tags:
+
+Raise or re-raise an exception
+
+%% Cell type:code id: tags:
+
+``` python
+raise RuntimeError("my error message")
+```
+
+%% Cell type:markdown id: tags:
+
+Complete block
+
+%% Cell type:code id: tags:
+
+``` python
+try:
+    print("try something here (always executed)")
+except:
+    print("this happens only if it fails")
+else:
+    print("this happens only if it succeeds")
+finally:
+    print("this happens no matter what (used for cleanup)")
+```
+
+%% Cell type:markdown id: tags:
+
+Suppress exceptions
+
+%% Cell type:code id: tags:
+
+``` python
+from contextlib import suppress
+
+with suppress(DeprecationWarning, FutureWarning):
+     raise(FutureWarning)
+```
+
+%% Cell type:markdown id: tags:
+
+### Modules
+
+%% Cell type:markdown id: tags:
+
+* Additional functionality can be used by loading modules
+
+%% Cell type:code id: tags:
+
+``` python
+import math # Explicit import of whole module
+math.cos(math.pi)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+import numpy as np # Explicit import by alias
+np.random.randint(10)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+from os import getuid, getgid # Explicit import of module parts
+getuid(), getgid()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+from pathlib import *  # Implicit import of module, use with care
+data = Path("data")
+```
+
+%% Cell type:markdown id: tags:
+
+Sometimes, you see a statement like this at the end of a Python script
+
+%% Cell type:code id: tags:
+
+``` python
+if __name__ == '__main__':
+    # script was executed standalone
+    call_main_function()
+```
+
+%% Cell type:markdown id: tags:
+
+* This is used to make a standalone executable script which is also importable as a module
+* The code is executed only when being called standalone, and not when imported as a module
+
+%% Cell type:markdown id: tags:
+
+### Pitfalls in Python
+
+%% Cell type:markdown id: tags:
+
+* Although Python tries to follow the [Principle of least astonishment](https://en.wikipedia.org/wiki/Principle_of_least_astonishment), there are some pitfalls one must be aware of
+* Let's discuss them, in (my subjective) order of importance
+* Don't mix spaces and tabs
+* Know what version of Python you use, Python 2 and 3 are different in some important points
+* Mutable default argument
+
+%% Cell type:code id: tags:
+
+``` python
+def foo(a=[]):
+    a.append(5)
+    return a
+```
+
+%% Cell type:code id: tags:
+
+``` python
+foo()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+foo()
+```
+
+%% Cell type:markdown id: tags:
+
+* What?!
+
+%% Cell type:markdown id: tags:
+
+Why?
+
+%% Cell type:markdown id: tags:
+
+* Functions in Python are first-class objects
+* They are evaluated at definition time
+* Default parameters are kind of *member data* and so their state may change from one call to the other
+
+%% Cell type:markdown id: tags:
+
+What to do instead?
+
+%% Cell type:markdown id: tags:
+
+* Never use mutable default arguments
+
+%% Cell type:code id: tags:
+
+``` python
+def foo(a=None):
+    if a is None:
+        a = []
+    a.append(5)
+    return a
+```
+
+%% Cell type:code id: tags:
+
+``` python
+foo()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+foo()
+```
+
+%% Cell type:markdown id: tags:
+
+* Assignment is by reference, not by copy. Only immutable objects are actually copied.
+
+%% Cell type:code id: tags:
+
+``` python
+a = [1,2,3,4]
+b = a
+b.append(5)
+print(a)
+```
+
+%% Cell type:markdown id: tags:
+
+Proper check for None:
+
+%% Cell type:code id: tags:
+
+``` python
+# correct (compares object's identity)
+if a is not None:
+    pass
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# wrong (same as a == True, compares object's equality)
+if a:
+    pass
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# from https://stackoverflow.com/a/14247383/2315949
+class Negator(object):
+    def __eq__(self,other):
+        return not other
+thing = Negator()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+thing == None
+```
+
+%% Cell type:code id: tags:
+
+``` python
+thing is None
+```
+
+%% Cell type:markdown id: tags:
+
+* Be careful when using "==" to check against True or False
+
+%% Cell type:code id: tags:
+
+``` python
+if (var == True):  # this will execute if var is True or 1, 1.0, 1L
+
+if (var != True):  # this will execute if var is neither True nor 1
+
+if (var == False): # this will execute if var is False or 0 (or 0.0, 0L, 0j)
+
+if (var == None):  # only execute if var is None
+
+if var:            # execute if var is a non-empty string/list/dictionary/tuple, non-0, etc
+
+if not var:        # execute if var is "", {}, [], (), 0, None, etc.
+
+if var is True:    # only execute if var is boolean True, not 1
+
+if var is False:   # only execute if var is boolean False, not 0
+
+if var is None:    # same as var == None
+```
+
+%% Cell type:markdown id: tags:
+
+* Dicts are unordered, even if they often come along in the order they were filled.
+* Use *OrderedDict* from the *collections* module if you actually need an ordered dict
+* Sets are initialised with `set(iterable)`, whereas `{}` initialises a dict.
+
+%% Cell type:code id: tags:
+
+``` python
+set([1,2,3])
+```
+
+%% Cell type:markdown id: tags:
+
+* `list.sort()` sorts inplace, `sorted()` returns a list
+* ++n and --n not work as people with C or Java background would expect
+
+%% Cell type:code id: tags:
+
+``` python
+n = 1
++n # positive of a positive number, which is simply n
+```
+
+%% Cell type:code id: tags:
+
+``` python
+--n # negative of a negative number, which is simply n
+```
+
+%% Cell type:markdown id: tags:
+
+Multiprocessing
+
+%% Cell type:markdown id: tags:
+
+* The GIL (Global Interpreter Lock) is responsible for the fact that only one thread in a Python program can be running at any one time
+* But when running Python code, you don't get parallel execution most of the time. In other words, threads in Python are not like threads in Java or C++.
+* There are many instances in which things do run in parallel, like when using libraries that are essentially C extensions (numpy for example)
+* Use the *multiprocessing* module if you want to parallelize your own code
+
+%% Cell type:markdown id: tags:
+
+### Pythonic code (just a few examples)
+
+%% Cell type:code id: tags:
+
+``` python
+# Don't
+l = ['a','b','c']
+for i in range(len(l)):
+    print(l[i], end=' ')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# Do this instead:
+for elem in l:
+    print(elem, end=' ')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# Or if you do need the counter:
+for i, elem in enumerate(l):
+    print("{}:{} ".format(i, elem), end=' ')
+```
+
+%% Cell type:markdown id: tags:
+
+*It's easier to ask for forgiveness than permission* (EAFP) instead of *Look before you leap* (LBYL)
+
+%% Cell type:code id: tags:
+
+``` python
+# from https://stackoverflow.com/a/11360880/2315949
+
+# Don't (LBYL):
+if 'key' in my_dict:
+    x = my_dict['key']
+else:
+    # handle missing key
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# Do this instead (EAFP):
+try:
+    x = my_dict['key']
+except KeyError:
+    # handle missing key
+```
+
+%% Cell type:markdown id: tags:
+
+Duck typing
+
+%% Cell type:markdown id: tags:
+
+* Don't check if it is a duck, it's enough if it walks like a duck and quacks like a duck
+
+%% Cell type:code id: tags:
+
+``` python
+# Don't
+def foo(name):
+    if isinstance(name, str):
+        print(name.lower())
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# Do this instead. It is enough if the object has a string representation (many objects do)
+def foo(name) :
+    print(str(name).lower())
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# Don't
+def bar(listing):
+    if isinstance(listing, list):
+        listing.extend((1, 2, 3))
+        return ", ".join(listing)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# Do this instead. It is enough if the object implements the sequence protocol (many objects do)
+def bar(listing):
+    l = list(listing)
+    l.extend((1, 2, 3))
+    return ", ".join(l)
+```
+
+%% Cell type:markdown id: tags:
+
+Don't write long and unreadable one-liners just because you can
+
+%% Cell type:code id: tags:
+
+``` python
+l = [m for a, b in zip(['a', 'b', 'c'], [1,2,3]) if b.method(a) != b for m in b if not m.method(a, b) and reduce(lambda x, y: a + y.method(), (m, a, b))]
+```
+
+%% Cell type:markdown id: tags:
+
+Python with its handy list comprehensions is somewhat more prone to this than other languages
+
+%% Cell type:markdown id: tags:
+
+And again
+
+%% Cell type:code id: tags:
+
+``` python
+import this
+```
+
+%% Cell type:markdown id: tags:
+
+And also
+
+%% Cell type:markdown id: tags:
+
+[PEP 8 -- Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/)
+
+(PEP stands for Python Enhancement Proposals)
+
+%% Cell type:markdown id: tags:
+
+### Some things left out
+
+%% Cell type:markdown id: tags:
+
+* [Object-oriented programming](https://www.programiz.com/python-programming/object-oriented-programming)
+
+%% Cell type:markdown id: tags:
+
+* [@Properties](https://www.programiz.com/python-programming/property)
+
+%% Cell type:markdown id: tags:
+
+* [Decorators](https://www.programiz.com/python-programming/decorator)
+
+%% Cell type:markdown id: tags:
+
+* [Closures](https://www.programiz.com/python-programming/closure)
+
+%% Cell type:markdown id: tags:
+
+* [Shallow and deep copying](https://www.programiz.com/python-programming/shallow-deep-copy)
+%% Cell type:markdown id: tags:
+
+# Machine Learning (ML) Module - Python crash course
+
+%% Cell type:markdown id: tags:
+
+## Attribution
+
+%% Cell type:markdown id: tags:
+
+* A lot of the material in this course was inspired by the free book [A Whirlwind Tour of Python](https://www.oreilly.com/library/view/a-whirlwind-tour/9781492037859/) by Jake VanderPlas
+* And some, as usual, from [Wikipedia](https://en.wikipedia.org/wiki/Python_(programming_language)) and [Stackoverflow](https://stackoverflow.com/questions/tagged/python)
+
+%% Cell type:markdown id: tags:
+
+![book.jpg](attachment:book.jpg)
+
+%% Cell type:markdown id: tags:
+
+# Python
+
+%% Cell type:markdown id: tags:
+
+* An interpreted high-level programming language for general-purpose programming
+* Conceived in the late 1980s by Guido van Rossum
+* Simple, beautiful with a wide range of domain-specific libraries
+* As of February 2019, ranked 3th on the [TIOBE index](https://www.tiobe.com/tiobe-index/)
+* Python has a design philosophy that emphasizes code readability
+* Uses significant whitespace, the most obvious difference to other languages. Also probably the least important.
+* Python is typed *dynamically* (variables may change type) and *strongly* (no unobvious implicit casts)
+* Python has automatic memory management
+* Supports multiple programming paradigms, including object-oriented, functional and procedural
+* Has a large and comprehensive standard library
+* Users of Python are often referred to as Pythonists, Pythonistas, and Pythoneers
+
+%% Cell type:markdown id: tags:
+
+## and...
+
+%% Cell type:markdown id: tags:
+
+Python is arguably the most popular programming language for Data Science, with R and Julia being strong contenders
+
+%% Cell type:markdown id: tags:
+
+# The ZEN of Python
+
+%% Cell type:code id: tags:
+
+``` python
+import this
+```
+
+%% Cell type:markdown id: tags:
+
+# Python Syntax
+
+%% Cell type:code id: tags:
+
+``` python
+# Simple, iterative Fibonacci sequence
+
+# endpoint
+endpoint = 10
+
+# starting numbers
+a = 1; b = 1
+
+for i in range(endpoint-1):
+    print(a, end=" ")
+    a,b = b,a+b
+```
+
+%% Cell type:markdown id: tags:
+
+### Comments
+
+%% Cell type:markdown id: tags:
+
+* comments are marked by #
+
+%% Cell type:code id: tags:
+
+``` python
+# Simple, iterative Fibonacci sequence
+```
+
+%% Cell type:markdown id: tags:
+
+* there are no multi-line comments like /* ... */ such as in C
+
+%% Cell type:markdown id: tags:
+
+### Multi-line Comments
+* But wait, I have seen multi-line comments using triple quotes:
+
+%% Cell type:code id: tags:
+
+``` python
+"""
+Is this is a
+multiline
+comment?
+"""
+```
+
+%% Cell type:markdown id: tags:
+
+* Triple quotes are actually **multi-line strings**.
+
+%% Cell type:markdown id: tags:
+
+* If they are not assigned to a variable they will be immediately garbage collected as soon as the code executes.
+
+%% Cell type:markdown id: tags:
+
+* They are not ignored by the interpreter in the same way as # comment.
+
+%% Cell type:markdown id: tags:
+
+* You often use this multi-line strings as **docstrings**
+
+%% Cell type:markdown id: tags:
+
+### Statement termination
+
+%% Cell type:markdown id: tags:
+
+* end-of-line terminates a statement
+
+%% Cell type:code id: tags:
+
+``` python
+# endpoint
+endpoint = 10
+```
+
+%% Cell type:markdown id: tags:
+
+* statements can also be separated by a semicolon
+
+%% Cell type:code id: tags:
+
+``` python
+# starting numbers
+a = 1; b = 1
+```
+
+%% Cell type:markdown id: tags:
+
+* but this is not often used
+
+%% Cell type:markdown id: tags:
+
+### Code blocks and indentation
+
+%% Cell type:markdown id: tags:
+
+* Code blocks are indented by whitespace
+
+%% Cell type:code id: tags:
+
+``` python
+for i in range(endpoint-1):
+    print(a, end=" ")
+    a,b = b,a+b
+```
+
+%% Cell type:markdown id: tags:
+
+* tabs and spaces are not the same, don't mix them
+* best to set your editor to indent with 4 spaces
+
+%% Cell type:code id: tags:
+
+``` python
+# Simple, iterative Fibonacci sequence
+
+# endpoint
+endpoint = 10
+
+# starting numbers
+a,b = 1,1
+
+for i in range(endpoint-1):
+    print(a, end=" ")
+    a,b = b,a+b
+```
+
+%% Cell type:markdown id: tags:
+
+# Getting Help
+
+%% Cell type:markdown id: tags:
+
+* To get help on an object, you can use the internal help
+
+%% Cell type:code id: tags:
+
+``` python
+help(print)
+```
+
+%% Cell type:markdown id: tags:
+
+* When using iPython, you can also append a ? to the object you want help about
+
+%% Cell type:code id: tags:
+
+``` python
+print?
+```
+
+%% Cell type:markdown id: tags:
+
+* This will open a separate window with the help information
+
+%% Cell type:markdown id: tags:
+
+# Python Semantics
+
+%% Cell type:markdown id: tags:
+
+### Variables and Objects
+
+%% Cell type:markdown id: tags:
+
+* Variables are references, not containers
+
+%% Cell type:code id: tags:
+
+``` python
+a = [1,2,3,4]
+b = a
+print(b)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+b.append(5)
+print(a)
+```
+
+%% Cell type:markdown id: tags:
+
+* The assignment operator does just that, assign a refererence
+
+%% Cell type:markdown id: tags:
+
+* So:
+
+%% Cell type:code id: tags:
+
+``` python
+b = 'something else'
+print(a)
+```
+
+%% Cell type:markdown id: tags:
+
+* And there is no type-safety (and no declaration needed)
+* But above is only true for mutable values
+* Simple types such as numbers are immutable
+* So the following works as expected
+
+%% Cell type:code id: tags:
+
+``` python
+x = 5
+y = x
+y += 1
+print(x, y)
+```
+
+%% Cell type:markdown id: tags:
+
+* Everything in Python is an object
+* So everything has a type
+
+%% Cell type:code id: tags:
+
+``` python
+x = "Hello"
+type(x)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+type(print)
+```
+
+%% Cell type:markdown id: tags:
+
+* But the variable names are still references, the type information is contained in the object
+* even simple types have attributes
+
+%% Cell type:code id: tags:
+
+``` python
+x = 42.0
+x.is_integer()
+```
+
+%% Cell type:markdown id: tags:
+
+### Simple types
+
+%% Cell type:markdown id: tags:
+
+| Scalar Type | Example        | Description                                                     |
+|-------------|----------------|-----------------------------------------------------------------|
+| ``int``     | ``x = 1``      | integers (i.e., whole numbers)                                  |
+| ``float``   | ``x = 1.0``    | floating-point numbers (i.e., real numbers)                     |
+| ``complex`` | ``x = 1 + 2j`` | Complex numbers (i.e., numbers with real and imaginary part)    |
+| ``bool``    | ``x = True``   | Boolean: True/False values                                      |
+| ``str``     | ``x = 'abc'``  | String: characters or text, either with single or double quotes |
+| ``NoneType``| ``x = None``   | Special object indicating nulls                                 |
+
+%% Cell type:markdown id: tags:
+
+### Built-in Data Structures
+
+%% Cell type:markdown id: tags:
+
+There are four built-in compound data structures that act as containers
+
+%% Cell type:code id: tags:
+
+``` python
+a_list = [0,1,2,3,4] # mutable, ordered collection
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_tuple = (0,1,2,3,4) # immutable, ordered collection
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_dictionary = {'a':0, 'b':1, 'c':2} # unordered key/value mapping
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_set = {0,1,2,3,3,4,4,4} # Unordered collection of unique values
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_set
+```
+
+%% Cell type:markdown id: tags:
+
+Lists
+
+%% Cell type:markdown id: tags:
+
+* Lists are the workhorse datatype of Python
+
+%% Cell type:markdown id: tags:
+
+* They are called lists, not arrays
+
+%% Cell type:code id: tags:
+
+``` python
+len(a_list) # they have a length
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_list.append('a') # and many many userful attributes and functions
+a_list
+```
+
+%% Cell type:markdown id: tags:
+
+#### List indexing and slicing
+
+%% Cell type:code id: tags:
+
+``` python
+a_list
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_list[0] # zero-based indexing for accessing elements
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_list[-1] # access from the end of the list
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_list
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_list[2:4] # slice a list, 1st index is included, 2nd excluded
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_list[:2] # omitting the initial 0 or the end index (or both) is possible
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_list
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_list[::2] # a third optional index indicated stepsize
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_list[::-1] # reverse a list with a negative stepsize
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_list[2:4] = [9,8] # indexing and slicing can be used on the left-hand-side too
+a_list
+```
+
+%% Cell type:markdown id: tags:
+
+#### Tuples
+
+%% Cell type:code id: tags:
+
+``` python
+# tuples behave like lists except that they are immutable
+a_tuple[0] = 6
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# convert between the two with list() and tuple()
+another_list = list(a_tuple)
+another_list[0] = 6
+tuple(another_list)
+```
+
+%% Cell type:markdown id: tags:
+
+#### Dictionaries
+
+%% Cell type:code id: tags:
+
+``` python
+a_dict = {'a': 1, 2: 27, 'large_num':2**64}
+a_dict
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_dict['large_num']
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_dict.keys(), a_dict.values(), a_dict.items()
+```
+
+%% Cell type:markdown id: tags:
+
+* Please note that due to efficiency reasons, dictionaries are unordered
+
+%% Cell type:markdown id: tags:
+
+#### Sets
+
+%% Cell type:code id: tags:
+
+``` python
+a_set = {1,2,3,3,4,4,4,4}
+a_set
+```
+
+%% Cell type:markdown id: tags:
+
+* Sets have no duplicates and are unordered
+
+%% Cell type:code id: tags:
+
+``` python
+another_set = {2,3,5,6}
+a_set & another_set # or: a_set.intersection(another_set)
+```
+
+%% Cell type:markdown id: tags:
+
+* set operations: union `|`, difference `-`, symmmetric difference `^` and more
+
+%% Cell type:markdown id: tags:
+
+### Control Flow
+
+%% Cell type:markdown id: tags:
+
+#### if elif else
+
+%% Cell type:code id: tags:
+
+``` python
+a = 10
+
+if a == 10:
+    print("ten")
+elif x > 0:
+    print("positive")
+elif x < 0:
+    print("negative")
+else:
+    print("must be something else then...")
+```
+
+%% Cell type:markdown id: tags:
+
+#### for loops
+
+%% Cell type:code id: tags:
+
+``` python
+# for value in iterable
+for elem in ['foo', 'bar', 'baz']:
+    print(elem, end=" ")
+```
+
+%% Cell type:markdown id: tags:
+
+* There is no need for a counter, we iterate over a so-called sequence
+
+%% Cell type:markdown id: tags:
+
+#### while loops
+
+%% Cell type:code id: tags:
+
+``` python
+condition = True
+while condition:
+    print("it's true only once")
+    condition = False
+```
+
+%% Cell type:markdown id: tags:
+
+#### break and continue
+
+%% Cell type:markdown id: tags:
+
+* break leaves the innermost loop
+* continue moves to the next iteration of the current loop
+
+%% Cell type:markdown id: tags:
+
+#### Ternary expression
+
+%% Cell type:code id: tags:
+
+``` python
+# expr if cond else expr
+a = 1
+b = 2
+"True" if a>b else "False"
+```
+
+%% Cell type:markdown id: tags:
+
+### Functions
+
+%% Cell type:code id: tags:
+
+``` python
+def simple_fib(length=10):
+    result = []
+
+    # starting numbers
+    a = 1; b = 1
+
+    while len(result) < length:
+        result.append(a)
+        a,b = b,a+b
+    return result
+
+simple_fib(10)
+```
+
+%% Cell type:markdown id: tags:
+
+* functions can have defaults for arguments
+
+%% Cell type:code id: tags:
+
+``` python
+def simple_fib(length=10):
+    ...
+```
+
+%% Cell type:markdown id: tags:
+
+* every function returns a value, either the one given in the return statement or *None*
+* Arguments and return values can be any Python object
+
+%% Cell type:code id: tags:
+
+``` python
+def real_and_imag(a_complex_num):
+    return a_complex_num.real, a_complex_num.imag
+
+r, i = real_and_imag(3 + 4j)
+r, i
+```
+
+%% Cell type:markdown id: tags:
+
+* Function calls can have positional arguments and/or keyword arguments
+
+%% Cell type:code id: tags:
+
+``` python
+def real_and_imag(a_complex_num, nothing):
+    return a_complex_num.real, a_complex_num.imag
+
+real_and_imag(3, None) # order matters
+real_and_imag(nothing=None, a_complex_num=3) # order does not matter
+```
+
+%% Cell type:markdown id: tags:
+
+* When mixing, positional arguments must come before keyword
+
+%% Cell type:code id: tags:
+
+``` python
+print(1, 2, 3, sep='--')
+#print(1, sep='--', 2 , 3) # throws an error
+```
+
+%% Cell type:markdown id: tags:
+
+* An arbitrary number of arguments is supported with `*args` and `**kwargs`
+
+%% Cell type:code id: tags:
+
+``` python
+def catch_all(*args, **kwargs):
+    print("positional args =", args)
+    print("keyword wargs = ", kwargs)
+catch_all(1, 2, 3, a=4, b=5)
+```
+
+%% Cell type:markdown id: tags:
+
+* The names args and kwargs are just a convention, important are the `*` and `**`
+* A `*` before a parameter means "unpack this as a sequence", while a double `**` before a parameter means "unpack this as a dictionary"
+
+%% Cell type:code id: tags:
+
+``` python
+inputs = (1, 2, 3)
+keywords = {'pi': 3.14}
+
+catch_all(*inputs, **keywords)
+catch_all(1, 2, 3, pi=3.14) # equal to the previous line
+```
+
+%% Cell type:markdown id: tags:
+
+* Functions are objects too (everything in Python is an object)
+* They can be passed around, assigned and have members
+
+%% Cell type:code id: tags:
+
+``` python
+def what_type(f):
+    print(type(f))
+what_type(print)
+```
+
+%% Cell type:markdown id: tags:
+
+#### Lambda functions
+
+%% Cell type:markdown id: tags:
+
+* Anonymous, short-lived functions
+* Often used with data manipulation
+
+%% Cell type:code id: tags:
+
+``` python
+add = lambda x, y: x + y
+add(1, 2)
+```
+
+%% Cell type:markdown id: tags:
+
+* In the following example, a lambda function is used to filter a list and only return the even values
+* filter(func, iterable) takes a function and applies it to each element of an iterable
+
+%% Cell type:code id: tags:
+
+``` python
+even = filter(lambda x: x % 2 == 0,
+              [1, 2, 3, 4, 5, 6, 7, 8, 9])
+list(even)
+```
+
+%% Cell type:markdown id: tags:
+
+* Or sort a list with an alternate key
+
+%% Cell type:code id: tags:
+
+``` python
+sorted([1, 2, 3, 4, 5, 6, 7, 8, 9], key=lambda x: abs(5-x))
+```
+
+%% Cell type:markdown id: tags:
+
+### Namespaces and Scope
+
+%% Cell type:markdown id: tags:
+
+* Functions can access the global scope, and create their own local scope
+* Functions have their local namespace, which is created and initialized with the parameters when the function is called
+* Variables assigned within the function are added to this namespace, and destroyed when the function ends
+* Functions can access variables that were defined in the outer scope by reference
+
+%% Cell type:code id: tags:
+
+``` python
+def f():
+    a.append(4)
+
+a = [1,2,3]
+f()
+a
+```
+
+%% Cell type:markdown id: tags:
+
+* But using an assignment on that reference does not change it globally
+
+%% Cell type:code id: tags:
+
+``` python
+def f():
+    a = [4,5,6]
+    print("inside f: ", a)
+
+a = [1,2,3]
+f()
+print("outside f after calling f(): ", a)
+```
+
+%% Cell type:markdown id: tags:
+
+* There are the two keywords `global` and `nonlocal` that change these rules, but we will not look at them in detail
+
+%% Cell type:markdown id: tags:
+
+### Iterators
+
+%% Cell type:markdown id: tags:
+
+We have seen a for loop  previously:
+
+%% Cell type:code id: tags:
+
+``` python
+# for value in iterable
+for elem in ['foo', 'bar', 'baz']:
+    print(elem, end=" ")
+```
+
+%% Cell type:markdown id: tags:
+
+* The expression passed for looping over needs to implement the *iterator interface*
+
+%% Cell type:code id: tags:
+
+``` python
+iter(['foo', 'bar', 'baz'])
+```
+
+%% Cell type:code id: tags:
+
+``` python
+iter(1)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+iter(range(10))
+```
+
+%% Cell type:markdown id: tags:
+
+To get the next element of an iterable, the *next()* statement is used
+
+%% Cell type:code id: tags:
+
+``` python
+an_iterable = iter(['foo', 'bar'])
+next(an_iterable)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+next(an_iterable)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+next(an_iterable)
+```
+
+%% Cell type:markdown id: tags:
+
+* When an iterable is depleted, a StopIteration is raised
+* The for in loop implicitely calls iter() and next() and catches the final StopIteration
+* Iterators are a central concept in Python
+* But why are they useful?
+
+%% Cell type:markdown id: tags:
+
+1. Many things can be treated as lists and iterated over (they just need to implement the iterator interface)
+2. The whole list is never explicitely created, which allows for memory-efficient programming
+
+%% Cell type:code id: tags:
+
+``` python
+N = 10 ** 12 # this would need terabytes of memory if completely instanciated as a list...
+for i in range(N):
+    if i >= 10: break # ... which would be a waste sincce we only need the 1st 10 elements
+    print(i, end=' ')
+```
+
+%% Cell type:markdown id: tags:
+
+* If the whole list needs to be generated, wrap the iterable with *list()*
+
+%% Cell type:code id: tags:
+
+``` python
+range(10) # a range iterator, does not print the values
+```
+
+%% Cell type:code id: tags:
+
+``` python
+list(range(10)) # a list
+```
+
+%% Cell type:markdown id: tags:
+
+* Some useful iterators
+
+%% Cell type:markdown id: tags:
+
+* range() produces a sequence of integers
+
+%% Cell type:code id: tags:
+
+``` python
+for i in range(10):
+    print(i, end=' ')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+for i in range(10, 0, -1):
+    print(i, end=' ')
+```
+
+%% Cell type:markdown id: tags:
+
+* enumerate(an_iterable)  creates an iterator of 2-tuples
+
+%% Cell type:code id: tags:
+
+``` python
+for i, val in enumerate(['Federer', 'Nadal', 'Djokovic']):
+    print(i+1, val)
+```
+
+%% Cell type:markdown id: tags:
+
+* Please note the *unpacking* that is happening here. enumerate() generates 2-tuples (1, 'Federer), (2, 'Nadal'), ... which are *unpacked* into the two separate variables i and val.
+
+%% Cell type:markdown id: tags:
+
+* zip() can be used to iterate over multiple lists simultaneously and returns an iterator of the sequence (all, first, elements), (all, second, elements), ...
+
+%% Cell type:code id: tags:
+
+``` python
+first_names = ['Roger', 'Rafael', 'Novak']
+last_names =['Federer', 'Nadal', 'Djokovic']
+for first, last in zip(first_names, last_names):
+    print(first, last)
+```
+
+%% Cell type:markdown id: tags:
+
+* the number of lists can be greater than two
+
+%% Cell type:markdown id: tags:
+
+* If lengths are not equal, the longer ones are truncated
+
+%% Cell type:markdown id: tags:
+
+### List comprehensions
+
+%% Cell type:markdown id: tags:
+
+* Python is famous for its list comprehension construct
+
+%% Cell type:code id: tags:
+
+``` python
+[i**2 for i in range(10)]
+```
+
+%% Cell type:markdown id: tags:
+
+* Takes an iterable, does something to every element and returns a list of the results
+
+%% Cell type:markdown id: tags:
+
+* The basic syntax is `[expression for variable in iterable]`
+
+%% Cell type:markdown id: tags:
+
+There is an extended syntax including a filter
+
+%% Cell type:code id: tags:
+
+``` python
+[i for i in range(10) if i%2]
+```
+
+%% Cell type:markdown id: tags:
+
+* `[expression for variable in iterable if condition]`
+
+%% Cell type:markdown id: tags:
+
+* List comprehensions can of course also be written as a multiline for loop
+
+%% Cell type:markdown id: tags:
+
+* List comprehensions can also be nested
+
+%% Cell type:code id: tags:
+
+``` python
+[(i, j) for i in range(2) for j in range(3)]
+```
+
+%% Cell type:markdown id: tags:
+
+* This is equivalend to a nested for loop with `i` varying in the outer loop and `j` in the inner
+
+%% Cell type:markdown id: tags:
+
+Set comprehension
+
+%% Cell type:markdown id: tags:
+
+* With the same syntax but curly braces (not parenthesis!), a set is created and returned
+
+%% Cell type:code id: tags:
+
+``` python
+{-n for n in [1,1,2,3,4,4,4,5]}
+```
+
+%% Cell type:markdown id: tags:
+
+* Note that duplicates are eliminated
+
+%% Cell type:markdown id: tags:
+
+Dict comprehension
+
+%% Cell type:markdown id: tags:
+
+* With almost similar syntax (only an additional colon), a dictionary is returned
+
+%% Cell type:code id: tags:
+
+``` python
+{n:n**2 for n in range(6)}
+```
+
+%% Cell type:markdown id: tags:
+
+### Generators
+
+%% Cell type:markdown id: tags:
+
+* In the above examples, the list, set ot dict is fully created. These structures are constrained by available memory.
+* When instanciating a list, for example with a list comprehension, we are building a collection of values.
+* When instanciating a generator, we are building a *recipe* to create a collection of values
+* Both expose the same iterator interface, but with a generator, the collection is not precomputed. Elements are generated on demand.
+* Lists can be used multiple times
+
+%% Cell type:code id: tags:
+
+``` python
+a = [1,2,3]
+print(a)
+print(a)
+```
+
+%% Cell type:markdown id: tags:
+
+* Generators are single-use (once-used values are gone)
+
+%% Cell type:code id: tags:
+
+``` python
+g = (n for n in range(1, 4))
+print(list(g))
+print(list(g))
+```
+
+%% Cell type:markdown id: tags:
+
+* Generators cannot be sliced (or concatenated or multiplied or ...)
+
+%% Cell type:code id: tags:
+
+``` python
+a_list = [1,2,3]
+a_gen = iter([1,2,3])
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_list[2]
+```
+
+%% Cell type:code id: tags:
+
+``` python
+a_gen[2]
+```
+
+%% Cell type:markdown id: tags:
+
+Building generators with *generator expressions*
+
+%% Cell type:code id: tags:
+
+``` python
+gen = (n**2 for n in [1,2,3])
+gen
+```
+
+%% Cell type:code id: tags:
+
+``` python
+next(gen)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+next(gen)
+```
+
+%% Cell type:markdown id: tags:
+
+Building generators with the *yield* statement
+
+%% Cell type:markdown id: tags:
+
+* For more complex operations, it is better to use a for loop than to construct an overly complex generator expression (or list comprehension)
+* We use the *yield* statement to return a value and stop the execution until the next value is requested
+
+%% Cell type:code id: tags:
+
+``` python
+def gen():
+    for n in [1,2,3]:
+        yield n ** 2
+```
+
+%% Cell type:code id: tags:
+
+``` python
+g = gen()
+# 1st use of next():
+# run the code up to the yield statement, return 1st value and pause
+next(g)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# continue after the yield statement and execute code
+# until the next yield statement is encountered
+next(g)
+```
+
+%% Cell type:markdown id: tags:
+
+### Exceptions
+
+%% Cell type:markdown id: tags:
+
+Runtime errors throw exceptions of different types
+
+%% Cell type:code id: tags:
+
+``` python
+print(2/0)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+l = [1,2,3]
+print(l[42])
+```
+
+%% Cell type:code id: tags:
+
+``` python
+1 + 'a'
+```
+
+%% Cell type:markdown id: tags:
+
+* try/except block (which catches all types of exceptions)
+
+%% Cell type:code id: tags:
+
+``` python
+try:
+    print("Let's try something.")
+    #x = 1 / 0 # ZeroDivisionError
+except:
+    print("Something bad happened!")
+```
+
+%% Cell type:markdown id: tags:
+
+* It is better to catch only anticipated exceptions and let everything unforseen still raise an error, so we don't mask problems in our code.
+
+%% Cell type:code id: tags:
+
+``` python
+try:
+    print("Let's try something.")
+    #x = 1 / 0 # ZeroDivisionError
+except ZeroDivisionError:
+    print("Something bad happened!")
+```
+
+%% Cell type:markdown id: tags:
+
+Access error message
+
+%% Cell type:code id: tags:
+
+``` python
+try:
+    x = 1 / 0
+except ZeroDivisionError as err:
+    print("Error class is:  ", type(err))
+    print("Error message is:", err)
+```
+
+%% Cell type:markdown id: tags:
+
+Catch multiple exceptions
+
+%% Cell type:code id: tags:
+
+``` python
+try:
+    pass # do something that may fail
+except SomeException:
+    pass # do something for this kind of exception
+except AnotherException:
+    pass # do something else for another kind of exception
+```
+
+%% Cell type:code id: tags:
+
+``` python
+except (SomeException, AnotherException) as e: # parenthesis are mandatory, 'as e' is optional
+    pass # do the same thing for all exceptions
+```
+
+%% Cell type:markdown id: tags:
+
+Raise or re-raise an exception
+
+%% Cell type:code id: tags:
+
+``` python
+raise RuntimeError("my error message")
+```
+
+%% Cell type:markdown id: tags:
+
+Complete block
+
+%% Cell type:code id: tags:
+
+``` python
+try:
+    print("try something here (always executed)")
+except:
+    print("this happens only if it fails")
+else:
+    print("this happens only if it succeeds")
+finally:
+    print("this happens no matter what (used for cleanup)")
+```
+
+%% Cell type:markdown id: tags:
+
+Suppress exceptions
+
+%% Cell type:code id: tags:
+
+``` python
+from contextlib import suppress
+
+with suppress(DeprecationWarning, FutureWarning):
+     raise(FutureWarning)
+```
+
+%% Cell type:markdown id: tags:
+
+### Modules
+
+%% Cell type:markdown id: tags:
+
+* Additional functionality can be used by loading modules
+
+%% Cell type:code id: tags:
+
+``` python
+import math # Explicit import of whole module
+math.cos(math.pi)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+import numpy as np # Explicit import by alias
+np.random.randint(10)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+from os import getuid, getgid # Explicit import of module parts
+getuid(), getgid()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+from pathlib import *  # Implicit import of module, use with care
+data = Path("data")
+```
+
+%% Cell type:markdown id: tags:
+
+Sometimes, you see a statement like this at the end of a Python script
+
+%% Cell type:code id: tags:
+
+``` python
+if __name__ == '__main__':
+    # script was executed standalone
+    call_main_function()
+```
+
+%% Cell type:markdown id: tags:
+
+* This is used to make a standalone executable script which is also importable as a module
+* The code is executed only when being called standalone, and not when imported as a module
+
+%% Cell type:markdown id: tags:
+
+### Pitfalls in Python
+
+%% Cell type:markdown id: tags:
+
+* Although Python tries to follow the [Principle of least astonishment](https://en.wikipedia.org/wiki/Principle_of_least_astonishment), there are some pitfalls one must be aware of
+* Let's discuss them, in (my subjective) order of importance
+* Don't mix spaces and tabs
+* Know what version of Python you use, Python 2 and 3 are different in some important points
+* Mutable default argument
+
+%% Cell type:code id: tags:
+
+``` python
+def foo(a=[]):
+    a.append(5)
+    return a
+```
+
+%% Cell type:code id: tags:
+
+``` python
+foo()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+foo()
+```
+
+%% Cell type:markdown id: tags:
+
+* What?!
+
+%% Cell type:markdown id: tags:
+
+Why?
+
+%% Cell type:markdown id: tags:
+
+* Functions in Python are first-class objects
+* They are evaluated at definition time
+* Default parameters are kind of *member data* and so their state may change from one call to the other
+
+%% Cell type:markdown id: tags:
+
+What to do instead?
+
+%% Cell type:markdown id: tags:
+
+* Never use mutable default arguments
+
+%% Cell type:code id: tags:
+
+``` python
+def foo(a=None):
+    if a is None:
+        a = []
+    a.append(5)
+    return a
+```
+
+%% Cell type:code id: tags:
+
+``` python
+foo()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+foo()
+```
+
+%% Cell type:markdown id: tags:
+
+* Assignment is by reference, not by copy. Only immutable objects are actually copied.
+
+%% Cell type:code id: tags:
+
+``` python
+a = [1,2,3,4]
+b = a
+b.append(5)
+print(a)
+```
+
+%% Cell type:markdown id: tags:
+
+Proper check for None:
+
+%% Cell type:code id: tags:
+
+``` python
+# correct (compares object's identity)
+if a is not None:
+    pass
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# wrong (same as a == True, compares object's equality)
+if a:
+    pass
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# from https://stackoverflow.com/a/14247383/2315949
+class Negator(object):
+    def __eq__(self,other):
+        return not other
+thing = Negator()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+thing == None
+```
+
+%% Cell type:code id: tags:
+
+``` python
+thing is None
+```
+
+%% Cell type:markdown id: tags:
+
+* Be careful when using "==" to check against True or False
+
+%% Cell type:code id: tags:
+
+``` python
+if (var == True):  # this will execute if var is True or 1, 1.0, 1L
+
+if (var != True):  # this will execute if var is neither True nor 1
+
+if (var == False): # this will execute if var is False or 0 (or 0.0, 0L, 0j)
+
+if (var == None):  # only execute if var is None
+
+if var:            # execute if var is a non-empty string/list/dictionary/tuple, non-0, etc
+
+if not var:        # execute if var is "", {}, [], (), 0, None, etc.
+
+if var is True:    # only execute if var is boolean True, not 1
+
+if var is False:   # only execute if var is boolean False, not 0
+
+if var is None:    # same as var == None
+```
+
+%% Cell type:markdown id: tags:
+
+* Dicts are unordered, even if they often come along in the order they were filled.
+* Use *OrderedDict* from the *collections* module if you actually need an ordered dict
+* Sets are initialised with `set(iterable)`, whereas `{}` initialises a dict.
+
+%% Cell type:code id: tags:
+
+``` python
+set([1,2,3])
+```
+
+%% Cell type:markdown id: tags:
+
+* `list.sort()` sorts inplace, `sorted()` returns a list
+* ++n and --n not work as people with C or Java background would expect
+
+%% Cell type:code id: tags:
+
+``` python
+n = 1
++n # positive of a positive number, which is simply n
+```
+
+%% Cell type:code id: tags:
+
+``` python
+--n # negative of a negative number, which is simply n
+```
+
+%% Cell type:markdown id: tags:
+
+Multiprocessing
+
+%% Cell type:markdown id: tags:
+
+* The GIL (Global Interpreter Lock) is responsible for the fact that only one thread in a Python program can be running at any one time
+* But when running Python code, you don't get parallel execution most of the time. In other words, threads in Python are not like threads in Java or C++.
+* There are many instances in which things do run in parallel, like when using libraries that are essentially C extensions (numpy for example)
+* Use the *multiprocessing* module if you want to parallelize your own code
+
+%% Cell type:markdown id: tags:
+
+### Pythonic code (just a few examples)
+
+%% Cell type:code id: tags:
+
+``` python
+# Don't
+l = ['a','b','c']
+for i in range(len(l)):
+    print(l[i], end=' ')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# Do this instead:
+for elem in l:
+    print(elem, end=' ')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# Or if you do need the counter:
+for i, elem in enumerate(l):
+    print("{}:{} ".format(i, elem), end=' ')
+```
+
+%% Cell type:markdown id: tags:
+
+*It's easier to ask for forgiveness than permission* (EAFP) instead of *Look before you leap* (LBYL)
+
+%% Cell type:code id: tags:
+
+``` python
+# from https://stackoverflow.com/a/11360880/2315949
+
+# Don't (LBYL):
+if 'key' in my_dict:
+    x = my_dict['key']
+else:
+    # handle missing key
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# Do this instead (EAFP):
+try:
+    x = my_dict['key']
+except KeyError:
+    # handle missing key
+```
+
+%% Cell type:markdown id: tags:
+
+Duck typing
+
+%% Cell type:markdown id: tags:
+
+* Don't check if it is a duck, it's enough if it walks like a duck and quacks like a duck
+
+%% Cell type:code id: tags:
+
+``` python
+# Don't
+def foo(name):
+    if isinstance(name, str):
+        print(name.lower())
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# Do this instead. It is enough if the object has a string representation (many objects do)
+def foo(name) :
+    print(str(name).lower())
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# Don't
+def bar(listing):
+    if isinstance(listing, list):
+        listing.extend((1, 2, 3))
+        return ", ".join(listing)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# Do this instead. It is enough if the object implements the sequence protocol (many objects do)
+def bar(listing):
+    l = list(listing)
+    l.extend((1, 2, 3))
+    return ", ".join(l)
+```
+
+%% Cell type:markdown id: tags:
+
+Don't write long and unreadable one-liners just because you can
+
+%% Cell type:code id: tags:
+
+``` python
+l = [m for a, b in zip(['a', 'b', 'c'], [1,2,3]) if b.method(a) != b for m in b if not m.method(a, b) and reduce(lambda x, y: a + y.method(), (m, a, b))]
+```
+
+%% Cell type:markdown id: tags:
+
+Python with its handy list comprehensions is somewhat more prone to this than other languages
+
+%% Cell type:markdown id: tags:
+
+And again
+
+%% Cell type:code id: tags:
+
+``` python
+import this
+```
+
+%% Cell type:markdown id: tags:
+
+And also
+
+%% Cell type:markdown id: tags:
+
+[PEP 8 -- Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/)
+
+(PEP stands for Python Enhancement Proposals)
+
+%% Cell type:markdown id: tags:
+
+### Some things left out
+
+%% Cell type:markdown id: tags:
+
+* [Object-oriented programming](https://www.programiz.com/python-programming/object-oriented-programming)
+
+%% Cell type:markdown id: tags:
+
+* [@Properties](https://www.programiz.com/python-programming/property)
+
+%% Cell type:markdown id: tags:
+
+* [Decorators](https://www.programiz.com/python-programming/decorator)
+
+%% Cell type:markdown id: tags:
+
+* [Closures](https://www.programiz.com/python-programming/closure)
+
+%% Cell type:markdown id: tags:
+
+* [Shallow and deep copying](https://www.programiz.com/python-programming/shallow-deep-copy)
--- a/notebooks/00 Python Tutorial/00b Python Exercises.ipynb
+++ b/notebooks/00 Python Tutorial/00b Python Exercises.ipynb
+%% Cell type:markdown id: tags:
+
+# Python Exercises
+
+%% Cell type:markdown id: tags:
+
+Let's start with the obligatory Hello World. Print the string "Hello World!"
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+print("Hello World!")
+```
+
+%% Cell type:markdown id: tags:
+
+Now make a function called hello that prints the same string and execute it.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+def hello():
+    print("Hello World!")
+hello()
+```
+
+%% Cell type:markdown id: tags:
+
+What does that function return?
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+return_value = hello()
+print(return_value)
+```
+
+%% Cell type:markdown id: tags:
+
+Every function returns a value. If none is given explicitely, then `None` is returned.
+Instead of printing directly, make your function return the string, then print the return value.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+def hello():
+    return "Hello World!"
+print(hello())
+```
+
+%% Cell type:markdown id: tags:
+
+Now make the function take an argument and have it print it.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+def hello(output):
+    print(output)
+hello("Hello World!") # or hello(output="Hello World!")
+```
+
+%% Cell type:markdown id: tags:
+
+Change the funtion to take "Hello World!" as the default for its argument and call it.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+def hello(output="Hello World!"):
+    print(output)
+hello()
+```
+
+%% Cell type:markdown id: tags:
+
+Now print "Hello World!" 5 times with a for loop
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+# the _ variable is used by convention as a placeholder for a throwaway variable
+# we can use it here since we don't need the numbers 0 to 4 that we loop over
+for _ in range(5):
+    print("Hello World!")
+```
+
+%% Cell type:markdown id: tags:
+
+Next, define loop over the following list, and in each iteration, print the number of the iteration starting with 1 and the content of the list.
+
+%% Cell type:code id: tags:
+
+``` python
+data = ['print', 'this', 'element', 'by', 'ellement']
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+for i, elem in enumerate(data):
+    print(i+1, elem)
+    # more advanced: use string formating, like so: print("{} {}".format(i+1, elem))
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Now lets look at lists. Define a list called even_ints that contains all even integers starting at 2 up to and including 100.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+even_ints = list(range(2,102,2)) # note that range returns an iterator, which we turn into a list with list()
+```
+
+%% Cell type:markdown id: tags:
+
+Slice the list to print the first 5 elements
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+even_ints[0:5]
+```
+
+%% Cell type:markdown id: tags:
+
+Slice it to print the elements 5 to 15 (not including 15)
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+even_ints[5:15]
+```
+
+%% Cell type:markdown id: tags:
+
+Slice the list to print the last 5 elements
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+even_ints[-5:]
+```
+
+%% Cell type:markdown id: tags:
+
+Slice the list to print all multiples of 4
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+even_ints[1::2]
+```
+
+%% Cell type:markdown id: tags:
+
+Reverse the list.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+even_ints[::-1]
+```
+
+%% Cell type:markdown id: tags:
+
+Reverse the string "Hello World!"
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+"Hello World!"[::-1] # strings can be treated as lists because they implement the iterator inteface
+```
+
+%% Cell type:markdown id: tags:
+
+Append a new element 5 to the list a
+
+%% Cell type:code id: tags:
+
+``` python
+a = [1,2,3,4]
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+a.append(5)
+a
+```
+
+%% Cell type:markdown id: tags:
+
+Extend the list a with a new list `[6,7,8]`
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+a.extend([6,7,8])
+a
+```
+
+%% Cell type:markdown id: tags:
+
+Add the list `[9,10]` to the end of a using the `+` operator
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+a = a + [9,10]
+a
+```
+
+%% Cell type:markdown id: tags:
+
+Check if the elements 5 and then 11 are in the list (use two separate checks)
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+5 in a
+```
+
+%% Cell type:code id: tags:
+
+``` python
+11 in a
+```
+
+%% Cell type:markdown id: tags:
+
+Use a list comprehension to multiply each element of a by 2
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+[elem*2 for elem in a]
+```
+
+%% Cell type:markdown id: tags:
+
+Use a list comprehension, the `str()` function and the modulo operator `%` to create a list of strings of the odd numbers in a.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+[str(i) for i in a if i%2==0]
+```
+
+%% Cell type:markdown id: tags:
+
+Let's quickly look at each of the other built-in container datatypes. Create a tuple named a with the elements 1,2,3
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+a = (1,2,3)
+a
+```
+
+%% Cell type:markdown id: tags:
+
+Try to change an element of a to show that tuples are immutable and otherwise similar to lists.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+a[0] = 0
+```
+
+%% Cell type:markdown id: tags:
+
+Now create a set by initialising it from the tuple `(1,1,2,2,3,4,5)`.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+a = set((1,1,2,2,3,4,5))
+a
+```
+
+%% Cell type:markdown id: tags:
+
+Note that a set is indicated by curly braces (which is in fact a bit of an inconsistency) and that the duplicate elements were removed.
+
+%% Cell type:markdown id: tags:
+
+Finally, create a dictionary with the keys `'a'`, `2` and `16` and the elements `1`, `'2'` and `[3,4,5]`
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+d = {'a':1, 2:'2', 16:[3,4,5]}
+d
+```
+
+%% Cell type:markdown id: tags:
+
+Get the value belonging to the key `2`
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+d[2]
+```
+
+%% Cell type:markdown id: tags:
+
+Delete the key 16 using the `del` keyword.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+del d[16]
+d
+```
+
+%% Cell type:markdown id: tags:
+
+Add a new element with key `42` and value `42`.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+d[42] = 42
+d
+```
+
+%% Cell type:markdown id: tags:
+
+Use a list comprehension to iterate over the keys of d to create a list of strings of the values of d (using the `str()` function again)
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+[str(d[i]) for i in d.keys()]
+```
+
+%% Cell type:markdown id: tags:
+
+Swap the keys and values in d.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+{v:k for k, v in d.items()}
+
+# This is the same as the following code:
+#d_reversed = {}
+#for k, v in d.items():
+#    d_reversed[v] = k
+#d_reversed
+```
+
+%% Cell type:markdown id: tags:
+
+Import the permutations function from itertools
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+from itertools import permutations
+```
+
+%% Cell type:markdown id: tags:
+
+Now from within Python, access the help of the permutations function
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+help(permutations) # or permutations? (the latter only works when using ipython)
+```
+
+%% Cell type:markdown id: tags:
+
+Where does that help come from? As everything in Python is an object. And most functions and classes have a __doc__ attribute that stores this docstring. Print it.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+print(permutations.__doc__)
+```
+
+%% Cell type:markdown id: tags:
+
+You can see that the help() function prints other imformation about the module itertools and the class permutations, but that the docstring is there.
+
+%% Cell type:markdown id: tags:
+
+Now that you know how poermutations work, list all permutations of the numbers 1,2,3,4.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+list(permutations([1,2,3,4]))
+```
+
+%% Cell type:markdown id: tags:
+
+Use filter() with a a lambda function to only print multiples of 3 (including 0) out of the following list a.
+
+%% Cell type:code id: tags:
+
+``` python
+a = range(20)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+list(filter(lambda x: x%3==0, a))
+```
+
+%% Cell type:markdown id: tags:
+
+Unpacking is most commonly used in for loops: `for a,b,c in iterator:` unpacks the tuples that iterator yields into the three separate variables a,b,c.
+
+Write a function called divide that takes two integer arguments called dividend and divisor, divides them and returns two values: the quotient and the remainder. Omit any checks of values.
+Call the function and assign the return values to a and b. Here you'll make use of unpacking again.
+
+%% Cell type:code id: tags:
+
+``` python
+from math import floor
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+def divide(dividend, divisor):
+    return floor(dividend/divisor), dividend%divisor
+a, b = divide(5,2)
+a,b
+```
+
+%% Cell type:markdown id: tags:
+
+Switch the variables a and b. Try to use unpacking and only use a single statement.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+b,a = a,b
+a,b
+```
+
+%% Cell type:markdown id: tags:
+
+You can also unpack an unknown number of values by assigning some to given variables and the remainder to a separate variable.
+
+%% Cell type:code id: tags:
+
+``` python
+one, two, *rest = [1,2,3,4,5]
+one, two, rest
+```
+
+%% Cell type:code id: tags:
+
+``` python
+*rest, four, five = [1,2,3,4,5]
+rest, four, five
+```
+
+%% Cell type:markdown id: tags:
+
+We have seen that zipped = zip(list1, list2, list3, ...) returns an iterator that yields a tuple with all elements with index 0, then a tuple with all elements with index 1 and so on. This is called zipping.
+
+There is no unzip operator to do the reverse. The reverse operation is achieved with zip(*zipped). The * operator again unpacks. If you can explain why this works, you have come along way in understanding Python. Try it!
+
+%% Cell type:markdown id: tags:
+
+Think of an explanation.
+
+%% Cell type:markdown id: tags:
+
+Let l1= (1,2) and l2 = ('a', 'b'). Then zipped = zip(l1, l2) is an iterator that first yields (1,'a') and then (2,'b').
+
+So l1, l2 = zip(*zipped) works like this: *zipped passes the two arguments (1,'a') and (2,'b') to the zip function. The zip function then does its work and creates the two tuples (1,2) and ('a','b). These, by use of unpacking, get assigned to l1 and l2.
+
+%% Cell type:markdown id: tags:
+
+Back to hello world: Make a generator that yields the characters of "Hello World!" one by one. Do not use the generator yet.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+def gen():
+    for n in "Hello World!":
+        yield n
+g = gen()
+```
+
+%% Cell type:markdown id: tags:
+
+Deplete the generator manually.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:code id: tags:
+
+``` python
+next(g) # call this a couple of times until you get the StopIteration exception.
+```
+
+%% Cell type:markdown id: tags:
+
+Deplete the generator with a for loop.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+for c in g:
+    print(c, end='')
+```
+
+%% Cell type:markdown id: tags:
+
+Write above for loop yourself, by using a while loop, try/except, yield and a StopIteration.
+
+%% Cell type:code id: tags:
+
+``` python
+g = gen()
+while True:
+    ...
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+g = gen()
+while True:
+    try:
+        c = next(g)
+        print(c, end='')
+    except StopIteration:
+        break
+```
+
+%% Cell type:markdown id: tags:
+
+Write the fibonacci function that returns the fibonacci sequence up to its argument n. Reject negative arguments. Upon receiving one, throw a ValueError exception.
+
+%% Cell type:code id: tags:
+
+``` python
+def fib(n):
+    #...
+    l = []
+    #...
+    return l
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+def fib(n):
+    if n < 0:
+        raise ValueError("n must be non-negative")
+    l = []
+    a, b = 0, 1
+    while len(l) < n:
+        a, b = b, a + b
+        l.append(a)
+    return l
+```
+
+%% Cell type:code id: tags:
+
+``` python
+fib(10)
+```
+
+%% Cell type:markdown id: tags:
+
+### Well done!
+Now continue with the exercise **00c Pandas Exercise**.
+%% Cell type:markdown id: tags:
+
+# Python Exercises
+
+%% Cell type:markdown id: tags:
+
+Let's start with the obligatory Hello World. Print the string "Hello World!"
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+print("Hello World!")
+```
+
+%% Cell type:markdown id: tags:
+
+Now make a function called hello that prints the same string and execute it.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+def hello():
+    print("Hello World!")
+hello()
+```
+
+%% Cell type:markdown id: tags:
+
+What does that function return?
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+return_value = hello()
+print(return_value)
+```
+
+%% Cell type:markdown id: tags:
+
+Every function returns a value. If none is given explicitely, then `None` is returned.
+Instead of printing directly, make your function return the string, then print the return value.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+def hello():
+    return "Hello World!"
+print(hello())
+```
+
+%% Cell type:markdown id: tags:
+
+Now make the function take an argument and have it print it.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+def hello(output):
+    print(output)
+hello("Hello World!") # or hello(output="Hello World!")
+```
+
+%% Cell type:markdown id: tags:
+
+Change the funtion to take "Hello World!" as the default for its argument and call it.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+def hello(output="Hello World!"):
+    print(output)
+hello()
+```
+
+%% Cell type:markdown id: tags:
+
+Now print "Hello World!" 5 times with a for loop
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+# the _ variable is used by convention as a placeholder for a throwaway variable
+# we can use it here since we don't need the numbers 0 to 4 that we loop over
+for _ in range(5):
+    print("Hello World!")
+```
+
+%% Cell type:markdown id: tags:
+
+Next, define loop over the following list, and in each iteration, print the number of the iteration starting with 1 and the content of the list.
+
+%% Cell type:code id: tags:
+
+``` python
+data = ['print', 'this', 'element', 'by', 'ellement']
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+for i, elem in enumerate(data):
+    print(i+1, elem)
+    # more advanced: use string formating, like so: print("{} {}".format(i+1, elem))
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Now lets look at lists. Define a list called even_ints that contains all even integers starting at 2 up to and including 100.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+even_ints = list(range(2,102,2)) # note that range returns an iterator, which we turn into a list with list()
+```
+
+%% Cell type:markdown id: tags:
+
+Slice the list to print the first 5 elements
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+even_ints[0:5]
+```
+
+%% Cell type:markdown id: tags:
+
+Slice it to print the elements 5 to 15 (not including 15)
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+even_ints[5:15]
+```
+
+%% Cell type:markdown id: tags:
+
+Slice the list to print the last 5 elements
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+even_ints[-5:]
+```
+
+%% Cell type:markdown id: tags:
+
+Slice the list to print all multiples of 4
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+even_ints[1::2]
+```
+
+%% Cell type:markdown id: tags:
+
+Reverse the list.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+even_ints[::-1]
+```
+
+%% Cell type:markdown id: tags:
+
+Reverse the string "Hello World!"
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+"Hello World!"[::-1] # strings can be treated as lists because they implement the iterator inteface
+```
+
+%% Cell type:markdown id: tags:
+
+Append a new element 5 to the list a
+
+%% Cell type:code id: tags:
+
+``` python
+a = [1,2,3,4]
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+a.append(5)
+a
+```
+
+%% Cell type:markdown id: tags:
+
+Extend the list a with a new list `[6,7,8]`
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+a.extend([6,7,8])
+a
+```
+
+%% Cell type:markdown id: tags:
+
+Add the list `[9,10]` to the end of a using the `+` operator
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+a = a + [9,10]
+a
+```
+
+%% Cell type:markdown id: tags:
+
+Check if the elements 5 and then 11 are in the list (use two separate checks)
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+5 in a
+```
+
+%% Cell type:code id: tags:
+
+``` python
+11 in a
+```
+
+%% Cell type:markdown id: tags:
+
+Use a list comprehension to multiply each element of a by 2
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+[elem*2 for elem in a]
+```
+
+%% Cell type:markdown id: tags:
+
+Use a list comprehension, the `str()` function and the modulo operator `%` to create a list of strings of the odd numbers in a.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+[str(i) for i in a if i%2==0]
+```
+
+%% Cell type:markdown id: tags:
+
+Let's quickly look at each of the other built-in container datatypes. Create a tuple named a with the elements 1,2,3
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+a = (1,2,3)
+a
+```
+
+%% Cell type:markdown id: tags:
+
+Try to change an element of a to show that tuples are immutable and otherwise similar to lists.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+a[0] = 0
+```
+
+%% Cell type:markdown id: tags:
+
+Now create a set by initialising it from the tuple `(1,1,2,2,3,4,5)`.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+a = set((1,1,2,2,3,4,5))
+a
+```
+
+%% Cell type:markdown id: tags:
+
+Note that a set is indicated by curly braces (which is in fact a bit of an inconsistency) and that the duplicate elements were removed.
+
+%% Cell type:markdown id: tags:
+
+Finally, create a dictionary with the keys `'a'`, `2` and `16` and the elements `1`, `'2'` and `[3,4,5]`
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+d = {'a':1, 2:'2', 16:[3,4,5]}
+d
+```
+
+%% Cell type:markdown id: tags:
+
+Get the value belonging to the key `2`
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+d[2]
+```
+
+%% Cell type:markdown id: tags:
+
+Delete the key 16 using the `del` keyword.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+del d[16]
+d
+```
+
+%% Cell type:markdown id: tags:
+
+Add a new element with key `42` and value `42`.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+d[42] = 42
+d
+```
+
+%% Cell type:markdown id: tags:
+
+Use a list comprehension to iterate over the keys of d to create a list of strings of the values of d (using the `str()` function again)
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+[str(d[i]) for i in d.keys()]
+```
+
+%% Cell type:markdown id: tags:
+
+Swap the keys and values in d.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+{v:k for k, v in d.items()}
+
+# This is the same as the following code:
+#d_reversed = {}
+#for k, v in d.items():
+#    d_reversed[v] = k
+#d_reversed
+```
+
+%% Cell type:markdown id: tags:
+
+Import the permutations function from itertools
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+from itertools import permutations
+```
+
+%% Cell type:markdown id: tags:
+
+Now from within Python, access the help of the permutations function
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+help(permutations) # or permutations? (the latter only works when using ipython)
+```
+
+%% Cell type:markdown id: tags:
+
+Where does that help come from? As everything in Python is an object. And most functions and classes have a __doc__ attribute that stores this docstring. Print it.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+print(permutations.__doc__)
+```
+
+%% Cell type:markdown id: tags:
+
+You can see that the help() function prints other imformation about the module itertools and the class permutations, but that the docstring is there.
+
+%% Cell type:markdown id: tags:
+
+Now that you know how poermutations work, list all permutations of the numbers 1,2,3,4.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+list(permutations([1,2,3,4]))
+```
+
+%% Cell type:markdown id: tags:
+
+Use filter() with a a lambda function to only print multiples of 3 (including 0) out of the following list a.
+
+%% Cell type:code id: tags:
+
+``` python
+a = range(20)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+list(filter(lambda x: x%3==0, a))
+```
+
+%% Cell type:markdown id: tags:
+
+Unpacking is most commonly used in for loops: `for a,b,c in iterator:` unpacks the tuples that iterator yields into the three separate variables a,b,c.
+
+Write a function called divide that takes two integer arguments called dividend and divisor, divides them and returns two values: the quotient and the remainder. Omit any checks of values.
+Call the function and assign the return values to a and b. Here you'll make use of unpacking again.
+
+%% Cell type:code id: tags:
+
+``` python
+from math import floor
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+def divide(dividend, divisor):
+    return floor(dividend/divisor), dividend%divisor
+a, b = divide(5,2)
+a,b
+```
+
+%% Cell type:markdown id: tags:
+
+Switch the variables a and b. Try to use unpacking and only use a single statement.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+b,a = a,b
+a,b
+```
+
+%% Cell type:markdown id: tags:
+
+You can also unpack an unknown number of values by assigning some to given variables and the remainder to a separate variable.
+
+%% Cell type:code id: tags:
+
+``` python
+one, two, *rest = [1,2,3,4,5]
+one, two, rest
+```
+
+%% Cell type:code id: tags:
+
+``` python
+*rest, four, five = [1,2,3,4,5]
+rest, four, five
+```
+
+%% Cell type:markdown id: tags:
+
+We have seen that zipped = zip(list1, list2, list3, ...) returns an iterator that yields a tuple with all elements with index 0, then a tuple with all elements with index 1 and so on. This is called zipping.
+
+There is no unzip operator to do the reverse. The reverse operation is achieved with zip(*zipped). The * operator again unpacks. If you can explain why this works, you have come along way in understanding Python. Try it!
+
+%% Cell type:markdown id: tags:
+
+Think of an explanation.
+
+%% Cell type:markdown id: tags:
+
+Let l1= (1,2) and l2 = ('a', 'b'). Then zipped = zip(l1, l2) is an iterator that first yields (1,'a') and then (2,'b').
+
+So l1, l2 = zip(*zipped) works like this: *zipped passes the two arguments (1,'a') and (2,'b') to the zip function. The zip function then does its work and creates the two tuples (1,2) and ('a','b). These, by use of unpacking, get assigned to l1 and l2.
+
+%% Cell type:markdown id: tags:
+
+Back to hello world: Make a generator that yields the characters of "Hello World!" one by one. Do not use the generator yet.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+def gen():
+    for n in "Hello World!":
+        yield n
+g = gen()
+```
+
+%% Cell type:markdown id: tags:
+
+Deplete the generator manually.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:code id: tags:
+
+``` python
+next(g) # call this a couple of times until you get the StopIteration exception.
+```
+
+%% Cell type:markdown id: tags:
+
+Deplete the generator with a for loop.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+for c in g:
+    print(c, end='')
+```
+
+%% Cell type:markdown id: tags:
+
+Write above for loop yourself, by using a while loop, try/except, yield and a StopIteration.
+
+%% Cell type:code id: tags:
+
+``` python
+g = gen()
+while True:
+    ...
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+g = gen()
+while True:
+    try:
+        c = next(g)
+        print(c, end='')
+    except StopIteration:
+        break
+```
+
+%% Cell type:markdown id: tags:
+
+Write the fibonacci function that returns the fibonacci sequence up to its argument n. Reject negative arguments. Upon receiving one, throw a ValueError exception.
+
+%% Cell type:code id: tags:
+
+``` python
+def fib(n):
+    #...
+    l = []
+    #...
+    return l
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+def fib(n):
+    if n < 0:
+        raise ValueError("n must be non-negative")
+    l = []
+    a, b = 0, 1
+    while len(l) < n:
+        a, b = b, a + b
+        l.append(a)
+    return l
+```
+
+%% Cell type:code id: tags:
+
+``` python
+fib(10)
+```
+
+%% Cell type:markdown id: tags:
+
+### Well done!
+Now continue with the exercise **00c Pandas Exercise**.
--- a/notebooks/00 Python Tutorial/00c Pandas Exercises.ipynb
+++ b/notebooks/00 Python Tutorial/00c Pandas Exercises.ipynb
+%% Cell type:markdown id: tags:
+
+# Pandas Exercises
+
+%% Cell type:markdown id: tags:
+
+We will be working with a Dataset from the [Internet Movie Database](https://www.imdb.com). The dataset comes from [Kaggle](https://www.kaggle.com/PromptCloudHQ/imdb-data/data). Since you need an account there to download, the datafile is provided in ILIAS for your convenience. The copied file was slightly altered to make this exercise more interesting.
+
+%% Cell type:markdown id: tags:
+
+Import Numpy and Pandas with the usual convention
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+import numpy as np
+import pandas as pd
+```
+
+%% Cell type:markdown id: tags:
+
+Load the file into a Dataframe named df.
+
+%% Cell type:code id: tags:
+
+``` python
+# df = ...
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df = pd.read_csv('IMDB-Movie-Data.csv')
+```
+
+%% Cell type:markdown id: tags:
+
+Show the dimensions (the shape) of the dataframe
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.shape
+```
+
+%% Cell type:markdown id: tags:
+
+Now display the fist 10 rows of the dataframe
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.head(n=10)
+```
+
+%% Cell type:markdown id: tags:
+
+Also check the last 5 entries.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.tail()
+```
+
+%% Cell type:markdown id: tags:
+
+That last line does not contain any data. Remove it.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+# any of the following works (but execute only one of them)
+df = df.iloc[:-1, :] # select all rows except the last by position
+#df = df.loc[:999, :] # select all rows except the last by index
+#df.drop(1000, axis='index', inplace=True) # drop last row inplace
+#df = df.drop(1000) # drop last row and assign resulting dataframe to df
+```
+
+%% Cell type:markdown id: tags:
+
+List all the columns of the DataFrame
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.columns
+```
+
+%% Cell type:markdown id: tags:
+
+And now list the index
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.index
+```
+
+%% Cell type:markdown id: tags:
+
+List only the Years
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.Year # or df['Year'] or df.loc[:, 'Year']
+```
+
+%% Cell type:markdown id: tags:
+
+What is the time our data spans?
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.Year.min(), df.Year.max()
+```
+
+%% Cell type:markdown id: tags:
+
+Now check if the datatypes make sense (list them)
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.dtypes
+```
+
+%% Cell type:markdown id: tags:
+
+In fact, we can drop the Rank column, as it simply is a sort of Id which we don't need.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.drop('Rank', axis='columns', inplace=True)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+df.head()
+```
+
+%% Cell type:markdown id: tags:
+
+Ok. So the columns Genre and Actors contain multiple values. Just like with databases, we want our dataframe to adhere to the [1st Normal Form](https://en.wikipedia.org/wiki/First_normal_form) and thus need to separate these columns into their own Dataframes or Series.
+
+%% Cell type:markdown id: tags:
+
+First use the .str accessor and its split() method to create a Series where each element is a list of genres.
+
+%% Cell type:code id: tags:
+
+``` python
+# genres = ...
+# genres.head()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+genres = df.Genre.str.split(',')
+genres.head()
+```
+
+%% Cell type:markdown id: tags:
+
+You now have a Series of lists. The following code generates a mapping table that has one entry per movie/genre combination. Try to understand the statement by taking it apart, executing the commands one-by-one and reading their documentation.
+
+%% Cell type:code id: tags:
+
+``` python
+genres = genres.apply(pd.Series).stack().reset_index().iloc[:, [0,2]]
+genres.columns = ['movie_id', 'genre']
+genres.head()
+```
+
+%% Cell type:markdown id: tags:
+
+Now do the same for the actors.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+actors = df.Actors.str.split(',').apply(pd.Series).stack().reset_index().iloc[:, [0,2]]
+actors.columns = ['movie_id', 'actor']
+actors["actor"] = actors.actor.str.strip() #This is neede to remove whitespaces whih duplicate the actors
+actors.head()
+```
+
+%% Cell type:markdown id: tags:
+
+We don't need the original columns Genre and Actors from the df DataFramwe anymore. Drop them.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.drop(['Genre', 'Actors'], axis=1, inplace=True)
+```
+
+%% Cell type:markdown id: tags:
+
+Ok, so now our three DataFrames are ready for some analysis!
+
+%% Cell type:code id: tags:
+
+``` python
+df.head(n=3)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+actors.head(n=3)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+genres.head(n=3)
+```
+
+%% Cell type:markdown id: tags:
+
+Which are the movies with the longest and shortest runtime?
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+# which boolean indexing (can also be written in one line):
+shortest_runtime = df['Runtime (Minutes)'].min()
+df[df['Runtime (Minutes)'] == shortest_runtime]
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# with sorting:
+df.sort_values(by='Runtime (Minutes)', ascending=False).head(n=1)
+```
+
+%% Cell type:markdown id: tags:
+
+Which is the best movie, measured by the metascore?
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.sort_values(by='Metascore', ascending=False).head(n=1)
+```
+
+%% Cell type:markdown id: tags:
+
+Which is the longest description?
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+index_of_longest = df.Description.str.len().sort_values(ascending=False).head(n=1)
+df.loc[index_of_longest.index, 'Description'].values
+
+# Alternative solution
+# df.loc[df.Description.str.len().idxmax].Description
+```
+
+%% Cell type:markdown id: tags:
+
+What is the average metascore?
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.Metascore.mean()
+```
+
+%% Cell type:markdown id: tags:
+
+Sort the genres by popularity.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+genres.genre.value_counts()
+# or manually: genres.groupby('genre').count().sort_values('movie_id', ascending=False)
+```
+
+%% Cell type:markdown id: tags:
+
+Which director is the most productive by number of movies?
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.Director.value_counts().head()
+```
+
+%% Cell type:markdown id: tags:
+
+Which director is the most productive by revenue?
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.groupby('Director')['Revenue (Millions)'].sum().sort_values(ascending=False).head()
+```
+
+%% Cell type:markdown id: tags:
+
+Which actors have acted most often in the year 2012?
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+joined = actors.join(df, on='movie_id')
+joined.loc[joined.Year==2012, 'actor'].value_counts().head()
+```
+
+%% Cell type:markdown id: tags:
+
+Make sure you remember the function value_counts(), you'll use it a lot!
+
+%% Cell type:markdown id: tags:
+
+Now, explore the dataset a bit more and think of some questions you can answer.
+
+%% Cell type:code id: tags:
+
+``` python
+# ...
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+### Well done!
+If you understood most of the concepts, you are ready for the Machine Learning course.
+
+%% Cell type:markdown id: tags:
+
+Below is some additional material, if you want to go on.
+* [Pythonchallenge](http://www.pythonchallenge.com/): Learn Python by solving increasingly hard riddles. Fun!
+* [Project Euler](https://projecteuler.net/): Learn Python by solving increasingly hard programming problems.
+* [More Pandas Exercises](http://pandas.pydata.org/pandas-docs/stable/tutorials.html) (see the Cookbook exercises)
+* [More Python Exercises](https://github.com/jerry-git/learn-python3) (these start out easy but cover a lot of the language)
+%% Cell type:markdown id: tags:
+
+# Pandas Exercises
+
+%% Cell type:markdown id: tags:
+
+We will be working with a Dataset from the [Internet Movie Database](https://www.imdb.com). The dataset comes from [Kaggle](https://www.kaggle.com/PromptCloudHQ/imdb-data/data). Since you need an account there to download, the datafile is provided in ILIAS for your convenience. The copied file was slightly altered to make this exercise more interesting.
+
+%% Cell type:markdown id: tags:
+
+Import Numpy and Pandas with the usual convention
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+import numpy as np
+import pandas as pd
+```
+
+%% Cell type:markdown id: tags:
+
+Load the file into a Dataframe named df.
+
+%% Cell type:code id: tags:
+
+``` python
+# df = ...
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df = pd.read_csv('IMDB-Movie-Data.csv')
+```
+
+%% Cell type:markdown id: tags:
+
+Show the dimensions (the shape) of the dataframe
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.shape
+```
+
+%% Cell type:markdown id: tags:
+
+Now display the fist 10 rows of the dataframe
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.head(n=10)
+```
+
+%% Cell type:markdown id: tags:
+
+Also check the last 5 entries.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.tail()
+```
+
+%% Cell type:markdown id: tags:
+
+That last line does not contain any data. Remove it.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+# any of the following works (but execute only one of them)
+df = df.iloc[:-1, :] # select all rows except the last by position
+#df = df.loc[:999, :] # select all rows except the last by index
+#df.drop(1000, axis='index', inplace=True) # drop last row inplace
+#df = df.drop(1000) # drop last row and assign resulting dataframe to df
+```
+
+%% Cell type:markdown id: tags:
+
+List all the columns of the DataFrame
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.columns
+```
+
+%% Cell type:markdown id: tags:
+
+And now list the index
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.index
+```
+
+%% Cell type:markdown id: tags:
+
+List only the Years
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.Year # or df['Year'] or df.loc[:, 'Year']
+```
+
+%% Cell type:markdown id: tags:
+
+What is the time our data spans?
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.Year.min(), df.Year.max()
+```
+
+%% Cell type:markdown id: tags:
+
+Now check if the datatypes make sense (list them)
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.dtypes
+```
+
+%% Cell type:markdown id: tags:
+
+In fact, we can drop the Rank column, as it simply is a sort of Id which we don't need.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.drop('Rank', axis='columns', inplace=True)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+df.head()
+```
+
+%% Cell type:markdown id: tags:
+
+Ok. So the columns Genre and Actors contain multiple values. Just like with databases, we want our dataframe to adhere to the [1st Normal Form](https://en.wikipedia.org/wiki/First_normal_form) and thus need to separate these columns into their own Dataframes or Series.
+
+%% Cell type:markdown id: tags:
+
+First use the .str accessor and its split() method to create a Series where each element is a list of genres.
+
+%% Cell type:code id: tags:
+
+``` python
+# genres = ...
+# genres.head()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+genres = df.Genre.str.split(',')
+genres.head()
+```
+
+%% Cell type:markdown id: tags:
+
+You now have a Series of lists. The following code generates a mapping table that has one entry per movie/genre combination. Try to understand the statement by taking it apart, executing the commands one-by-one and reading their documentation.
+
+%% Cell type:code id: tags:
+
+``` python
+genres = genres.apply(pd.Series).stack().reset_index().iloc[:, [0,2]]
+genres.columns = ['movie_id', 'genre']
+genres.head()
+```
+
+%% Cell type:markdown id: tags:
+
+Now do the same for the actors.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+actors = df.Actors.str.split(',').apply(pd.Series).stack().reset_index().iloc[:, [0,2]]
+actors.columns = ['movie_id', 'actor']
+actors["actor"] = actors.actor.str.strip() #This is neede to remove whitespaces whih duplicate the actors
+actors.head()
+```
+
+%% Cell type:markdown id: tags:
+
+We don't need the original columns Genre and Actors from the df DataFramwe anymore. Drop them.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.drop(['Genre', 'Actors'], axis=1, inplace=True)
+```
+
+%% Cell type:markdown id: tags:
+
+Ok, so now our three DataFrames are ready for some analysis!
+
+%% Cell type:code id: tags:
+
+``` python
+df.head(n=3)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+actors.head(n=3)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+genres.head(n=3)
+```
+
+%% Cell type:markdown id: tags:
+
+Which are the movies with the longest and shortest runtime?
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+# which boolean indexing (can also be written in one line):
+shortest_runtime = df['Runtime (Minutes)'].min()
+df[df['Runtime (Minutes)'] == shortest_runtime]
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# with sorting:
+df.sort_values(by='Runtime (Minutes)', ascending=False).head(n=1)
+```
+
+%% Cell type:markdown id: tags:
+
+Which is the best movie, measured by the metascore?
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.sort_values(by='Metascore', ascending=False).head(n=1)
+```
+
+%% Cell type:markdown id: tags:
+
+Which is the longest description?
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+index_of_longest = df.Description.str.len().sort_values(ascending=False).head(n=1)
+df.loc[index_of_longest.index, 'Description'].values
+
+# Alternative solution
+# df.loc[df.Description.str.len().idxmax].Description
+```
+
+%% Cell type:markdown id: tags:
+
+What is the average metascore?
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.Metascore.mean()
+```
+
+%% Cell type:markdown id: tags:
+
+Sort the genres by popularity.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+genres.genre.value_counts()
+# or manually: genres.groupby('genre').count().sort_values('movie_id', ascending=False)
+```
+
+%% Cell type:markdown id: tags:
+
+Which director is the most productive by number of movies?
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.Director.value_counts().head()
+```
+
+%% Cell type:markdown id: tags:
+
+Which director is the most productive by revenue?
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+df.groupby('Director')['Revenue (Millions)'].sum().sort_values(ascending=False).head()
+```
+
+%% Cell type:markdown id: tags:
+
+Which actors have acted most often in the year 2012?
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Click on the three dots below to disply the solution
+
+%% Cell type:code id: tags:
+
+``` python
+joined = actors.join(df, on='movie_id')
+joined.loc[joined.Year==2012, 'actor'].value_counts().head()
+```
+
+%% Cell type:markdown id: tags:
+
+Make sure you remember the function value_counts(), you'll use it a lot!
+
+%% Cell type:markdown id: tags:
+
+Now, explore the dataset a bit more and think of some questions you can answer.
+
+%% Cell type:code id: tags:
+
+``` python
+# ...
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+### Well done!
+If you understood most of the concepts, you are ready for the Machine Learning course.
+
+%% Cell type:markdown id: tags:
+
+Below is some additional material, if you want to go on.
+* [Pythonchallenge](http://www.pythonchallenge.com/): Learn Python by solving increasingly hard riddles. Fun!
+* [Project Euler](https://projecteuler.net/): Learn Python by solving increasingly hard programming problems.
+* [More Pandas Exercises](http://pandas.pydata.org/pandas-docs/stable/tutorials.html) (see the Cookbook exercises)
+* [More Python Exercises](https://github.com/jerry-git/learn-python3) (these start out easy but cover a lot of the language)
--- a/notebooks/00 Python Tutorial/IMDB-Movie-Data.csv
+++ b/notebooks/00 Python Tutorial/IMDB-Movie-Data.csv
--- a/requirements.txt
+++ b/requirements.txt
-ipywidgets
-numpy
-pandas 
-pandas-profiling 
-scikit-learn 
-matplotlib 
-seaborn
-mlxtend 
-nltk 
-scikit-image 
-tqdm
-plotly 
-scipy
-tensorflow
-wordcloud
+ipywidgets == 8.0.2
+numpy == 1.23.3
+pandas == 1.4.4 
+pandas-profiling == 3.3.0
+scikit-learn == 1.1.2
+matplotlib == 3.5.3
+seaborn == 0.11.2
+mlxtend == 0.21.0
+nltk == 3.7 
+scikit-image == 0.19.3 
+tqdm == 4.64.0
+plotly == 5.10.0
+scipy == 1.9.1
+tensorflow == 2.10.0
+wordcloud == 1.8.2.2
+tensorflow_hub == 0.12.0
+yfinance == 0.1.74
+fasttext == 0.9.2
+SoMaJo == 2.2.2
No results found