import React from 'react';
import Navigation from './Navigation';
import ae_image from './autoencoder_schema.jpeg';
import img1 from './importmnistdata.png';
import img2 from './Encoder.png';
import img3 from './decoder.png';
import img4 from './Autoencoder.png';
import img5 from './train_ae.png';
import img6 from './latent_space_visualization.png';
import img7 from './latent_space_AE.png';
import img8 from './VAE.png';
import img9 from './vae-fit.png';
import img10 from './vae-visualize-30epochs.png';
import img11 from './vae-visual-210epochs.png';


const Vae = () => {
    return (
        <div class='bg-gray-300 min-h-screen pb-10'>
            <Navigation/>
            <div class='relative top-5 flex justify-center font-serif'>
                <div class='bg-white min-w-screen md:w-2/3 min-h-screen rounded-lg shadow-lg space-y-4 p-5'>
                    <p class='text-left text-2xl'> An Introduction to Variational Autoencoders</p>
                    <p class='text-left text-lg text-gray-700'>
                        Variational Autoencoders are essentially the same as normal Autoencoders with the difference that, instead of mapping input data to points in the model’s latent space,
                        the model maps input data to a distribution over points in the latent space.
                        This quality makes VAEs more suited to generative modelling tasks, which involve sampling from the model’s latent distribution. 
                        In this post I’ll be implementing a VAE from scratch in TensorFlow and training it on the MNIST dataset, to generate images of hand-written digits. 
                    </p>

                    <p class='text-left text-lg text-gray-700'>
                        We'll start by setting up a regular convolutional Autoencoder, and slowly refactor it into a VAE. To jog your memory, the architecture of an Autoencoder 
                        typically resembles the following: 
                    </p>

                    <img class='mx-auto' src={ae_image} alt=''></img>

                    <p class='text-left text-lg text-gray-700'>
                        First let's import the MNIST dataset (as a tf.data.Dataset object) and apply feature normalization.
                    </p>

                    <img src={img1} alt=''></img>

                    <p class='text-left text-lg text-gray-700'>
                        Next, let's define the Encoder and Decoder layers:
                    </p>

                    <img src={img2} alt=''></img>
                    <img src={img3} alt=''></img>
                    
                    <p class='text-left text-lg text-gray-700'>
                        The Encoder layer connects two convolutional layers (with max pooling inbetween), flattens the output then feeds it to a sequence of densely-connected layers,
                        i.e., a multi-layer perceptron (MLP). The Decoder layer takes as input a 'latent' representation of the original data, passes it through a MLP, reshapes the vector
                        , and passes it through a series of upsampling and convolution transpose operations to produce the reconstructed image. Let's put them together into a Keras model:
                    </p>

                    <img class='scale-2' src={img4} alt=''></img>

                    <p class='text-left text-lg text-gray-700'>
                        Et voilà, we have our convolutional autoencoder! Unfortunately however, if our goal is to sample vectors from the model's latent space 
                        (i.e., from the compressed representation space depicted in the first figure) to generate images, then this is not good enough. To demonstrate why, 
                        lets train up the autoencoder using Keras' fit API, and see what kind of images we generate by running inference on samples from the latent space.</p>

                    <img class='object-scale-down h-72 mx-auto' src={img5} alt=''></img>

                    <p class='text-left text-lg text-gray-700'>
                        Here we are using the Keras' fit API to train the autoencoder for 30 iterations over the entire dataset, using a binary crossentropy loss. Below is the helper function 
                        for visualizing the model's latent space which I borrowed from Reza Kalantar's blog post, along with its output for the trained autoencoder:</p>

                    <figure class='mx-auto'>
                        <img src={img6} alt=''></img>
                        <div class='flex justify-center'>
                            <a class='text-blue-500 underline' href='https://medium.com/@rekalantar/variational-auto-encoder-vae-pytorch-tutorial-dce2d2fe0f5f' target='_blank'>source</a>
                        </div>
                    </figure>
                    
                    <img src={img7} alt=''></img>

                    <p class='text-left text-lg text-gray-700'>
                        As you can see, if you were to sample points from here the large majority of results would not come close to resembling hand-written digits. 
                        In fact, every point to the left of x=0 appears pretty much useless, and even on the right side only a handful are truly recognizeable.</p>
                    <p class='text-left text-lg text-gray-700'>
                        To turn our autoencoder into a variational autoencoder we need to add sampling from a distribution whose probability density function we can calculate,
                        and use with the reparameterization trick, and also a KL divergence regularization term:</p>

                    <img src={img8} alt=''></img>
                    <p class='text-left text-lg text-gray-700'>
                        In this new VAE class, the model now maps the input data to the mean and variance parameters of a multi-dimensional normal distribution, which are used to compute a sample "z" from a 
                        distribution over the latent-space, as opposed to a direct mapping to the latent space. This is called the "Reparameterization trick". </p>

                    <p class='text-left text-lg text-gray-700'>
                        The 'KLD' term stand for 'Kullback-Leibler Divergence' and is a regularization term that measures the "distance" between two probability distributions. In our case, 
                        we are minimizing the KL divergence between the latent-distribution specified by our model, and the normal distribution N(0,1). Basically this discourages the model 
                        from mapping distributions centered too far away from the origin, and from setting their variance to 0 (which would be the same as mapping to a single point rather than to a distribution of points); 
                        {/* TODO: insert image of KL formula for integral over distributions P and Q? and maybe do math for substituting PDF of Gaussian? */}
                    </p>

                    <p class='text-left text-lg text-gray-700'>
                        Let's see the difference:</p>
                    <img src={img9} alt=''></img>
                    <img src={img10} alt=''></img>

                    <p class='text-left text-lg text-gray-700'>
                        Not bad, definitely an improvement over the normal autoencoder. Now, a much larger range of values can be sampled and generate reasonable looking images. 
                        Interesting to note as well the transition between some numbers, and the localization of certain numbers to regions of the space
                        (for example, the lower left quadrant seem to completely consist of different looking zeros). Lets train it up some more and see if we can make it any better:</p>

                    <img src={img11} alt=''></img>
                    <p class='text-left text-lg text-gray-700'>
                        Heres how the latent space looks after 210 epochs. Pretty nice! Every digit is visible, and some some variation in writing styles 
                        for digits like the 2's and 7's also are captured. 
                        If you want to try this yourself, you can access my code from colab <a class='underline' href='https://colab.research.google.com/drive/1Ybhj1Xf4gv5T62iUWngwlUETXw418XvV?usp=sharing' target='_blank'>here</a>
                        </p>
                
                </div>
            </div>
        </div>
    );
};

export default Vae;