{"id":329274,"date":"2023-10-02T13:00:00","date_gmt":"2023-10-02T13:00:00","guid":{"rendered":"https:\/\/pyimagesearch.com\/?p=41130"},"modified":"2023-10-02T13:00:00","modified_gmt":"2023-10-02T13:00:00","slug":"a-deep-dive-into-variational-autoencoders-with-pytorch","status":"publish","type":"post","link":"https:\/\/itteacheritfreelance.hk\/wordpress\/index.php\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/","title":{"rendered":"A Deep Dive into Variational Autoencoders with PyTorch"},"content":{"rendered":"<p class=\"syndicated-attribution\"><meta name= \\\"keywords \\\" content= \\\"\u96fb\u5b50\u8a08\u7b97\u6a5f, \u6559\u80b2, IT \u96fb\u8166\u73ed,\u96fb\u8166\u88dc\u7fd2\uff0c \u96fb\u8166\u73ed\uff0c \u5bb6\u6559\uff0c \u79c1\u4eba\u8001\u5e2b\uff0c \u8cc7\u8a0a\u6280\u8853\uff0c \u7a0b\u5e8f\u8a2d\u8a08\uff0c \u96fb\u5b50\u8a08\u7b97\u6a5f\uff0c \u904a\u6232\uff0c \u860b\u679c\uff0c \u96fb\u5f71\uff0c \u8a08\u7b97\u6a5f\uff0c\u7de8\u78bc\uff0c Java\uff0c C\/C++\uff0c JavaScript\uff0c PHP\uff0c HTML\uff0c CSS\uff0c MySQL\uff0c mobile\uff0c Android\uff0c \u52d5\u6f2b\uff0c Python\uff0c teacher\uff0c \u88dc\u7fd2\uff0c \u96fb\u8166\u88dc\u7fd2 \u8cc7\u8a0a, \u7535\u5b50\u8ba1\u7b97\u673a, IT ,Game, apple, movie, Computer,student,Java,\u6559\u80b2, ,\u5b66\u751f, \u5b66\u4e60, learn, \u6559\u5b66,  Android, apple,anime, animation, \u4fe1\u606f\u6280\u672f, \u7a0b\u5e8f\u8bbe\u8ba1, \u79fb\u52a8\u7535\u8bdd, \u8cc7\u8a0a\u79d1\u6280,Game, Jeu, Juego,Call Of Duty ,\u4f7f\u547d\u53ec\u559a , \u6e38\u620f, \u7535\u5b50\u6e38\u620f,, \u591a\u4eba\u7535\u5b50\u6e38\u620f, \u7f51\u7edc\u6e38\u620f\uff0conline\uff0conline game, \u624b\u673a\u6e38\u620f, mobile \\\"><\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"TOC\"\/>\n<div class=\"yoast-breadcrumbs\"><span><span><a href=\"https:\/\/pyimagesearch.com\/\">Home<\/a><\/span><\/div>\n<h2><strong>Table of Contents<\/strong><\/h2>\n<div class=\"toc\">\n<ul>\n<li id=\"TOC-h2BPTitle\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h2BPTitle\">A Deep Dive into Variational Autoencoder with PyTorch<\/a><\/li>\n<ul>\n<li id=\"TOC-h3Introduction\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h3Introduction\">Introduction<\/a><\/li>\n<ul>\n<li id=\"TOC-h4Comparison\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h4Comparison\">Comparison with Convolutional Autoencoder<\/a><\/li>\n<ul>\n<li id=\"TOC-h5Architecture\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h5Architecture\">Architecture<\/a><\/li>\n<li id=\"TOC-h5LatentSpace\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h5LatentSpace\">Latent Space<\/a><\/li>\n<li id=\"TOC-h5LossFunction\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h5LossFunction\">Loss Function<\/a><\/li>\n<\/ul>\n<li id=\"TOC-h4StandOut\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h4StandOut\">Why Does VAE Stand Out?<\/a><\/li>\n<li id=\"TOC-h4Gaussian\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h4Gaussian\">Why Does the Encoder of a VAE Follow a Gaussian Distribution?<\/a><\/li>\n<li id=\"TOC-h4ObjectiveFunctions\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h4ObjectiveFunctions\">Objective Functions of VAE<\/a><\/li>\n<li id=\"TOC-h4Reparameterization\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h4Reparameterization\">Reparameterization Trick<\/a><\/li>\n<\/ul>\n<li id=\"TOC-h3DevelopmentEnvironment\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h3DevelopmentEnvironment\">Configuring Your Development Environment<\/a><\/li>\n<li id=\"TOC-h3NeedHelp\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h3NeedHelp\">Need Help Configuring Your Development Environment?<\/a><\/li>\n<li id=\"TOC-h3ProjectStructure\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h3ProjectStructure\">Project Structure<\/a><\/li>\n<li id=\"TOC-h3AboutDataset\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h3AboutDataset\">About the Dataset<\/a><\/li>\n<ul>\n<li id=\"TOC-h4Overview\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h4Overview\">Overview<\/a><\/li>\n<li id=\"TOC-h4Distribution\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h4Distribution\">Class Distribution<\/a><\/li>\n<li id=\"TOC-h4Preprocessing\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h4Preprocessing\">Data Preprocessing<\/a><\/li>\n<li id=\"TOC-h4DataSplit\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h4DataSplit\">Data Split<\/a><\/li>\n<\/ul>\n<li id=\"TOC-h3Prerequisites\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h3Prerequisites\">Configuring the Prerequisites<\/a>\n            <\/li>\n<li id=\"TOC-h3DefiningUtilities\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h3DefiningUtilities\">Defining the Utilities<\/a><\/li>\n<li id=\"TOC-h3DefiningNetwork\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h3DefiningNetwork\">Defining the Network<\/a><\/li>\n<li id=\"TOC-h3Training\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h3Training\">Training the Variational Autoencoder<\/a><\/li>\n<li id=\"TOC-h3PostTraining\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h3PostTraining\">Post-Training Analysis of Variational Autoencoder<\/a><\/li>\n<ul>\n<li id=\"TOC-h4Reconstruction\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h4Reconstruction\">Reconstruction by Variational Autoencoder After Training<\/a><\/li>\n<li id=\"TOC-h4Visualize\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h4Visualize\">Visualize the Distribution of the Latent Space of Trained Convolutional Autoencoder vs. Variational Autoencoder<\/a><\/li>\n<li id=\"TOC-h4Latent\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h4Latent\">Latent Space Plot of Trained Variational Autoencoder<\/a><\/li>\n<li id=\"TOC-h4Linearly\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h4Linearly\">Linearly Separated Images (Grid) on Embeddings of Trained Variational Autoencoder<\/a><\/li>\n<li id=\"TOC-h4Reconstructions\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h4Reconstructions\">Reconstructions by the Trained Decoder of Variational Autoencoder Using the Points Sampled from Normal Distribution<\/a><\/li>\n<\/ul>\n<\/ul>\n<li id=\"TOC-h2Summary\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h2Summary\">Summary<\/a><\/li>\n<ul>\n<li id=\"TOC-h3Citation\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#h3Citation\">Citation Information<\/a><\/li>\n<\/ul>\n<\/ul>\n<\/div>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h2BPTitle\"\/>\n<h2><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h2BPTitle\"><strong>A Deep Dive into Variational Autoencoder with PyTorch<\/strong><\/a><\/h2>\n<p>In this tutorial, we dive deep into the fascinating world of Variational Autoencoders (VAEs). We&#8217;ll start by unraveling the foundational concepts, exploring the roles of the encoder and decoder, and drawing comparisons between the traditional Convolutional Autoencoder (CAE) and the VAE. A special emphasis will be placed on the Gaussian distribution&#8217;s pivotal role in VAEs and the balance between reconstruction loss and KL divergence. <\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><a href=\"https:\/\/pyimagesearch.com\/wp-content\/uploads\/2023\/10\/vae-3-featured.png\"  rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/10\/vae-3-featured.png?lossy=2&#038;strip=1&#038;webp=1\" alt=\"\" class=\"wp-image-41452\" width=\"603\" height=\"500\" srcset=\"https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/10\/vae-3-featured.png?size=126x104&amp;lossy=2&amp;strip=1&amp;webp=1 126w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/10\/vae-3-featured-300x249.png?lossy=2&amp;strip=1&amp;webp=1 300w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/10\/vae-3-featured.png?size=378x313&amp;lossy=2&amp;strip=1&amp;webp=1 378w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/10\/vae-3-featured.png?size=504x418&amp;lossy=2&amp;strip=1&amp;webp=1 504w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/10\/vae-3-featured-768x637.png?lossy=2&amp;strip=1&amp;webp=1 768w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/10\/vae-3-featured.png?lossy=2&amp;strip=1&amp;webp=1 940w\" sizes=\"(max-width: 603px) 100vw, 603px\" \/><\/a><\/figure>\n<\/div>\n<p>Using the renowned <a href=\"https:\/\/universe.roboflow.com\/popular-benchmarks\/fashion-mnist-ztryt\/?ref=pyimagesearch\"  rel=\"noreferrer noopener\">Fashion-MNIST<\/a> dataset, we&#8217;ll guide you through understanding its nuances. As the tutorial progresses, you&#8217;ll delve into setting up prerequisites, crafting utilities, and designing the VAE network. The highlight will be the VAE training on the <a href=\"https:\/\/universe.roboflow.com\/popular-benchmarks\/fashion-mnist-ztryt\/?ref=pyimagesearch\"  rel=\"noreferrer noopener\">Fashion-MNIST<\/a> data, followed by a detailed post-training analysis. This will encompass experiments ranging from latent space visualization to image generation. By the conclusion, you&#8217;ll have a deep appreciation for VAEs, their capabilities in image generation, and the intricacies of the dataset.<\/p>\n<p>So, are you ready to delve into the captivating realm of VAEs with PyTorch? Let&#8217;s get started!<\/p>\n<p>This lesson is the 3rd in a 5-part series on Autoencoders:<\/p>\n<ol>\n<li><a href=\"https:\/\/pyimg.co\/ehnlf\"  rel=\"noreferrer noopener\"><em>Introduction to Autoencoders<\/em><\/a><\/li>\n<li><a href=\"https:\/\/pyimg.co\/t0noi\"  rel=\"noreferrer noopener\"><em>Implementing a Convolutional Autoencoder with PyTorch<\/em><\/a><\/li>\n<li><a href=\"https:\/\/pyimg.co\/7e4if\"  rel=\"noreferrer noopener\"><strong><em>A Deep Dive into Variational Autoencoders with PyTorch<\/em><\/strong><\/a><strong> (this tutorial)<\/strong><\/li>\n<li><em>Lesson 4<\/em><\/li>\n<li><em>Lesson 5<\/em><\/li>\n<\/ol>\n<p><strong>To learn the theoretical concepts behind Variational Autoencoder and delve into the intricacies of training one using the Fashion-MNIST dataset in PyTorch with numerous exciting experiments, <\/strong><strong><em>just keep reading<\/em><\/strong><strong>.<\/strong><\/p>\n<div id=\"pyi-source-code-block\" class=\"source-code-wrap\">\n<div class=\"gpd-source-code\">\n<div class=\"gpd-source-code-content\">\n        <img decoding=\"async\" src=\"https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2020\/01\/source-code-icon.png?lossy=2&#038;strip=1&#038;webp=1\" alt=\"\"><\/p>\n<h4>Looking for the source code to this post?<\/h4>\n<p>                    <a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#download-the-code\" class=\"pyis-cta-modal-open-modal\">Jump Right To The Downloads Section <svg class=\"svg-icon arrow-right\" width=\"12\" height=\"12\" aria-hidden=\"true\" role=\"img\" focusable=\"false\" viewBox=\"0 0 14 14\" fill=\"none\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"><path d=\"M6.8125 0.1875C6.875 0.125 6.96875 0.09375 7.09375 0.09375C7.1875 0.09375 7.28125 0.125 7.34375 0.1875L13.875 6.75C13.9375 6.8125 14 6.90625 14 7C14 7.125 13.9375 7.1875 13.875 7.25L7.34375 13.8125C7.28125 13.875 7.1875 13.9062 7.09375 13.9062C6.96875 13.9062 6.875 13.875 6.8125 13.8125L6.1875 13.1875C6.125 13.125 6.09375 13.0625 6.09375 12.9375C6.09375 12.8438 6.125 12.75 6.1875 12.6562L11.0312 7.8125H0.375C0.25 7.8125 0.15625 7.78125 0.09375 7.71875C0.03125 7.65625 0 7.5625 0 7.4375V6.5625C0 6.46875 0.03125 6.375 0.09375 6.3125C0.15625 6.25 0.25 6.1875 0.375 6.1875H11.0312L6.1875 1.34375C6.125 1.28125 6.09375 1.1875 6.09375 1.0625C6.09375 0.96875 6.125 0.875 6.1875 0.8125L6.8125 0.1875Z\" fill=\"#169FE6\"><\/path><\/svg><\/a>\n            <\/div>\n<\/div>\n<\/div>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h2BPTitle\"\/>\n<h2><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h2BPTitle\"><strong>A Deep Dive into Variational Autoencoder with PyTorch<\/strong><\/a><\/h2>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h3Introduction\"\/>\n<h3><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h3Introduction\"><strong>Introduction<\/strong><\/a><\/h3>\n<p>Deep learning has achieved remarkable success in supervised tasks, especially in image recognition. However, in the realm of unsupervised learning, generative models like Generative Adversarial Networks (GANs) have gained prominence for their ability to produce synthetic yet realistic images. Before the rise of GANs, there were other foundational neural network architectures for generative modeling. One such model that predates the GAN era is the Variational Autoencoder (VAE).<\/p>\n<p>In our <a href=\"https:\/\/pyimg.co\/t0noi\"  rel=\"noreferrer noopener\">previous tutorial on autoencoders<\/a>, we learned that they are not inherently generative. While they can reconstruct input data effectively, they falter when generating new samples from the latent space unless specific points are manually chosen. This limitation was evident in experiments conducted on datasets like <a href=\"https:\/\/universe.roboflow.com\/popular-benchmarks\/fashion-mnist-ztryt\/?ref=pyimagesearch\"  rel=\"noreferrer noopener\">Fashion-MNIST<\/a>.<\/p>\n<p>VAEs were introduced in 2013 by Diederik et al. in their paper <a href=\"https:\/\/arxiv.org\/abs\/1312.6114\"  rel=\"noreferrer noopener\">Auto-Encoding Variational Bayes<\/a>. They extended the idea of autoencoders to learn useful data distributions. Rooted in Bayesian inference, VAEs aim to model the underlying probability distribution of data, enabling the generation of new samples from that distribution.<\/p>\n<p>The key distinction between VAEs and traditional autoencoders is the design of their latent spaces. VAEs ensure continuous latent spaces, facilitating random sampling and interpolation, making them invaluable for generative modeling.<\/p>\n<p>In a standard autoencoder, every image corresponds to a singular point within the latent space. Conversely, as shown in <strong>Figure 1<\/strong>, in a variational autoencoder, each image is associated with a multivariate normal distribution centered around a specific point in the latent space.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter is-resized\"><a href=\"https:\/\/lh5.googleusercontent.com\/ZrB_cooYvAWJ25e4gprHyODxcJtVLIXQXcMbJQ19obGM2iZ9aEwNQ6Nv-xrGot5ITkTjmUf-GeHN-4AcX-V5c_MuYNAutiB5yW08_wujSH9JdQGvXbHKSPxcP_eD9MmDP7IV151f-abqmtJJTqFeyEs\"  rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/ZrB_cooYvAWJ25e4gprHyODxcJtVLIXQXcMbJQ19obGM2iZ9aEwNQ6Nv-xrGot5ITkTjmUf-GeHN-4AcX-V5c_MuYNAutiB5yW08_wujSH9JdQGvXbHKSPxcP_eD9MmDP7IV151f-abqmtJJTqFeyEs\" alt=\"\" width=\"700\" height=\"354\"\/><\/a><figcaption><strong>Figure 1:<\/strong> Encoding of Autoencoder vs. Variational Autoencoder (source: <a href=\"https:\/\/vitalflux.com\/autoencoder-vs-variational-autoencoder-vae-difference\/\"  rel=\"noreferrer noopener\">https:\/\/vitalflux.com\/autoencoder-vs-variational-autoencoder-vae-difference\/<\/a>).<\/figcaption><\/figure>\n<\/div>\n<p>VAEs are a type of autoencoder, designed to learn efficient input data codings or representations. However, unlike traditional autoencoders that learn deterministic encodings, VAEs introduce a probabilistic twist. The encoder in a VAE doesn&#8217;t produce a fixed point in the latent space. Instead, it outputs parameters (typically mean and variance) of a probability distribution, which we sample to obtain our latent representation.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h4Comparison\"\/>\n<h4><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h4Comparison\"><strong>Comparison with Convolutional Autoencoder<\/strong><\/a><\/h4>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h5Architecture\"\/>\n<h5><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h5Architecture\"><strong>Architecture<\/strong><\/a><\/h5>\n<ul>\n<li><strong>Convolutional Autoencoder (CAE):<\/strong> A CAE typically consists of an encoder and a decoder. The encoder uses convolutional layers to compress the input into a compact latent representation, and the decoder uses transposed convolutional layers to reconstruct the input from this representation.<\/li>\n<li><strong>VAE:<\/strong> Similar to a CAE, a VAE also has an encoder and a decoder. However, the encoder in a VAE produces parameters of a probability distribution (typically Gaussian) in the latent space rather than a deterministic point, as shown in <strong>Figure 2<\/strong>.<\/li>\n<\/ul>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter is-resized\"><a href=\"https:\/\/lh6.googleusercontent.com\/dqHH81HNI-B60vDS3u2M0jsUVo0nsUIlMoRT4GlG4w8fDTfJ5-Li0vZ08XWtuEHLW2jFR4jlwxCz8O2WLTDX5u09uOp6WEE87XmStaspZgcBbHaRB47S3tdXdkf4TzIaZsDFh-YXLl945ebwzlWnJek\"  rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/dqHH81HNI-B60vDS3u2M0jsUVo0nsUIlMoRT4GlG4w8fDTfJ5-Li0vZ08XWtuEHLW2jFR4jlwxCz8O2WLTDX5u09uOp6WEE87XmStaspZgcBbHaRB47S3tdXdkf4TzIaZsDFh-YXLl945ebwzlWnJek\" alt=\"\" width=\"700\" height=\"385\"\/><\/a><figcaption><strong>Figure 2:<\/strong> Variational Autoencoder Architecture Diagram (source: image created by the author for the <a href=\"https:\/\/learnopencv.com\/wp-content\/uploads\/2020\/11\/vae-diagram-1-2048x1126.jpg\"  rel=\"noreferrer noopener\">LearnOpenCV Blog<\/a>).<\/figcaption><\/figure>\n<\/div>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h5LatentSpace\"\/>\n<h5><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h5LatentSpace\"><strong>Latent Space<\/strong><\/a><\/h5>\n<ul>\n<li><strong>CAE:<\/strong> The latent space in a CAE is deterministic. Given the same input, the encoder will always produce the same point in the latent space.<\/li>\n<li><strong>VAE:<\/strong> The latent space in a VAE is probabilistic. The encoder produces a distribution\u2019s parameters (mean and variance), and the actual latent representation is sampled from this distribution.<\/li>\n<\/ul>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h5LossFunction\"\/>\n<h5><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h5LossFunction\"><strong>Loss Function<\/strong><\/a><\/h5>\n<ul>\n<li><strong>CAE:<\/strong> The loss function of a CAE typically focuses on the reconstruction error, which measures the difference between the original input and its reconstruction.<\/li>\n<li><strong>VAE:<\/strong> The VAE loss function has two components:\n<ul>\n<li><strong>Reconstruction Loss:<\/strong> Like the CAE, this measures the fidelity of the reconstructed input.<\/li>\n<li><strong>Kullback-Leibler (KL) Divergence:<\/strong> This term ensures that the learned distribution in the latent space is close to a prior distribution, usually a standard Gaussian. It acts as a regularizer, preventing the model from encoding too much information in the latent space and ensuring smoothness in the latent space.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h4StandOut\"\/>\n<h4><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h4StandOut\"><strong>Why Does VAE Stand Out?<\/strong><\/a><\/h4>\n<p>VAEs have garnered attention due to their ability to learn smooth and continuous latent spaces. This continuity ensures that small changes in the latent space result in coherent changes in the generated data, making VAEs suitable for tasks like interpolation between data points. Additionally, the probabilistic nature of VAEs introduces a level of randomness that can benefit generative tasks, allowing the model to produce diverse outputs.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h4Gaussian\"\/>\n<h4><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h4Gaussian\"><strong>Why Does the Encoder of a VAE Follow a Gaussian Distribution?<\/strong><\/a><\/h4>\n<ul>\n<li><strong>Regularization and Continuity:<\/strong> The Gaussian distribution acts as a regularizer, ensuring the latent space is continuous. This continuity allows for smooth interpolations between data points, making it possible to generate new, similar data points by sampling from regions between known data points.<\/li>\n<li><strong>Simplicity and Universality:<\/strong> The Gaussian distribution is mathematically tractable and is a universal approximator. VAEs can leverage their properties for efficient training and representation by constraining the latent variables to follow this distribution.<\/li>\n<li><strong>Reparameterization Trick:<\/strong> The Gaussian distribution facilitates the reparameterization trick, a crucial component in VAEs. This trick allows for the backpropagation of gradients through the stochastic sampling process, enabling end-to-end training of the model.<\/li>\n<li><strong>Balanced Latent Space:<\/strong> By pushing the encoder&#8217;s outputs to approximate a standard Gaussian distribution, VAEs prevent the model from assigning too much importance to any particular region of the latent space. This ensures a balanced representation where different regions of the space can be effectively utilized for data generation.<\/li>\n<\/ul>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h4ObjectiveFunctions\"\/>\n<h4><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h4ObjectiveFunctions\"><strong>Objective Functions of VAE<\/strong><\/a><\/h4>\n<p>VAEs optimize two primary loss functions:<\/p>\n<p><strong>Reconstruction Loss:<\/strong> This loss ensures that the images generated by the decoder closely resemble the input images. It&#8217;s typically computed using the Mean Squared Error (MSE) between the original and reconstructed images. <\/p>\n<p class=\"has-text-align-center\"><img src='https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/latex\/67d\/67d7e6c31558343cdc040c66ad345fdf-ffffff-000000-0.png?lossy=2&#038;strip=1&#038;webp=1' alt='L_\\text{MSE}(\\theta,\\phi) = \\displaystyle\\frac{1}{N}\\sum_{i=1}^{N}{\\left(x_i -f_\\theta (g_\\phi (x_i))\\right)^2}' title='L_\\text{MSE}(\\theta,\\phi) = \\displaystyle\\frac{1}{N}\\sum_{i=1}^{N}{\\left(x_i -f_\\theta (g_\\phi (x_i))\\right)^2}' class='latex' srcset='https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/latex\/67d\/67d7e6c31558343cdc040c66ad345fdf-ffffff-000000-0.png?lossy=2&#038;strip=1&#038;webp=1 266w,https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/latex\/67d\/67d7e6c31558343cdc040c66ad345fdf-ffffff-000000-0.png?size=126x23&#038;lossy=2&#038;strip=1&#038;webp=1 126w' sizes='(max-width: 266px) 100vw, 266px' \/><\/p>\n<ul>\n<li><img src='https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/latex\/818\/818f98a590dbf69596d717dbbc5149b8-ffffff-000000-0.png?lossy=2&#038;strip=1&#038;webp=1' alt='L_\\text{MSE}(\\theta ,\\phi)' title='L_\\text{MSE}(\\theta ,\\phi)' class='latex' \/> represents the reconstruction loss.<\/li>\n<li><img src='https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/latex\/255\/2554a2bb846cffd697389e5dc8912759-ffffff-000000-0.png?lossy=2&#038;strip=1&#038;webp=1' alt='\\theta' title='\\theta' class='latex' \/> and <img src='https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/latex\/1ed\/1ed346930917426bc46d41e22cc525ec-ffffff-000000-0.png?lossy=2&#038;strip=1&#038;webp=1' alt='\\phi' title='\\phi' class='latex' \/> are the parameters of the decoder and encoder, respectively.<\/li>\n<li><img src='https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/latex\/8d9\/8d9c307cb7f3c4a32822a51922d1ceaa-ffffff-000000-0.png?lossy=2&#038;strip=1&#038;webp=1' alt='N' title='N' class='latex' \/> is the number of samples.<\/li>\n<li><img src='https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/latex\/1ba\/1ba8aaab47179b3d3e24b0ccea9f4e30-ffffff-000000-0.png?lossy=2&#038;strip=1&#038;webp=1' alt='x_i' title='x_i' class='latex' \/> is the original input image.<\/li>\n<li> <img src='https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/latex\/f95\/f95dbc5e6ca7517c76671b7b2f72474f-ffffff-000000-0.png?lossy=2&#038;strip=1&#038;webp=1' alt='f_\\theta' title='f_\\theta' class='latex' \/> is the decoder function, and <img src='https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/latex\/1a6\/1a6317a7865929248bc6b1e3e0469bde-ffffff-000000-0.png?lossy=2&#038;strip=1&#038;webp=1' alt='g_\\phi' title='g_\\phi' class='latex' \/> is the encoder function.<\/li>\n<li>The formula calculates the squared difference between each original image <img src='https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/latex\/1ba\/1ba8aaab47179b3d3e24b0ccea9f4e30-ffffff-000000-0.png?lossy=2&#038;strip=1&#038;webp=1' alt='x_i' title='x_i' class='latex' \/> and its corresponding reconstructed image <img src='https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/latex\/7cf\/7cf5d85e6a6f949b63da203c387a4fd9-ffffff-000000-0.png?lossy=2&#038;strip=1&#038;webp=1' alt='f_\\theta (g_\\phi(x_i))' title='f_\\theta (g_\\phi(x_i))' class='latex' \/>, then averages these squared differences over all samples.<\/li>\n<\/ul>\n<p><strong>KL Divergence:<\/strong> This measures the difference between the encoder&#8217;s distribution and a standard normal distribution. It is a regularizer, ensuring the latent variables are close to a standard normal distribution. It encourages the model to maintain a structured and continuous latent space, which is particularly beneficial for generative tasks.<\/p>\n<p class=\"has-text-align-center has-white-background-color has-background\"><img src='https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/latex\/4e0\/4e060015dd5a90a210fac12cd6c9157b-ffffff-000000-0.png?lossy=2&#038;strip=1&#038;webp=1' alt='L_\\text{KL}[G(Z_{\\mu}, Z_{\\sigma})  |  \\mathcal{N}(0, 1)] = -0.5 * \\sum_{i=1}^{N}{1 + \\log(Z_{\\sigma_{i}^2}) - Z_{\\mu_{i}}^2 -  Z_{\\sigma_{i}}^2}' title='L_\\text{KL}[G(Z_{\\mu}, Z_{\\sigma})  |  \\mathcal{N}(0, 1)] = -0.5 * \\sum_{i=1}^{N}{1 + \\log(Z_{\\sigma_{i}^2}) - Z_{\\mu_{i}}^2 -  Z_{\\sigma_{i}}^2}' class='latex' srcset='https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/latex\/4e0\/4e060015dd5a90a210fac12cd6c9157b-ffffff-000000-0.png?lossy=2&#038;strip=1&#038;webp=1 448w,https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/latex\/4e0\/4e060015dd5a90a210fac12cd6c9157b-ffffff-000000-0.png?size=126x7&#038;lossy=2&#038;strip=1&#038;webp=1 126w,https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/latex\/4e0\/4e060015dd5a90a210fac12cd6c9157b-ffffff-000000-0.png?size=252x14&#038;lossy=2&#038;strip=1&#038;webp=1 252w,https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/latex\/4e0\/4e060015dd5a90a210fac12cd6c9157b-ffffff-000000-0.png?size=378x20&#038;lossy=2&#038;strip=1&#038;webp=1 378w' sizes='(max-width: 448px) 100vw, 448px' \/><\/p>\n<ul>\n<li> <img src='https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/latex\/1f1\/1f150fbeb1728057a59d67d1aa0f9176-ffffff-000000-0.png?lossy=2&#038;strip=1&#038;webp=1' alt='L_\\text{KL}' title='L_\\text{KL}' class='latex' \/> represents the KL divergence loss.<\/li>\n<li><img src='https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/latex\/f18\/f189de5c0abefcdb20d60add2b74420e-ffffff-000000-0.png?lossy=2&#038;strip=1&#038;webp=1' alt='G(Z_{\\mu}, Z_{\\sigma})' title='G(Z_{\\mu}, Z_{\\sigma})' class='latex' \/> is the Gaussian distribution defined by the encoder&#8217;s outputs <img src='https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/latex\/3c3\/3c3992a836dc10f16083f526c13b7f79-ffffff-000000-0.png?lossy=2&#038;strip=1&#038;webp=1' alt='Z_{\\mu}' title='Z_{\\mu}' class='latex' \/> (mean) and <img src='https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/latex\/2e1\/2e100224cae81152659641f4b9082080-ffffff-000000-0.png?lossy=2&#038;strip=1&#038;webp=1' alt='Z_{\\sigma}' title='Z_{\\sigma}' class='latex' \/> (standard deviation).<\/li>\n<li> <img src='https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/latex\/3b3\/3b3f99a5588e1cd0c235b04edc563f25-ffffff-000000-0.png?lossy=2&#038;strip=1&#038;webp=1' alt='\\mathcal{N}(0, 1)' title='\\mathcal{N}(0, 1)' class='latex' \/> is the standard normal distribution.<\/li>\n<li>The formula calculates the difference between the encoder&#8217;s distribution and the standard normal distribution for each sample and sums these differences.<\/li>\n<\/ul>\n<p>The combined VAE loss is a weighted sum of the reconstruction and KL divergence losses:<\/p>\n<p class=\"has-text-align-center\"><img src='https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/latex\/331\/3318c9bd2bc57d6b6777d1343bfaafb2-ffffff-000000-0.png?lossy=2&#038;strip=1&#038;webp=1' alt='\\mathcal{L}_\\text{VAE} = \\mathcal{L}_\\text{recon} + \\mathcal{L}_\\text{KL}' title='\\mathcal{L}_\\text{VAE} = \\mathcal{L}_\\text{recon} + \\mathcal{L}_\\text{KL}' class='latex' \/><\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h4Reparameterization\"\/>\n<h4><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h4Reparameterization\"><strong>Reparameterization Trick<\/strong><\/a><\/h4>\n<p>In the realm of Variational Autoencoder, one of the pivotal challenges is the integration of randomness in the latent space. This stochasticity, while essential for the VAE&#8217;s generative capabilities, poses a significant hurdle during training. Specifically, the sampling operation&#8217;s inherent randomness obstructs the smooth flow of gradients, making backpropagation infeasible.<\/p>\n<p>This is where the reparameterization trick comes into play. It helps avoid the problem by transforming the random node in the latent space into a deterministic counterpart. Doing so ensures that gradients can propagate seamlessly through the network (<strong>Figure 3<\/strong>), facilitating effective training. The essence of the trick lies in introducing an auxiliary random variable, typically drawn from a standard normal distribution.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter is-resized\"><a href=\"https:\/\/lh6.googleusercontent.com\/JPzde4_np6R-lSQaZdHztnHdu0KRa9n3VELwWPeNqSvU8idzpDZ7xufgNZEPMbvEpLczUA70LNsRlKMyPlQqoX3Xml3uGrammE93gLfQ6Tx5raQin6WArH3XgbaQvTpkyOmeCtj5bJZJTRdq9Mb7S5Y\"  rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/JPzde4_np6R-lSQaZdHztnHdu0KRa9n3VELwWPeNqSvU8idzpDZ7xufgNZEPMbvEpLczUA70LNsRlKMyPlQqoX3Xml3uGrammE93gLfQ6Tx5raQin6WArH3XgbaQvTpkyOmeCtj5bJZJTRdq9Mb7S5Y\" alt=\"\" width=\"700\" height=\"328\"\/><\/a><figcaption><strong>Figure 3:<\/strong> The reparameterization trick transforms a stochastic node into a deterministic one, facilitating gradient flow (source: image designed by the author for the <a href=\"https:\/\/learnopencv.com\/wp-content\/uploads\/2020\/11\/reparam-vae-2048x959.jpg\"  rel=\"noreferrer noopener\">LearnOpenCV Blog<\/a>).<\/figcaption><\/figure>\n<\/div>\n<p>Mathematically, this can be represented as:<\/p>\n<p class=\"has-text-align-center\"><img src='https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/latex\/52d\/52d342b2d22e73d7f2204133a35cc40b-ffffff-000000-0.png?lossy=2&#038;strip=1&#038;webp=1' alt='Z = Z_\\mu + Z_\\sigma^2 \\odot \\varepsilon' title='Z = Z_\\mu + Z_\\sigma^2 \\odot \\varepsilon' class='latex' \/><\/p>\n<p>Here, <img src='https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/latex\/f8b\/f8b1c5a729a09649c275fca88976d8dd-ffffff-000000-0.png?lossy=2&#038;strip=1&#038;webp=1' alt='\\varepsilon' title='\\varepsilon' class='latex' \/> is sampled from a standard normal distribution, that is, <img src='https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/latex\/be3\/be3f6fb4d31886a209ebafdc65802c1c-ffffff-000000-0.png?lossy=2&#038;strip=1&#038;webp=1' alt='\\varepsilon \\sim \\mathcal{N}(0, 1)' title='\\varepsilon \\sim \\mathcal{N}(0, 1)' class='latex' \/>. The symbol <img src='https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/latex\/319\/319d584a4a5166ee6c51f4b8348856ea-ffffff-000000-0.png?lossy=2&#038;strip=1&#038;webp=1' alt='\\odot' title='\\odot' class='latex' \/> stands for element-wise multiplication.<\/p>\n<p>By adopting this approach, the VAE can harness the benefits of randomness in the latent space while still maintaining the tractability of training. This balance is crucial for the VAE&#8217;s dual objectives of accurate reconstructions and effective generation.<\/p>\n<p>Having delved deep into the theoretical underpinnings of VAE, it&#8217;s time to bring that knowledge to life. We&#8217;ll embark on a comprehensive code walkthrough in the next segment, demystifying each component step-by-step. Following that, we&#8217;ll dive into some exciting experiments, showcasing the prowess of our trained VAE in action. Let&#8217;s transition from theory to hands-on practice!<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h3DevelopmentEnvironment\"\/>\n<h3><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h3DevelopmentEnvironment\"><strong>Configuring Your Development Environment<\/strong><\/a><\/h3>\n<p>To follow this guide, you need to have the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">torch<\/code>, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">torchvision<\/code>, and <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">matplotlib<\/code> libraries installed on your system.<\/p>\n<p>Luckily, all these libraries are pip-installable:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"shell\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"true\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"A Deep Dive into Variational Autoencoders with PyTorch\" data-enlighter-group=\"1\">$ pip install torch>=2.0.1\n$ pip install torchvision>=0.15.2\n$ pip install matplotlib==3.7.2<\/pre>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h3NeedHelp\"\/>\n<h3><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h3NeedHelp\"><strong>Need Help Configuring Your Development Environment?<\/strong><\/a><\/h3>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/pyimagesearch.com\/pyimagesearch-university\/\"  rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"500\" height=\"334\" src=\"https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2021\/05\/pyimagesearch_plus_jupyter.png?lossy=2&#038;strip=1&#038;webp=1\" alt=\"Figure 4: Need help configuring your dev environment? Want access to pre-configured Jupyter Notebooks running on Google Colab? Be sure to join PyImageSearch University \u2014 you\u2019ll be up and running with this tutorial in minutes.\" class=\"wp-image-19836\" srcset=\"https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2021\/05\/pyimagesearch_plus_jupyter.png?size=126x84&amp;lossy=2&amp;strip=1&amp;webp=1 126w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2021\/05\/pyimagesearch_plus_jupyter-300x200.png?lossy=2&amp;strip=1&amp;webp=1 300w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2021\/05\/pyimagesearch_plus_jupyter.png?size=378x253&amp;lossy=2&amp;strip=1&amp;webp=1 378w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2021\/05\/pyimagesearch_plus_jupyter.png?lossy=2&amp;strip=1&amp;webp=1 500w\" sizes=\"(max-width: 500px) 100vw, 500px\" \/><\/a><figcaption><strong>Figure 4: <\/strong>Need help configuring your dev environment? Want access to pre-configured Jupyter Notebooks running on Google Colab? Be sure to join <a href=\"https:\/\/pyimagesearch.com\/pyimagesearch-university\/\"  rel=\"noreferrer noopener\">PyImageSearch University<\/a> \u2014 you\u2019ll be up and running with this tutorial in minutes.<\/figcaption><\/figure>\n<\/div>\n<p>All that said, are you:<\/p>\n<ul>\n<li>Short on time?<\/li>\n<li>Learning on your employer\u2019s administratively locked system?<\/li>\n<li>Wanting to skip the hassle of fighting with the command line, package managers, and virtual environments?<\/li>\n<li><strong>Ready to run the code <\/strong><strong><em>immediately<\/em><\/strong><strong> on your Windows, macOS, or Linux system?<\/strong><\/li>\n<\/ul>\n<p>Then join <a href=\"https:\/\/pyimagesearch.com\/pyimagesearch-university\/\"  rel=\"noreferrer noopener\">PyImageSearch University<\/a> today!<\/p>\n<p><strong>Gain access to Jupyter Notebooks for this tutorial and other PyImageSearch guides <\/strong><strong><em>pre-configured<\/em><\/strong><strong> to run on Google Colab\u2019s ecosystem right in your web browser!<\/strong> No installation required.<\/p>\n<p>And best of all, these Jupyter Notebooks will run on Windows, macOS, and Linux!<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h3ProjectStructure\"\/>\n<h3><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h3ProjectStructure\"><strong>Project Structure<\/strong><\/a><\/h3>\n<p>We first need to review our project directory structure.<\/p>\n<p>Start by accessing this tutorial\u2019s <strong><em>\u201cDownloads\u201d<\/em><\/strong> section to retrieve the source code and example images.<\/p>\n<p>From there, take a look at the directory structure:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"shell\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"true\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"A Deep Dive into Variational Autoencoders with PyTorch\" data-enlighter-group=\"2\">$ tree -L 2\n.\n\u251c\u2500\u2500 output\n\u2502   \u251c\u2500\u2500 embedding_visualize.png\n\u2502   \u251c\u2500\u2500 image_grid_on_embeddings.png\n\u2502   \u251c\u2500\u2500 latent_distribution.png\n\u2502   \u251c\u2500\u2500 linearly_sampled_reconstructions.png\n\u2502   \u251c\u2500\u2500 model_weights\n\u2502   \u251c\u2500\u2500 normally_sampled_reconstructions.png\n\u2502   \u251c\u2500\u2500 real_test_images_after_train.png\n\u2502   \u251c\u2500\u2500 real_test_images_before_train.png\n\u2502   \u251c\u2500\u2500 reconstruct_after_train.png\n\u2502   \u251c\u2500\u2500 reconstruct_before_train.png\n\u2502   \u2514\u2500\u2500 training_progress\n\u251c\u2500\u2500 pyimagesearch\n\u2502   \u251c\u2500\u2500 __init__.py\n\u2502   \u251c\u2500\u2500 config.py\n\u2502   \u251c\u2500\u2500 network.py\n\u2502   \u2514\u2500\u2500 utils.py\n\u251c\u2500\u2500 test.py\n\u2514\u2500\u2500 train.py\n\n5 directories, 15 files<\/pre>\n<p>In the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">pyimagesearch<\/code> directory, we have the following files:<\/p>\n<ul>\n<li><code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">config.py<\/code>: This configuration file is for training the variational autoencoder.<\/li>\n<li><code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">utils.py<\/code>: This file contains utilities like the loss function of VAE, post-training analysis, and a validation method for evaluating the VAE during training.<\/li>\n<li><code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">network.py<\/code>: Contains the VAE architecture implementation in PyTorch.<\/li>\n<\/ul>\n<p>In the core directory, we have the following:<\/p>\n<ul>\n<li><code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">train.py<\/code>: The script for training the VAE on the <a href=\"https:\/\/universe.roboflow.com\/popular-benchmarks\/fashion-mnist-ztryt\/?ref=pyimagesearch\"  rel=\"noreferrer noopener\">Fashion-MNIST<\/a> dataset.<\/li>\n<li><code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">test.py<\/code>: The script for evaluating the trained VAE on the test dataset and conducting post-training analysis.<\/li>\n<li><code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">output<\/code>: This folder hosts the model weights, training reconstruction progress over each epoch, evaluation of the test set, and post-training analysis of the VAE.<\/li>\n<\/ul>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h3AboutDataset\"\/>\n<h3><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h3AboutDataset\"><strong>About the Dataset<\/strong><\/a><\/h3>\n<p>In this tutorial, we employ the <a href=\"https:\/\/universe.roboflow.com\/popular-benchmarks\/fashion-mnist-ztryt\/?ref=pyimagesearch\"  rel=\"noreferrer noopener\">Fashion-MNIST<\/a> dataset for training our variational autoencoder model. <\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h4Overview\"\/>\n<h4><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h4Overview\"><strong>Overview<\/strong><\/a><\/h4>\n<p><a href=\"https:\/\/universe.roboflow.com\/popular-benchmarks\/fashion-mnist-ztryt\/?ref=pyimagesearch\"  rel=\"noreferrer noopener\">Fashion-MNIST<\/a> is a dataset of Zalando&#8217;s article images consisting of the following: <\/p>\n<ul>\n<li>training set of 60,000 examples<\/li>\n<li>test set of 10,000 examples<\/li>\n<\/ul>\n<p>Each sample is a <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">28x28<\/code> grayscale image associated with a label from 10 classes (<strong>Figure 5<\/strong>). It serves as a direct drop-in replacement for the original <a href=\"https:\/\/universe.roboflow.com\/popular-benchmarks\/fashion-mnist-ztryt\/?ref=pyimagesearch\"  rel=\"noreferrer noopener\">Fashion-MNIST<\/a> dataset for benchmarking machine learning algorithms, with the benefit of being more representative of the actual data tasks and challenges.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter is-resized\"><a href=\"https:\/\/lh5.googleusercontent.com\/BEJZinMLehq-vV_4pkSYA9xzxpf77VktG38X8Zq9TwavWp19XGQeJpvfv3sJ4amI3qhaVCQvv6Ob9kwvQQ4u8dbAkfM1P_uzlDnq32Y_e8vbKv3VUlLTajyX-6-0_hUdWzo0yf4ptG8XyOkSwTHLiBI\"  rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/BEJZinMLehq-vV_4pkSYA9xzxpf77VktG38X8Zq9TwavWp19XGQeJpvfv3sJ4amI3qhaVCQvv6Ob9kwvQQ4u8dbAkfM1P_uzlDnq32Y_e8vbKv3VUlLTajyX-6-0_hUdWzo0yf4ptG8XyOkSwTHLiBI\" alt=\"\" width=\"700\" height=\"422\"\/><\/a><figcaption><strong>Figure 5: <\/strong>Sample images from the <a href=\"https:\/\/universe.roboflow.com\/popular-benchmarks\/fashion-mnist-ztryt\/?ref=pyimagesearch\"  rel=\"noreferrer noopener\">Fashion-MNIST<\/a> dataset (source: image by the author).<\/figcaption><\/figure>\n<\/div>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h4Distribution\"\/>\n<h4><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h4Distribution\"><strong>Class Distribution<\/strong><\/a><\/h4>\n<p>The <a href=\"https:\/\/universe.roboflow.com\/popular-benchmarks\/fashion-mnist-ztryt\/?ref=pyimagesearch\"  rel=\"noreferrer noopener\">Fashion-MNIST<\/a> dataset is balanced, which means it has an equal number of samples from each class. The 10 classes are <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">T-shirt\/top<\/code>, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">Trouser<\/code>, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">Pullover<\/code>, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">Dress<\/code>, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">Coat<\/code>, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">Sandal<\/code>, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">Shirt<\/code>, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">Sneaker<\/code>, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">Bag<\/code>, and <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">Ankle boot<\/code>. Each class has 6,000 images in the training set and 1,000 in the test set.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h4Preprocessing\"\/>\n<h4><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h4Preprocessing\"><strong>Data Preprocessing<\/strong><\/a><\/h4>\n<p>Before training the autoencoder, the images from the dataset are preprocessed. Each image in the dataset is a <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">28x28<\/code> grayscale image. The pixel values fall from <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">0<\/code> to <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">255<\/code>. As a preprocessing step, these pixel values are normalized to fall from <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">0<\/code> to <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">1<\/code>. This is achieved by dividing each pixel value by <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">255<\/code>. This normalization helps in faster and more stable convergence during training.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h4DataSplit\"\/>\n<h4><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h4DataSplit\"><strong>Data Split<\/strong><\/a><\/h4>\n<p>The dataset is split into two parts: a training set and a test set. The training set, which contains 60,000 images, is used to train the autoencoder, and the test set, which includes 10,000 images, is used to evaluate the model&#8217;s performance. It is essential to separate the data used for training from the data used for testing to get an unbiased measure of the model&#8217;s performance.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h3Prerequisites\"\/>\n<h3><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h3Prerequisites\"><strong>Configuring the Prerequisites<\/strong><\/a><\/h3>\n<p>Before we start our implementation, let\u2019s review our project\u2019s configuration. For that, we will move on to the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">config.py<\/code> script located in the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">pyimagesearch<\/code> directory.<\/p>\n<p>The <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">config.py<\/code> script sets up the autoencoder model hyperparameters and creates an output directory for storing training progress metadata, model weights, and post-training analysis plots. It also defines the class labels dictionary mapping from integer to human-readable format.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"true\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"A Deep Dive into Variational Autoencoders with PyTorch\" data-enlighter-group=\"3\"># import the necessary packages\nimport os\n\nimport torch\n\n# set device to 'cuda' if CUDA is available, 'mps' if MPS is available,\n# or 'cpu' otherwise for model training and testing\nDEVICE = (\n    \"cuda\"\n    if torch.cuda.is_available()\n    else \"mps\"\n    if torch.backends.mps.is_available()\n    else \"cpu\"\n)\n\n# define model hyperparameters\nLR = 0.001\nPATIENCE = 2\nIMAGE_SIZE = 32\nCHANNELS = 1\nBATCH_SIZE = 64\nEMBEDDING_DIM = 2\nEPOCHS = 100\nSHAPE_BEFORE_FLATTENING = (128, IMAGE_SIZE \/\/ 8, IMAGE_SIZE \/\/ 8)\n\n# create output directory\noutput_dir = \"output\"\nos.makedirs(\"output\", exist_ok=True)<\/pre>\n<p><strong>Lines 2-4<\/strong> import the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">os<\/code> module, which provides functionality for operating system-dependent operations, and the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">torch<\/code> module to determine the available computational device (<code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">CUDA GPU<\/code>, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">MPS<\/code>, or <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">CPU<\/code>) for model training and inference.<\/p>\n<p>From <strong>Lines 8-14<\/strong>, we set the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">DEVICE<\/code> variable based on the available hardware. If <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">CUDA<\/code> (used for NVIDIA GPUs) is available, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">DEVICE<\/code> is set to <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">cuda<\/code>. If <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">CUDA<\/code> isn&#8217;t available but <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">MPS<\/code> (Metal Performance Shaders, used for Apple devices) is available, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">DEVICE<\/code> is set to &#8220;mps&#8221;. If neither <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">CUDA<\/code> nor <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">MPS<\/code> is available, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">DEVICE<\/code> defaults to <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">cpu<\/code>.<\/p>\n<p>Then, from <strong>Lines 17-24<\/strong>, we define various hyperparameters and settings for the model:<\/p>\n<ul>\n<li><code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">LR<\/code>: Learning rate for the optimizer.<\/li>\n<li><code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">PATIENCE<\/code>: Used for reducing the learning rate, indicating how many epochs to wait for before reducing the learning rate.<\/li>\n<li><code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">IMAGE_SIZE<\/code>: The size (height and width) of the input images.<\/li>\n<li><code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">CHANNELS<\/code>: Number of channels in the input image (<code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">1<\/code> for grayscale, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">3<\/code> for RGB).<\/li>\n<li><code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">BATCH_SIZE<\/code>: Number of samples processed before the model is updated.<\/li>\n<li><code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">EMBEDDING_DIM<\/code>: Dimensionality of the embedding space (for a latent space in a VAE model).<\/li>\n<li><code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">EPOCHS<\/code>: Total number of training epochs.<\/li>\n<li><code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">SHAPE_BEFORE_FLATTENING<\/code>: The shape of the tensor before it&#8217;s flattened, used in the decoder of VAE for reshaping the latent space from a vector to a tensor.<\/li>\n<\/ul>\n<p>On<strong> Lines 27 and 28<\/strong>, an output directory is created where the results from the model (e.g., saved model weights or performance plots) are stored.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"true\" data-enlighter-lineoffset=\"30\" data-enlighter-title=\"A Deep Dive into Variational Autoencoders with PyTorch\" data-enlighter-group=\"4\"># create the training_progress directory inside the output directory\ntraining_progress_dir = os.path.join(output_dir, \"training_progress\")\nos.makedirs(training_progress_dir, exist_ok=True)\n\n# create the model_weights directory inside the output directory\n# for storing variational autoencoder weights\nmodel_weights_dir = os.path.join(output_dir, \"model_weights\")\nos.makedirs(model_weights_dir, exist_ok=True)\n\n# define model_weights, reconstruction &amp; real before training images paths\nMODEL_WEIGHTS_PATH = os.path.join(model_weights_dir, \"best_vae.pt\")<\/pre>\n<p>On <strong>Lines 31 and 32<\/strong>, we create <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">training_progress_dir<\/code>, which would store the reconstruction output of a variational autoencoder during training for each epoch.<\/p>\n<p>Next, we create a <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">model_weights_dir<\/code>, which hosts the best variational autoencoder weights (<strong>Lines 36-40<\/strong>).<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"true\" data-enlighter-lineoffset=\"41\" data-enlighter-title=\"A Deep Dive into Variational Autoencoders with PyTorch\" data-enlighter-group=\"5\">FILE_RECON_BEFORE_TRAINING = os.path.join(\n    output_dir, \"reconstruct_before_train.png\"\n)\nFILE_REAL_BEFORE_TRAINING = os.path.join(\n    output_dir, \"real_test_images_before_train.png\"\n)\n\n# define reconstruction &amp; real after training images paths\nFILE_RECON_AFTER_TRAINING = os.path.join(\n    output_dir, \"reconstruct_after_train.png\"\n)\nFILE_REAL_AFTER_TRAINING = os.path.join(\n    output_dir, \"real_test_images_after_train.png\"\n)\n\n# define latent space and image grid embeddings plot paths\nLATENT_SPACE_PLOT = os.path.join(output_dir, \"embedding_visualize.png\")\nIMAGE_GRID_EMBEDDINGS_PLOT = os.path.join(\n    output_dir, \"image_grid_on_embeddings.png\"\n)\n\n# define linearly and normally sampled latent space reconstructions plot paths\nLINEARLY_SAMPLED_RECONSTRUCTIONS_PLOT = os.path.join(\n    output_dir, \"linearly_sampled_reconstructions.png\"\n)\nNORMALLY_SAMPLED_RECONSTRUCTIONS_PLOT = os.path.join(\n    output_dir, \"normally_sampled_reconstructions.png\"\n)\n\n# define class labels dictionary\nCLASS_LABELS = {\n    0: \"T-shirt\/top\",\n    1: \"Trouser\",\n    2: \"Pullover\",\n    3: \"Dress\",\n    4: \"Coat\",\n    5: \"Sandal\",\n    6: \"Shirt\",\n    7: \"Sneaker\",\n    8: \"Bag\",\n    9: \"Ankle boot\",\n}<\/pre>\n<p><strong>Lines 41-54<\/strong> define paths where images will be saved before and after training the model. The images (or plots) are reconstructions from an untrained and trained model along with the corresponding real images.<\/p>\n<p>Next, we define some post-training analysis plot paths for storing visualization results such as <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">LATENT_SPACE_PLOT<\/code>, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">IMAGE_GRID_EMBEDDINGS_PLOT<\/code>, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">LINEARLY_SAMPLED_RECONSTRUCTIONS_PLOT<\/code>, and <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">NORMALLY_SAMPLED_RECONSTRUCTIONS_PLOT<\/code> on <strong>Lines 57-68.<\/strong><\/p>\n<p>Lastly, we establish a <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">CLASS_LABELS<\/code> dictionary on <strong>Lines 71-82<\/strong>. This assists in assessing the quality of the reconstruction and determining the class to which the reconstruction pertains, as we possess both test images and labels for that specific reconstruction.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h3DefiningUtilities\"\/>\n<h3><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h3DefiningUtilities\"><strong>Defining the Utilities<\/strong><\/a><\/h3>\n<p>Now that the configuration has been defined, we can determine the utilities for validating the autoencoder during training and post-training analysis plots.<\/p>\n<p>If you&#8217;ve followed along with <a href=\"https:\/\/pyimg.co\/t0noi\"  rel=\"noreferrer noopener\">the previous lesson in this series<\/a>, you may recall our deep dive into various utilities designed to aid in the training and analysis of deep learning models. In the context of this autoencoder tutorial, I&#8217;ve prepared a <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">utils.py<\/code> script that arms you with functions for tasks such as random image extraction, image visualization, and latent space plotting, to name a few. <\/p>\n<p>Here&#8217;s a brief rundown:<\/p>\n<ul>\n<li><strong>Image Visualization:<\/strong> <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">extract_random_images<\/code>, <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">display_images<\/code>, and <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">display_random_images<\/code> collectively handle the extraction and display of random images from our dataset.<\/li>\n<li><strong>Autoencoder Validation:<\/strong> The <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">validate<\/code> function is instrumental in monitoring our autoencoder&#8217;s performance at the end of each training epoch.<\/li>\n<li><strong>Latent Space Analysis:<\/strong> The real magic of autoencoders lies in their ability to represent complex data in a reduced-dimensional space. Functions like <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">get_test_embeddings<\/code>, <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">plot_latent_space<\/code>, <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">get_random_test_images_embeddings<\/code>, and <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">plot_image_grid_on_embeddings<\/code> are designed specifically for this purpose, allowing you to visualize and interpret the latent embeddings of your trained model.<\/li>\n<\/ul>\n<p>While we won&#8217;t delve into the specifics here (as they were thoroughly explored in <a href=\"https:\/\/pyimg.co\/t0noi\"  rel=\"noreferrer noopener\"><a href=\"https:\/\/pyimg.co\/t0noi\"  rel=\"noreferrer noopener\">the previous lesson of this series<\/a><\/a>), rest assured these utilities are pivotal in fine-tuning, validating, and analyzing our autoencoder model.<\/p>\n<p>However, in this lesson, we do emphasize the VAE loss function, a critical component of our utilities. This loss function, composed of the KL divergence and reconstruction loss, is pivotal for optimizing the VAE architecture.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"true\" data-enlighter-lineoffset=\"6\" data-enlighter-title=\"A Deep Dive into Variational Autoencoders with PyTorch\" data-enlighter-group=\"6\">import torch\nimport torch.nn as nn<\/pre>\n<p>We start by importing <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">torch<\/code> and <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">torch.nn<\/code> on <strong>Lines 6 and 7<\/strong>. The <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">torch.nn<\/code> is a sub-library in PyTorch containing neural network layers, loss functions, and utilities.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"true\" data-enlighter-lineoffset=\"382\" data-enlighter-title=\"A Deep Dive into Variational Autoencoders with PyTorch\" data-enlighter-group=\"66\">def vae_gaussian_kl_loss(mu, logvar):\n    # see Appendix B from VAE paper:\n    # Kingma and Welling. Auto-Encoding Variational Bayes. ICLR, 2014\n    # https:\/\/arxiv.org\/abs\/1312.6114\n    KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp(), dim=1)\n    return KLD.mean()\n\n\ndef reconstruction_loss(x_reconstructed, x):\n    bce_loss = nn.BCELoss()\n    return bce_loss(x_reconstructed, x)\n\n\ndef vae_loss(y_pred, y_true):\n    mu, logvar, recon_x = y_pred\n    recon_loss = reconstruction_loss(recon_x, y_true)\n    kld_loss = vae_gaussian_kl_loss(mu, logvar)\n    return 500 * recon_loss + kld_loss<\/pre>\n<p><strong>Lines 382-387<\/strong> compute the Kullback-Leibler Divergence (KLD) between the learned latent variable distribution and a standard normal distribution. The formula is derived from the <a href=\"https:\/\/arxiv.org\/pdf\/1312.6114v10.pdf\"  rel=\"noreferrer noopener\">VAE paper<\/a>. <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">mu<\/code> and <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">logvar<\/code> are the mean and log variance outputs from the encoder part of the VAE.<\/p>\n<p>Next, on <strong>Lines 390-392<\/strong>, we compute the Binary Cross-Entropy (BCE) loss between the original input <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">x<\/code> and its reconstruction <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">x_reconstructed<\/code>. This loss measures how well the VAE has reconstructed the input data.<\/p>\n<p>Finally, we compute the total loss for training the VAE by combining the reconstruction loss and the KL divergence loss on <strong>Lines 395-399<\/strong>. The reconstruction loss is scaled by a factor of <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">500<\/code>. This weighting factor balances the two components of the loss. Adjusting this factor can influence the trade-off between the fidelity of the reconstructions and the regularity of the latent space.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h3DefiningNetwork\"\/>\n<h3><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h3DefiningNetwork\"><strong>Defining the Network<\/strong><\/a><\/h3>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"true\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"A Deep Dive into Variational Autoencoders with PyTorch\" data-enlighter-group=\"7\"># import the necessary libraries\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom torch.distributions.normal import Normal<\/pre>\n<p>We start by importing the required packages: <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">torch<\/code>, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">torch.nn<\/code>, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">torch.nn.functional<\/code>, and <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">torch.distributions.normal<\/code> on <strong>Lines 1-5<\/strong>.<\/p>\n<p>The <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">torch.nn.functional<\/code> module contains functions that operate on tensors and are used in building neural networks. Unlike <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">torch.nn<\/code>, which provides classes, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">torch.nn.functional<\/code> provides functions. This is useful for operations that don&#8217;t have any parameters (e.g., activation functions, certain loss functions, and, in our case, using it for the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">ReLU<\/code> activation function).<\/p>\n<p>We import the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">Normal<\/code> class from <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">torch.distributions<\/code>, which provides functionalities to create and manipulate normal (Gaussian) distributions. We use it in the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">Sampling<\/code> class to sample a tensor from a normal distribution. <\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"true\" data-enlighter-lineoffset=\"8\" data-enlighter-title=\"A Deep Dive into Variational Autoencoders with PyTorch\" data-enlighter-group=\"8\"># define a class for sampling\n# this class will be used in the encoder for sampling in the latent space\nclass Sampling(nn.Module):\n    def forward(self, z_mean, z_log_var):\n        # get the shape of the tensor for the mean and log variance\n        batch, dim = z_mean.shape\n        # generate a normal random tensor (epsilon) with the same shape as z_mean\n        # this tensor will be used for reparameterization trick\n        epsilon = Normal(0, 1).sample((batch, dim)).to(z_mean.device)\n        # apply the reparameterization trick to generate the samples in the\n        # latent space\n        return z_mean + torch.exp(0.5 * z_log_var) * epsilon<\/pre>\n<p>Next, we define a <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">Sampling<\/code> class that provides a mechanism to sample from the latent space of a VAE using the reparameterization trick, which allows for gradient-based optimization during training.<\/p>\n<p>On <strong>Line 10<\/strong>, we define the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">Sampling<\/code> class as a subclass of <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">nn.Module<\/code> that allows us to use it as part of a larger neural network model. <\/p>\n<p>Then, the <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">forward<\/code> method defines the forward pass of the module on <strong>Line 11<\/strong>. In PyTorch, it internally invokes the <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">forward<\/code> method when you call an <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">nn.Module<\/code> object. Here, the method takes two arguments: <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">z_mean<\/code> and <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">z_log_var<\/code>, which represent the mean and log variance of the latent variable&#8217;s distribution, respectively.<\/p>\n<p>On <strong>Lines 13 and 16<\/strong>, the shape of the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">z_mean<\/code> tensor is extracted to get the batch size and the dimension of the latent space. A random tensor epsilon is sampled from a standard normal distribution (mean <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">0<\/code> and variance <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">1<\/code>) with the same shape as <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">z_mean<\/code>. This tensor is used for the reparameterization trick. The <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">.to(z_mean.device)<\/code> ensures the epsilon tensor is on the same device (CPU or GPU) as the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">z_mean<\/code> tensor.<\/p>\n<p>Finally, on <strong>Line 19<\/strong>, we apply the reparameterization trick and return it to the calling function. Instead of sampling from the distribution parameterized by <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">z_mean<\/code> and <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">z_log_var<\/code> directly, the trick introduces an auxiliary random variable epsilon and a deterministic transformation.<\/p>\n<ul>\n<li>The term <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">torch.exp(0.5 * z_log_var)<\/code> computes the standard deviation (since the input is the log variance).<\/li>\n<li>The sampled latent variable is then computed as the mean plus the standard deviation multiplied by the random tensor epsilon.<\/li>\n<\/ul>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"true\" data-enlighter-lineoffset=\"22\" data-enlighter-title=\"A Deep Dive into Variational Autoencoders with PyTorch\" data-enlighter-group=\"9\"># define the encoder\nclass Encoder(nn.Module):\n    def __init__(self, image_size, embedding_dim):\n        super(Encoder, self).__init__()\n        # define the convolutional layers for downsampling and feature\n        # extraction\n        self.conv1 = nn.Conv2d(1, 32, 3, stride=2, padding=1)\n        self.conv2 = nn.Conv2d(32, 64, 3, stride=2, padding=1)\n        self.conv3 = nn.Conv2d(64, 128, 3, stride=2, padding=1)\n        # define a flatten layer to flatten the tensor before feeding it into\n        # the fully connected layer\n        self.flatten = nn.Flatten()\n        # define fully connected layers to transform the tensor into the desired\n        # embedding dimensions\n        self.fc_mean = nn.Linear(\n            128 * (image_size \/\/ 8) * (image_size \/\/ 8), embedding_dim\n        )\n        self.fc_log_var = nn.Linear(\n            128 * (image_size \/\/ 8) * (image_size \/\/ 8), embedding_dim\n        )\n        # initialize the sampling layer\n        self.sampling = Sampling()\n\n    def forward(self, x):\n        # apply convolutional layers with relu activation function\n        x = F.relu(self.conv1(x))\n        x = F.relu(self.conv2(x))\n        x = F.relu(self.conv3(x))\n        # flatten the tensor\n        x = self.flatten(x)\n        # get the mean and log variance of the latent space distribution\n        z_mean = self.fc_mean(x)\n        z_log_var = self.fc_log_var(x)\n        # sample a latent vector using the reparameterization trick\n        z = self.sampling(z_mean, z_log_var)\n        return z_mean, z_log_var, z<\/pre>\n<p>Next, we define the encoder part of a VAE, consisting of convolutional layers for feature extraction, fully connected layers for transforming features into latent space parameters, and a sampling mechanism to generate latent vectors.<\/p>\n<p>On <strong>Line 23<\/strong>, we define the class <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">Encoder<\/code> as a subclass of PyTorch&#8217;s <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">nn.Module<\/code>.<\/p>\n<p>The <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">__init__<\/code> method initializes the encoder with the necessary layers from <strong>Lines 24-43<\/strong>. It takes two parameters: <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">image_size<\/code>, representing the size of the input images, and <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">embedding_dim<\/code>, representing the dimensionality of the latent space. Below are the layers the encoder comprises of:<\/p>\n<ul>\n<li><strong>Convolutional Layers:<\/strong> Three convolutional layers are defined for downsampling and feature extraction. They progressively increase the number of channels and reduce the spatial dimensions of the input.<\/li>\n<li><strong>Flatten Layer:<\/strong> A flatten layer is defined to convert the 3D tensor output from the convolutional layers into a 1D tensor.<\/li>\n<li><strong>Fully Connected Layers:<\/strong> Two fully connected layers are defined to transform the flattened tensor into the mean and log variance of the latent space distribution.<\/li>\n<li><strong>Sampling Layer:<\/strong> An instance of the previously defined <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">Sampling<\/code> class is created to handle the reparameterization trick.<\/li>\n<\/ul>\n<p>From<strong> Lines 45-57<\/strong>, the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">forward<\/code> method defines the forward pass of the encoder. It takes an input tensor <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">x<\/code> (representing a batch of images) and returns three outputs: <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">z_mean<\/code>, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">z_log_var<\/code>, and <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">z<\/code>. <\/p>\n<p>A call to the following layers, along with necessary activation functions, is made:<\/p>\n<ul>\n<li><strong>Convolutional Layers with ReLU Activation:<\/strong> The input is passed through the three convolutional layers, each followed by a ReLU activation function.<\/li>\n<li><strong>Flattening:<\/strong> The 3D tensor is flattened into a 1D tensor.<\/li>\n<li><strong>Mean and Log Variance:<\/strong> The flattened tensor is passed through the fully connected layers to obtain the mean (<code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">z_mean<\/code>) and log variance (<code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">z_log_var<\/code>) of the latent space distribution.<\/li>\n<li><strong>Sampling:<\/strong> The Sampling layer is called with <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">z_mean<\/code> and <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">z_log_var<\/code> to sample a latent vector <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">z<\/code> using the reparameterization trick.<\/li>\n<\/ul>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"true\" data-enlighter-lineoffset=\"60\" data-enlighter-title=\"A Deep Dive into Variational Autoencoders with PyTorch\" data-enlighter-group=\"10\"># define the decoder\nclass Decoder(nn.Module):\n    def __init__(self, embedding_dim, shape_before_flattening):\n        super(Decoder, self).__init__()\n        # define a fully connected layer to transform the latent vector back to\n        # the shape before flattening\n        self.fc = nn.Linear(\n            embedding_dim,\n            shape_before_flattening[0]\n            * shape_before_flattening[1]\n            * shape_before_flattening[2],\n        )\n        # define a reshape function to reshape the tensor back to its original\n        # shape\n        self.reshape = lambda x: x.view(-1, *shape_before_flattening)\n        # define the transposed convolutional layers for the decoder to upsample\n        # and generate the reconstructed image\n        self.deconv1 = nn.ConvTranspose2d(\n            128, 64, 3, stride=2, padding=1, output_padding=1\n        )\n        self.deconv2 = nn.ConvTranspose2d(\n            64, 32, 3, stride=2, padding=1, output_padding=1\n        )\n        self.deconv3 = nn.ConvTranspose2d(\n            32, 1, 3, stride=2, padding=1, output_padding=1\n        )\n\n    def forward(self, x):\n        # pass the latent vector through the fully connected layer\n        x = self.fc(x)\n        # reshape the tensor\n        x = self.reshape(x)\n        # apply transposed convolutional layers with relu activation function\n        x = F.relu(self.deconv1(x))\n        x = F.relu(self.deconv2(x))\n        # apply the final transposed convolutional layer with a sigmoid\n        # activation to generate the final output\n        x = torch.sigmoid(self.deconv3(x))\n        return x<\/pre>\n<p>On <strong>Line 61<\/strong>, we define the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">Decoder<\/code> class, which like <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">Encoder<\/code>, inherits from the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">nn.Module<\/code> class. The decoder of a variational autoencoder performs the opposite of the encoder, taking the latent space as input and generating an image at the output.<\/p>\n<p>The <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">__init__<\/code> method initializes the decoder with the necessary layers from <strong>Lines 62-85<\/strong>. It takes two parameters: <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">embedding_dim<\/code> (represents the dimensionality of the latent space) and <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">shape_before_flattening<\/code> (is the tensor\u2019s shape before it was flattened in the encoder). The decoder comprises of following layers and functionalities in the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">__init__<\/code> method:<\/p>\n<ul>\n<li><strong>Fully Connected Layer:<\/strong> The <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">self.fc<\/code> layer transforms the latent vector back to the tensor\u2019s shape before it was flattened in the encoder.<\/li>\n<li><strong>Reshape Function:<\/strong> The <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">self.reshape<\/code> lambda function reshapes the tensor back to its original 3D shape after passing through the fully connected layer.<\/li>\n<li><strong>Transposed Convolutional Layers:<\/strong> Three transposed convolutional layers (<code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">deconv1<\/code>, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">deconv2<\/code>, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">deconv3<\/code>) are defined to upsample the tensor and generate the reconstructed image.<\/li>\n<\/ul>\n<p>On <strong>Lines 87-98<\/strong>, the <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">forward<\/code> method defines the forward pass of the decoder. It takes a latent vector <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">x<\/code> as input and returns the reconstructed image. During this forward pass, calls to the following layers and functions are made:<\/p>\n<ul>\n<li><strong>Fully Connected Layer:<\/strong> The latent vector is passed through the fully connected layer to expand its dimensions.<\/li>\n<li><strong>Reshaping:<\/strong> The tensor is reshaped to its original 3D shape using the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">self.reshape<\/code> function.<\/li>\n<li><strong>Transposed Convolutional Layers with ReLU Activation:<\/strong> The tensor is passed through two transposed convolutional layers (<code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">deconv1<\/code> and <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">deconv2<\/code>), each followed by a ReLU activation function.<\/li>\n<li><strong>Final Transposed Convolutional Layer with Sigmoid Activation:<\/strong> The tensor is passed through the final transposed convolutional layer (<code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">deconv3<\/code>) and then through a sigmoid activation function. The sigmoid activation ensures that the output values are between <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">0<\/code> and <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">1<\/code>, which is suitable for image pixel values.<\/li>\n<\/ul>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"true\" data-enlighter-lineoffset=\"101\" data-enlighter-title=\"A Deep Dive into Variational Autoencoders with PyTorch\" data-enlighter-group=\"11\"># define the vae class\nclass VAE(nn.Module):\n    def __init__(self, encoder, decoder):\n        super(VAE, self).__init__()\n        # initialize the encoder and decoder\n        self.encoder = encoder\n        self.decoder = decoder\n\n    def forward(self, x):\n        # pass the input through the encoder to get the latent vector\n        z_mean, z_log_var, z = self.encoder(x)\n        # pass the latent vector through the decoder to get the reconstructed\n        # image\n        reconstruction = self.decoder(z)\n        # return the mean, log variance and the reconstructed image\n        return z_mean, z_log_var, reconstruction<\/pre>\n<p>Finally, with the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">Encoder<\/code> and <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">Decoder<\/code> classes defined, we combine them into a unified <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">VAE<\/code> class.<\/p>\n<p>The <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">__init__<\/code> method initializes the VAE with the provided encoder and decoder from <strong>Lines 103-107<\/strong>. The encoder and decoder are passed as arguments when creating an instance of the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">VAE<\/code> class. The provided encoder and decoder are assigned to <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">self.encoder<\/code> and <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">self.decoder<\/code>, respectively. These will be used in the forward pass.<\/p>\n<p>On <strong>Lines 109-116<\/strong>, the <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">forward<\/code> method defines the forward pass of the VAE. It takes an input tensor <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">x<\/code> (representing a batch of images) and returns three outputs: <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">z_mean<\/code>, <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">z_log_var<\/code>, and <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">reconstruction<\/code>.<\/p>\n<ul>\n<li><strong>Encoder:<\/strong> The input <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">x<\/code> is passed through the encoder, which returns the mean (<code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">z_mean<\/code>), log variance (<code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">z_log_var<\/code>), and a sample (<code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">z<\/code>) from the latent space distribution.<\/li>\n<li><strong>Decoder:<\/strong> The sampled latent vector <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">z<\/code> is then passed through the decoder to produce the reconstructed image (<code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">reconstruction<\/code>).<\/li>\n<\/ul>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h3Training\"\/>\n<h3><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h3Training\"><strong>Training the Variational Autoencoder<\/strong><\/a><\/h3>\n<p>In this section, we set up and train a VAE on the <a href=\"https:\/\/universe.roboflow.com\/popular-benchmarks\/fashion-mnist-ztryt\/?ref=pyimagesearch\"  rel=\"noreferrer noopener\">Fashion-MNIST<\/a> dataset. First, we preprocess the data, initialize the model, optimizer, and scheduler, and then train the model for a specified number of epochs, saving the best model based on validation loss.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"true\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"A Deep Dive into Variational Autoencoders with PyTorch\" data-enlighter-group=\"12\"># USAGE\n# python train.py\n\n# import the necessary packages\nfrom pyimagesearch import config, network, utils\nfrom torchvision import datasets, transforms\nimport torch.optim as optim\nimport torch\nimport os\n\nimport matplotlib\n\n# change the backend based on the non-gui backend available\nmatplotlib.use(\"agg\")<\/pre>\n<p>We start by importing the necessary modules and functions from the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">pyimagesearch<\/code> package, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">torchvision<\/code>, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">torch<\/code>, and <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">optim<\/code> from <strong>Lines 5-11<\/strong>.<\/p>\n<p><strong>Line 14<\/strong> sets the backend of Matplotlib to &#8220;agg&#8221;, which is a non-GUI backend suitable for scripts and web servers.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"true\" data-enlighter-lineoffset=\"17\" data-enlighter-title=\"A Deep Dive into Variational Autoencoders with PyTorch\" data-enlighter-group=\"13\"># define the transformation to be applied to the data\ntransform = transforms.Compose(\n    [transforms.Pad(padding=2), transforms.ToTensor()]\n)\n\n# load the FashionMNIST training data and create a dataloader\ntrainset = datasets.FashionMNIST(\n    \"data\", train=True, download=True, transform=transform\n)\ntrain_loader = torch.utils.data.DataLoader(\n    trainset, batch_size=config.BATCH_SIZE, shuffle=True\n)\n\n# load the FashionMNIST test data and create a dataloader\ntestset = datasets.FashionMNIST(\n    \"data\", train=False, download=True, transform=transform\n)\ntest_loader = torch.utils.data.DataLoader(\n    testset, batch_size=config.BATCH_SIZE, shuffle=True\n)<\/pre>\n<p><strong>Lines 18-20<\/strong> define a transformation pipeline to preprocess the images. Images are padded by <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">2<\/code> pixels and then converted to tensors.<\/p>\n<p>Then, from <strong>Lines 23-36<\/strong>, <\/p>\n<ul>\n<li>The <a href=\"https:\/\/universe.roboflow.com\/popular-benchmarks\/fashion-mnist-ztryt\/?ref=pyimagesearch\"  rel=\"noreferrer noopener\">Fashion-MNIST<\/a> training and test datasets are loaded using <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">datasets.FashionMNIST<\/code>. <\/li>\n<li>The data is transformed using the previously defined transformations.<\/li>\n<li>DataLoaders for both training and test datasets are created. These will be used to iterate over the datasets in batches.<\/li>\n<\/ul>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"true\" data-enlighter-lineoffset=\"38\" data-enlighter-title=\"A Deep Dive into Variational Autoencoders with PyTorch\" data-enlighter-group=\"14\"># instantiate the encoder and decoder models\nencoder = network.Encoder(config.IMAGE_SIZE, config.EMBEDDING_DIM).to(\n    config.DEVICE\n)\ndecoder = network.Decoder(\n    config.EMBEDDING_DIM, config.SHAPE_BEFORE_FLATTENING\n).to(config.DEVICE)\n# pass the encoder and decoder to VAE class\nvae = network.VAE(encoder, decoder)<\/pre>\n<p>In the above lines, <\/p>\n<ul>\n<li>Instances of the encoder and decoder are created using configurations from the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">config<\/code> module.<\/li>\n<li>These instances are then passed to the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">VAE<\/code> class to create the VAE model.<\/li>\n<\/ul>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"true\" data-enlighter-lineoffset=\"48\" data-enlighter-title=\"A Deep Dive into Variational Autoencoders with PyTorch\" data-enlighter-group=\"15\"># instantiate optimizer and scheduler\noptimizer = optim.Adam(\n    list(encoder.parameters()) + list(decoder.parameters()), lr=config.LR\n)\nscheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(\n    optimizer, mode=\"min\", factor=0.1, patience=config.PATIENCE, verbose=True\n)<\/pre>\n<p>The <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">Adam<\/code> optimizer is initialized with the parameters of both the encoder and decoder on <strong>Lines 49-51<\/strong>.<\/p>\n<p>A learning rate scheduler is also initialized to reduce the learning rate when the validation loss plateaus (<strong>Lines 52-54<\/strong>).<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"true\" data-enlighter-lineoffset=\"68\" data-enlighter-title=\"A Deep Dive into Variational Autoencoders with PyTorch\" data-enlighter-group=\"16\"># initialize the best validation loss as infinity\nbest_val_loss = float(\"inf\")\n\n# start training by looping over the number of epochs\nfor epoch in range(config.EPOCHS):\n    # set the vae model to train mode\n    # and move it to CPU\/GPU\n    vae.train()\n    vae.to(config.DEVICE)\n\n    running_loss = 0.0\n    # loop over the batches of the training dataset\n    for batch_idx, (data, _) in enumerate(train_loader):\n        data = data.to(config.DEVICE)\n        optimizer.zero_grad()\n\n        # forward pass through the VAE\n        pred = vae(data)\n\n        # compute the VAE loss\n        loss = utils.vae_loss(pred, data)\n\n        # backward pass and optimizer step\n        loss.backward()\n        optimizer.step()\n\n        running_loss += loss.item()<\/pre>\n<p>Now starts the training loop for variational autoencoder for a specified number of epochs (from the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">config<\/code> module).<\/p>\n<p>Within each epoch:<\/p>\n<ul>\n<li>The VAE model is set to training mode and moved to the appropriate device (CPU\/GPU).<\/li>\n<li>The training dataset is iterated over in batches.<\/li>\n<li>For each batch, the data is passed through the VAE, and the loss is computed using the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">utils.vae_loss<\/code> function.<\/li>\n<li>Gradients are backpropagated, and the optimizer updates the model parameters.<\/li>\n<li>The training loss is accumulated.<\/li>\n<\/ul>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"true\" data-enlighter-lineoffset=\"96\" data-enlighter-title=\"A Deep Dive into Variational Autoencoders with PyTorch\" data-enlighter-group=\"17\">    # compute average loss for the epoch\n    train_loss = running_loss \/ len(train_loader)\n    # compute validation loss for the epoch\n    val_loss = utils.validate(vae, test_loader)\n    # print training and validation loss at every 20 epochs\n    if epoch % 20 == 0 or (epoch+1) == config.EPOCHS:\n        print(\n            f\"Epoch {epoch} | Train Loss: {train_loss:.4f} | Val Loss: {val_loss:.4f}\"\n        )\n\n    # save best vae model weights based on validation loss\n    if val_loss &lt; best_val_loss:\n        best_val_loss = val_loss\n        torch.save(\n            {\"vae\": vae.state_dict()},\n            config.MODEL_WEIGHTS_PATH,\n        )\n    # adjust learning rate based on the validation loss\n    scheduler.step(val_loss)<\/pre>\n<p>Continuing the training loop:<\/p>\n<ul>\n<li>After processing all batches, the average training loss for the epoch is computed.<\/li>\n<li>The model is then validated on the test dataset, and the validation loss is computed.<\/li>\n<li>Every <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">20<\/code> epochs, or on the last epoch, the training and validation losses are printed.<\/li>\n<li>If the current epoch&#8217;s validation loss is the best so far, the model&#8217;s weights are saved.<\/li>\n<li>The learning rate scheduler adjusts the learning rate based on the validation loss.<\/li>\n<\/ul>\n<p>With that, we&#8217;ve completed the training of a variational autoencoder on the <a href=\"https:\/\/universe.roboflow.com\/popular-benchmarks\/fashion-mnist-ztryt\/?ref=pyimagesearch\"  rel=\"noreferrer noopener\">Fashion-MNIST<\/a> dataset. In the following section, we&#8217;ll examine the performance of the variational autoencoder under various testing scenarios.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h3PostTraining\"\/>\n<h3><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h3PostTraining\"><strong>Post-Training Analysis of Variational Autoencoder<\/strong><\/a><\/h3>\n<p>After training our Variational Autoencoder (VAE) on the <a href=\"https:\/\/universe.roboflow.com\/popular-benchmarks\/fashion-mnist-ztryt\/?ref=pyimagesearch\"  rel=\"noreferrer noopener\">Fashion-MNIST<\/a> dataset, it&#8217;s essential to dive into its performance metrics and truly understand what it has achieved. This post-training exploration gives us a clear picture of the VAE&#8217;s ability to encode images, navigate the latent space, and produce outputs that strike a balance between originality and familiarity.<\/p>\n<p>Our deep dive into these evaluations not only stands as a testament to the VAE&#8217;s capabilities but also showcases its wide-ranging applications. From refining existing images to creating entirely new visuals, the possibilities are vast.<\/p>\n<p>Here&#8217;s a closer look at the experiments we conducted:<\/p>\n<ul>\n<li>Evaluating the VAE&#8217;s image reconstructions after training.<\/li>\n<li>Comparing the latent space distribution of a Convolutional Autoencoder from a <a href=\"https:\/\/pyimg.co\/t0noi\"  rel=\"noreferrer noopener\">previous blog post<\/a> with our VAE.<\/li>\n<li>A detailed visualization of the latent space of the trained VAE.<\/li>\n<li>Generating a series of images linearly spaced on the VAE&#8217;s embeddings.<\/li>\n<li>Visualizing reconstructions from points linearly sampled within the latent space.<\/li>\n<\/ul>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h4Reconstruction\"\/>\n<h4><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h4Reconstruction\"><strong>Reconstruction by Variational Autoencoder After Training<\/strong><\/a><\/h4>\n<p>After training our VAE, it&#8217;s essential to gauge its reconstruction capabilities. By feeding it a set of validation images and comparing the outputs with the originals, we can determine how effectively the VAE has captured the essence of the dataset. As shown in <strong>Figure 6<\/strong>, the generated images appear quite realistic and closely resemble the original data, showcasing the VAE&#8217;s ability to replicate visual patterns with minimal loss of detail.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter is-resized\"><a href=\"https:\/\/lh5.googleusercontent.com\/edKqYfw2GJyMG--hq_FiMoZ6-FtnwHVf09z02RYSBuaDVROLu-8kIjTpFxPcRpYSIliD3nD75jKsLU87fHnEibSaumY6lJTicAkFBnfXhqeFn4ZCfSMMDfSHRaJfFHy9MQZoa0Vk9Xyq4xU0Ij2MU4U\"  rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/edKqYfw2GJyMG--hq_FiMoZ6-FtnwHVf09z02RYSBuaDVROLu-8kIjTpFxPcRpYSIliD3nD75jKsLU87fHnEibSaumY6lJTicAkFBnfXhqeFn4ZCfSMMDfSHRaJfFHy9MQZoa0Vk9Xyq4xU0Ij2MU4U\" alt=\"\" width=\"700\" height=\"422\"\/><\/a><figcaption><strong>Figure 6:<\/strong> <a href=\"https:\/\/universe.roboflow.com\/popular-benchmarks\/fashion-mnist-ztryt\/?ref=pyimagesearch\"  rel=\"noreferrer noopener\">Fashion-MNIST<\/a> images reconstructed by Variational Autoencoder after training (source: image by the author).<\/figcaption><\/figure>\n<\/div>\n<p>Furthermore, there&#8217;s a diverse representation across classes, ensuring distinct visual differences between categories, such as between sneakers and ankle boots. This diversity in reconstructions also indicates that our VAE doesn&#8217;t suffer from mode collapse, a common issue where generative models produce similar outputs for different classes.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h4Visualize\"\/>\n<h4><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h4Visualize\"><strong>Visualize the Distribution of the Latent Space of Trained Convolutional Autoencoder vs. Variational Autoencoder<\/strong><\/a><\/h4>\n<p>In a <a href=\"https:\/\/pyimg.co\/t0noi\"  rel=\"noreferrer noopener\">previous blog post<\/a>, we explored the Convolutional Autoencoder (CAE). When comparing its latent space with those of our VAEs, distinct differences emerge. As shown in <strong>Figure 7<\/strong>, the CA&#8217;s latent space doesn&#8217;t closely follow a normal distribution, whereas the VAE&#8217;s latent space (<strong>Figure 8<\/strong>) aligns well with it and is centered around <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">0<\/code>. <\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><a href=\"https:\/\/pyimagesearch.com\/wp-content\/uploads\/2023\/09\/image.png\"  rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/09\/image-1024x509.png?lossy=2&#038;strip=1&#038;webp=1\" alt=\"\" class=\"wp-image-41412\" width=\"700\" height=\"348\" srcset=\"https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/09\/image.png?size=126x63&amp;lossy=2&amp;strip=1&amp;webp=1 126w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/09\/image-300x149.png?lossy=2&amp;strip=1&amp;webp=1 300w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/09\/image.png?size=378x188&amp;lossy=2&amp;strip=1&amp;webp=1 378w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/09\/image.png?size=504x251&amp;lossy=2&amp;strip=1&amp;webp=1 504w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/09\/image.png?size=630x313&amp;lossy=2&amp;strip=1&amp;webp=1 630w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/09\/image-768x382.png?lossy=2&amp;strip=1&amp;webp=1 768w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/09\/image-1024x509.png?lossy=2&amp;strip=1&amp;webp=1 1024w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/09\/image.png?lossy=2&amp;strip=1&amp;webp=1 1080w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/09\/image-1536x763.png?lossy=2&amp;strip=1&amp;webp=1 1536w\" sizes=\"(max-width: 630px) 100vw, 630px\" \/><\/a><figcaption><strong>Figure 7:<\/strong> Latent Space Distribution of Convolutional Autoencoder trained on the <a href=\"https:\/\/universe.roboflow.com\/popular-benchmarks\/fashion-mnist-ztryt\/?ref=pyimagesearch\"  rel=\"noreferrer noopener\">Fashion-MNIST<\/a> dataset (source: image by the author).<\/figcaption><\/figure>\n<\/div>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter is-resized\"><a href=\"https:\/\/lh3.googleusercontent.com\/YkrDhAV8bx5HCRkOYGErMVaE0ksPSP1DLdj1rsxXYlyojGS7W0--5STp0M8FT2JaeNGlwVoTuYFSSuJpopeRro8emj5geyyFOVm6cubZhgjNHThjrkikBR1tw7u2Lyjwz-yqAvIvB8iwN5iao-ydpDA\"  rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/YkrDhAV8bx5HCRkOYGErMVaE0ksPSP1DLdj1rsxXYlyojGS7W0--5STp0M8FT2JaeNGlwVoTuYFSSuJpopeRro8emj5geyyFOVm6cubZhgjNHThjrkikBR1tw7u2Lyjwz-yqAvIvB8iwN5iao-ydpDA\" alt=\"\" width=\"700\" height=\"349\"\/><\/a><figcaption><strong>Figure 8:<\/strong> Latent Space Distribution of Variational Autoencoder trained on the <a href=\"https:\/\/universe.roboflow.com\/popular-benchmarks\/fashion-mnist-ztryt\/?ref=pyimagesearch\"  rel=\"noreferrer noopener\">Fashion-MNIST<\/a> dataset (source: image by the author).<\/figcaption><\/figure>\n<\/div>\n<p>This normal distribution in the VAE&#8217;s latent space ensures a more continuous and dense representation, which often leads to better and more consistent image generation. Both models aim to capture the data&#8217;s underlying structure, but their methods and outcomes vary. This comparison deepens our understanding of each generative model\u2019s unique characteristics and strengths.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h4Latent\"\/>\n<h4><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h4Latent\"><strong>Latent Space Plot of Trained Variational Autoencoder<\/strong><\/a><\/h4>\n<p>The latent space of our VAE is a treasure trove of information. By visualizing this space, colored by clothing type, as shown in <strong>Figure 9<\/strong>, we can discern clusters, patterns, and potential correlations between different attributes. Each point in this space represents a condensed version of an image, and its location provides insights into the image&#8217;s features.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter is-resized\"><a href=\"https:\/\/lh3.googleusercontent.com\/yzZ2-KXIPRb3OsdQaVHfJuNOZtd4FaNV13qIYv-ETPcvEd8w3M6lTDxLPubdAHlTOI-kO1twpyMXiQh7_iBJhb10u2ISFDKWOTr-mCvwmedV-W2Oyg0DMT8kZ45rAeCMwVuE0qp0AQQ9thD1mTPl1HU\"  rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/yzZ2-KXIPRb3OsdQaVHfJuNOZtd4FaNV13qIYv-ETPcvEd8w3M6lTDxLPubdAHlTOI-kO1twpyMXiQh7_iBJhb10u2ISFDKWOTr-mCvwmedV-W2Oyg0DMT8kZ45rAeCMwVuE0qp0AQQ9thD1mTPl1HU\" alt=\"\" width=\"484\" height=\"500\"\/><\/a><figcaption><strong>Figure 9:<\/strong> Latent Space Plot of Variational Autoencoder trained on the <a href=\"https:\/\/universe.roboflow.com\/popular-benchmarks\/fashion-mnist-ztryt\/?ref=pyimagesearch\"  rel=\"noreferrer noopener\">Fashion-MNIST<\/a> dataset (source: image by the author).<\/figcaption><\/figure>\n<\/div>\n<p>Similar class labels tend to form clusters, as observed with the Convolutional Autoencoder. This clustering remains consistent despite the VAE incorporating both the KL divergence loss and the reconstruction loss. The KL term encourages the latent space to follow a standard normal distribution centered around <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">0<\/code>. As a result, most of the latent values tend to lie within a range close to <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">-3<\/code> to <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">3<\/code>, based on the properties of the standard normal distribution.<\/p>\n<p>Notably, the labels were not used during training; the VAE independently learned the various forms of clothing to minimize reconstruction loss. Exploring this space offers a deeper understanding of the VAE&#8217;s internal mechanics and the relationships it has inferred among the dataset images.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h4Linearly\"\/>\n<h4><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h4Linearly\"><strong>Linearly Separated Images (Grid) on Embeddings of Trained Variational Autoencoder<\/strong><\/a><\/h4>\n<p>In our <a href=\"https:\/\/pyimg.co\/t0noi\"  rel=\"noreferrer noopener\">previous tutorial<\/a> on Convolutional Autoencoders (CAEs), we observed certain limitations in the latent space. Notably, there were regions where the encoded images were sparse, leading to voids. These voids posed challenges when generating new images, as points sampled from these areas often resulted in poorly formed or unrecognizable outputs. Additionally, the distribution of points in the CAE&#8217;s latent space was undefined, making it challenging to determine where to sample from.<\/p>\n<p>Fast forward to our experiments with the Variational Autoencoder (VAE), the landscape of the latent space appears markedly different. One of the defining characteristics of VAEs is their ability to enforce a continuous latent space, primarily due to the KL divergence term in their loss function. This continuity ensures that almost any point sampled from this space can be decoded into a meaningful image.<\/p>\n<p>Comparing our VAE results with the CAE findings, we notice a significant reduction in the voids or &#8220;empty spaces&#8221; in the latent space, as shown in <strong>Figure 10<\/strong>. Images generated from the VAE&#8217;s latent space exhibit a higher degree of coherence and quality. Even when we linearly sample points between two images, the VAE produces a smooth transition, capturing the nuances of each intermediate step. This is in stark contrast to the CAE, where linear sampling could lead to abrupt or unrecognizable transitions.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter is-resized\"><a href=\"https:\/\/lh4.googleusercontent.com\/slbFy8j3HKQJ3681sUTZw8rlh5LZfzIaWh-leYjmBUPxUBjPdiL-2FSRxEsg3Ba0CLLTTrJzMWj2CgWqfacyHJfqD9u2NPTrjU2iHuUsVARmA0xAd9z_OYt3hiLEUEvOC9Kf_kC_AAspD8PAwiH0VTE\"  rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/slbFy8j3HKQJ3681sUTZw8rlh5LZfzIaWh-leYjmBUPxUBjPdiL-2FSRxEsg3Ba0CLLTTrJzMWj2CgWqfacyHJfqD9u2NPTrjU2iHuUsVARmA0xAd9z_OYt3hiLEUEvOC9Kf_kC_AAspD8PAwiH0VTE\" alt=\"\" width=\"593\" height=\"500\"\/><\/a><figcaption><strong>Figure 10:<\/strong> A grid consisting of decoded embeddings via a trained decoder of variational autoencoder model, superimposed with embeddings from the original images in the dataset, is presented, with each class type distinguished by different colors (source: image by the author).<\/figcaption><\/figure>\n<\/div>\n<p>Furthermore, the VAE&#8217;s latent space is centered around zero and follows a normal distribution, providing a clear guideline for sampling. This structured approach to the latent space alleviates the challenges we faced with the CAE, where any point on the 2D plane could technically be a valid choice, but with no guarantee of a meaningful output.<\/p>\n<p>In essence, the VAE addresses many of the challenges we identified with the CAE. Its ability to maintain a continuous and structured latent space makes it a powerful tool for generating diverse and high-quality images, bridging the gaps we observed in our previous experiments with the CAE.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h4Reconstructions\"\/>\n<h4><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h4Reconstructions\"><strong>Reconstructions by the Trained Decoder of Variational Autoencoder Using the Points Sampled from Normal Distribution<\/strong><\/a><\/h4>\n<p>A hallmark of the VAE is its ability to generate novel images by sampling points from a standard normal distribution and decoding them. This process taps into the VAE&#8217;s trained decoder to produce images that, while not explicitly present in the <a href=\"https:\/\/universe.roboflow.com\/popular-benchmarks\/fashion-mnist-ztryt\/?ref=pyimagesearch\"  rel=\"noreferrer noopener\">Fashion-MNIST<\/a> dataset, are constructed using the learned features and patterns. From generating t-shirts and trousers to sandals and slippers, the VAE captures a wide range of fashion items.<\/p>\n<p><strong>Figure 11<\/strong> provides a glimpse into the VAE&#8217;s generative prowess. Showcasing its capacity to create diverse and realistic fashion item reconstructions from random points in the latent space, the VAE covers the entire spectrum of the <a href=\"https:\/\/universe.roboflow.com\/popular-benchmarks\/fashion-mnist-ztryt\/?ref=pyimagesearch\"  rel=\"noreferrer noopener\">Fashion-MNIST<\/a> dataset, capturing the essence of the fashion world in a truly remarkable manner.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter is-resized\"><a href=\"https:\/\/lh3.googleusercontent.com\/61DkMmmVZhumaBt1enhOVMw_RcaXc6_fUXPDJgPXZHwc27p49w-b3sdcMGo42ku9Km_6THoB61BRTxYHGj6Nmms3jyjXmn146k1aSyoYguKgyjC-MrLVU725lQ-VDcaoD3KankNcdu06xtr26zsNbBM\"  rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/61DkMmmVZhumaBt1enhOVMw_RcaXc6_fUXPDJgPXZHwc27p49w-b3sdcMGo42ku9Km_6THoB61BRTxYHGj6Nmms3jyjXmn146k1aSyoYguKgyjC-MrLVU725lQ-VDcaoD3KankNcdu06xtr26zsNbBM\" alt=\"\" width=\"508\" height=\"500\"\/><\/a><figcaption><strong>Figure 11: <\/strong>Visualizations of Decoded Outputs from the Variational Autoencoder, Generated Using Points Sampled from a Standard Normal Distribution (source: image by the author).<\/figcaption><\/figure>\n<\/div>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n<div id=\"pitch\" style=\"padding: 40px; width: 100%; background-color: #F4F6FA;\">\n<h3>What&#8217;s next? I recommend <a  href=\"https:\/\/pyimagesearch.com\/pyimagesearch-university\/?utm_source=blogPost&#038;utm_medium=bottomBanner&#038;utm_campaign=What%27s%20next%3F%20I%20recommend\">PyImageSearch University<\/a>.<\/h3>\n<p>\t<script src=\"https:\/\/fast.wistia.com\/embed\/medias\/kno0cmko2z.jsonp\" async><\/script><script src=\"https:\/\/fast.wistia.com\/assets\/external\/E-v1.js\" async><\/script><\/p>\n<div class=\"wistia_responsive_padding\" style=\"padding:56.25% 0 0 0;position:relative;\">\n<div class=\"wistia_responsive_wrapper\" style=\"height:100%;left:0;position:absolute;top:0;width:100%;\">\n<div class=\"wistia_embed wistia_async_kno0cmko2z videoFoam=true\" style=\"height:100%;position:relative;width:100%\">\n<div class=\"wistia_swatch\" style=\"height:100%;left:0;opacity:0;overflow:hidden;position:absolute;top:0;transition:opacity 200ms;width:100%;\"><img decoding=\"async\" src=\"https:\/\/fast.wistia.com\/embed\/medias\/kno0cmko2z\/swatch\" style=\"filter:blur(5px);height:100%;object-fit:contain;width:100%;\" alt=\"\" aria-hidden=\"true\" onload=\"this.parentNode.style.opacity=1;\" \/><\/div>\n<\/div>\n<\/div>\n<\/div>\n<div style=\"margin-top: 32px; margin-bottom: 32px; \">\n\t\t<strong>Course information:<\/strong><br \/>\n\t\t80 total classes \u2022 105+ hours of on-demand code walkthrough videos \u2022 Last updated: September 2023<br \/>\n\t\t<span style=\"color: #169FE6;\">\u2605\u2605\u2605\u2605\u2605<\/span> 4.84 (128 Ratings) \u2022 16,000+ Students Enrolled\n\t<\/div>\n<p><strong>I strongly believe that if you had the right teacher you could <em>master<\/em> computer vision and deep learning.<\/strong><\/p>\n<p>Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?<\/p>\n<p>That\u2019s <em>not<\/em> the case.<\/p>\n<p>All you need to master computer vision and deep learning is for someone to explain things to you in <em>simple, intuitive<\/em> terms. <em>And that\u2019s exactly what I do<\/em>. My mission is to change education and how complex Artificial Intelligence topics are taught.<\/p>\n<p>If you&#8217;re serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you\u2019ll learn how to <em>successfully<\/em> and <em>confidently<\/em> apply computer vision to your work, research, and projects. Join me in computer vision mastery.<\/p>\n<p><strong>Inside PyImageSearch University you&#8217;ll find:<\/strong><\/p>\n<ul style=\"margin-left: 0px;\">\n<li style=\"list-style: none;\">&check; <strong>80 courses<\/strong> on essential computer vision, deep learning, and OpenCV topics<\/li>\n<li style=\"list-style: none;\">&check; <strong>80 Certificates<\/strong> of Completion<\/li>\n<li style=\"list-style: none;\">&check; <strong>105+ hours<\/strong> of on-demand video<\/li>\n<li style=\"list-style: none;\">&check; <strong>Brand new courses released <em>regularly<\/em><\/strong>, ensuring you can keep up with state-of-the-art techniques<\/li>\n<li style=\"list-style: none;\">&check; <strong>Pre-configured Jupyter Notebooks in Google Colab<\/strong><\/li>\n<li style=\"list-style: none;\">&check; Run all code examples in your web browser \u2014 works on Windows, macOS, and Linux (no dev environment configuration required!)<\/li>\n<li style=\"list-style: none;\">&check; Access to <strong>centralized code repos for <em>all<\/em> 520+ tutorials<\/strong> on PyImageSearch<\/li>\n<li style=\"list-style: none;\">&check; <strong> Easy one-click downloads<\/strong> for code, datasets, pre-trained models, etc.<\/li>\n<li style=\"list-style: none;\">&check; <strong>Access<\/strong> on mobile, laptop, desktop, etc.<\/li>\n<\/ul>\n<p style=\"text-align: center;\">\n\t\t<a  class=\"button link\" href=\"https:\/\/pyimagesearch.com\/pyimagesearch-university\/?utm_source=blogPost&#038;utm_medium=bottomBanner&#038;utm_campaign=What%27s%20next%3F%20I%20recommend\" style=\"background-color: #6DC713; border-bottom: none;\">Click here to join PyImageSearch University<\/a>\n\t<\/p>\n<\/div>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h2Summary\"\/>\n<h2><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h2Summary\"><strong>Summary<\/strong><\/a><\/h2>\n<p>This tutorial offers a deep dive into the world of Variational Autoencoders (VAEs), beginning with a foundational understanding of their structure, including the roles of the encoder and decoder. We contrast the traditional Convolutional Autoencoder (CAE) with the VAE, emphasizing the significance of the Gaussian distribution in the latter. The tutorial further elaborates on the VAE&#8217;s objective function, highlighting the balance between reconstruction loss and KL divergence and introduces the reparameterization trick, a crucial component for training VAEs.<\/p>\n<p>Our dataset of choice is the <a href=\"https:\/\/universe.roboflow.com\/popular-benchmarks\/fashion-mnist-ztryt\/?ref=pyimagesearch\"  rel=\"noreferrer noopener\">Fashion-MNIST<\/a>, a popular collection of fashion items. We explore its structure, class distribution, and the necessary preprocessing steps to make it suitable for training. The partitioning of this dataset into training and validation subsets is also detailed.<\/p>\n<p>As we delve into the implementation, we discuss the configuration prerequisites, the creation of essential utilities, and the architecture of the VAE network.<\/p>\n<p>The core of this tutorial revolves around the training process of the VAE, where we meticulously guide readers through each step. Once trained, we transition into a comprehensive post-training analysis. This section showcases a series of experiments, from evaluating the VAE&#8217;s image reconstructions to comparing the latent space distributions of a previously trained Convolutional Autoencoder and our VAE. We also visualize the latent space, generate images based on linearly spaced embeddings, and demonstrate the VAE&#8217;s ability to reconstruct images from points sampled from a normal distribution.<\/p>\n<p>By the tutorial&#8217;s conclusion, readers will possess a robust understanding of VAEs, appreciating their capabilities in image generation, reconstruction, and the nuances of working with the <a href=\"https:\/\/universe.roboflow.com\/popular-benchmarks\/fashion-mnist-ztryt\/?ref=pyimagesearch\"  rel=\"noreferrer noopener\">Fashion-MNIST<\/a> dataset.<\/p>\n<p>In our upcoming tutorial, we&#8217;ll explore the CelebA dataset with Variational Autoencoders (VAEs), focusing on its architecture, training nuances, and post-training experiments. We&#8217;ll delve into image reconstruction, latent space arithmetic, and the unique capabilities of VAEs in generative modeling. Stay tuned for a deeper dive into VAEs and their applications.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h3Citation\"\/>\n<h3><a href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/#TOC-h3Citation\"><strong>Citation Information<\/strong><\/a><\/h3>\n<p><strong>Sharma, A.<\/strong> \u201cA Deep Dive into Variational Autoencoders with PyTorch,\u201d <em>PyImageSearch<\/em>, P. Chugh, A. R. Gosthipaty, S. Huot, K. Kidriavsteva, and R. Raha, eds., 2023, <a href=\"https:\/\/pyimg.co\/7e4if\"  rel=\"noreferrer noopener\">https:\/\/pyimg.co\/7e4if<\/a> <\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"raw\" data-enlighter-theme=\"classic\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"false\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">@incollection{Sharma_2023_VAE,\n  author = {Aditya Sharma},\n  title = {A Deep Dive into Variational Autoencoders with {PyTorch}},\n  booktitle = {PyImageSearch},\n  editor = {Puneet Chugh and Aritra Roy Gosthipaty and Susan Huot and Kseniia Kidriavsteva and Ritwik Raha},\n  year = {2023},\n  url = {https:\/\/pyimg.co\/7e4if},\n}<\/pre>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n<div style=\"padding: 40px; width: 100%; background-color: #F4F6FA;\">\n<img decoding=\"async\" src=\"https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/05\/maskcv.png?lossy=2&#038;strip=1&#038;webp=1\" alt=\"Featured Image\" style=\"width: 100%; height: auto; margin-bottom: 20px;\" srcset=\"https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/05\/maskcv.png?size=126x70&#038;lossy=2&#038;strip=1&#038;webp=1 126w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/05\/maskcv-300x166.png?lossy=2&#038;strip=1&#038;webp=1 300w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/05\/maskcv.png?size=378x209&#038;lossy=2&#038;strip=1&#038;webp=1 378w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/05\/maskcv.png?lossy=2&#038;strip=1&#038;webp=1 500w\" sizes=\"(max-width: 500px) 100vw, 500px\"><\/p>\n<h3>Unleash the potential of computer vision with Roboflow &#8211; Free!<\/h3>\n<ul style=\"margin-left: 0px;\">\n<li style=\"list-style: none;\">Step into the realm of the future by <a  href=\"https:\/\/roboflow.com\/?ref=pyimagesearch\">signing up or logging into your Roboflow account<\/a>. Unlock a wealth of innovative dataset libraries and revolutionize your computer vision operations.<\/li>\n<li style=\"list-style: none;\">Jumpstart your journey by choosing from our broad array of datasets, or benefit from PyimageSearch\u2019s comprehensive library, crafted to cater to a wide range of requirements.<\/li>\n<li style=\"list-style: none;\">Transfer your data to Roboflow in any of the 40+ compatible formats. Leverage cutting-edge model architectures for training, and deploy seamlessly across diverse platforms, including API, NVIDIA, browser, iOS, and beyond. Integrate our platform effortlessly with your applications or your favorite third-party tools.<\/li>\n<li style=\"list-style: none;\">Equip yourself with the ability to train a potent computer vision model in a mere afternoon. With a few images, you can import data from any source via API, annotate images using our superior cloud-hosted tool, kickstart model training with a single click, and deploy the model via a hosted API endpoint. Tailor your process by opting for a code-centric approach, leveraging our intuitive, cloud-based UI, or combining both to fit your unique needs.<\/li>\n<li style=\"list-style: none;\">Embark on your journey today with absolutely no credit card required. Step into the future with Roboflow.<\/li>\n<\/ul>\n<p style=\"text-align: center;\">\n        <a  class=\"button link\" href=\"https:\/\/roboflow.com\/?ref=pyimagesearch\" style=\"background-color: #6DC713; border-bottom: none;\">Join Roboflow Now<\/a>\n    <\/p>\n<\/div>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n<p><strong>To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), <em>simply enter your email address in the form below!<\/em><\/strong><\/p>\n<div id=\"download-the-code\" class=\"post-cta-wrap\">\n<div class=\"gpd-post-cta\">\n<div class=\"gpd-post-cta-content\">\n<div class=\"gpd-post-cta-top\">\n<div class=\"gpd-post-cta-top-image\"><img decoding=\"async\" src=\"https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2020\/01\/cta-source-guide-1.png?lossy=2&#038;strip=1&#038;webp=1\" alt=\"\" srcset=\"https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2020\/01\/cta-source-guide-1.png?lossy=2&#038;strip=1&#038;webp=1 410w,https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2020\/01\/cta-source-guide-1.png?size=126x174&#038;lossy=2&#038;strip=1&#038;webp=1 126w,https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2020\/01\/cta-source-guide-1.png?size=252x348&#038;lossy=2&#038;strip=1&#038;webp=1 252w\" sizes=\"(max-width: 410px) 100vw, 410px\" \/><\/div>\n<div class=\"gpd-post-cta-top-title\">\n<h4>Download the Source Code and FREE 17-page Resource Guide<\/h4>\n<\/div>\n<div class=\"gpd-post-cta-top-desc\">\n<p>Enter your email address below to get a .zip of the code and a <strong>FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning.<\/strong> Inside you&#8217;ll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!<\/p>\n<\/div><\/div>\n<div class=\"gpd-post-cta-bottom\">\n<form id=\"footer-cta-code\" class=\"footer-cta\" action=\"https:\/\/www.getdrip.com\/forms\/4130035\/submissions\" method=\"post\"  data-drip-embedded-form=\"4130035\">\n\t\t\t\t\t<input name=\"fields[email]\" type=\"email\" value=\"\" placeholder=\"Your email address\" class=\"form-control\" \/><\/p>\n<p>\t\t\t\t\t<button type=\"submit\">Download the code!<\/button><\/p>\n<div style=\"display: none;\" aria-hidden=\"true\"><label for=\"website\">Website<\/label><br \/><input type=\"text\" id=\"website\" name=\"website\" tabindex=\"-1\" autocomplete=\"false\" value=\"\" \/><\/div>\n<\/p><\/form>\n<\/p><\/div>\n<\/p><\/div>\n<\/div>\n<\/div>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/\">A Deep Dive into Variational Autoencoders with PyTorch<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/pyimagesearch.com\/\">PyImageSearch<\/a>.<\/p>\n\n<p class=\"syndicated-attribution\"><figure class= \\\"wp-block-image alignnone \\\"><img src= \\\"http:\/\/itteacheritfreelance.hk\/test\/wordpress\/wp-content\/uploads\/2016\/05\/logo2-2.png\\\" alt=\\\"IT\u96fb\u8166\u88dc\u7fd2 java\u88dc\u7fd2 \u70ba\u5927\u5bb6\u914d\u5c0d\u96fb\u8166\u88dc\u7fd2,IT freelance, \u79c1\u4eba\u8001\u5e2b, PHP\u88dc\u7fd2,CSS\u88dc\u7fd2,XML,Java\u88dc\u7fd2,MySQL\u88dc\u7fd2,graphic design\u88dc\u7fd2,\u4e2d\u5c0f\u5b78ICT\u88dc\u7fd2,\u4e00\u5c0d\u4e00\u79c1\u4eba\u88dc\u7fd2\u548cFreelance\u81ea\u7531\u5de5\u4f5c\u914d\u5c0d\u3002\\\"\/><figcaption>\u7acb\u523b\u8a3b\u518a\u53ca\u5831\u540d\u96fb\u8166\u88dc\u7fd2\u8ab2\u7a0b\u5427!<\/figcaption><\/figure>\r\n<\/br>Find A Teacher Form:\r\n<\/br>https:\/\/docs.google.com\/forms\/d\/1vREBnX5n262umf4wU5U2pyTwvk9O-JrAgblA-wH9GFQ\/viewform?edit_requested=true#responses\r\n<\/br><\/br>Email:\r\n<\/br>public1989two@gmail.com<br><br><br><br><br><br><br>\r\n<a href=www.itsec.hk style=color:#FFFFFF;>www.itsec.hk<\/a><br>\r\n<a href=\\\"www.itsec.vip\\\" style=color:#FFFFFF;>www.itsec.vip<\/a><br>\r\n<a href=\\\"www.itseceu.uk\\\" style=color:#FFFFFF;>www.itseceu.uk<\/a><br><\/p>","protected":false},"excerpt":{"rendered":"<div class=\"mh-excerpt\"><p>Table of Contents A Deep Dive into Variational Autoencoder with PyTorch Introduction Comparison with Convolutional Autoencoder Architecture Latent Space Loss Function Why Does VAE Stand Out? Why Does the Encoder of a VAE Follow a Gaussian Distribution? Objective Functions of\u2026<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/pyimagesearch.com\/2023\/10\/02\/a-deep-dive-into-variational-autoencoders-with-pytorch\/\">A Deep Dive into Variational Autoencoders with PyTorch<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/pyimagesearch.com\/\">PyImageSearch<\/a>.<\/p>\n<\/div>","protected":false},"author":2019,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"slim_seo":{"title":"A Deep Dive into Variational Autoencoders with PyTorch - ITTeacherITFreelance.hk","description":"Table of Contents A Deep Dive into Variational Autoencoder with PyTorch Introduction Comparison with Convolutional Autoencoder Architecture Latent Space Loss Fu"},"footnotes":""},"categories":[10700],"tags":[10717,10719,10718,10720,10721],"_links":{"self":[{"href":"https:\/\/itteacheritfreelance.hk\/wordpress\/index.php\/wp-json\/wp\/v2\/posts\/329274"}],"collection":[{"href":"https:\/\/itteacheritfreelance.hk\/wordpress\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/itteacheritfreelance.hk\/wordpress\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/itteacheritfreelance.hk\/wordpress\/index.php\/wp-json\/wp\/v2\/users\/2019"}],"replies":[{"embeddable":true,"href":"https:\/\/itteacheritfreelance.hk\/wordpress\/index.php\/wp-json\/wp\/v2\/comments?post=329274"}],"version-history":[{"count":1,"href":"https:\/\/itteacheritfreelance.hk\/wordpress\/index.php\/wp-json\/wp\/v2\/posts\/329274\/revisions"}],"predecessor-version":[{"id":329275,"href":"https:\/\/itteacheritfreelance.hk\/wordpress\/index.php\/wp-json\/wp\/v2\/posts\/329274\/revisions\/329275"}],"wp:attachment":[{"href":"https:\/\/itteacheritfreelance.hk\/wordpress\/index.php\/wp-json\/wp\/v2\/media?parent=329274"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/itteacheritfreelance.hk\/wordpress\/index.php\/wp-json\/wp\/v2\/categories?post=329274"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/itteacheritfreelance.hk\/wordpress\/index.php\/wp-json\/wp\/v2\/tags?post=329274"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}