The end to all blurry pictures

An overview and explanation of Image Super-Resolution using GANs.

Aryan Misra

Can you count how many penguins are in this picture?

How about now?

Hmm, I made it bigger but it’s still blurry…

Well, how about now?

Ah! We get to count the penguins and its a perfect example of image super-resolution in action!

Image super-resolution is creating a better looking, higher resolution image, from a low-resolution picture. Anyone that has put images in their Powerpoint presentation knows that zooming out an image actually makes it look worse. This is because bicubic interpolation is used to upscale the image: a technique that takes the average value of pixels to fill in the gaps created when the image is upscaled.

How interpolation-based resizing/enlargement works. Source.

The problem with this is that there is no new data being created — this means your image isn’t becoming better in resolution, it’s only getting bigger.

Whether you’re just creating a powerpoint, analyzing mammograms, or doing long-distance facial recognition super-resolution is super important.

So how do we super-resolutionwell? We turn to — artificial intelligence! We’ve seen a ton of recent developments in image super-resolution using Convolutional Neural Networks (check out my article on them here, and check this out if you wanna find out how they’re being used for super-resolution). But, there are still some issues with this — images don’t look as crisp and detailed as we like, and to solve this, SRGANs (Super-Resolution Generative Adversarial Networks — try saying that in one breath) were made.

Comparison of different SR methods. Zoom in to notice how crisp SRGAN is!

Before getting into the super-resolution aspect, here’s a quick overview of GANs

Overview of GANs (Generative Adversarial Networks)

“Given an input dataset, can we make new data that looks like it should be in that dataset?”

Think of a situation with a counterfeiter at a museum and a curator, whose job is to distinguish real artworks from fakes. When both of them just start, the counterfeiter is gonna make tons of mistakes, and the curator will also be bad at distinguishing fakes from reals.

Over time, the counterfeiter could try different techniques and get better at creating fakes, while the curator also finds strategies to help him distinguish fakes; they’re both improving through each other.

Simultaneously, the curator is getting constant feedback from external sources and improving through the use of expert data.

The goal of the counterfeiter is to create artwork that looks real, and the goal of the curator is to be able to always spot the fake paintings.

These two characters compete against each other and represent the two networks in a GAN, the generative network creating new images— the counterfeiter, and the discriminative network evaluating if the image from the generator looks real or not — the curator.

Outline of a traditional GAN being trained to generate handwritten digits. Source.

The generator gets fed random noise, which it turns into a fake output, in the case of SR, it would turn the random noise into a higher resolution image.

The generator is trained on a training set and the generator’s images, learning to differentiate between them.

So why use a GAN?

There have been tons of other methods for doing super-resolution such as SRResNet and SRCNN, however, these methods all have one problem: poor visual quality, even if the network seems to be performing well.

When we use a traditional loss function for measuring how accurate our discriminator is, it measures how mathematically close (Euclidean distance) — as opposed to how visually close the generated image is to the real image, which results in smooth averages of color in an area, as you can see in the SRResNet image below.

To counteract this, a perceptual loss function was created — to measure visual clarity. This loss is the sum of two different losses → content loss and adversarial loss.

Adversarial Loss

One of the huge benefits of using a GAN is that you can use the adversarial loss to motivate outputs to look natural. This happens because of the fundamental nature of GANs: to find unnatural-looking images.

Now we’ve covered out two bases with these two loss functions: getting the reconstructed images to contain detail, and to actually contain relevant image data.

Content Loss

Content loss compares the fine detail in images by passing the generated and original image through CNN feature maps and calculating the loss on the outputs.

Let’s break this down.

When we train a convolutional neural network — convolutional layers in this network perform feature extraction — a fancy way of saying that it finds patterns and shapes in images. As we go deeper and deeper into the network, we find features of increasing complexity.

Visualization of feature maps.

Okay, cool— let’s analyze what’s exactly going on in these feature maps. In the Conv1 layers, a ton of the original information from the image is kept. This is because initial (conv) layers in CNNs generally act as an edge detector.

Later on in the network, higher-level information encoded, and we see in Conv 2–4, the image is starting to get way more abstract. Even though it might look like deeper layers are encoding less information than initial layers (since they look blurry), they’re actually just changing the type of information they contain — from geometric to semantic information.

To make a bit more sense of this: here’s some visualization of what the actual filters that pass over the images look like in a VGG16 network. (more specifically, an image on which the filter activates the most, code to do this)

Layer 7

woah.

Other than these looking absolutely mesmerizing, this gives an intuition of what filters are looking for as we move deeper into the network. In the final row of images, we can easily identify arches, birds, and chains as the objects each filter is looking for.

Bringing this back to the content loss, we pass an image reconstructed from the generator and reference (original) image through the feature maps and compare the fine textural differences in the two images, penalizing images with a texture that looks smoothed out.

The concept of perceptual loss is also used in neural style transfer, which you can learn more about in my article on it!

…and that’s it! That’s the basic idea of a perceptual loss function.

As this technology progresses and continues to improve over time, we can go from cool results like this:

To this:

Key Takeaways:

  • SRGAN images look better than SRCNN images because they contain more fine detail.
  • GANs have two neural networks competing against each other.
  • Perceptual loss measures the visual similarity of two images.

If you enjoyed my article or learned something new, make sure to:

  • Connect with me on LinkedIn.
  • Send me some feedback and comments (aryanmisra4@gmail.com).
  • Check out the SRGAN paper.
  • Also, go check out the code for a Tensorflow 2.0 implementation of SRGAN — which won two prizes in the #PoweredByTF2.0 devpost challenge! (github, devpost)
Favorite

Leave a Comment