💗

DeblurGAN (CVPR 2018)

[GAN]Conditional Generative Adversarial Nets( cGAN ) (2014)

Official pytorch implementation

DeblurGAN

KupynOrest

Abstract

end-to-end learned method for motion blurring

learning (training) : conditional GAN , content loss 기반

Discriminator로 Wasserstein GAN을 사용

multi-component loss function

Generator Architecture + Loss(Perceptual loss + WGAN-GP loss) + Critic Network Architecture + Wasserstein distance

Introduction

GAN의 Super Resolution , in-painting 에서의 응용

GAN은 이미지의 texture detail을 보존하고 실제 이미지 매니폴드에 가깝고 지각적으로 설득력 있게 보이는(look perceptually convincing) solution을 생성해내는 것으로 알려져있음

GAN에서의 SR과 image-to-image translation에서 영감을 받음

⇒ Deblur의 과정이 image-to-image translation의 special case라고 취급

“CGAN 과 multi-component loss에 기반한 deblur task에 대한 접근 ⇒ DeblurGAN”

또한 gradient penalty와 perceptual loss와 함께 WGAN을 사용 ( MSE 또는 MAE를 optimization target로 사용하는 것보다 더 미세한 텍스처 디테일을 복원할 수 있게 해준다. )

[SRGAN] Perceptual Loss (1)

contribution

우리는 가장 빠른 competitor보다 5배 빠르면서 SOTA 결과를 얻는 loss과 아키텍처를 제안합니다. 

우리는 sharp image 세트에서 자동화된 방식으로 motion deblurring training 위한 데이터 세트를 생성하기 위한 무작위 trajectories(궤적)을 기반으로 한 방법을 제시합니다. 

마지막으로, 우리는 object detection 결과를 개선하는 방법에 따라 deblurring algorithm의 평가를 위한 새로운 데이터셋과 방법을 제시합니다.

Related Work

Image Deblurring

I_B = k(M) * I_S + N

•

IBI_BIB​ : Blurred Image

•

k(M)k(M)k(M) : unknown blur kernels determined by motion field MMM

•

ISI_SIS​ : sharp latent image

•

* : convolution

•

NNN : additive noise

deblurring problem 분류

blind

non-blind : blur kernel k(M)k(M)k(M)을 안다는 가정

이전에는 non-blind deblurring에 많이 치우쳐있었음. 대부분은 deconvolution 작업을 수행하고

I_S

esti- mate를 얻기 위해 classical Lucy-Richardson 알고리즘, Wiener 또는 Tikhonov 필터에 의존한다.

그러나 보통의 경우

k(M)

을 모른다

각 픽셀별로 blur function을 찾는 것은 잘못된 문제이다.

대부분의 존재하는 알고리즘들은 sources of the blur에 대한 heuristics, image statistics and assumptions에 의존한다.

그런종류의 방법들은 이미지 전체에 대해 블러가 균일하다고 가정하는 camera shake를 다룬다.

첫번째로, 카메라 모션은 유도된 블러 커널의 관점에서 추정된 다음, deconvolution operation을 수행하여 효과가 reverse된다.

...여러 다양한 방식의 성공... 그리고 발전.... (생략)

그러나 running time & stopping criterion이 항상 문제였다!

그 외에는 blur function의 local linearity를 가정. 그리고 빠르게 unknown kerneld 예측이 가능한 simple heuristcs를 사용 → 빠르지만 small subset of images에서만 잘 작동

... 그리고 CNN으로의 발전 ... (생략)

GAN

GAN과 GAN Loss에 대한 설명

vanilla gan의 mode collapse와 vanishing gradient problem 언급

JS divergence의 어려움을 언급하면서 WGAN의 EM distance와 WGAN Loss (Wasserstein Loss) 설명

WGAN에서의 [-c,c] weight clipping을 말하면서 [11] 에서 이것 대신 적용하기로 제안한 gradient penalty term(4)을 소개

an alternative way to enforce the Lipschitz constraint

cGAN

pix2pix로도 알려진 architecture

vanilla GAN과 달리 random noise vector z와 observed image x로부터 y를 매핑하는 것이 아닌, discriminator에 condition을 주고 U-Net구조를 G에 사용, 라벨맵으로부터 사진합성 및 edge map으로부터 reconstructiong objects, colorizing image 등의 많은 tasks에서 perceptually superior results를 달성한 Markovian discriminator를 사용

Proposed Method

목표 : blurred image

I_B

만을 input으로 하여 sharp image

I_S

를 복원하기

3.1 Loss Function

모든 실험에서

\lambda

= 100 으로 설정했다고 함

[16] 과 달리 D에 condition은 사용하지 않았다. (D의 Loss 에는 condition이 없다

[Adversarial Loss]

CGAN과 관련된 대부분의 논문에선 vanilla GAN objective를 사용하지만, 최근에는 least aquare GAN을 사용하는 대안의 방법이 제안됨 → 안정적 & 고퀄의 결과물

critic function으로는 [11]의 WGAN-GP Loss가 사용 : G의 구조에 거의 영향 x (robust)

이전의 연구에서 다른 아키텍쳐들로의 연구에서 ResNet 152보다 더 가벼운 아키텍쳐를 사용할 수 있음을 발견했었다

그래서 결과적으로 Adversarial Loss는

GAN component없이도 DeblurGAN은 수렴하지만 smooth하고 blurry한 이미지들을 생성해낸다

[Content Loss]

content loss에 대한 클래식한 선택으로는 law pixels에 대한 L1 / MAE / L2 / MSE가 있었다

→ [GAN Loss] Perceptual Loss 에서 이것들의 안좋은 점을 설명

대신에 DeblurGAN에서는 Perceptual loss를 수용하였다. Perceptual loss는 간단한 L2-loss이지만, generated image와 target image의 CNN feature map의 차이를 기반하고 있다.

[GAN Loss] Perceptual Loss 참고

따라서 위의 total loss에서 perceptual loss를 content loss로 함으로써 general content 복원에 집중하고, adversarial loss로 texture detail들에 집중한다.

perceptual loss를 빼고 학습시키거나 simple MSE on pixels를 대신 사용하면 GAN은 meaningful state로 수렴하지 않는다

3.2 Network Architecture

Generator 구조를 나타낸 그림이다

[17]에서 style transfer task를 위해 제안된 구조와 비슷하다

•

2개의 strided convolution block (stride 1/2), 9개의 residual blocks (ResBlocks), 2개의 transposed convolution block들을 포함하고 있다.

•

각 ResBlock은 convolution layer와 instnace normalization layer와 ReLu activation을 포함하고 있다.

•

Drop regularization (확률: 0.5)는 각 ResBlock안의 첫 번째 convolution layer뒤에 추가되어 있다.

•

우리는 ResOut이라고 불리는 global skip connection을 소개함.

CNN은

I_B

에 대한 residual correction

I_R

을 학습

I_S = I_B + I_R

이 식이 training을 더 빠르게 만들어주고, model generalize 결과를 좀 더 좋게 만들어줌

training 단계동안 정의되는 critic network는 gradient penalty를 가진 Wasswerstein GAN = WGAN-GP

critic network 구조는 PatchGAN과 동일

마지막 제외 conv layer는 Instance normalization 과 Leaky ReLU(

\alpha

=0.2)가 뒤따라온다

Motion blur generation

sharp image - blurred image pair를 얻기 위한 방법

random trajectories generation의 Boracchi와 Foi [4]가 묘사한 아이디어를 따른다.

kernels는 trajectory vector에 subpixel interpolation을 적용하여 생성됩니다.

각 trajectory vector는 continuous domain에서 2D random motion에 따른 물체의 discrete positions에 해당하는 complex valued vector이다. Trajectory generation은 (알고리즘 1에 요약된 )Markov process에 의해 수행된다. trajectory의 다음 지점의 위치는 이전 점 속도(point velocity)와 위치, 가우스 섭동(gaussian perturbation), impulse perturbation and deterministic inertial component를 기반으로 무작위로 생성됩니다.