torch.nn.ConvTranspose2d

[trasposed convolution 에 대해 다뤘던 글]

GAN을 깊게 공부하다가 DCGAN에서 Transposed Convolution이라는 것을 마주하게 되었다. 그러한 김에 여러가지 convolition 연산에 대해 정리해보기로 했음 :) padding없이 kernel이 지정된 stride에 맞게 이동하면서 conv연산을 진행해줌으로써 너비와 높이가 감소합니다. strided convolution에서 padding이 추가 되었다고 보면 됩니다. kernel size에 맞춰서 Output size와 input size가 같을 수 있게 이미지 가장자리에 padding을 적용해줍니다.

https://chang-aistory.tistory.com/48?category=933534

torch.nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros', device=None, dtype=None)

https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html

여러 input 평면으로 이뤄진 input image에 2d transposed convolution 을 적용한다.

입력에 대한 Conv2d의 gradient로 볼 수 있다.

보통 Deep Convolution GAN을 구현할 때 주로 사용된다.

•

stride controls the stride for the cross-correlation.

•

padding controls the amount of implicit zero padding on both sides for dilation * (kernel_size - 1) - padding number of points. 

•

output_padding controls the additional size added to one side of the output shape.

•

dilation controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but the link here has a nice visualization of what dilation does.

•

groups controls the connections between inputs and outputs.

•

in_channels and out_channels must both be divisible by groups. 
For example,
    ◦ At groups=1, all inputs are convolved to all outputs.
    ◦ At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels and producing half the output channels, and both subsequently concatenated.
    ◦ At groups= in_channels, each input channel is convolved with its own set of filters (of size out_channelsin_channels\frac{\text{out\_channels}}{\text{in\_channels}}in_channelsout_channels​).

The parameters kernel_size, stride, padding, output_padding can either be:

a single int – height와 width dimension에서 같은 값으로 사용

a tuple of two ints – 첫번째 값이 height dimension에서, 두번째 값이 width dimension에서 사용

parameters

•

in_channels (int) – Number of channels in the input image

•

out_channels (int) – Number of channels produced by the convolution

•

kernel_size (int or tuple) – Size of the convolving kernel

•

stride (int or tuple, optional) – Stride of the convolution. Default: 1

•

padding (int or tuple, optional) – dilation * (kernel_size - 1) - padding zero-padding will be added to both sides of each dimension in the input. Default: 0

•

output_padding (int or tuple, optional) – Additional size added to one side of each dimension in the output shape. Default: 0

•

groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1

•

bias (bool, optional) – If True, adds a learnable bias to the output. Default: True

•

dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1

shape

input

(N, C_{in}, H_{in}, W_{in})

output

(N, C_{out}, H_{out}, W_{out})

Hout=(Hin−1)×stride[0]−2×padding[0]+dilation[0]×(kernel\_size[0]−1)+output\_padding[0]+1

Wout=(Win−1)×stride[1]−2×padding[1]+dilation[1]×(kernel\_size[1]−1)+output\_padding[1]+1