论文链接:Generative Adversarial Nets

对抗式生成网络GAN(Generative Adversarial Nets)由Ian J. Goodfellow大神在2014年提出
GAN可以说是深度学习领域最划时代的idea之一了

GAN原理

GAN的原理其实非常简单
它包含一个生成器网络G(generator)和一个判别器网络D(discriminator)

G的输入是一个先验的随机噪声z,输出G(z)是一张图片
D的输入x是一张图片,它需要输出一个概率,表示x是真实的而非由G生成的概率

G和D的训练过程是一个对抗过程
G要尽可能生成能“骗过”D的图片,而D要尽可能识别出图片的真假

这个过程的数学形式为

其中$p{\mathrm{data}}$和$p{z}$分别表示真实数据的概率分布和输入噪声z的概率分布,E为均值

$D(x)$​表示x为真实数据的概率,而$\log$​来自于对数损失函数,也即二值交叉熵损失

对D将输入x带入得$\mathrm{Loss}=-\log D(x)$
D的任务是尽可能判别,所以D需要最小化这个Loss ,也即最大化$\log D(x)$
同理对D将输入G(z)带入,得$\mathrm{Loss}=-\log (1-D(G(z)))$,即D要最大化$\log (1-D(G(z)))$

反之对G而言则是需要最小化$\log (1-D(G(z)))$

GAN的理论分析

判别器D的最优结果

对于固定的G,最优的D满足$D^{*}(x)=\frac{p{\mathrm{data}}(x)}{p{\mathrm{data}}(x)+p_{g}(x)}$

证明:

对于给定的的G,判别器D的训练目标即最大化

其中$p_G$为G表示的数据分布

显然通过求导易得这个式子在$D^{}(x)=\frac{p{\mathrm{data}}(x)}{p{\mathrm{data}}(x)+p{g}(x)}$​​时达到最大
而当$p
{\mathrm{data}}(x)=p_{g}(x)$​​时,有$D^{
}(x)=\frac{1}{2}$​​​

生成器G的最优结果

对于当前最优的D,生成器G需要最小化$C(G)=\underset{D}{max}V(G,D)=V(G,D^{*})$​​​,论文中给出了定理

当且仅当$p{\mathrm{data}}(x)=p{g}(x)$​时,C(G)达到全局最小值,且有$C(G)=-\log 4$​

证明:

充分性证明:

若$p{\mathrm{data}}(x)=p{g}(x)$,则有$C(G)=E{x\sim p{\mathrm{data}}}[-\log 2]+E{z\sim p{z}}[-\log 2]=-\log 4$​

必要性证明:

其中KL和JSD分别为KL散度和JS散度
由于当且仅当$p{\mathrm{data}}(x)=p{g}(x)$时$JSD(p{\mathrm{data}} | p_g)=0$,取得最小值
故当$C(G)$达到全局最小值$C(G)=-\log 4$时有$p
{\mathrm{data}}(x)=p_{g}(x)$

证毕

GAN的实现

论文伪码解析

先给出论文中的伪码

GAN_algorithm

伪码意思非常清晰,每个训练轮次中,先对判别器D训练k次,再对G训练一次(作者在实验中取了k=1)

训练判别器D时对需要对$\frac{1}{m}\sum [\log D(x)+\log (1-D(G(z)))]$进行梯度上升
即对应了最小化两个二值交叉熵的和

训练生成器G时需要对$\frac{1}{m}\sum \log (1-D(G(z)))$进行梯度下降
及对应了使D的二值交叉熵最大化

作者在论文中还提到
令G最大化$\log (D(G(z)))$替代最小化$\log (1-D(G(z)))$
可以在训练初期G生成能力很弱而D的判别能力很强时,为G提供足够的梯度

pytorch代码实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
import torch
import torch.nn as nn
from torch.optim import Adam
import torch.utils.data as Data
import torchvision
from torchvision import transforms
import numpy as np

'''定义生成器'''
class Generator(nn.Module):
def __init__(self, input_shape, output_shape):
super().__init__()

self.input_shape = input_shape
self.output_shape = output_shape

self.layer1 = nn.Sequential(
nn.Linear(self.input_shape, 256),
nn.BatchNorm1d(256, momentum=0.8),
nn.ReLU(),
)

self.layer2 = nn.Sequential(
nn.Linear(256, 512),
nn.BatchNorm1d(512, momentum=0.8),
nn.ReLU(),
)

self.layer3 = nn.Sequential(
nn.Linear(512, 1024),
nn.BatchNorm1d(1024, momentum=0.8),
nn.ReLU(),
)

self.layer4 = nn.Sequential(
nn.Linear(1024, int(np.prod(self.output_shape))),
nn.Sigmoid()
)

def forward(self, tensor_input):
x = self.layer1(tensor_input)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
output = x.reshape(-1, *self.output_shape)

return output

'''定义判别器'''
class Discriminator(nn.Module):
def __init__(self, input_shape):
super().__init__()

self.input_shape = input_shape

self.model = nn.Sequential(
nn.Flatten(),

nn.Linear(int(np.prod(self.input_shape)), 512),
nn.LeakyReLU(negative_slope=0.2),

nn.Linear(512, 256),
nn.LeakyReLU(negative_slope=0.2),

nn.Linear(256, 64),
nn.LeakyReLU(negative_slope=0.2),

nn.Linear(64, 1),
nn.Sigmoid(),
)

def forward(self, img):
output = self.model(img)

return output

class GAN():
def __init__(self):
self.cuda_on = torch.cuda.is_available()

self.input_shape = 100
self.img_shape = (1, 28, 28)

self.generator = Generator(self.input_shape, self.img_shape)
self.discriminator = Discriminator(self.img_shape)

self.optim_G = Adam(self.generator.parameters(), lr=2e-4)
self.optim_D = Adam(self.discriminator.parameters(), lr=2e-4)
self.loss_adver = nn.BCELoss()

if self.cuda_on:
self.generator.cuda()
self.discriminator.cuda()
self.loss_adver.cuda()

def getDataloader(self, batch_size):
mnist = torchvision.datasets.MNIST(
root='./data/', train=True,
transform=transforms.Compose([
transforms.ToTensor(),
])
)
loader = Data.DataLoader(dataset=mnist, batch_size=batch_size, shuffle=True)
return loader

def train(self, epochs=1, batch_size=32):
loader = self.getDataloader(batch_size)

for epoch in range(epochs):
for step, (img_real, _) in enumerate(loader):
num = img_real.shape[0]

# 训练用的标签向量, valid为全1矩阵, fake为全0
valid = torch.ones((num, 1), dtype=torch.float32)
fake = torch.zeros((num, 1), dtype=torch.float32)

# 生成标准正版态分布噪声作为输入
z = torch.randn(num, self.input_shape)

if self.cuda_on:
valid = valid.cuda()
fake = fake.cuda()
z = z.cuda()
img_real = img_real.cuda()

# generator以输入的随机噪声生成假图片
img_gen = self.generator(z)

'''训练判别器D'''
D_loss_real = self.loss_adver(self.discriminator(img_real), valid)
D_loss_fake = self.loss_adver(self.discriminator(img_gen), fake)
D_loss = (D_loss_real + D_loss_fake) / 2

self.optim_D.zero_grad()
D_loss.backward(retain_graph=True) # retain_graph用于保留计算图,否则G无法反向传播
self.optim_D.step()

'''训练生成器G'''
G_loss = self.loss_adver(self.discriminator(img_gen), valid)

self.optim_G.zero_grad()
G_loss.backward()
self.optim_G.step()

print('Epoch:', epoch+1, ' Step:', step, ' D_loss:', D_loss.item(), ' G_loss:', G_loss.item())

if (step+1) % 400 == 0:
torchvision.utils.save_image(
img_gen.data[:9], 'gen\\{}_{}.png'.format(epoch, step), nrow=3)

if __name__ == '__main__':
gan = GAN()
gan.train(epochs=12, batch_size=64)

Keras代码实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
from keras.datasets import mnist
from keras.models import Model, Sequential
from keras.layers import Dense, Flatten, BatchNormalization, Reshape, Input, Activation
from keras.layers import LeakyReLU
from keras.optimizers import Adam
import numpy as np
import matplotlib.pyplot as plt

class GAN:
def __init__(self):
self.img_shape = (28, 28, 1)
self.latent_dim = 100

self.generator = self.buildGenerator()
self.discriminator = self.buildDiscriminator()

input = Input(shape=(self.latent_dim,))
img = self.generator(input)

# 在判别器compile之后设置trainable为False
# 则使用train_on_batch时判别器仍可训练, 而训练生成器(GAN)时则判别器权重不变
self.discriminator.trainable = False

validity = self.discriminator(img)

self.combined = Model(input, validity)
self.combined.compile(
loss='binary_crossentropy',
optimizer=Adam(2e-4)
)

def buildGenerator(self):
model = Sequential()

model.add(Dense(input_dim=self.latent_dim, units=256))
model.add(BatchNormalization(momentum=0.9))
model.add(Activation('relu'))

model.add(Dense(512))
model.add(BatchNormalization(momentum=0.9))
model.add(Activation('relu'))

model.add(Dense(1024))
model.add(BatchNormalization(momentum=0.9))
model.add(Activation('relu'))

model.add(Dense(np.prod(self.img_shape), activation='tanh'))
model.add(Reshape(self.img_shape))

noise = Input(shape=(self.latent_dim,))
img = model(noise)

return Model(noise, img)

def buildDiscriminator(self):
model = Sequential()

model.add(Flatten(input_shape=self.img_shape))
model.add(Dense(512))
model.add(LeakyReLU(0.2))

model.add(Dense(256))
model.add(LeakyReLU(0.2))

model.add(Dense(64))
model.add(LeakyReLU(0.2))

model.add(Dense(1, activation='sigmoid'))

img = Input(shape=self.img_shape)
validity = model(img)

discriminator = Model(img, validity)
discriminator.compile(
optimizer=Adam(2e-4),
loss='binary_crossentropy',
)
return discriminator

def trainModel(self, epochs, batch_size=64):
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()

# Normalize [-1, 1]
X_train = X_train / 127.5 - 1.
X_train = np.expand_dims(X_train, axis=3)

# 训练用的标签向量, valid为全1矩阵, fake为全0
valid = np.ones((batch_size, 1))
fake = np.zeros((batch_size, 1))

for epoch in range(epochs):
epoch += 1

# 随机选取一个batch的图片 randint(low, high, num)
idx = np.random.randint(0, X_train.shape[0], batch_size)
orgImg = X_train[idx]

# 生成标准正版态分布噪声作为输入
noise = np.random.normal(0, 1, (batch_size, self.latent_dim))

# generator以输入的随机噪声生成假图片
genImg = self.generator.predict(noise)

'''训练判别器D'''
D_loss_real = self.discriminator.train_on_batch(orgImg, valid)
D_loss_fake = self.discriminator.train_on_batch(genImg, fake)
D_loss = 0.5 * np.add(D_loss_real, D_loss_fake)

'''训练生成器G'''
noise = np.random.normal(0, 1, (batch_size, self.latent_dim))
G_loss = self.combined.train_on_batch(noise, valid)

print("{} --- D loss: {:.4f} , G loss: {:.4f}".format(epoch, D_loss, G_loss))

if epoch % 200 == 0:
self.saveImage(epoch)

def saveImage(self, epoch):
r, c = 3, 3
noise = np.random.normal(0, 1, (r * c, self.latent_dim))
genImgs = self.generator.predict(noise)

# Rescale images 0 - 1
genImgs = 0.5 * genImgs + 0.5

fig, axs = plt.subplots(r, c)
cnt = 0
for i in range(r):
for j in range(c):
axs[i, j].imshow(genImgs[cnt, :, :, 0], cmap='gray')
axs[i, j].axis('off')
cnt += 1
fig.savefig('generated\\%d.png' % epoch)
plt.close()

def main():
gan = GAN()
gan.trainModel(epochs=8000, batch_size=64)

if __name__ == '__main__':
main()