Skip to content

Commit

Permalink
add psgan and fom tutorials doc (PaddlePaddle#49)
Browse files Browse the repository at this point in the history
* add psgan and fom tutorial doc
  • Loading branch information
lijianshe02 authored Oct 29, 2020
1 parent 9020cf3 commit b2cc92a
Show file tree
Hide file tree
Showing 9 changed files with 215 additions and 7 deletions.
4 changes: 2 additions & 2 deletions README_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,9 +67,9 @@ Please refer [get started](./docs/get_started.md) for the basic usage of PaddleG

## Model tutorial
* [Pixel2Pixel and CycleGAN](./docs/tutorials/pix2pix_cyclegan.md)
* [PSGAN](./docs/tutorials/psgan.md)
* [PSGAN](./docs/tutorials/psgan_en.md)
* [Video restore](./docs/tutorails/video_restore.md)
* [Motion driving](./docs/tutorials/motion_driving.md)
* [Motion driving](./docs/tutorials/motion_driving_en.md)

## License
PaddleGAN is released under the [Apache 2.0 license](LICENSE).
Expand Down
Binary file added docs/imgs/fom_demo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/imgs/psgan_arc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
28 changes: 27 additions & 1 deletion docs/tutorials/motion_driving.md
Original file line number Diff line number Diff line change
@@ -1 +1,27 @@
## to be added
# First order motion model

## 1. First order motion model原理

First order motion model的任务是image animation,给定一张源图片,给定一个驱动视频,生成一段视频,其中主角是源图片,动作是驱动视频中的动作。如下图所示,源图像通常包含一个主体,驱动视频包含一系列动作。

![](../imgs/fom_demo.png)

以左上角的人脸表情迁移为例,给定一个源人物,给定一个驱动视频,可以生成一个视频,其中主体是源人物,视频中源人物的表情是由驱动视频中的表情所确定的。通常情况下,我们需要对源人物进行人脸关键点标注、进行表情迁移的模型训练。

但是这篇文章提出的方法只需要在同类别物体的数据集上进行训练即可,比如实现太极动作迁移就用太极视频数据集进行训练,想要达到表情迁移的效果就使用人脸视频数据集voxceleb进行训练。训练好后,我们使用对应的预训练模型就可以达到前言中实时image animation的操作。

## 2. 使用方法

用户可以上传自己准备的视频和图片,并在如下命令中的source_image参数和driving_video参数分别换成自己的图片和视频路径,然后运行如下命令,就可以完成动作表情迁移,程序运行成功后,会在ouput文件夹生成名为result.mp4的视频文件,该文件即为动作迁移后的视频。本项目中提供了原始图片和驱动视频供展示使用。运行的命令如下所示:

`python -u tools/first-order-demo.py --driving_video ./ravel_10.mp4 --source_image ./sudaqiang.png --relative --adapt_scale`

**参数说明:**
- driving_video: 驱动视频,视频中人物的表情动作作为待迁移的对象
- source_image: 原始图片,视频中人物的表情动作将迁移到该原始图片中的人物上
- relative: 指示程序中使用视频和图片中人物关键点的相对坐标还是绝对坐标,建议使用相对坐标,若使用绝对坐标,会导致迁移后人物扭曲变形
- adapt_scale: 根据关键点凸包自适应运动尺度


## 3. 生成结果展示
![](../imgs/first_order.gif)
21 changes: 21 additions & 0 deletions docs/tutorials/motion_driving_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Fist order motion model

## 1. First order motion model introduction

First order motion model is to complete the Image animation task, which consists of generating a video sequence so that an object in a source image is animated according to the motion of a driving video. The first order motion framework addresses this problem without using any annotation or prior information about the specific object to animate. Once trained on a set of videos depicting objects of the same category (e.g. faces, human bodies), this method can be applied to any object of this class. To achieve this, the innovative method decouple appearance and motion information using a self-supervised formulation. In addition, to support complex motions, it use a representation consisting of a set of learned keypoints along with their local affine transformations. A generator network models occlusions arising during target motions and combines the appearance extracted from the source image and the motion derived from the driving video.
![](../imgs/fom_demo.png)

## How to use

Users can upload the prepared source image and driving video, then substitute the path of source image and driving video for the `source_image` and `driving_video` parameter in the following running command. It will geneate a video file named `result.mp4` in the `output` folder, which is the animated video file.

`python -u tools/first-order-demo.py --driving_video ./ravel_10.mp4 --source_image ./sudaqiang.png --relative --adapt_scale`

**params:**
- driving_video: driving video, the motion of the driving video is to be migrated.
- source_image: source_image, the image will be animated according to the motion of the driving video.
- relative: indicate whether the relative or absolute coordinates of the key points in the video are used in the program. It is recommended to use relative coordinates. If absolute coordinates are used, the characters will be distorted after animation.
- adapt_scale: adapt movement scale based on convex hull of keypoints.

## 3. Animation results
![](../imgs/first_order.gif)
76 changes: 75 additions & 1 deletion docs/tutorials/psgan.md
Original file line number Diff line number Diff line change
@@ -1 +1,75 @@
## to be added
# PSGAN

## 1. PSGAN原理
PSGAN模型的任务是妆容迁移, 即将任意参照图像上的妆容迁移到不带妆容的源图像上。很多人像美化应用都需要这种技术。近来的一些妆容迁移方法大都基于生成对抗网络(GAN)。它们通常采用 CycleGAN 的框架,并在两个数据集上进行训练,即无妆容图像和有妆容图像。但是,现有的方法存在一个局限性:只在正面人脸图像上表现良好,没有为处理源图像和参照图像之间的姿态和表情差异专门设计模块。PSGAN是一种全新的姿态稳健可感知空间的生生成对抗网络。PSGAN 主要分为三部分:妆容提炼网络(MDNet)、注意式妆容变形(AMM)模块和卸妆-再化妆网络(DRNet)。这三种新提出的模块能让 PSGAN 具备上述的完美妆容迁移模型所应具备的能力。
![](../imgs/psgan_arc.png)

## 2. 使用方法
### 2.1 测试
运行如下命令,就可以完成妆容迁移,程序运行成功后,会在当前文件夹生成妆容迁移后的图片文件。本项目中提供了原始图片和参考供展示使用,具体命令如下所示:

```
cd applications/
python tools/ps_demo.py \
--config-file configs/makeup.yaml \
--model_path /your/model/path \
--source_path /your/source/image/path \
--reference_dir /your/ref/image/path
```
**参数说明:**
- config-file: PSGAN网络到参数配置文件,格式为yaml
- model_path: 训练完成保存下来网络权重文件的路径
- source_path: 未化妆的原始图片文件全路径,包含图片文件名字
- reference_dir: 化妆的参考图片文件路径,不包含图片文件名字

### 2.2 训练
1. 从百度网盘下载原始换妆数据[data](https://pan.baidu.com/s/1ZF-DN9PvbBteOSfQodWnyw)(密码:rtdd)到PaddleGAN文件夹, 并解压
2. 下载landmarks数据[lmks](https://paddlegan.bj.bcebos.com/landmarks.tar),并解压
3. 运行如下命令进行文件夹及文件替换:
```
mv landmarks/makeup MT-Dataset/landmarks/makeup
mv landmarks/non-makeup MT-Dataset/landmarks/non-makeup
mv landmarks/train_makeup.txt MT-Dataset/makeup.txt
mv tlandmarks/train_non-makeup.txt MT-Dataset/non-makeup.txt
```

最后数据集目录如下所示:
```
data
├── images
│   ├── makeup
│   └── non-makeup
├── landmarks
│   ├── makeup
│   └── non-makeup
├── train_makeup.txt
├── train_non-makeup.txt
├── segs
│   ├── makeup
│   └── non-makeup
```

4. `python tools/main.py --config-file configs/makeup.yaml` ,训练参数设置参考makeup.yaml.
单卡batch_size=1训练部分log如下所示:
```
[10/29 05:39:40] ppgan.engine.trainer INFO: Epoch: 0, iters: 0 lr: 0.000200 D_A: 0.448 G_A: 0.973 rec: 1.258 idt: 0.624 D_B: 0.436 G_B: 0.889 G_A_his: 0.402 G_B_his: 0.472 G_bg_consis: 0.030 A_vgg: 0.027 B_vgg: 0.040 reader cost: 2.45463s batch cost: 4.20075s
[10/29 05:40:00] ppgan.engine.trainer INFO: Epoch: 0, iters: 10 lr: 0.000200 D_A: 0.200 G_A: 0.488 rec: 0.954 idt: 0.539 D_B: 0.179 G_B: 0.767 G_A_his: 0.224 G_B_his: 0.266 G_bg_consis: 0.033 A_vgg: 0.019 B_vgg: 0.026 reader cost: 0.55506s batch cost: 1.95968s
[10/29 05:40:22] ppgan.engine.trainer INFO: Epoch: 0, iters: 20 lr: 0.000200 D_A: 0.340 G_A: 0.339 rec: 1.293 idt: 0.698 D_B: 0.124 G_B: 0.174 G_A_his: 0.302 G_B_his: 0.233 G_bg_consis: 0.061 A_vgg: 0.032 B_vgg: 0.045 reader cost: 0.74937s batch cost: 2.13529s
[10/29 05:40:42] ppgan.engine.trainer INFO: Epoch: 0, iters: 30 lr: 0.000200 D_A: 0.238 G_A: 0.276 rec: 0.907 idt: 0.449 D_B: 0.324 G_B: 0.292 G_A_his: 0.263 G_B_his: 0.380 G_bg_consis: 0.029 A_vgg: 0.040 B_vgg: 0.049 reader cost: 0.69248s batch cost: 2.06999s
[10/29 05:41:03] ppgan.engine.trainer INFO: Epoch: 0, iters: 40 lr: 0.000200 D_A: 0.236 G_A: 0.111 rec: 0.865 idt: 0.470 D_B: 0.237 G_B: 0.465 G_A_his: 0.289 G_B_his: 0.211 G_bg_consis: 0.021 A_vgg: 0.042 B_vgg: 0.049 reader cost: 0.65904s batch cost: 2.07197s
[10/29 05:41:23] ppgan.engine.trainer INFO: Epoch: 0, iters: 50 lr: 0.000200 D_A: 0.341 G_A: 0.073 rec: 0.698 idt: 0.424 D_B: 0.153 G_B: 0.731 G_A_his: 0.198 G_B_his: 0.180 G_bg_consis: 0.019 A_vgg: 0.032 B_vgg: 0.047 reader cost: 0.52772s batch cost: 1.92949s
[10/29 05:41:43] ppgan.engine.trainer INFO: Epoch: 0, iters: 60 lr: 0.000200 D_A: 0.267 G_A: 0.475 rec: 0.843 idt: 0.462 D_B: 0.266 G_B: 0.534 G_A_his: 0.259 G_B_his: 0.219 G_bg_consis: 0.024 A_vgg: 0.031 B_vgg: 0.041 reader cost: 0.58212s batch cost: 2.02212s
[10/29 05:42:03] ppgan.engine.trainer INFO: Epoch: 0, iters: 70 lr: 0.000200 D_A: 0.116 G_A: 0.298 rec: 0.983 idt: 0.543 D_B: 0.097 G_B: 0.233 G_A_his: 0.210 G_B_his: 0.169 G_bg_consis: 0.046 A_vgg: 0.028 B_vgg: 0.034 reader cost: 0.56367s batch cost: 1.97049s
[10/29 05:42:23] ppgan.engine.trainer INFO: Epoch: 0, iters: 80 lr: 0.000200 D_A: 0.325 G_A: 0.339 rec: 0.744 idt: 0.417 D_B: 0.292 G_B: 0.310 G_A_his: 0.189 G_B_his: 0.206 G_bg_consis: 0.016 A_vgg: 0.029 B_vgg: 0.034 reader cost: 0.60760s batch cost: 2.04126s
[10/29 05:42:43] ppgan.engine.trainer INFO: Epoch: 0, iters: 90 lr: 0.000200 D_A: 0.177 G_A: 0.308 rec: 0.970 idt: 0.494 D_B: 0.199 G_B: 0.813 G_A_his: 0.116 G_B_his: 0.153 G_bg_consis: 0.036 A_vgg: 0.019 B_vgg: 0.042 reader cost: 0.62142s batch cost: 1.96606s
[10/29 05:43:03] ppgan.engine.trainer INFO: Epoch: 0, iters: 100 lr: 0.000200 D_A: 0.178 G_A: 0.382 rec: 1.358 idt: 0.607 D_B: 0.265 G_B: 0.405 G_A_his: 0.086 G_B_his: 0.161 G_bg_consis: 0.060 A_vgg: 0.025 B_vgg: 0.047 reader cost: 0.63939s batch cost: 2.00111s
```
注意:训练时makeup.yaml文件中`isTrain`参数值为`True`, 测试时修改该参数值为`False` .

### 2.3 模型
Model|Dataset|BatchSize|Inference speed|Download
---|:--:|:--:|:--:|:--:
PSGAN|MT-Dataset| 1 | 1.9s(GPU:P40) | [model]()

## 3. 妆容迁移结果展示
![](../imgs/makeup_shifter.png)
76 changes: 76 additions & 0 deletions docs/tutorials/psgan_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# PSGAN
## 1. PSGAN introduction
This paper is to address the makeup transfer task, which aims to transfer the makeup from a reference image to a source image. Existing methods have achieved promising progress in constrained scenarios, but transferring between images with large pose and expression differences is still challenging. To address these issues, we propose Pose and expression robust Spatial-aware GAN (PSGAN). It first utilizes Makeup Distill Network to disentangle the makeup of the reference image as two spatial-aware makeup matrices. Then, Attentive Makeup Morphing module is introduced to specify how the makeup of a pixel in the source image is morphed from the reference image. With the makeup matrices and the source image, Makeup Apply Network is used to perform makeup transfer.
![](../imgs/psgan_arc.png)

## 2. How to use
### 2.1 Test
Running the following command to complete the makeup transfer task. It will geneate the transfered image in the current path when the program running sucessfully.

```
cd applications
python tools/ps_demo.py \
--config-file configs/makeup.yaml \
--model_path /your/model/path \
--source_path /your/source/image/path \
--reference_dir /your/ref/image/path
```
**params:**
- config-file: PSGAN network configuration file, yaml format
- model_path: Saved model weight path
- source_path: Full path of the non-makeup image file, including the image file name
- reference_dir: Path of the make_up iamge file, don't including the image file name

### 2.2 Training
1. Downloading the original makeup transfer [data](https://pan.baidu.com/s/1ZF-DN9PvbBteOSfQodWnyw)(Password:rtdd) to the PaddleGAN folder, and uncompress it.
2. Downloading the landmarks [data](https://paddlegan.bj.bcebos.com/landmarks.tar), and uncompress it
3. Runnint the following command to substitute files:
```
mv landmarks/makeup MT-Dataset/landmarks/makeup
mv landmarks/non-makeup MT-Dataset/landmarks/non-makeup
mv landmarks/train_makeup.txt MT-Dataset/makeup.txt
mv tlandmarks/train_non-makeup.txt MT-Dataset/non-makeup.txt
```

The final data directory should be looked like:

```
data
├── images
│ ├── makeup
│ └── non-makeup
├── landmarks
│ ├── makeup
│ └── non-makeup
├── train_makeup.txt
├── train_non-makeup.txt
├── segs
│ ├── makeup
│ └── non-makeup
```

2. `python tools/main.py --config-file configs/makeup.yaml`

The training log looks like:
```
[10/29 05:39:40] ppgan.engine.trainer INFO: Epoch: 0, iters: 0 lr: 0.000200 D_A: 0.448 G_A: 0.973 rec: 1.258 idt: 0.624 D_B: 0.436 G_B: 0.889 G_A_his: 0.402 G_B_his: 0.472 G_bg_consis: 0.030 A_vgg: 0.027 B_vgg: 0.040 reader cost: 2.45463s batch cost: 4.20075s
[10/29 05:40:00] ppgan.engine.trainer INFO: Epoch: 0, iters: 10 lr: 0.000200 D_A: 0.200 G_A: 0.488 rec: 0.954 idt: 0.539 D_B: 0.179 G_B: 0.767 G_A_his: 0.224 G_B_his: 0.266 G_bg_consis: 0.033 A_vgg: 0.019 B_vgg: 0.026 reader cost: 0.55506s batch cost: 1.95968s
[10/29 05:40:22] ppgan.engine.trainer INFO: Epoch: 0, iters: 20 lr: 0.000200 D_A: 0.340 G_A: 0.339 rec: 1.293 idt: 0.698 D_B: 0.124 G_B: 0.174 G_A_his: 0.302 G_B_his: 0.233 G_bg_consis: 0.061 A_vgg: 0.032 B_vgg: 0.045 reader cost: 0.74937s batch cost: 2.13529s
[10/29 05:40:42] ppgan.engine.trainer INFO: Epoch: 0, iters: 30 lr: 0.000200 D_A: 0.238 G_A: 0.276 rec: 0.907 idt: 0.449 D_B: 0.324 G_B: 0.292 G_A_his: 0.263 G_B_his: 0.380 G_bg_consis: 0.029 A_vgg: 0.040 B_vgg: 0.049 reader cost: 0.69248s batch cost: 2.06999s
[10/29 05:41:03] ppgan.engine.trainer INFO: Epoch: 0, iters: 40 lr: 0.000200 D_A: 0.236 G_A: 0.111 rec: 0.865 idt: 0.470 D_B: 0.237 G_B: 0.465 G_A_his: 0.289 G_B_his: 0.211 G_bg_consis: 0.021 A_vgg: 0.042 B_vgg: 0.049 reader cost: 0.65904s batch cost: 2.07197s
[10/29 05:41:23] ppgan.engine.trainer INFO: Epoch: 0, iters: 50 lr: 0.000200 D_A: 0.341 G_A: 0.073 rec: 0.698 idt: 0.424 D_B: 0.153 G_B: 0.731 G_A_his: 0.198 G_B_his: 0.180 G_bg_consis: 0.019 A_vgg: 0.032 B_vgg: 0.047 reader cost: 0.52772s batch cost: 1.92949s
[10/29 05:41:43] ppgan.engine.trainer INFO: Epoch: 0, iters: 60 lr: 0.000200 D_A: 0.267 G_A: 0.475 rec: 0.843 idt: 0.462 D_B: 0.266 G_B: 0.534 G_A_his: 0.259 G_B_his: 0.219 G_bg_consis: 0.024 A_vgg: 0.031 B_vgg: 0.041 reader cost: 0.58212s batch cost: 2.02212s
[10/29 05:42:03] ppgan.engine.trainer INFO: Epoch: 0, iters: 70 lr: 0.000200 D_A: 0.116 G_A: 0.298 rec: 0.983 idt: 0.543 D_B: 0.097 G_B: 0.233 G_A_his: 0.210 G_B_his: 0.169 G_bg_consis: 0.046 A_vgg: 0.028 B_vgg: 0.034 reader cost: 0.56367s batch cost: 1.97049s
[10/29 05:42:23] ppgan.engine.trainer INFO: Epoch: 0, iters: 80 lr: 0.000200 D_A: 0.325 G_A: 0.339 rec: 0.744 idt: 0.417 D_B: 0.292 G_B: 0.310 G_A_his: 0.189 G_B_his: 0.206 G_bg_consis: 0.016 A_vgg: 0.029 B_vgg: 0.034 reader cost: 0.60760s batch cost: 2.04126s
[10/29 05:42:43] ppgan.engine.trainer INFO: Epoch: 0, iters: 90 lr: 0.000200 D_A: 0.177 G_A: 0.308 rec: 0.970 idt: 0.494 D_B: 0.199 G_B: 0.813 G_A_his: 0.116 G_B_his: 0.153 G_bg_consis: 0.036 A_vgg: 0.019 B_vgg: 0.042 reader cost: 0.62142s batch cost: 1.96606s
[10/29 05:43:03] ppgan.engine.trainer INFO: Epoch: 0, iters: 100 lr: 0.000200 D_A: 0.178 G_A: 0.382 rec: 1.358 idt: 0.607 D_B: 0.265 G_B: 0.405 G_A_his: 0.086 G_B_his: 0.161 G_bg_consis: 0.060 A_vgg: 0.025 B_vgg: 0.047 reader cost: 0.63939s batch cost: 2.00111s
```

Notation: In train phase, the `isTrain` value in makeup.yaml file is `True`, but in test phase, its value should be modified as `False`.

### 2.3 Model
Model|Dataset|BatchSize|Inference speed|Download
---|:--:|:--:|:--:|:--:
PSGAN|MT-Dataset| 1 | 1.9s(GPU:P40) | [model]()
## 3. Result
![](../imgs/makeup_shifter.png)
7 changes: 5 additions & 2 deletions ppgan/faceutils/mask/face_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,12 @@
from PIL import Image
import paddle
import paddle.vision.transforms as T
from paddle.utils.download import get_path_from_url
import pickle
from .model import BiSeNet

BISENET_WEIGHT_URL = 'https://paddlegan.bj.bcebos.com/bisnet.pdparams'


class FaceParser:
def __init__(self, device="cpu"):
Expand All @@ -47,8 +50,8 @@ def __init__(self, device="cpu"):
18: 0
}
#self.dict = paddle.to_tensor(mapper)
self.save_pth = osp.split(
osp.realpath(__file__))[0] + '/resnet.pdparams'
self.save_pth = get_path_from_url(BISENET_WEIGHT_URL,
osp.split(osp.realpath(__file__))[0])

self.net = BiSeNet(n_classes=19)

Expand Down
10 changes: 9 additions & 1 deletion ppgan/models/makeup_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
import paddle.nn as nn
import paddle.nn.functional as F
from paddle.vision.models import vgg16
from paddle.utils.download import get_path_from_url
from .base_model import BaseModel

from .builder import MODELS
Expand All @@ -29,6 +30,8 @@
from ..utils.preprocess import *
from ..datasets.makeup_dataset import MakeupDataset

VGGFACE_WEIGHT_URL = 'https://paddlegan.bj.bcebos.com/vggface.pdparams'


@MODELS.register()
class MakeupModel(BaseModel):
Expand All @@ -50,8 +53,13 @@ def __init__(self, cfg):
init_weights(self.nets['netG'], init_type='xavier', init_gain=1.0)

if self.is_train: # define discriminators
vgg = vgg16(pretrained=True)
vgg = vgg16(pretrained=False)
self.vgg = vgg.features
cur_path = os.path.abspath(os.path.dirname(__file__))
vgg_weight_path = get_path_from_url(VGGFACE_WEIGHT_URL, cur_path)
param = paddle.load(vgg_weight_path)
vgg.load_dict(param)

self.nets['netD_A'] = build_discriminator(cfg.model.discriminator)
self.nets['netD_B'] = build_discriminator(cfg.model.discriminator)
init_weights(self.nets['netD_A'], init_type='xavier', init_gain=1.0)
Expand Down

0 comments on commit b2cc92a

Please sign in to comment.