Skip to content

Commit

Permalink
add distillation quick-start (PaddlePaddle#91)
Browse files Browse the repository at this point in the history
  • Loading branch information
baiyfbupt authored Feb 7, 2020
1 parent d80ed89 commit 9ce81dc
Show file tree
Hide file tree
Showing 2 changed files with 319 additions and 0 deletions.
206 changes: 206 additions & 0 deletions demo/distillation/image_classification_distillation_tutorial.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# PaddleSlim Distillation知识蒸馏简介与实验\n",
"\n",
"一般情况下,模型参数量越多,结构越复杂,其性能越好,但参数也越冗余,运算量和资源消耗也越大。**知识蒸馏**就是一种将大模型学习到的有用信息(Dark Knowledge)压缩进更小更快的模型,而获得可以匹敌大模型结果的方法。\n",
"\n",
"在本文中性能强劲的大模型被称为teacher, 性能稍逊但体积较小的模型被称为student。示例包含以下步骤:\n",
"\n",
"1. 导入依赖\n",
"2. 定义student_program和teacher_program\n",
"3. 选择特征图\n",
"4. 合并program (merge)并添加蒸馏loss\n",
"5. 模型训练\n",
"\n",
"\n",
"## 1. 导入依赖\n",
"PaddleSlim依赖Paddle1.7版本,请确认已正确安装Paddle,然后按以下方式导入Paddle、PaddleSlim以及其他依赖:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import paddle\n",
"import paddle.fluid as fluid\n",
"import paddleslim as slim\n",
"import sys\n",
"sys.path.append(\"../\")\n",
"import models"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. 定义student_program和teacher_program\n",
"\n",
"本教程在MNIST数据集上进行知识蒸馏的训练和验证,输入图片尺寸为`[1, 28, 28]`,输出类别数为10。\n",
"选择`ResNet50`作为teacher对`MobileNet`结构的student进行蒸馏训练。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"model = models.__dict__['MobileNet']()\n",
"student_program = fluid.Program()\n",
"student_startup = fluid.Program()\n",
"with fluid.program_guard(student_program, student_startup):\n",
" image = fluid.data(\n",
" name='image', shape=[None] + [1, 28, 28], dtype='float32')\n",
" label = fluid.data(name='label', shape=[None, 1], dtype='int64')\n",
" out = model.net(input=image, class_dim=10)\n",
" cost = fluid.layers.cross_entropy(input=out, label=label)\n",
" avg_cost = fluid.layers.mean(x=cost)\n",
" acc_top1 = fluid.layers.accuracy(input=out, label=label, k=1)\n",
" acc_top5 = fluid.layers.accuracy(input=out, label=label, k=5)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"teacher_model = models.__dict__['ResNet50']()\n",
"teacher_program = fluid.Program()\n",
"teacher_startup = fluid.Program()\n",
"with fluid.program_guard(teacher_program, teacher_startup):\n",
" with fluid.unique_name.guard():\n",
" image = fluid.data(\n",
" name='image', shape=[None] + [1, 28, 28], dtype='float32')\n",
" predict = teacher_model.net(image, class_dim=10)\n",
"exe = fluid.Executor(fluid.CPUPlace())\n",
"exe.run(teacher_startup)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. 选择特征图\n",
"我们可以用student_的list_vars方法来观察其中全部的Variables,从中选出一个或多个变量(Variable)来拟合teacher相应的变量。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# get all student variables\n",
"student_vars = []\n",
"for v in student_program.list_vars():\n",
" student_vars.append((v.name, v.shape))\n",
"#uncomment the following lines to observe student's variables for distillation\n",
"#print(\"=\"*50+\"student_model_vars\"+\"=\"*50)\n",
"#print(student_vars)\n",
"\n",
"# get all teacher variables\n",
"teacher_vars = []\n",
"for v in teacher_program.list_vars():\n",
" teacher_vars.append((v.name, v.shape))\n",
"#uncomment the following lines to observe teacher's variables for distillation\n",
"#print(\"=\"*50+\"teacher_model_vars\"+\"=\"*50)\n",
"#print(teacher_vars)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"经过筛选我们可以看到,teacher_program中的'bn5c_branch2b.output.1.tmp_3'和student_program的'depthwise_conv2d_11.tmp_0'尺寸一致,可以组成蒸馏损失函数。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. 合并program (merge)并添加蒸馏loss\n",
"merge操作将student_program和teacher_program中的所有Variables和Op都将被添加到同一个Program中,同时为了避免两个program中有同名变量会引起命名冲突,merge也会为teacher_program中的Variables添加一个同一的命名前缀name_prefix,其默认值是'teacher_'\n",
"\n",
"为了确保teacher网络和student网络输入的数据是一样的,merge操作也会对两个program的输入数据层进行合并操作,所以需要指定一个数据层名称的映射关系data_name_map,key是teacher的输入数据名称,value是student的"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data_name_map = {'image': 'image'}\n",
"main = slim.dist.merge(teacher_program, student_program, data_name_map, fluid.CPUPlace())\n",
"with fluid.program_guard(student_program, student_startup):\n",
" l2_loss = slim.dist.l2_loss('teacher_bn5c_branch2b.output.1.tmp_3', 'depthwise_conv2d_11.tmp_0', student_program)\n",
" loss = l2_loss + avg_cost\n",
" opt = fluid.optimizer.Momentum(0.01, 0.9)\n",
" opt.minimize(loss)\n",
"exe.run(student_startup)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. 模型训练\n",
"\n",
"为了快速执行该示例,我们选取简单的MNIST数据,Paddle框架的`paddle.dataset.mnist`包定义了MNIST数据的下载和读取。\n",
"代码如下:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"train_reader = paddle.batch(\n",
" paddle.dataset.mnist.train(), batch_size=128, drop_last=True)\n",
"train_feeder = fluid.DataFeeder(['image', 'label'], fluid.CPUPlace(), student_program)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for data in train_reader():\n",
" acc1, acc5, loss_np = exe.run(student_program, feed=train_feeder.feed(data), fetch_list=[acc_top1.name, acc_top5.name, loss.name])\n",
" print(\"Acc1: {:.6f}, Acc5: {:.6f}, Loss: {:.6f}\".format(acc1.mean(), acc5.mean(), loss_np.mean()))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
113 changes: 113 additions & 0 deletions docs/zh_cn/quick_start/distillation_tutorial.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# 图像分类模型知识蒸馏-快速开始

该教程以图像分类模型MobileNetV1为例,说明如何快速使用[PaddleSlim的知识蒸馏接口](https://paddlepaddle.github.io/PaddleSlim/api/single_distiller_api/)
该示例包含以下步骤:

1. 导入依赖
2. 定义student_program和teacher_program
3. 选择特征图
4. 合并program(merge)并添加蒸馏loss
5. 模型训练

以下章节依次介绍每个步骤的内容。

## 1. 导入依赖

PaddleSlim依赖Paddle1.7版本,请确认已正确安装Paddle,然后按以下方式导入Paddle和PaddleSlim:

```
import paddle
import paddle.fluid as fluid
import paddleslim as slim
```

## 2. 定义student_program和teacher_program

本教程在MNIST数据集上进行知识蒸馏的训练和验证,输入图片尺寸为`[1, 28, 28]`,输出类别数为10。
选择`ResNet50`作为teacher对`MobileNet`结构的student进行蒸馏训练。

```python
model = models.__dict__['MobileNet']()
student_program = fluid.Program()
student_startup = fluid.Program()
with fluid.program_guard(student_program, student_startup):
image = fluid.data(
name='image', shape=[None] + [1, 28, 28], dtype='float32')
label = fluid.data(name='label', shape=[None, 1], dtype='int64')
out = model.net(input=image, class_dim=10)
cost = fluid.layers.cross_entropy(input=out, label=label)
avg_cost = fluid.layers.mean(x=cost)
acc_top1 = fluid.layers.accuracy(input=out, label=label, k=1)
acc_top5 = fluid.layers.accuracy(input=out, label=label, k=5)
```



```python
teacher_model = models.__dict__['ResNet50']()
teacher_program = fluid.Program()
teacher_startup = fluid.Program()
with fluid.program_guard(teacher_program, teacher_startup):
with fluid.unique_name.guard():
image = fluid.data(
name='image', shape=[None] + [1, 28, 28], dtype='float32')
predict = teacher_model.net(image, class_dim=10)
exe = fluid.Executor(fluid.CPUPlace())
exe.run(teacher_startup)
```

## 3. 选择特征图

我们可以用student_的list_vars方法来观察其中全部的Variables,从中选出一个或多个变量(Variable)来拟合teacher相应的变量。

```python
# get all student variables
student_vars = []
for v in student_program.list_vars():
student_vars.append((v.name, v.shape))
#uncomment the following lines to observe student's variables for distillation
#print("="*50+"student_model_vars"+"="*50)
#print(student_vars)

# get all teacher variables
teacher_vars = []
for v in teacher_program.list_vars():
teacher_vars.append((v.name, v.shape))
#uncomment the following lines to observe teacher's variables for distillation
#print("="*50+"teacher_model_vars"+"="*50)
#print(teacher_vars)
```

经过筛选我们可以看到,teacher_program中的'bn5c_branch2b.output.1.tmp_3'和student_program的'depthwise_conv2d_11.tmp_0'尺寸一致,可以组成蒸馏损失函数。

## 4. 合并program (merge)并添加蒸馏loss
merge操作将student_program和teacher_program中的所有Variables和Op都将被添加到同一个Program中,同时为了避免两个program中有同名变量会引起命名冲突,merge也会为teacher_program中的Variables添加一个同一的命名前缀name_prefix,其默认值是'teacher_'

为了确保teacher网络和student网络输入的数据是一样的,merge操作也会对两个program的输入数据层进行合并操作,所以需要指定一个数据层名称的映射关系data_name_map,key是teacher的输入数据名称,value是student的

```python
data_name_map = {'image': 'image'}
main = slim.dist.merge(teacher_program, student_program, data_name_map, fluid.CPUPlace())
with fluid.program_guard(student_program, student_startup):
l2_loss = slim.dist.l2_loss('teacher_bn5c_branch2b.output.1.tmp_3', 'depthwise_conv2d_11.tmp_0', student_program)
loss = l2_loss + avg_cost
opt = fluid.optimizer.Momentum(0.01, 0.9)
opt.minimize(loss)
exe.run(student_startup)
```

## 5. 模型训练

为了快速执行该示例,我们选取简单的MNIST数据,Paddle框架的`paddle.dataset.mnist`包定义了MNIST数据的下载和读取。 代码如下:

```python
train_reader = paddle.batch(
paddle.dataset.mnist.train(), batch_size=128, drop_last=True)
train_feeder = fluid.DataFeeder(['image', 'label'], fluid.CPUPlace(), student_program)
```

```python
for data in train_reader():
acc1, acc5, loss_np = exe.run(student_program, feed=train_feeder.feed(data), fetch_list=[acc_top1.name, acc_top5.name, loss.name])
print("Acc1: {:.6f}, Acc5: {:.6f}, Loss: {:.6f}".format(acc1.mean(), acc5.mean(), loss_np.mean()))
```

0 comments on commit 9ce81dc

Please sign in to comment.