the student-teacher paradigm

来源：互联网发布：centos 6.7 安装cacti 编辑：程序博客网时间：2024/06/01 10:14

摘自论文《FractalNet: Ultra-Deep Neural Networks without Residuals》

Exploration of the student-teacher paradigm [1] illuminates the potential for interplay between networks of different depth. In the model compression scenario, a deeper network (previously trained) guides and improves the learning of a shallower and faster student network [1, 34]. This is accomplished by feeding unlabeled data through the teacher and having the student mimic the teacher’s soft output predictions. FitNets [27] explicitly couple students and teachers, forcing mimic behavior across several intermediate points in the network. Our fractal networks capture yet another alternative, in the form of implicit coupling, with the potential for bidirectional information ﬂow between shallow and deep subnetworks.

师生模式
通过教师加入未打标的数据，学生模拟教师的软输出预测。
Fitnets网络明确指定师生对，在网络中通过多个中间点来产生模拟行为。

[1] J. Ba and R. Caruana. Do deep nets really need to be deep? In NIPS, pages 2654–2662, 2014.

[27] A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio. Fitnets: Hints for thin deep nets. ICLR, 2015.

[34] G. Urban, K. J. Geras, S. Ebrahimi Kahou, O. Aslan, S. Wang, R. Caruana, A. Mohamed, M. Philipose, and M. Richardson. Do deep convolutional nets really need to be deep (or even convolutional)? ArXiv e-prints arXiv:1603.05691, Mar. 2016.

0 0