In this talk, I will discuss two interesting phenomena that can reveal underlying mechanisms of deep learning. First, Frequency Principle shows deep neural networks tend to fit data from low to high frequency, which shows the strength of deep learning for learning low frequency but weakness for high frequency. Second, with small initialization, neurons in the same layer tend to behave similarly, which makes networks learn data with complexity as small as possible. In addition, we show that small initialization can effectively improve the inference ability of language models.
Zhi-Qin John XU (Institute of Natural Sciences, School of Mathematical Sciences, Shanghai Jiao Tong University), https://ins.sjtu.edu.cn/people/xuzhiqin/index.html
Zhi-Qin John Xu is an associate professor at Shanghai Jiao Tong University (SJTU). Zhi-Qin obtained B.S. in Physics (2012) and a Ph.D. degree in Mathematics (2016) from SJTU. Before joining SJTU, Zhi-Qin worked as a postdoc at NYUAD and Courant Institute from 2016 to 2019. He published papers on TPAMI, JMLR, AAAI, NeurIPS, JCP, CiCP, SIMODS, PRL, CPL etc. He is a managing editor of Journal of Machine Learning.
Join Tencent Meeting:https://meeting.tencent.com/dm/mpg1T1aVPyDG
Meeting ID: 383672350