Deep learning models are often so complex that they achieve vanishing classification error on the training set. Despite their huge complexity, the same architectures achieve small generalization error. This phenomenon has been rationalized in terms of a so-called double descent curve. As the model complexity increases, the generalization error follows the usual U-shaped curve at the beginning, first decreasing and then peaking around the interpolation threshold (when the model achieves vanishing training error). However, it descends again as model complexity exceeds this threshold.
I will focus on the case of a fully-connected two-layers neural network, and consider its linearization around a random initial condition. I will show that many intersting phenomena can be demonstrated and mathematically understood in this simple setting. I will then describe a few open problems and directions for future research.
Andrea Montanari received a Laurea degree in Physics in 1997, and a Ph. D. in Theoretical Physics in 2001 (both from Scuola Normale Superiore in Pisa, Italy). He has been post-doctoral fellow at Laboratoire de Physique Théorique de l'Ecole Normale Supérieure (LPTENS), Paris, France, and the Mathematical Sciences Research Institute, Berkeley, USA. From 2002 to 2010 he has been Chargé de Recherche (with Centre National de la Recherche Scientifique, CNRS) at LPTENS. In September 2006 he joined Stanford University as a faculty, and since 2015 he is Full Professor in the Departments of Electrical Engineering and Statistics.
He was co-awarded the ACM SIGMETRICS best paper award in 2008. He received the CNRS bronze medal for theoretical physics in 2006, the National Science Foundation CAREER award in 2008, the Okawa Foundation Research Grant in 2013, and the Applied Probability Society Best Publication Award in 2015. He was an Information Theory Society distinguished lecturer for 2015-2016. In 2016 he received the James L. Massey Research & Teaching Award of the Information Theory Society for young scholars, and in 2017 was elevated to IEEE Fellow. In 2018 he was an invited sectional speaker at the International Congress of Mathematicians. He received the 2020 Le Cam prize of the French Statistical Society, and is an invited IMS Medallion lecturer for the 2020 Bernoulli-IMS World Congress.
Distributed computing enables parallel execution of tasks that make up a large computing job. In large scale systems, even small random fluctuations in service times (inherent to computing environments) often cause a non-negligible number of straggling tasks with long completion times. Redundancy, in the form of simple task replication and erasure coding, has emerged as a potentially powerful way to curtail the variability in service time, as it provides diversity that allows a job to be completed when only a subset of redundant tasks gets executed. Thus both redundancy and parallelism reduce the execution time, but compete for resources of the system. In situations of constrained resources (e.g., fixed number of parallel servers), increasing redundancy reduces the available level of parallelism. This talk will present the diversity vs. parallelism trade off for some common models of task size dependent execution times, and show that different models operate optimally at different levels of redundancy, and thus require very different code rates.
[Joint work with Pei Peng and Phil Whiting]
Emina Soljanin is a professor of Electrical and Compute Engineering at Rutgers. Before moving to Rutgers in January 2016, she was a (Distinguished) Member of Technical Staff for 21 years in various incarnations of the Mathematical Sciences Research Center of Bell Labs. Her interests and expertise are wide, currently ranging from distributed computing to quantum information science. She is an IEEE Fellow, a 2017 outstanding alumnus of the Texas A&M School of Engineering, the 2011 Padovani Lecturer, a 2016/17 Distinguished Lecturer, and 2019 President for the IEEE Information Theory Society.
Networks are often modeled by random processes in which nodes are added one-by-one, according to some simple random rule. Uniform and preferential attachment trees are among the simplest examples of such dynamically growing networks. The statistical problems we address in this talk regard discovering the past of the tree when a present-day snapshot is observed. We present results that show that, even in gigantic networks, a lot of information is preserved from the early days. In particular, we discuss the problem of finding the root and the broadcasting problem.
Gábor Lugosi is an ICREA research professor at the Department of Economics, Pompeu Fabra University, Barcelona. His main research interests include the theory of machine learning, combinatorial statistics, random graphs and random structures, and information theory.
Coded caching has emerged as a powerful and elegant idea for content distribution over communication networks. Since the initial work of Maddah-Ali and Niesen, a vast set of theoretical results have been developed in the network coding and information theory community. These results range from solving more and more complicated theoretical "puzzles" (i.e., highly involved, but somehow practically irrelevant problems) to addressing more concrete problems of practical relevance for applications. Yet, questions still remain about whether such schemes will ever be used in the real world on a vast scale. This talk provides an account of some recent exciting results including the real-world implementation of coded caching on actual wireless networks, addressing some of the residual skepticism about the feasibility and actual gains achievable by these schemes.
Giuseppe Caire (S '92 -- M '94 -- SM '03 -- F '05) was born in Torino in 1965. He received the B.Sc. in Electrical Engineering from Politecnico di Torino in 1990, the M.Sc. in Electrical Engineering from Princeton University in 1992, and the Ph.D. from Politecnico di Torino in 1994. He has been a post-doctoral research fellow with the European Space Agency (ESTEC, Noordwijk, The Netherlands) in 1994-1995, Assistant Professor in Telecommunications at the Politecnico di Torino, Associate Professor at the University of Parma, Italy, Professor with the Department of Mobile Communications at the Eurecom Institute, Sophia-Antipolis, France, a Professor of Electrical Engineering with the Viterbi School of Engineering, University of Southern California, Los Angeles, and he is currently an Alexander von Humboldt Professor with the Faculty of Electrical Engineering and Computer Science at the Technical University of Berlin, Germany.
He received the Jack Neubauer Best System Paper Award from the IEEE Vehicular Technology Society in 2003, the IEEE Communications Society & Information Theory Society Joint Paper Award in 2004 and in 2011, the Leonard G. Abraham Prize for best IEEE JSAC paper in 2019, the Okawa Research Award in 2006, the Alexander von Humboldt Professorship in 2014, the Vodafone Innovation Prize in 2015, and an ERC Advanced Grant in 2018. Giuseppe Caire is a Fellow of IEEE since 2005. He has served in the Board of Governors of the IEEE Information Theory Society from 2004 to 2007, and as officer from 2008 to 2013. He was President of the IEEE Information Theory Society in 2011. His main research interests are in the field of communications theory, information theory, channel and source coding with particular focus on wireless communications.
MARIA FLORINA BALCAN
Carnegie Mellon University
Data driven algorithm design for combinatorial problems is an important aspect of modern data science. Rather than using off the shelf algorithms that only have worst case performance guarantees, practitioners typically optimize over large families of parametrized algorithms and tune the parameters of these algorithms using a training set of problem instances from their domain to determine a configuration with high expected performance over future instances. However, most of this work comes with no performance guarantees. The challenge is that for many combinatorial problems, including partitioning and subset selection problems, a small tweak to the parameters can cause a cascade of changes in the algorithm's behavior, so the algorithm's performance is a discontinuous function of its parameters.
In this talk, I will present new work that helps put data driven combinatorial algorithm selection on firm foundations. This includes strong computational and statistical performance guarantees, both for the batch and online scenarios where a collection of typical problem instances from the given application are presented either all at once or in an online fashion, respectively. I will describe both specific examples (for clustering, partitioning, and subset selection problems) and general principles that emerge in this context (including general techniques for sample complexity guarantees in the batch setting and no-regret guarantees in the online settings).
Maria Florina Balcan is the Cadence Design Systems Professor of Computer Science in the School of Computer Science at Carnegie Mellon University. Her main research interests are machine learning, artificial intelligence, theory of computing, and algorithmic game theory. She is a Sloan Fellow, a Microsoft Research New Faculty Fellow, a Kavli Fellow, and a recipient of an NSF CAREER award, the ACM Grace Murray Hopper Award, and several best paper awards. She currently serves as a general chair for the International Conference on Machine Learning 2021 and as a board member of the International Machine Learning Society. She previously served as a program committee co-chair of the Conference on Learning Theory in 2014, the International Conference on Machine Learning in 2016, and the Neural Information Processing Systems 2020 conference.
ETH Zürich, Switzerland
The exploration—exploitation dilemma is a central challenge when making decisions under uncertainty. Most common approaches explore by favouring actions with uncertain outcomes. However, aleatoric uncertainty in the outcomes is different from epistemic uncertainty in the estimation task, thus the resulting observations may not necessarily be informative. In this talk, I will present approaches towards efficient information-directed exploration in stochastic multi-armed bandits, Bayesian optimization, reinforcement learning and a rich family of sequential decision problems called partial monitoring. These approaches use information measures for guiding exploration, and their submodularity allows to establish sublinear regret even in non-parametric settings. I will present the theoretical background, as well as empirical demonstrations on deep reinforcement learning tasks.
[Based primarily on joint work with Johannes Kirschner, Tor Lattimore, Nikolay Nikolov and Felix Berkenkamp]
Andreas Krause is a Professor of Computer Science at ETH Zurich, where he leads the Learning & Adaptive Systems Group. He also serves as Academic Co-Director of the Swiss Data Science Center and Chair of the ETH AI Center. Before that he was an Assistant Professor of Computer Science at Caltech. He received his Ph.D. in Computer Science from Carnegie Mellon University (2008) and his Diplom in Computer Science and Mathematics from the Technical University of Munich, Germany (2004). He is a Microsoft Research Faculty Fellow and a Kavli Frontiers Fellow, and received ERC Starting Investigator and ERC Consolidator grants, the Deutscher Mustererkennungspreis, an NSF CAREER award as well as the ETH Golden Owl teaching award. His research has received awards at several premier conferences and journals, including Test of Time awards at KDD 2019 and ICML 2020. Andreas Krause served as Program Co-Chair for ICML 2018 and serves as Action Editor for the Journal of Machine Learning Research.