In the past decade, we have seen dramatic evolution of machine learning from convex methods to non-convex methods and from shallow models to deep models. Lying at the heart of machine learning, mathematical optimization has played an important and indispensable role in solving many different learning problems. However, there is still a big gap between the practice used in deep learning community and the existing theory. In this talk, I will focus on a learning paradigm called stagewise learning that is different from conventional learning methods based on stochastic gradient descent with a continuously decreasing step size. I will show that the proposed stagewise learning algorithms can achieve significant improvements in both theory and practice over standard stochastic gradient method for solving many machine learning problems, including support vector machine, AUC optimization, deep neural networks and generative adversarial networks.