Recently, rapid advances in science and technology have brought extraordinary amount of data that cannot be analyzed by traditional statistical or machine learning approaches and algorithms. These advances provide unprecedented opportunities and challenges to tackle much larger and more complicated data in academics and industry. To overcome these difficulties, massive computing frameworks such as MapReduce and Spark are becoming increasingly important. However, statistical challenges have not been paid much attention to in the implementation of these frameworks. Recently, we have proposed to use sufficient statistics instead of the whole data in the analysis. We have investigated the concept of sufficient statistics under the framework of a variety of statistical approaches, including linear regression and generalized linear models. The current talk will focus on linear regression problems. It will briefly mention the idea to generalized linear models.