Presented By: Advanced Research Computing (ARC)
Data Engineering on Spark
Shuo Xiang, Data Scientist, Pinterest
Abstract: We are collecting and processing vast amount of data nowadays and we have witnessed how data-driven R&D could fundamentally change various aspects of our life. In this talk, Shuo Xiang will first introduce common data engineering tasks and the associated challenges that are faced by both industrial companies and academic researchers. Powered by Apache Spark, an emerging data processing engine, he will show how to deliver and scale up data engineering services such as data preprocessing, machine learning and real-time analytics. Finally, he will give examples on parallelizing specific machine learning algorithms on top of Spark.
Bio: Shuo Xiang is a Data Scientist of Pinterest, where he works on data pipeline, machine learning service and visualization. He obtained his PhD from Arizona State University in 2014 for research on feature selection modeling and optimization
algorithm. He is a contributor of multiple open-source projects including Apache Spark.
About Pinterest: Pinterest is a visual bookmarking tool that helps you discover and save creative ideas. Our mission is to help people discover the things they love, and inspire them to go do those things in their daily lives. We're building the world's first and biggest discovery engine. Ben Silbermann, Evan Sharp and Paul Sciarra co-founded our site in March 2010. Since then, we’ve helped millions of people pick up new hobbies, find their style and plan life’s important projects. (from about.pinterest.com)
This event is co-sponsored by the Michigan Institute for Data Science and the Department of Electrical and Computer Engineering
Bio: Shuo Xiang is a Data Scientist of Pinterest, where he works on data pipeline, machine learning service and visualization. He obtained his PhD from Arizona State University in 2014 for research on feature selection modeling and optimization
algorithm. He is a contributor of multiple open-source projects including Apache Spark.
About Pinterest: Pinterest is a visual bookmarking tool that helps you discover and save creative ideas. Our mission is to help people discover the things they love, and inspire them to go do those things in their daily lives. We're building the world's first and biggest discovery engine. Ben Silbermann, Evan Sharp and Paul Sciarra co-founded our site in March 2010. Since then, we’ve helped millions of people pick up new hobbies, find their style and plan life’s important projects. (from about.pinterest.com)
This event is co-sponsored by the Michigan Institute for Data Science and the Department of Electrical and Computer Engineering
Related Links
Co-Sponsored By
Explore Similar Events
-
Loading Similar Events...