Skip to Content

Sponsors

No results

Tags

No results

Types

No results

Search Results

Events

No results
Search events using: keywords, sponsors, locations or event type
When / Where
All occurrences of this event have passed.
This listing is displayed for historical purposes.

Presented By: Summer Institute in Survey Research Techniques

An Introduction to Big Data and Machine Learning for Survey Researchers and Social Scientists

Trent D. Buskirk

The amount of data generated as a by-product in society is growing fast including data from satellites, sensors, transactions, social media and smartphones, just to name a few. Such data are often referred to as "big data", and can be used to create value in different areas such as health and crime prevention, commerce and fraud detection. An emerging practice in many areas is to append or link big data sources with more specific and smaller scale sources that often contain much more limited information. This practice has been used for some time by survey researchers in constructing frames by appending auxiliary information that is often not directly available on the frame, but can be obtained from an external source. Using Big Data has the potential to go beyond the sampling phase for survey researchers and in fact has the potential to influence the social sciences in general. Big Data is of interest for public opinion researchers and agencies that produce statistics to find alternative data sources either to reduce costs, to improve estimates or to produce estimates in a more timely fashion. However, Big Data pose several interesting and new challenges to survey researchers and social scientists among others who want to extract information from data. As Robert Groves (2012) pointedly commented, the era is “appropriately called Big Data and not Big Information”, because there is a lot of work for analysts before information can be gained from “auxiliary traces of some process that is going on in society.”

This course offers participants a broad overview of big data sources, opportunities and examples motivated within the survey and social science contexts including the use of social media data, para data and other such sources. This course also offers a detailed, practical introduction to four common machine learning methods that can be applied to big and small data alike at various aspects of a study’s lifecycle from design to nonresponse adjustments to propensity score matching to weighting and evaluation and analysis. The machine learning methods will be demonstrated in R and we will provide several different examples of using these methods along with multiple packages in R that offer these methods.

If you wish to take this course for academic credit you must also enroll in A Virtually Syntax Free Practical Introduction to Web Scrapping for Survey and Social Science Researchers.

Prerequisite: Basic proficiency in R (i.e. how to load a package, launch it and basic R syntax knowledge)

Explore Similar Events

  •  Loading Similar Events...

Back to Main Content