Presented By: Summer Institute in Survey Research Techniques
A Virtually Syntax Free Practical Introduction to Web Scraping for Survey and Social Science Researchers - Summer Institute in Survey Research Techniques
Course presented by Trent Buskirk
A Virtually Syntax Free Practical Introduction to Web Scraping for Survey and Social Science Researchers
Course open for registration!
July 18-19, 2022
1:00pm-5:00pm
M/T
This short course will offer a very practical introduction to data gathering geared at social scientists and survey researchers. This course begins with an overview of web scraping discussing some basic technical jargon, types of web data and various methods for scraping. The course also includes a discussion and illustration of Application Programming Interfaces (APIs) use for gathering web data when they are available. Some websites are designed to be easily accessible by web crawlers or scraping algorithms while others require much more advanced, custom programming. And some web data can be accessed using an API that is provided by the website. In this course we will illustrate how participants can discern these differences as well as presenting several motivating examples of the various ways web scraped data can be used throughout a study’s lifecycle from design to calibration to analysis. We provide an extensive introduction to a suite of freeware programs that allow virtually syntax free, but customizable, web scraping capabilities. We contrast this type of gathered data access to APIs for some websites like Zillow or Twitter and discuss pros and cons of using web scraping or APIs to gather this type of web data. The course concludes with specific focus on the import.io tool where we demonstrate its capabilities and provide several, hands-on practical examples for participants to begin scraping several websites of increasing complexity. We will also illustrate API calls in R for Zillow, the Census and others as time permits.
Not for academic credit.
Instructor: Trent Buskirk
All 2022 courses will be held in an alternative remote format.
Course open for registration!
July 18-19, 2022
1:00pm-5:00pm
M/T
This short course will offer a very practical introduction to data gathering geared at social scientists and survey researchers. This course begins with an overview of web scraping discussing some basic technical jargon, types of web data and various methods for scraping. The course also includes a discussion and illustration of Application Programming Interfaces (APIs) use for gathering web data when they are available. Some websites are designed to be easily accessible by web crawlers or scraping algorithms while others require much more advanced, custom programming. And some web data can be accessed using an API that is provided by the website. In this course we will illustrate how participants can discern these differences as well as presenting several motivating examples of the various ways web scraped data can be used throughout a study’s lifecycle from design to calibration to analysis. We provide an extensive introduction to a suite of freeware programs that allow virtually syntax free, but customizable, web scraping capabilities. We contrast this type of gathered data access to APIs for some websites like Zillow or Twitter and discuss pros and cons of using web scraping or APIs to gather this type of web data. The course concludes with specific focus on the import.io tool where we demonstrate its capabilities and provide several, hands-on practical examples for participants to begin scraping several websites of increasing complexity. We will also illustrate API calls in R for Zillow, the Census and others as time permits.
Not for academic credit.
Instructor: Trent Buskirk
All 2022 courses will be held in an alternative remote format.
Cost
- Fees are based upon the total number of course hours, or webinars, selected.
Related Links
Co-Sponsored By
Explore Similar Events
-
Loading Similar Events...