Skip to content

mebauer/sodapy-tutorial-nyc-opendata

Repository files navigation

Socrata Open Data API Tutorial with Python and NYC Open Data

Author: Mark Bauer

Table of Contents

cover photo

cover photo

cover photo

cover photo

Introduction

Before analyzing a dataset, the first step is acquiring the data. While platforms like Kaggle and data.gov provide a wealth of datasets, one of the most popular platforms for local government data is Socrata's open data platform. Many government open data portals, including NYC Open Data, are powered by Socrata, making it a crucial resource for accessing public datasets. Fortunately, Socrata provides a robust and user-friendly API called the Socrata Open Data API (or Socrata API for short), which allows you to extract and interact with these datsets, including metadata. Ultimately, the result of using the Socrata API is more effective, scalable, and reproducible data workflows.

This project is designed to introduce both beginners and experienced users to the capabilities of the Socrata Open Data API. It focuses on how to locate, extract, and query datasets, which is key to performing data analysis. While smaller datasets can be loaded directly into a pandas dataframe from a URL (often in CSV format), larger datasets, such as NYC's 311 dataset which contains nearly 40 million rows, require more efficient methods of data retrieval. The Socrata Open Data API is ideal for this purpose.

For a more comprehensive undertanding of the Socrata API, read the official documentation.

This tutorial is the one I wish I had when I first started my data science journey, and I hope it helps you make the most of the Socrata API's powerful capabilities.

Quick Note: The inspiration for this project came from the Sodapy GitHub page, and much of the knowledge I gained about working with Sodapy and the Socrata API was based on the contributions from these developers. I highly recommend reviewing the official Sodapy documentation for a more comprehensive understanding of installation, requirements, available methods, and basic SoQL queries. We will use Sodapy, the Python client for interacting with the Socrata API, throughout this tutorial. Ultimately, this project is intended to complement, not replace, the official Sodapy docs.

Tutorials

  • Socrata API Basics: socrata-api-basics.ipynb Get started with the Socrata Open Data API and the sodapy Python client. This tutorial introduces you to the basics of connecting to Socrata, retrieving data, and working with the API.
  • The Socrata Query Language (SoQL): socrata-query-language.ipynb Learn how to craft powerful queries using the Socrata Query Language (SoQL). This guide covers various methods and techniques for querying data effectively through the Socrata API.
  • A sample analysis notebook: sample-analysis.ipynb Explore a sample analysis that highlights popular NYC Open Data datasets. This notebook also includes a quick exploratory data analysis (EDA) of NYC 311 Street Flooding Complaints.

Data

  • 311 Service Requests from 2010 to Present: All 311 Service Requests from 2010 to present. This information is automatically updated daily.
  • DEP Green Infrastructure: NYC Green Infrastructure Program initiatives. Green infrastructure (GI) collects stormwater from streets, sidewalks, and other hard surfaces before it can enter the sewer system or cause local flooding. The GI practice data contained in this dataset includes the location, program area, status, and type of GI. Please visit nyc.gov/dep/gimap to view the DEP Green Infrastructure Map.

Additional Resources

Socrata Open Data API

Sodapy

Other Projects

Here are some additional projects I've developed that make use of the Socrata API:

Say Hello!

Feel free to reach out for further discussions.