Scraper for Chicago Maps Archive

The purpose of this scraper is to reconfigure data made publicly available by the Chicago History Museum on City of Chicago Maps. This reconfigured data will be used to help process and digitize the maps for public use by University of Chicago Library's Preservation Department.

In order to streamline the use of this scraper, I used the python package poetry to create a virtual environment. This approach allows users to work more flexibility across machines, not needing to worry about installing all the appropriate packages needed to run this program.

scraper.py is the core module of this program, and is composed of three core functions:

page_to_dict(): This function takes the URL and restructures the data into a dictionary of dictionaries, parsing by types of text.
parse_dimensions(): This function takes a row_dict object created by page_to_dict() and parses the dimension values (width and height) that are saved within the scale key/value pair.
get_restructured_data(): This function takes no inputs, calls the two functions above using the City of Chicago Maps URL. The core use of this function is to convert the maps dictionary into a dataframe, and then export the dataframe as an xlsx file. This is output is saved as "city_of_chicago_maps.xlsx" within the restructured_data folder.

A note on the structure of the webpage:

The main content is all built into one table body, with each tr being a new "line" of the webpage content. The core reason why pulling this content is challenging is because each row of the table has multiple trs, so treating that as one text block (especially when the number of lines in each row changes) required additional construction.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
__pycache__		__pycache__
restructured_data		restructured_data
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scraper for Chicago Maps Archive

A note on the structure of the webpage:

About

Releases

Packages

Languages

claireboyd/chicagomaps_scraper

Folders and files

Latest commit

History

Repository files navigation

Scraper for Chicago Maps Archive

A note on the structure of the webpage:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages