The processed data files live in the src/phl_budget_data/data/processed
folder.
There are three folders:
collections/
: This folder includes:city-collections.csv
: The city's monthly collections, parsed from public Revenue Dept. reports; includes tax, non-tax, other govt. collections.city-tax-collections.csv
: The city's monthly tax collections, parsed from public Revenue Dept. reports; includes only tax collections.school-collections.csv
: The school district's monthly collections, parsed from public Revenue Dept. reportsrtt-collections-by-sector.csv
: A breakdown of Realty Transfer Tax collections by sector, parsed from public Revenue Dept. reportssales-collections-by-sector.csv
: A breakdown of Sales Tax collections by sector, parsed from public Revenue Dept. reportswage-collection-by-sector.csv
: A breakdown of Wage Tax collections by sector, parsed from public Revenue Dept. reports
qcmr/
: This folder includes data parsed from the Quarterly City Manager's Report (QCMR):cash-reports-*.csv
: Data parsed from different parts of the Cash Report in the back of the QCMRdepartment-obligations.csv
: Data parsed from the Departmental Obligations table in the QCMRfulltime-positions.csv
: Data parsed from the Fulltime Positions Report table in the QCMRpersonal-services-summary.csv
: Data parsed from the Personal Services Summary table in the QCMR
spending/
: This folder includes data parsed from City Budget-in-Brief documents:actual-department-spending.csv
: Historical actual spending by departmentbudgeted-department-spending-adopted.csv
: Budgeted spending by department from the adopted budgetbudgeted-department-spending-proposed.csv
: Budgeted spending by department from the proposed budget
First clone the environment:
git clone /~https://github.com/PhiladelphiaController/phl-budget-data.git
Then, install the Python dependencies with poetry:
cd phl-budget-data
poetry install
And run the help message for the main command:
poetry run phl-budget-data --help
You will need AWS credentials for running the parsing scripts. Create a .env
file in the root of the project
that is mirrored off of .env.example
and fill in the values. To get the AWS
credentials, go to the "Credentials/" folder on the FPD Sharepoint.
In general, the process for adding new data is:
- Add the raw PDF files to the appropriate folder in
src/phl_budget_data/data/etl/raw
. Look at past PDF files to make sure you are adding the correct table to the correct folder. You should make sure to add a PDF that only contains the pages with the table information. - Run the appropriate ETL command for the data you are parsing; run
poetry run phl-budget-data etl --help
to see the available commands. For example, to parse the cash report data, runpoetry run phl-budget-data etl CashReport
. This will create a new CSV file in the appropriate folder insrc/phl_budget_data/etl/data/processed
. - Update the files in the processed data folder
src/phl_budget_data/data/processed
by saving new versions:poetry run phl-budget-data save
.
- Extract out the two-page cash report PDF from the latest QCMR and save it to:
src/phl_budget_data/data/etl/raw/qcmr/cash/
. - Run the ETL parsing command. For example, for FY23 Q4 you would run:
poetry run phl-budget-data etl CashReport --fiscal-year 2023 --quarter 4
. - Update the main processed data files:
poetry run phl-budget-data save
.
There is a GitHub action in this repository that runs daily and checks the City's website for newly uploaded monthly collection reports. These reports are uploaded to the City's revenue reports with about a month delay. The script checks for new data and will parse and save it to the repository if it finds a new report.