-
Objective:
- Analyze COVID-19 mortality data in Chile from 2020 to 2024, providing critical insights into the pandemic's impact on the population.
-
Project Division:
- First Part: Work with Google Sheets for initial data exploration and analysis.
- Second Part: Use PostgreSQL for advanced analysis and data storage.
-
Methodology:
-
Google Sheets:
- Use Python for data processing and analysis.
- Collaboratively explore data using Google Sheets.
-
PostgreSQL:
- Migrate data to a PostgreSQL database for more detailed analysis and advanced queries.
-
-
Key Objectives:
- Data Cleaning and Organization: Prepare the mortality dataset for meaningful analysis.
- Exploration of Trends and Patterns: Investigate mortality rates by region, age, and time.
- Data Visualization: Create effective visualizations to communicate key findings.
The project leverages Python for data processing and analysis and Google Sheets for collaborative data exploration.
- Cleaning and organizing the mortality dataset for meaningful analysis.
- Exploring trends and patterns, such as mortality rates by region, age, or time.
- Visualizing data to communicate insights effectively.
- Building reproducible workflows for data analysis and reporting.
Nº | Variable Name | Description | Data Type | Values |
---|---|---|---|---|
1 | AÑO | Year of the death date | Number | N/A |
2 | FECHA_DEF | Date of the death date | Date | N/A |
3 | SEXO | Gloss identifying biological sex | Number | 1: Male, 2: Female, 9: Other |
4 | EDAD_TIPO | Unit of age measurement | Number | 1: Years, 2: Months, 3: Days |
5 | EDAD_CANT | Numerical record of the patient's age at admission | Number | N/A |
6 | COD_COMUNA | Code of the commune of residence of the deceased, according to the political-administrative division updated in 2019 | Number | N/A |
7 | COMUNA | Gloss of the commune of residence of the deceased, according to the political-administrative division updated in 2019 | Text | N/A |
8 | NOMBRE_REGION | Gloss of the region of residence | Text | N/A |
9 | DIAG1 | Basic cause of death | Text | N/A |
10 | CAPITULO_DIAG1 | ICD-10 chapter according to the cause of death | Text | N/A |
11 | GLOSA_CAPITULO_DIAG1 | Gloss of the ICD-10 chapter according to the cause of death | Text | N/A |
12 | CODIGO_GRUPO_DIAG1 | Group code of the cause of death according to ICD-10 | Text | N/A |
13 | GLOSA_GRUPO_DIAG1 | Gloss of the group code of the cause of death according to ICD-10 | Text | N/A |
14 | CODIGO_CATEGORIA_DIAG1 | Category code of the cause of death according to ICD-10 | Text | N/A |
15 | GLOSA_CATEGORIA_DIAG1 | Gloss code of the cause of death according to ICD-10 | Text | N/A |
16 | CODIGO_SUBCATEGORIA_DIAG1 | Category code of the cause of death according to ICD-10 | Text | N/A |
17 | GLOSA_SUBCATEGORIA_DIAG1 | Gloss of the category code of the cause of death according to ICD-10 | Text | N/A |
18 | DIAG2 | External cause of death | Text | N/A |
19 | CAPITULO_DIAG2 | ICD-10 chapter according to the cause of death | Text | N/A |
20 | GLOSA_CAPITULO_DIAG2 | Gloss of the ICD-10 chapter according to the cause of death | Text | N/A |
21 | CODIGO_GRUPO_DIAG2 | Group code of the cause of death according to ICD-10 | Text | N/A |
22 | GLOSA_GRUPO_DIAG2 | Gloss of the group code of the cause of death according to ICD-10 | Text | N/A |
23 | CODIGO_CATEGORIA_DIAG2 | Category code of the cause of death according to ICD-10 | Text | N/A |
24 | GLOSA_CATEGORIA_DIAG2 | Gloss of the category code of the cause of death according to ICD-10 | Text | N/A |
25 | CODIGO_SUBCATEGORIA_DIAG2 | Category code of the cause of death according to ICD-10 | Text | N/A |
26 | GLOSA_SUBCATEGORIA_DIAG2 | Gloss of the category code of the cause of death according to ICD-10 | Text | N/A |
27 | LUGAR_DEFUNCION | Describes the place where the death occurs | Text | N/A |
Nº | Variable Name | Description | Data Type | Values |
---|---|---|---|---|
1 | AÑO | Year of the death date | Number | N/A |
2 | FECHA_DEF | Date of the death date | Date | N/A |
3 | SEXO_NOMBRE | Gloss identifying biological sex | Number | N/A |
4 | EDAD_CANT | Numerical record of the patient's age at admission | Number | N/A |
5 | COMUNA | Gloss of the commune of residence of the deceased, according to the political-administrative division updated in 2019 | Text | N/A |
6 | NOMBRE_REGION | Gloss of the region of residence | Text | N/A |
7 | LUGAR_DEFUNCION | Describes the place where the death occurs | Text | N/A |
8 | PERIODO | Period of the death date | Date | N/A |
9 | RANGO_ETARIO | Age range of the deceased | Text | N/A |
The project leverages Python for data processing and analysis and Postgres for collaborative data exploration.
Ensure you have the following installed:
- PostgreSQL (Download)
- Python 3.x (Download)
- DBeaver (optional, for database visualization: Download)
- Required Python Libraries:
pip install pandas scikit-learn psycopg2 matplotlib seaborn numpy reportlab fpdf googleapiclient google
- In this case, we are going to use DBeaver, so the first thing we need to do is create a new connection.
- As shown in the picture, we need to configure the following settings:
- Host: localhost
- Database: postgres
- Port: 5432
- Username: postgres
- Password: YOUR PASSWORD
- After setting the parameters, we should test the connection to ensure it works before applying the configuration.
- AÑO (INT) - Year of the death date | Number
- FECHA_DEF (DATE) - Date of the death date
- SEXO_NOMBRE (VARCHAR(10)) - Gloss identifying biological sex
- EDAD_CANT (INT) - Numerical record of the patient's age at admission
- COMUNA (VARCHAR(100)) - Gloss of the commune of residence of the deceased, according to the political-administrative division updated in 2019
- NOMBRE_REGION (VARCHAR(100)) - Gloss of the region of residence
- LUGAR_DEFUNCION (VARCHAR(100)) - Describes the place where the death occurs
- PERIODO (VARCHAR(7)) - Period of the death date
- RANGO_ETARIO (VARCHAR(20)) - Age range of the deceased
- Since the data we have is quite large, we need to insert it through the terminal to avoid issues when using the editor.
psql -U postgres -d mi_base_de_datos -h localhost -p 5432 -f /home/usuario/Chile-COVID-Mortality-2020-2024/postgres/tables/add_into_tables.sql
select
cd."AÑO" as "year",
cd."NOMBRE_REGION" as "region",
count(cd."EDAD_CANT") as "quantity",
sum(case when cd."SEXO_NOMBRE" = 'Mujer' then 1 else 0 end) as "Women",
sum(case when cd."SEXO_NOMBRE" = 'Hombre' then 1 else 0 end) as "Men"
from covid_chile.covid_data cd
group by "year", "region"
order by "year";
The data used in this project, including COVID-19 mortality records in Chile from 2020 to 2024, is provided by the Chilean government’s open data platform:
Special thanks to Pamela Suarez (deis@minsal.cl) for creating and curating the dataset. I acknowledge and appreciate her work in making this information publicly accessible.
Thanks to the Ministerio de Salud de Chile and related governmental agencies for making this data publicly available.