Skip to content

Latest commit

 

History

History
353 lines (213 loc) · 9.42 KB

README.md

File metadata and controls

353 lines (213 loc) · 9.42 KB

Integrated Project for Business



$$\Huge {\textbf{\color{DodgerBlue} GOOD DECISIONS = GOOD RESULTS}}$$



Sponsor Quantum Software Development


t describes the process of scaling or normalizing data within a specific range, typically to a standardized scale, for example, from 0 to 1. This is a common technique in data analysis and machine learning.



$$X_{normalized} = \frac{X - X_{\min}}{X_{\max} - X_{\min}}$$


$$X_{\max} - X_{\min} = \text{Amplitude}$$


Is the amplitude, a way to represent the range of data values before normalization.


To calculate the standardization of the variables salario, n_filhos, and idade using both the Z-Score and Range methods, and to evaluate the mean, standard deviation, maximum, and minimum before and after standardization, we can follow these steps:

Compute the mean, standard deviation, maximum, and minimum for each of the variables (n_filhos, salario, idade).

We standardize the variables using the Z-Score method, which is computed as:

$Z$ = $\frac{X - \mu}{\sigma}$

Z = \frac{X - \mu}{\sigma}

Where:

  • $( \mu )$ is the mean,

  • $( \sigma )$ is the standard deviation.


We scale the data using the Min-Max method, which scales the values to a [0, 1] range using:

$X'$ = $\frac{X - \min(X)}{\max(X) - \min(X)}$

X' = \frac{X - \min(X)}{\max(X) - \min(X)}

Where:

  • X is the original value,
  • min(X) is the minimum value,
  • max(X) is the maximum value.

Compute the mean, standard deviation, maximum, and minimum of the standardized data for both Z-Score and Range methods.

The output will provide the descriptive statistics before and after each standardization method, allowing you to compare the effects of Z-Score and Range standardization on the dataset.


Practical Example for Calculating this Normalized Value in Python:

Use this dataset

The code demonstrates how to apply Z-Score and Range (Min-Max) standardization to the variables salario, n_filhos, and idade in a dataset. It also evaluates and compares the mean, standard deviation, minimum, and maximum values before and after the standardization methods are applied.


# Importing the necessary libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler

# Load the data from the Excel file
# df = pd.read_excel('use-your-own-dataset.xlsx') - optional
df = pd.read_excel('cadastro_funcionarios.xlsx')
df.head()  # Displaying the first few rows of the dataset to understand its structure

# Step 1: Evaluate the mean, std, max, and min before standardization
before_std_stats = {
    'mean_n_filhos': df['n_filhos'].mean(),
    'std_n_filhos': df['n_filhos'].std(),
    'min_n_filhos': df['n_filhos'].min(),
    'max_n_filhos': df['n_filhos'].max(),
    
    'mean_salario': df['salario'].mean(),
    'std_salario': df['salario'].std(),
    'min_salario': df['salario'].min(),
    'max_salario': df['salario'].max(),
    
    'mean_idade': df['idade'].mean(),
    'std_idade': df['idade'].std(),
    'min_idade': df['idade'].min(),
    'max_idade': df['idade'].max(),
}

# Display the statistics before standardization
before_std_stats

Cell 4: Apply Z-Score standardization

# Step 2: Z-Score Standardization
df_zscore = df[['n_filhos', 'salario', 'idade']].apply(lambda x: (x - x.mean()) / x.std())

# Display the standardized data
df_zscore.head()

Cell 5: Evaluate the statistics after Z-Score standardization

# Step 3: Evaluate the mean, std, max, and min after Z-Score standardization
after_zscore_stats = {
    'mean_n_filhos_zscore': df_zscore['n_filhos'].mean(),
    'std_n_filhos_zscore': df_zscore['n_filhos'].std(),
    'min_n_filhos_zscore': df_zscore['n_filhos'].min(),
    'max_n_filhos_zscore': df_zscore['n_filhos'].max(),
    
    'mean_salario_zscore': df_zscore['salario'].mean(),
    'std_salario_zscore': df_zscore['salario'].std(),
    'min_salario_zscore': df_zscore['salario'].min(),
    'max_salario_zscore': df_zscore['salario'].max(),
    
    'mean_idade_zscore': df_zscore['idade'].mean(),
    'std_idade_zscore': df_zscore['idade'].std(),
    'min_idade_zscore': df_zscore['idade'].min(),
    'max_idade_zscore': df_zscore['idade'].max(),
}

# Display the statistics after Z-Score standardization
after_zscore_stats

Cell 6: Apply Range Standardization (Min-Max Scaling)

# Step 4: Range Standardization (Min-Max Scaling)
scaler = MinMaxScaler()
df_range = pd.DataFrame(scaler.fit_transform(df[['n_filhos', 'salario', 'idade']]), columns=['n_filhos', 'salario', 'idade'])

# Display the scaled data
df_range.head()

Cell 7: Evaluate the statistics after Range standardization

# Step 5: Evaluate the mean, std, max, and min after Range standardization
after_range_stats = {
    'mean_n_filhos_range': df_range['n_filhos'].mean(),
    'std_n_filhos_range': df_range['n_filhos'].std(),
    'min_n_filhos_range': df_range['n_filhos'].min(),
    'max_n_filhos_range': df_range['n_filhos'].max(),
    
    'mean_salario_range': df_range['salario'].mean(),
    'std_salario_range': df_range['salario'].std(),
    'min_salario_range': df_range['salario'].min(),
    'max_salario_range': df_range['salario'].max(),
    
    'mean_idade_range': df_range['idade'].mean(),
    'std_idade_range': df_range['idade'].std(),
    'min_idade_range': df_range['idade'].min(),
    'max_idade_range': df_range['idade'].max(),
}

# Display the statistics after Range standardization
after_range_stats


Pratical Example for Calculating this Normalized Value in Excel

Use this dataset

To standardize the variables (salary, number of children, and age) in Excel using the Z-Score and Range methods, you can follow these steps:


Steps for Z-Score in Excel:

Use the AVERAGE function to calculate the mean of the column. For example, to find the mean of the salary (column E), use:

=AVERAGE(E2:E351)

Use the STDEV.P function to calculate the standard deviation of the column. For example, to find the standard deviation of the salary (column E), use:

=STDEV.P(E2:E351)

For each value in the column, apply the Z-Score formula. In the first row of the new column, use:

=(E2 - AVERAGE(E$2:E$351)) / STDEV.P(E$2:E$351)

Example for Salary:

In cell H2 (new column for standardized salary), write

=(E2 - AVERAGE(E$2:E$351)) / STDEV.P(E$2:E$351)

Then, drag it down to the rest of the rows.

Repeat the same steps for the variables n_filhos (column D) and idade (column F).


Steps for Range Standardization in Excel:

Use the MIN and MAX functions to find the minimum and maximum values of the column. For example, to find the min and max of salary (column E), use:

=MIN(E2:E351)
=MAX(E2:E351)

For each value in the column, apply the range formula. In the first row of the new column, use:

=(E2 - MIN(E$2:E$351)) / (MAX(E$2:E$351) - MIN(E$2:E$351))

Example for Salary:

In cell I2 (new column for range standardized salary), write:

=(E2 - MIN(E$2:E$351)) / (MAX(E$2:E$351) - MIN(E$2:E$351))

Then, drag it down to the rest of the rows. Repeat the same steps for the variables n_filhos (column D) and idade (column F).


Summary of the Process

Z-Score Standardization centers the data around zero and scales it based on the standard deviation.

Range Standardization (Min-Max Scaling) rescals the data to a [0, 1] range.

Both techniques were applied (given dataset) to the columns n_filhos, salario, and idade, and the statistics (mean, std, min, max) were calculated before and after the standardization methods.



Copyright 2025 Quantum Software Development. Code released under the MIT License license.