This project performs a comprehensive analysis of the SuperStore Sales dataset, which includes sales, profit, customer segments, product categories, and geographic data. The main goal of the analysis is to uncover insights related to customer segmentation, product category distributions, geographic patterns, and other business trends.
The analysis is done using the R programming language, and various data processing, cleaning, and visualization techniques are applied to derive insights from the dataset.
The dataset used in this project is a sample SuperStore sales dataset that includes details on product sales, profits, customers, and geographic data. This dataset can be found at:
-
Load and Inspect Data:
The dataset is loaded into R, and the variables are inspected for inconsistencies in data types. The variables are then converted to their appropriate data types.Summary of the Data Frame before Data Preprocessing:
Figure 1: Summary of the data frame before data preprocessing -
Data Cleaning:
Unnecessary or irrelevant variables are removed, and missing or inconsistent data points are handled to ensure accurate analysis.Updated Names and Data Types for Columns in Data Frame:
Figure 2: Updated names and data types for columns in data frameSummary of the Data Frame after Data Preprocessing:
Figure 3: Summary of the data frame after data preprocessing -
Top 1000 Instances:
After data cleaning, only the top 1000 records are retained for further analysis, ensuring that the dataset remains manageable and focused.Summary of Top 1000 Instances of The Data Frame:
-
Customer Segments & Product Categories:
Customers are grouped based on segments, and the distribution of product categories within each segment is visualized.Bar Chart of The Category Distribution for Each of The Customer Segments:
Figure 5: Bar chart of the category distribution for each of the customer segments -
Top 10 Product Categories:
A ranking of the top 10 most frequently sold product categories based on sales and profit data is presented. The top 10 sub-categories are Binders, storage, art, phones, chairs, paper, furnishings, labels, fasteners, and machines.Bar Chart of The Frequency of Different Sub-Categories:
Figure 6: Bar chart of the frequency of different sub-categories -
Subcategory Insights:
For each product category, the 3 most frequently bought subcategories are identified and visualized. The 3 most frequent-bought sub-categories for furniture are bookcases, chairs, and furnishings. The 3 most frequent-bought sub-categories for office supplies are binders, art, and storage. The 3 most frequent-bought sub-categories for technology are phones, machines, and have same frequency for accessories and copiers.Bar Chart of The Frequency of Sub-Categories in All Categories:
Figure 7: Bar chart of the frequency of sub-categories in all categories -
Geographic Distribution:
A geographic analysis is done to highlight the top 10 countries or regions that contribute the most to sales and customer base. The top 10 countries are United States, France, Australia, Germany, Nigeria, China, Italy, Nicaragua, Spain, and Mexico.Bar Chart of The Frequency of Countries:
-
Additional Trends and Insights:
a. Bar Chart of The Frequency of Different Markets:
Figure 9: Bar chart of the frequency of different markets
b. Bar Chart of The Average Shipping Cost in Different Markets:
Figure 10: Bar chart of the average shipping cost in different markets
c. Bar Chart of The Frequency of Different Categories in All Markets:
Figure 11: Bar chart of the frequency of different categories in all markets
d. Bar Chart of The Average Profit for All Markets:
According to the chart above, LATAM has the highest profit among all markets, followed by APAC. Notice that, EU market has lowest profit, although its sales is the highest among all market (according to Figure 11).
This project provides valuable insights into retail sales, customer segments, product trends, and geographic patterns. The analysis can be useful for business decision-making, such as product planning, sales forecasting, and market expansion strategies.