This project analyzes the Airbnb dataset for New York City to understand factors affecting pricing, identify trends across neighborhoods, and find the most cost-effective accommodations. Using statistical and geospatial methods, this project provides actionable insights for Airbnb hosts, travelers, and policymakers.
- Pricing by Neighborhood: Understand average pricing trends in neighborhoods, focusing on Manhattan.
- Distribution Analysis: Compare price distributions of specific neighborhoods to overall NYC distribution.
- Cost-Effectiveness Analysis: Identify the top 1,000 cost-effective listings in NYC using custom scoring.
- Statistical Confidence: Use bootstrap methods to estimate average prices with confidence intervals.
- Variable Correlations: Examine the relationship between price and other factors, such as reviews and availability.
- The dataset:
Airbnb_Open_Data.csv
- Geospatial data:
NY.geojson
-
Data Cleaning:
- Converted pricing and service fee columns to numeric.
- Filtered outliers and irrelevant listings.
- Focused on homes with sufficient reviews (≥10).
-
Geospatial Analysis:
- Used
NY.geojson
withleaflet
in R to visualize neighborhood-level price trends. - Created an interactive heatmap showing average prices by neighborhood.
- Used
-
Statistical Insights:
- Bootstrapped average price estimates with 95% confidence intervals.
- Analyzed linear correlations between price and variables like number of reviews and availability.
-
Cost-Effectiveness Scoring:
- Designed a custom scoring algorithm based on price, reviews, and features like cancellation policy and host verification.
- A heatmap was created using
leaflet
in R to display average prices across NYC neighborhoods. - Key finding: Central NYC neighborhoods had similar pricing trends, likely due to high competition and demand.
- Neighborhood-specific price distributions were compared to overall NYC distribution.
- Key insight: Price distributions varied significantly, except for Midtown, which closely matched NYC's overall pattern.
- Identified the top 1,000 cost-effective listings using a scoring algorithm.
- Observation: Listings with high scores often shared attributes like flexible cancellation policies and verified hosts.
- Estimated the average price across NYC listings with a 95% confidence interval.
- Result: Average price ~524.27 (±1.94 standard error).
- Analyzed the linear correlation between price and variables like reviews, availability, and rating.
- Finding: Correlations were weak, indicating complex pricing dynamics.
- Programming: R
- Geospatial Analysis:
leaflet
,sf
- Data Cleaning:
tidyverse
- Statistical Analysis: Bootstrapping, correlation analysis
- Visualization: Heatmaps, ggplot
- Clone this repository:
git clone /~https://github.com/your-username/NY-Airbnb-Analysis.git
install.packages(c("leaflet", "sf", "tidyverse", "ggplot2"))
source("r (1).R")
📧 Contact For questions or collaboration opportunities, please contact:
Segev Cohen: [segev777701@gmail.com]