A Statistical Comparison of Distances for an Optimal Trip Allocation
At our mobility company, the ride-matching algorithm assigns trips based on the closest available driver. Currently, we calculate distance using Haversine distance (straight-line distance). However, this does not consider road networks, traffic conditions, or travel time, which may result in suboptimal assignments. Engineering team proposes switching to an external real-time maps API to compute road distance, aiming to improve ride efficiency. While this approach is expected to enhance trip allocation, it introduces API query costs and additional system complexity. To determine whether this transition is justified, the Data Science team has designed an A/B test across multiple cities. This project evaluates the impact of road distance on operational efficiency, customer experience, and financial feasibility.
- Evaluate the impact of switching to road distance on ride assignment efficiency.
- Estimate the maximum feasible cost per API query to justify the investment.
- Assess experiment design improvements and propose enhancements.
- Exploratory Data Analysis (EDA): After a data preprocessing, explored trip data for anomalies and analyzed the distribution of key features like distance type or insights by city.
- Hypothesis Testing: Performed non-parametric normality tests, visualized metrics using Dumbbell Plot to compare duration mean for each city and distance type, and applied Welch's two sample t-test to compare distances across cities.
- Data Strategy & Experimental Design: Provided suggestions to gather additional information and improvements in the experimental design
- Data Manipulation:
dplyr
,tidyr
,lubridate
,stringr
for cleaning and transforming data. - Hypothesis Testing:
sm
for non-parametric methods andt.test
for means comparison - Data Visualization::
ggplot2
for visual analysis,reactable
for interactive tables, andshowtext
for custom fonts. - Reporting:
Quarto
for creating interactive reports with custom branded visualizations.
- City-Specific Insights: Astra had longer trips on average (up to 12.5 minutes) compared to Vera and Mina, indicating a larger city. Vera had shorter and more consistent trip durations, suggesting better mobility.
- Efficiency Gains: For Vera and Astra, there was no significant improvement in trip duration when using road distance over linear distance. In Mina, switching to road distance reduced trip durations by +3%, indicating a clear benefit in this city.
- Cost vs. Benefit: The estimated break-even API query cost is $0.0075 per request. If costs exceed this, the switch is not financially viable.
- Further Improvements: 1) Testing during a longer period, 2) assign distance type depending on the time period instead of start trip id, 3) Consider adding additional data such as user satisfaction insights, cancellation rates or CO2 emissions.