Skip to content

PaulaLC/optimizing-ride-matching

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Optimizing Ride Matching

A Statistical Comparison of Distances for an Optimal Trip Allocation

Project Overview

At our mobility company, the ride-matching algorithm assigns trips based on the closest available driver. Currently, we calculate distance using Haversine distance (straight-line distance). However, this does not consider road networks, traffic conditions, or travel time, which may result in suboptimal assignments. Engineering team proposes switching to an external real-time maps API to compute road distance, aiming to improve ride efficiency. While this approach is expected to enhance trip allocation, it introduces API query costs and additional system complexity. To determine whether this transition is justified, the Data Science team has designed an A/B test across multiple cities. This project evaluates the impact of road distance on operational efficiency, customer experience, and financial feasibility.

Objectives

  1. Evaluate the impact of switching to road distance on ride assignment efficiency.
  2. Estimate the maximum feasible cost per API query to justify the investment.
  3. Assess experiment design improvements and propose enhancements.

Methodology

  • Exploratory Data Analysis (EDA): After a data preprocessing, explored trip data for anomalies and analyzed the distribution of key features like distance type or insights by city.
  • Hypothesis Testing: Performed non-parametric normality tests, visualized metrics using Dumbbell Plot to compare duration mean for each city and distance type, and applied Welch's two sample t-test to compare distances across cities.
  • Data Strategy & Experimental Design: Provided suggestions to gather additional information and improvements in the experimental design

Tech Stack

  • Data Manipulation: dplyr, tidyr, lubridate, stringr for cleaning and transforming data.
  • Hypothesis Testing: sm for non-parametric methods and t.test for means comparison
  • Data Visualization:: ggplot2 for visual analysis, reactable for interactive tables, and showtext for custom fonts.
  • Reporting: Quarto for creating interactive reports with custom branded visualizations.

Findings & Recommendations

  • City-Specific Insights: Astra had longer trips on average (up to 12.5 minutes) compared to Vera and Mina, indicating a larger city. Vera had shorter and more consistent trip durations, suggesting better mobility.
  • Efficiency Gains: For Vera and Astra, there was no significant improvement in trip duration when using road distance over linear distance. In Mina, switching to road distance reduced trip durations by +3%, indicating a clear benefit in this city.
  • Cost vs. Benefit: The estimated break-even API query cost is $0.0075 per request. If costs exceed this, the switch is not financially viable.
  • Further Improvements: 1) Testing during a longer period, 2) assign distance type depending on the time period instead of start trip id, 3) Consider adding additional data such as user satisfaction insights, cancellation rates or CO2 emissions.

Releases

No releases published

Packages

No packages published

Languages