Optimizing Ride Matching

A Statistical Comparison of Distances for an Optimal Trip Allocation

Project Overview

At our mobility company, the ride-matching algorithm assigns trips based on the closest available driver. Currently, we calculate distance using Haversine distance (straight-line distance). However, this does not consider road networks, traffic conditions, or travel time, which may result in suboptimal assignments. Engineering team proposes switching to an external real-time maps API to compute road distance, aiming to improve ride efficiency. While this approach is expected to enhance trip allocation, it introduces API query costs and additional system complexity. To determine whether this transition is justified, the Data Science team has designed an A/B test across multiple cities. This project evaluates the impact of road distance on operational efficiency, customer experience, and financial feasibility.

Objectives

Evaluate the impact of switching to road distance on ride assignment efficiency.
Estimate the maximum feasible cost per API query to justify the investment.
Assess experiment design improvements and propose enhancements.

Methodology

Exploratory Data Analysis (EDA): After a data preprocessing, explored trip data for anomalies and analyzed the distribution of key features like distance type or insights by city.
Hypothesis Testing: Performed non-parametric normality tests, visualized metrics using Dumbbell Plot to compare duration mean for each city and distance type, and applied Welch's two sample t-test to compare distances across cities.
Data Strategy & Experimental Design: Provided suggestions to gather additional information and improvements in the experimental design

Tech Stack

Data Manipulation: dplyr, tidyr, lubridate, stringr for cleaning and transforming data.
Hypothesis Testing: sm for non-parametric methods and t.test for means comparison
Data Visualization:: ggplot2 for visual analysis, reactable for interactive tables, and showtext for custom fonts.
Reporting: Quarto for creating interactive reports with custom branded visualizations.

Findings & Recommendations

City-Specific Insights: Astra had longer trips on average (up to 12.5 minutes) compared to Vera and Mina, indicating a larger city. Vera had shorter and more consistent trip durations, suggesting better mobility.
Efficiency Gains: For Vera and Astra, there was no significant improvement in trip duration when using road distance over linear distance. In Mina, switching to road distance reduced trip durations by +3%, indicating a clear benefit in this city.
Cost vs. Benefit: The estimated break-even API query cost is $0.0075 per request. If costs exceed this, the switch is not financially viable.
Further Improvements: 1) Testing during a longer period, 2) assign distance type depending on the time period instead of start trip id, 3) Consider adding additional data such as user satisfaction insights, cancellation rates or CO2 emissions.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
mobility.css		mobility.css
optimizing-ride-matching.html		optimizing-ride-matching.html
optimizing-ride-matching.qmd		optimizing-ride-matching.qmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optimizing Ride Matching

Project Overview

Objectives

Methodology

Tech Stack

Findings & Recommendations

About

Releases

Packages

Languages

PaulaLC/optimizing-ride-matching

Folders and files

Latest commit

History

Repository files navigation

Optimizing Ride Matching

Project Overview

Objectives

Methodology

Tech Stack

Findings & Recommendations

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages