Skip to content

A UK whitepages alternative which leverages the BT Phonebook and ripgrep to quickly parse through PDFs to find data on people, phone numbers, businesses and addresses

Notifications You must be signed in to change notification settings

maxmoodycyber/BT-Phonebook-Lookup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

BT Phonebook Lookup

A UK whitepages alternative which leverages the BT Phonebook and ripgrep to quickly parse through PDFs to find data on people, phone numbers, businesses, and addresses.

Project Overview

This project downloads the BT Phonebook PDFs, indexes the data contained within them, and then allows you to quickly search for records (such as names, addresses, or phone numbers) using ripgrep for high performance.

Project Structure

  • scraper.py
    Scrapes a predefined website for PDF links (sourced from the BT Phonebook) and downloads them into the pdfs/ directory.

  • search_pdfs.py
    Checks if the pdfs/ directory contains any PDFs. If not, it automatically executes scraper.py to download them. It then indexes the PDFs by extracting and parsing their text (using a phone number pattern as a delimiter) and provides an interactive search prompt powered by ripgrep.

  • requirements.txt
    Lists the Python packages required for this project.

Requirements

  • Python 3.6+

Python Packages

External Tools

  • pdftotext (optional but recommended for fast PDF text extraction)
    Refer to the pdftotext documentation or install via your package manager.

  • ripgrep (rg) (for fast, memory-efficient searching)
    Visit ripgrep on GitHub for installation instructions and ensure it is in your system's PATH.

Installation

  1. Clone the repository or download the scripts (scraper.py, parser.py, and requirements.txt) into the same directory.

  2. Install Python dependencies using pip:

    pip install -r requirements.txt
  3. Ensure External Tools are Installed:

    • Install pdftotext.
    • Install ripgrep and ensure it is available in your system's PATH.

Usage

  1. Run the Search Script:

    python parser.py
  2. Script Behavior:

    • The script checks if the pdfs directory exists and contains any PDF files.
    • If no PDFs are found, it automatically runs scraper.py to download PDFs from the BT Phonebook.
    • Once the PDFs are available, the script indexes them (creating records_index.txt) and then prompts you for a search query.
    • Enter a search term (e.g., a name, address fragment, or phone number) to see matching records, which are retrieved using ripgrep.

Customization

  • Change the Scraping URL:
    To modify the URL from which PDFs are scraped, edit the base_url variable in scraper.py.

  • Modify Record Parsing:
    Adjust the regular expression in the parse_records function in parser.py if your PDF data format changes.

Acknowledgements

About

A UK whitepages alternative which leverages the BT Phonebook and ripgrep to quickly parse through PDFs to find data on people, phone numbers, businesses and addresses

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages