BT Phonebook Lookup

A UK whitepages alternative which leverages the BT Phonebook and ripgrep to quickly parse through PDFs to find data on people, phone numbers, businesses, and addresses.

Project Overview

This project downloads the BT Phonebook PDFs, indexes the data contained within them, and then allows you to quickly search for records (such as names, addresses, or phone numbers) using ripgrep for high performance.

Project Structure

scraper.py
Scrapes a predefined website for PDF links (sourced from the BT Phonebook) and downloads them into the pdfs/ directory.
search_pdfs.py
Checks if the pdfs/ directory contains any PDFs. If not, it automatically executes scraper.py to download them. It then indexes the PDFs by extracting and parsing their text (using a phone number pattern as a delimiter) and provides an interactive search prompt powered by ripgrep.
requirements.txt
Lists the Python packages required for this project.

Requirements

Python 3.6+

Python Packages

External Tools

pdftotext (optional but recommended for fast PDF text extraction)
Refer to the pdftotext documentation or install via your package manager.
ripgrep (rg) (for fast, memory-efficient searching)
Visit ripgrep on GitHub for installation instructions and ensure it is in your system's PATH.

Installation

Clone the repository or download the scripts (scraper.py, parser.py, and requirements.txt) into the same directory.
Install Python dependencies using pip:
```
pip install -r requirements.txt
```
Ensure External Tools are Installed:
- Install pdftotext.
- Install ripgrep and ensure it is available in your system's PATH.

Usage

Run the Search Script:
```
python parser.py
```
Script Behavior:
- The script checks if the pdfs directory exists and contains any PDF files.
- If no PDFs are found, it automatically runs scraper.py to download PDFs from the BT Phonebook.
- Once the PDFs are available, the script indexes them (creating records_index.txt) and then prompts you for a search query.
- Enter a search term (e.g., a name, address fragment, or phone number) to see matching records, which are retrieved using ripgrep.

Customization

Change the Scraping URL:
To modify the URL from which PDFs are scraped, edit the base_url variable in scraper.py.
Modify Record Parsing:
Adjust the regular expression in the parse_records function in parser.py if your PDF data format changes.

Acknowledgements

PyPDF2 for PDF text extraction.
requests and BeautifulSoup for web scraping.
ripgrep for fast and efficient searching.
pdftotext for efficient PDF text extraction.
AI for filling in my knowledge gaps.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
parser.py		parser.py
requirements.txt		requirements.txt
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BT Phonebook Lookup

Project Overview

Project Structure

Requirements

Python Packages

External Tools

Installation

Usage

Customization

Acknowledgements

About

Releases

Packages

Languages

maxmoodycyber/BT-Phonebook-Lookup

Folders and files

Latest commit

History

Repository files navigation

BT Phonebook Lookup

Project Overview

Project Structure

Requirements

Python Packages

External Tools

Installation

Usage

Customization

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages