pyDivar - The best divar crawler ever.
Welcome to pyDivar, a Python-based crawler designed to extract data from Divar, the largest classified ads website in Iran. This tool is aimed at providing efficient and robust data scraping capabilities to help you gather information from Divar seamlessly.
- Efficient data extraction from Divar.
- Easy-to-use interface.
- Customizable crawling settings.
To install pyDivar, follow these steps:
-
Clone the repository:
git clone /~https://github.com/hadif1999/pyDivar.git
-
Navigate to the project directory:
cd pyDivar
-
Install the required dependencies:
pip install -r requirements.txt
To use pyDivar, follow these steps:
-
Change the config as per your need: 1.1. Login to Divar in your browser.
1.2. Open the inspect tool and navigate to the network section.
1.3. Copy the content of the "Authorization" header from the response header of one of the pages that contains this header.
1.4. Add this toconfig.json
as thegeneral.AUTH_TOKEN
field.
1.5. Changegeneral.category
to your desired category (find this category by copying from the URL of Divar when selecting a category). -
Run the following command:
python3 main.py
-
The result will be saved as an XLSX file to the path specified in
general.output_path
of the config.
Note: If crawling stops due to an error, check Divar and select a post, then pass the CAPTCHA by clicking on "اطلاعات تماس" then try again.
Note 2: If you are not in Iran, your IP will be banned after retrieving approximately 1 page (around 24 phone numbers). If you are in Iran, you can retrieve around 8 pages before your IP gets banned. After that, you will need to change your IP. The ban duration on Divar is approximately 24 hours.
We welcome contributions to pyDivar! If you have any suggestions, bug fixes, or new features, please feel free to submit a pull request. Follow these steps to contribute:
- Fork the repository.
- Create a new branch:
git checkout -b feature-branch-name
- Make your changes and commit them:
git commit -m 'Add some feature'
- Push to the branch:
git push origin feature-branch-name
- Submit a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.
Feel free to customize this template to better fit your project's needs.