This project, Twitter Analysis with PageRank Algorithm, is built using Object-Oriented Programming (OOP) principles. The system leverages Selenium and the Twitter API to collect data on Key Opinion Leaders (KOLs) on Twitter, stores the data in a PostgreSQL database, and applies the PageRank Algorithm to analyze the influence of nodes within the graph.
- Data Collection: Managed by the
scraper
package. - Data Processing: Import data from the
output
directory into the database and compute PageRank via theprocessor
package. - Results: PageRank scores are exported to the
target
directory.
- Ensure Java, PostgreSQL, and Selenium WebDriver are installed on your system.
- Clone the repository:
git clone /~https://github.com/username/TwitterAnalysis-with-Pagerank.git
cd TwitterAnalysis-with-Pagerank
- Create an
application.properties
file in the root directory of the project with the following structure:
spring.datasource.url=jdbc:postgresql://localhost:5432/DatabaseName
spring.datasource.username=YourDatabaseUsername
spring.datasource.password=YourDatabasePassword
initialize_databasePath=src/main/java/processor/dataprocessing/sql/schema.sql
queriesPath=src/main/java/processor/dataprocessing/sql/queries.sql
directedSimpleGraphAdjListPath=output/AdjList/directedSimpleGraph.json
1-wayDirectedSimpleGraphAdjListPath=output/AdjList/1-wayDirectedSimpleGraph.json
PageRankOutputPath=output/PageRankPoints/pageRankPoints.json
IncrementalPageRankOutputPath=output/PageRankPoints/IncrementalPageRankPoints.json
twitter.username=YourTwitterUsername
twitter.password=YourTwitterPassword
oauth.ConsumerKey=YourConsumerKey
oauth.Consumer_Key_Secret=YourConsumerKeySecret
oauth.Access_Token=YourAccessToken
oauth.Access_Token_Secret=YourAccessTokenSecret
- Navigate to the
scraper
directory:
cd src/main/java/scraper
- Run the
Main.java
file to start data collection:
javac Main.java
java Main
- The collected data will be saved in the
output
directory:
- Navigate to the
processor
directory:
cd src/main/java/processor
- Run the
Main.java
file to process data and calculate PageRank:
javac Main.java
java Main
- The
Main.java
file performs the following tasks:
- Initializes the database using
schema.sql
. - Imports data from the
output
directory into the PostgreSQL database. - Computes PageRank scores and exports the results to
output/PageRankPoints
.
- PageRank scores of graph nodes are saved in the
output/PageRankPoints
directory: pageRankPoints.json
: General PageRank results.IncrementalPageRankPoints.json
: Incremental PageRank results.
Detailed package design and the overall process are documented in the report file located in report/OOP_Report.pdf
.
- Programming Language: Java
- Framework: Spring Boot
- Library: Selenium WebDriver
- Database: PostgreSQL
- Algorithm: PageRank
If you’d like to contribute to this project:
- Fork the repository.
- Create a feature branch:
git checkout -b feature-branch
- Commit your changes:
git commit -m “Add new feature”
- Push the branch:
git push origin feature-branch
- Open a Pull Request.
For any additional queries, feel free to open an issue in the repository. 😊