VK User Analysis

VK public news pages have become quite a political and cultural battleground recently. People will always engage in a heated discussion in the comments, especially when their point of view is not shared by others. The question posed is - are they real people? This project explores a possibility of bot account detection in those kinds of scenarios.

For this project, a parsing script based on VK Api and VKScript was created to collect posts data from a group of biggest and most active VK news (general and political) pages:

1 канал, Лентач, РБК, Роскомсвобода, Дождь, Вести, Топор, Медуза, РенТВ, Плохие Новости, Life, РИА, Mash

Plus, we collected data about comments and respective user profiles (1,5M+ accounts total) which accounted to 15+ GB of parquet data stored on cluster via HDFS.

Using Spark, we devised a series of criteria based on profile and activity data, and most intriguingly - comments sentiment analysis, performed with Dostoevsky - a library for analysis of russian text (which I adore for its speed, accuracy and ease of use). With the gathered info we used a percentile score which gave us the final verdict.

Chosen groups come in varying degrees of negativity in audience

Below is the rating of groups based on spam comment ratio

Want to take a deep dive into the pool of chaos? Here are the top-5 posts that caused the most toxic, hateful and controversial discussions between people, trolls and bots:

№1: "Плохие Новости", topic: Scandalous behavior in social media

№2: "1 Канал", topic: Religious insults

This post is somewhat of an outlier, as it dates back to 2012 and a bigger amount of accounts are deleted

№3: "РИА", topic: Event involving injury and awarding of a police officer

№4: "Плохие Новости", topic: Questional race-based statements

№5: "Дождь"*, topic: Alexei Navalny's return to Russia

*banned in Russia, only accessible with VPN

Made with Python as a course project for 1st year of masters, FDT ITMO

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitattributes		.gitattributes
Cluster_processing.ipynb		Cluster_processing.ipynb
Dostoevsky_Sentiment.ipynb		Dostoevsky_Sentiment.ipynb
Parser_for_VK.ipynb		Parser_for_VK.ipynb
README.md		README.md
VK_analysis.pptx		VK_analysis.pptx
acctypes.png		acctypes.png
pos_neg.png		pos_neg.png
spamrates.png		spamrates.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VK User Analysis

About

Releases

Packages

Languages

stas1f1/VK-User-Analysis

Folders and files

Latest commit

History

Repository files navigation

VK User Analysis

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages