Skip to content

A computational cluster-based study performed on user comment data parsed from VK. Course project for 1st year of masters, FDT ITMO

Notifications You must be signed in to change notification settings

stas1f1/VK-User-Analysis

Repository files navigation

VK User Analysis

VK public news pages have become quite a political and cultural battleground recently. People will always engage in a heated discussion in the comments, especially when their point of view is not shared by others. The question posed is - are they real people? This project explores a possibility of bot account detection in those kinds of scenarios.

For this project, a parsing script based on VK Api and VKScript was created to collect posts data from a group of biggest and most active VK news (general and political) pages:

1 канал, Лентач, РБК, Роскомсвобода, Дождь, Вести, Топор, Медуза, РенТВ, Плохие Новости, Life, РИА, Mash

Plus, we collected data about comments and respective user profiles (1,5M+ accounts total) which accounted to 15+ GB of parquet data stored on cluster via HDFS.

Using Spark, we devised a series of criteria based on profile and activity data, and most intriguingly - comments sentiment analysis, performed with Dostoevsky - a library for analysis of russian text (which I adore for its speed, accuracy and ease of use). With the gathered info we used a percentile score which gave us the final verdict.

Chosen groups come in varying degrees of negativity in audience

Below is the rating of groups based on spam comment ratio

Want to take a deep dive into the pool of chaos? Here are the top-5 posts that caused the most toxic, hateful and controversial discussions between people, trolls and bots:


№1: "Плохие Новости", topic: Scandalous behavior in social media

№2: "1 Канал", topic: Religious insults

This post is somewhat of an outlier, as it dates back to 2012 and a bigger amount of accounts are deleted

№3: "РИА", topic: Event involving injury and awarding of a police officer

№4: "Плохие Новости", topic: Questional race-based statements

№5: "Дождь"*, topic: Alexei Navalny's return to Russia

*banned in Russia, only accessible with VPN


Made with Python as a course project for 1st year of masters, FDT ITMO

About

A computational cluster-based study performed on user comment data parsed from VK. Course project for 1st year of masters, FDT ITMO

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published