Final Project DTI - Indonesian Tweet Racism Detection

This repository is the final project to complete the Telkom Digital Talent Incubator 2020 program. This research project was created as our contribution as a Telkom DTI 2020 Data Scientist to the UN Sustainable Development Goals 2030 program in the social justice and humanities fields to realize racial equality in communities worldwide.

Dataset

The dataset used is a collection of racist tweets in Indonesian. This dataset contains 511 rows labeled Non-Racism and 175 rows labeled Racism. You can see the dataset in the following link:

https://raw.githubusercontent.com/asthala/racism-detection/master/datasetfix.csv

This how the dataset looks like in wordcloud :

Research Flowchart

Text Preprocessing

Text data needs to be cleaned and encoded to numerical values before giving them to machine learning models. This process of cleaning and encoding is called text preprocessing. The following are the preprocessing stages carried out in this project:

Data Cleaning: aims to remove HTML tags, username, URL, and other unnecessary symbols.
Case Folding: aims to change the capital letter to a lowercase
Tokenization: aims to separate sentences into word pieces called tokens for later analysis
Stopword Removal: aims to remove any word tokens that contain stopword words (unnecessary words)
Stemming: aims to transform tokens into basic words by removing all word affixes
TF-IDF: aims to give weight to the terms of the dataset
SMOTE Oversampling: aims to balance data classes

Model

The model is build using the Multi Layer Perceptron or MLP architecture. The activation function that will be used for hidden neurons is the Rectified Linear Unit (ReLU).

How to use

Make sure you already have python and pip installed on your computer
Install all requirements pip install -r requirements.txt
Run app python app.py
Go to your browser and run 127.0.0.1:5000

Created By

Muhammad Alfhi Saputra
Tigas Adrian Wahyuindrajati
Crisnandra Rahmita Mardiantien
La Ode Muhamad Sofyan Syarafa

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
static/css		static/css
templates		templates
Procfile		Procfile
Project_DTI_Machine_Learning_Method_MLP_Method.ipynb		Project_DTI_Machine_Learning_Method_MLP_Method.ipynb
README.md		README.md
app.py		app.py
data_clean.csv		data_clean.csv
flowchart.png		flowchart.png
model.py		model.py
model_mlp.pkl		model_mlp.pkl
nltk.txt		nltk.txt
poster fix.jpg		poster fix.jpg
preprocessing.py		preprocessing.py
racismwordcloud.png		racismwordcloud.png
request.py		request.py
requirements.txt		requirements.txt
testmodel.py		testmodel.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Final Project DTI - Indonesian Tweet Racism Detection

Dataset

Research Flowchart

Text Preprocessing

Model

How to use

Created By

About

Releases

Packages

Contributors 3

Languages

ds-dti/DS01_07_racism-detection

Folders and files

Latest commit

History

Repository files navigation

Final Project DTI - Indonesian Tweet Racism Detection

Dataset

Research Flowchart

Text Preprocessing

Model

How to use

Created By

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages