Kaggle_LLMClassificationFinetuning

Overview

This repo is the humble result of my work on a Kaggle competition: https://www.kaggle.com/competitions/llm-classification-finetuning/overview

The idea is to predict which responses users will prefer in a head-to-head battle between chatbots powered by LLMs. The dataset is composed of a promt, and 2 responses comming from 2 differents LLM for the Chatbot Arena.

The only data accessible in the test set are [prompt] [reponse_a] and [reponse_b].

It a multiclass classification evaluated on the log loss of the probability made for each class.

I did this mainly to improve my knowledge on NLP and finetuning LLM.

Evolution

I started using a notebook given by Kaggle, working with tensoflow and WSL. I had many issues working with tensorflow and wsl (like tensor incompatibility between Tensorflow and Transformers, for instance), so i quicly recreate that notebook using PyTorch, worked like a charm.

First i tried a solution using Roberta and a siamese network, tokenizing the promt paired with each response separetely. Achieved a modest result, but good enought to start with.

Then i played a little bit with some basic feature engineering (lenght, similarity, key overlap and lexical diversity). This improved a little my results. For that i created a model using roberta, get both embedding from it, concatenating them with a vector containning all my features, and added a classification hea don top of it.

Then i switch to an other variant of roberta, mdeberta, supposed to handle multiligual embeddings. Helped to improve results as well. ("microsoft/mdeberta-v3-base" on HuggingFace)

Finnaly, a good enhancement i had was by adding a warm-up/decay scheduler (originally present in the tensorflow starter notebook) but i also added different starting LR for the finetuning part and the classification layer. This improved drasticly my results. I did not took the time to search for optimal hyperparameters, because i spent enought time on this project and i wanted to start something else, but they are possible improvements to be made on this part.

Results

The competition scores using the log loss between prediction and test set label, of probabilities shared between [reponse_a prefered] [reponse_b prefered] or [tie]. I scored 1.19 loss, while best of leaderboard are close to 0.83. Which is 'ok' but not a particularly good result.

But there is plenty of room to improve, and i have now a good backbone to start another interesting competition, based on almost the same parameters.

Possible improvement

Create a pipeline with less data to be able to test different ideas/FeatueEngineering/Models so i can iterate faster and compare more strategies.
Better feature engineering: already have better idea on how to handle similarity.
Try bigger and better models, i saw very good results of ppl using Gemma2, and i recently learned about the existence of a multiligual Gemma2 (https://huggingface.co/BAAI/bge-multilingual-gemma2) that i would like to test.
Grid Search to optimize hyperparameters
Getting the most out of the GPU T4 x2 accelerator from kaggle by using multithreading trainning.
Upgrading Sequence lenght, currenlty at 256, not ideal.
Chaging the model to create only 1 embedding containning prompt resp_a and resp_b. Currently it is using too much memory by storing prompt x2, and im stuck with a poor sequence lenght (256)

Next Step

Using this work as a baseline for another similar competition (timed) WSDM Cup - Multilingual Chatbot Arena, almost the same, but is only a binnary classification (no Tie) and ask for more supports on multiligual prompts.

Links

Trainning on Kaggle: https://www.kaggle.com/code/ohmatheus/llm-classification-supervisedlearning
Predict on Kaggle: https://www.kaggle.com/code/ohmatheus/llm-classification-predict

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Trainning		Trainning
Utils		Utils
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kaggle_LLMClassificationFinetuning

Overview

Evolution

Results

Possible improvement

Next Step

Links

About

Releases

Packages

Languages

ohmatheus/Kaggle_LLMClassificationFinetuning

Folders and files

Latest commit

History

Repository files navigation

Kaggle_LLMClassificationFinetuning

Overview

Evolution

Results

Possible improvement

Next Step

Links

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages