Skip to content

Text data of KonoSuba: God's Blessing on This Wonderful World! Light Novel Volume 1 to 17 + short stories (English fan translation).

License

Notifications You must be signed in to change notification settings

MarsRon/konosuba-data

Repository files navigation

KonoSuba Data

Text data of KonoSuba: God's Blessing on This Wonderful World! Light Novel Volume 1 to 17 + short stories (English fan translation).

Note:
Most of the unrelated metadata/TL note have been removed.
This might have accidentally removed some lines from the light novel, but the damage should be minimal.
Feel free to create an issue if there are some lines that have been accidentally removed.

Usage

Download the files below.

File Lines Size Description
konosuba.txt 47573 4.5MB 17 volumes of KonoSuba light novel condensed into 1 file. Both dialogue and monologue are included.
konosuba-dialogue.txt 18689 2.3MB Contains only dialogues in between quotes (“”). Monologue is excluded.
konosuba-dataset.json 18688 5.8MB JSON array of dialogues between user and assistant to fine-tune LLMs. Data from konosuba-dialogue.txt Example: { [user: 1st line, assistant: 2nd line], [user: 2nd line, assistant: 3rd line], [user: 3rd line, assistant: 4th line], ... }

Shameless self-plug:

  • Wanna make a Markov chain random sentence generator? Check out aqua.
  • Wanna make a AI chatbot? Check out kazuma.

Context

KonoSuba: God's Blessing on This Wonderful World!, often referred to simply as KonoSuba, is a Japanese light novel series written by Natsume Akatsuki. The series follows Kazuma Satou, a boy who is sent to a fantasy world with MMORPG elements following his death, where he forms a dysfunctional adventuring party with a goddess (Aqua), an archwizard (Megumin), and a crusader (Darkness/Lalatina Dustiness Ford).

Premise

Following an untimely and embarrassing death, Kazuma Satou, a Japanese teenage shut-in NEET, meets a goddess named Aqua, who offers to reincarnate him in a parallel world with MMORPG elements, where he can go on adventures and battle monsters. Despite being offered a superpowered item or ability to use in this new world, Kazuma, following some provocation, chooses Aqua herself to accompany him to the town of Axel, quickly finding her absent-mindedness to be less than beneficial. With Aqua unable to return to the afterlife until the Devil King is defeated, the two form a party and recruit two other members; an explosion-obsessed magician named Megumin and a masochistic crusader named Darkness. Due to the party's dysfunctional abilities, Kazuma quickly gives up on the idea of defeating the Devil King and tries to live a comfortable lifestyle, only to find the circumstances of his daily life are forcing him and his party to encounter and battle the Devil King's generals.

Source: https://en.wikipedia.org/wiki/KonoSuba

I wanna DIY

If you want to manually generate the data yourself, I recommend using a proxy/VPN before running the webscraper.

Clone the project.

git clone /~https://github.com/MarsRon/konosuba-data

Create a Python virtual environment.

python3 -m venv venv
source venv/bin/activate

Install libraries.

pip install -r requirements.txt

Run the webscraper.

python scrape.py

This will create a ./data directory which temporarily stores each chapter from Volume 1 to Volume 17 in text form.

Then, the script will merge all the posts into konosuba.txt and also generate konosuba-dialogue.txt only from speeches.

Dataset creation

Run the dataset creation script.

python dataset.py

This will create konosuba-dataset.json which can then be used to fine-tune LLMs such as Llama-3.2 3B.

You can edit the bottom of the script to choose compact JSON format (default), pretty-print JSON format or CSV format.

Acknowledgements

The data is scraped from cgtranslations.me and crimsonmagic.me.

License

Distributed under the MIT License. See LICENSE.md for more information.

Contact

MarsRon - marsron204@gmail.com - marsron.name.my

About

Text data of KonoSuba: God's Blessing on This Wonderful World! Light Novel Volume 1 to 17 + short stories (English fan translation).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages