Fine-tuning FLAN-T5 with PPO and PEFT to generate less toxic text summaries. This notebook leverages Meta AI's hate speech reward model and utilizes RLHF techniques for improved safety.
nlp toxic-comment-classification hate-speech-detection toxicity-analysis ppo-pytorch dialogue-summarization generative-ai detoxification reward-model
-
Updated
Jan 4, 2025 - Jupyter Notebook