Evaluating German T5 Models on GermEval 2014 (NER)

This repository presents an on-going evaluation of German T5 models on the GermEval 2014 NER downstream task.

Changelog

03.02.2023: Initial version.

Fine-Tuning T5

A few approaches exist for fine-tuning T5 models for token classification tasks:

"Structured Prediction as Translation between Augmented Natural Languages"
"Autoregressive Structured Prediction with Language Models"

These approaches tackle the token classification task as a sequence-to-sequence task.

However, it is also possible to use obly the encoder of a T5 model for downstream tasks as presented in:

"EncT5: A Framework for Fine-tuning T5 as Non-autoregressive Models"

The proposed "EncT5" architecture was not evaluated on token classification tasks.

This repository uses the Flair library and encoder-only fine-tuning is performed for the GermEval 2014 NER dataset. The recently released T5 models for German are used as LM backbones.

Results

We perform a basic hyper-parameter search over and report micro F1-Score, averaged over 5 runs (with different seeds). Score in brackets indicates result on development split.

Model Size	Configuration	Run 1	Run 2	Run 3	Run 4	Run 5	Avg.
Small	`bs16-e10-lr0.00011`	(87.24) / 85.53	(86.40) / 85.63	(86.50) / 85.47	(86.32) / 85.57	(86.77) / 85.38	(86.65) / 85.52
Large	`bs16-e10-lr0.00011`	(87.16) / 86.46	(87.07) / 85.76	(87.46) / 85.57	(87.05) / 86.91	(87.15) / 86.11	(87.18) / 86.16

For hyper-parameter search, the script flair-fine-tuner.py is used in combination with a configuration file (passed as argument). All configuration files are located under ./configs that were used for the experiments here.

Baselines:

Fine-tuned DistilBERT models reports (86.84) / 85.62.
GELECTRA Base reports 86.02 on test set.
Current SOTA is GELECTRA Large with 88.95 on test set.

Hardware/Requirements

Latest Flair version (commit 6da65a4) is used for experiments.

All models are fine-tuned on A10 (24GB) instances from Lambda Cloud.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Evaluating German T5 Models on GermEval 2014 (NER)

Changelog

Fine-Tuning T5

Results

Baselines:

Hardware/Requirements

Files

README.md

Latest commit

History

README.md

File metadata and controls

Evaluating German T5 Models on GermEval 2014 (NER)

Changelog

Fine-Tuning T5

Results

Baselines:

Hardware/Requirements