Skip to content

Reproducing the flan_v2 results of T5-xl #80

Open
@danczs

Description

First, thanks for this excellent work. However, I met some problems when reproducing the results of T2-xl.

My setting is:

Pretrained model and optimizer:
I used the T5-v1_1-xl pretrained model and following the training setting in "Scaling Instruction-Finetuned Language Models": batch size 64, Dropout 0.05, LR 5e-4, 38K steps, adafactor optimizer.

Data:
For the data, I first used the training data provided by SirNeural and evaluated the model on MMLU. When I equally sampled the 5 datasets (i.e. cot, flanv2, t0, diglog, niv2), I got 45% 5-shot accuracy on MMLU, which is similar to the w/o mixture balancing result in the paper. However, after I mixed the data with the suggested rates here, the accuracy is not improved (44%).

Afterwards, I tried the data provided by Enrico Shippole and mixed the data following the suggested rates. But the accuracy became worse (42% on MMLU). I also tried to use a larger batch size (128, considering batch packing ) and deduplicate the data, which nearly didn't help.

Are there any suggestions to reproduce the MMLU results of the released Flan-xl-t5 model (49%) or even the results in the paper(52%) ? Thanks a lot.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions