Skip to content

Commit

Permalink
Add missing tags to XTREME (#3322)
Browse files Browse the repository at this point in the history
* Add missing tags to XTREME

* Add licenses

* Update XNLI license

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
  • Loading branch information
mariosasko and lhoestq authored Nov 29, 2021
1 parent 09b7bb0 commit e6f1352
Showing 1 changed file with 460 additions and 1 deletion.
Loading

1 comment on commit e6f1352

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==3.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.113896 / 0.011353 (0.102543) 0.007363 / 0.011008 (-0.003645) 0.039676 / 0.038508 (0.001168) 0.059504 / 0.023109 (0.036395) 0.471405 / 0.275898 (0.195507) 0.507690 / 0.323480 (0.184210) 0.145619 / 0.007986 (0.137633) 0.005021 / 0.004328 (0.000692) 0.013619 / 0.004250 (0.009369) 0.062841 / 0.037052 (0.025788) 0.468609 / 0.258489 (0.210120) 0.478410 / 0.293841 (0.184569) 0.125745 / 0.128546 (-0.002801) 0.009995 / 0.075646 (-0.065651) 0.410229 / 0.419271 (-0.009042) 0.051595 / 0.043533 (0.008062) 0.493469 / 0.255139 (0.238330) 0.441463 / 0.283200 (0.158263) 0.115916 / 0.141683 (-0.025766) 2.595217 / 1.452155 (1.143063) 2.760663 / 1.492716 (1.267946)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.361011 / 0.018006 (0.343005) 0.679190 / 0.000490 (0.678700) 0.008239 / 0.000200 (0.008039) 0.000182 / 0.000054 (0.000128)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.041488 / 0.037411 (0.004076) 0.023006 / 0.014526 (0.008480) 0.032619 / 0.176557 (-0.143938) 0.235469 / 0.737135 (-0.501667) 0.033806 / 0.296338 (-0.262532)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.702115 / 0.215209 (0.486906) 7.318789 / 2.077655 (5.241134) 3.616772 / 1.504120 (2.112652) 2.826692 / 1.541195 (1.285498) 2.847268 / 1.468490 (1.378778) 0.783095 / 4.584777 (-3.801682) 9.043615 / 3.745712 (5.297903) 3.587846 / 5.269862 (-1.682016) 1.864630 / 4.565676 (-2.701047) 0.080864 / 0.424275 (-0.343411) 0.020312 / 0.007607 (0.012705) 0.838168 / 0.226044 (0.612123) 9.176907 / 2.268929 (6.907979) 3.793175 / 55.444624 (-51.651450) 3.062272 / 6.876477 (-3.814204) 3.025520 / 2.142072 (0.883447) 0.853790 / 4.805227 (-3.951437) 0.181046 / 6.500664 (-6.319618) 0.081682 / 0.075469 (0.006212)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 2.452304 / 1.841788 (0.610516) 20.361484 / 8.074308 (12.287176) 59.383051 / 10.191392 (49.191659) 1.363830 / 0.680424 (0.683407) 1.123688 / 0.534201 (0.589487) 0.837807 / 0.579283 (0.258523) 0.960699 / 0.434364 (0.526335) 0.569726 / 0.540337 (0.029389) 0.481808 / 1.386936 (-0.905128)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.069295 / 0.011353 (0.057942) 0.005213 / 0.011008 (-0.005795) 0.042635 / 0.038508 (0.004127) 0.044651 / 0.023109 (0.021542) 0.402484 / 0.275898 (0.126586) 0.506266 / 0.323480 (0.182786) 0.095710 / 0.007986 (0.087724) 0.003532 / 0.004328 (-0.000796) 0.009558 / 0.004250 (0.005307) 0.043394 / 0.037052 (0.006341) 0.443029 / 0.258489 (0.184540) 0.522637 / 0.293841 (0.228796) 0.095395 / 0.128546 (-0.033151) 0.017880 / 0.075646 (-0.057766) 0.420248 / 0.419271 (0.000976) 0.050484 / 0.043533 (0.006951) 0.507817 / 0.255139 (0.252678) 0.444599 / 0.283200 (0.161400) 0.083524 / 0.141683 (-0.058159) 2.665403 / 1.452155 (1.213248) 2.181263 / 1.492716 (0.688547)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.424943 / 0.018006 (0.406936) 0.675796 / 0.000490 (0.675307) 0.003228 / 0.000200 (0.003028) 0.000172 / 0.000054 (0.000117)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.060401 / 0.037411 (0.022990) 0.037431 / 0.014526 (0.022905) 0.054560 / 0.176557 (-0.121996) 0.407914 / 0.737135 (-0.329222) 0.058748 / 0.296338 (-0.237591)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.906239 / 0.215209 (0.691030) 8.279687 / 2.077655 (6.202032) 3.635965 / 1.504120 (2.131845) 3.105940 / 1.541195 (1.564745) 2.856435 / 1.468490 (1.387945) 0.702542 / 4.584777 (-3.882235) 8.218060 / 3.745712 (4.472347) 2.994886 / 5.269862 (-2.274975) 1.433826 / 4.565676 (-3.131851) 0.055051 / 0.424275 (-0.369224) 0.012617 / 0.007607 (0.005010) 0.830759 / 0.226044 (0.604715) 8.649946 / 2.268929 (6.381017) 3.694842 / 55.444624 (-51.749782) 2.728852 / 6.876477 (-4.147625) 2.981227 / 2.142072 (0.839155) 0.784664 / 4.805227 (-4.020564) 0.156272 / 6.500664 (-6.344392) 0.084612 / 0.075469 (0.009143)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 2.666752 / 1.841788 (0.824964) 20.462058 / 8.074308 (12.387750) 40.962439 / 10.191392 (30.771047) 1.214644 / 0.680424 (0.534220) 0.715012 / 0.534201 (0.180811) 0.529379 / 0.579283 (-0.049904) 0.732012 / 0.434364 (0.297648) 0.345210 / 0.540337 (-0.195127) 0.426966 / 1.386936 (-0.959970)

CML watermark

Please sign in to comment.