Skip to content

Commit

Permalink
Release: 1.17.0
Browse files Browse the repository at this point in the history
  • Loading branch information
lhoestq committed Dec 21, 2021
1 parent 1aa09c9 commit dff6c92
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
# The short X.Y version
version = ""
# The full version, including alpha/beta/rc tags
release = "1.16.1"
release = "1.17.0"


# -- General configuration ---------------------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@

setup(
name="datasets",
version="1.16.2.dev0", # expected format is one of x.y.z.dev0, or x.y.z.rc1 or x.y.z (no to dashes, yes to dots)
version="1.17.0", # expected format is one of x.y.z.dev0, or x.y.z.rc1 or x.y.z (no to dashes, yes to dots)
description="HuggingFace community-driven open-source library of datasets",
long_description=open("README.md", "r", encoding="utf-8").read(),
long_description_content_type="text/markdown",
Expand Down
2 changes: 1 addition & 1 deletion src/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
# pylint: enable=line-too-long
# pylint: disable=g-import-not-at-top,g-bad-import-order,wrong-import-position

__version__ = "1.16.2.dev0"
__version__ = "1.17.0"

import pyarrow
from packaging import version as _version
Expand Down

2 comments on commit dff6c92

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==3.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.010086 / 0.011353 (-0.001267) 0.004211 / 0.011008 (-0.006797) 0.032577 / 0.038508 (-0.005931) 0.035311 / 0.023109 (0.012202) 0.335833 / 0.275898 (0.059935) 0.340901 / 0.323480 (0.017421) 0.008519 / 0.007986 (0.000533) 0.003778 / 0.004328 (-0.000550) 0.009327 / 0.004250 (0.005077) 0.037703 / 0.037052 (0.000650) 0.308610 / 0.258489 (0.050120) 0.368830 / 0.293841 (0.074989) 0.031260 / 0.128546 (-0.097287) 0.009545 / 0.075646 (-0.066101) 0.270432 / 0.419271 (-0.148839) 0.049855 / 0.043533 (0.006322) 0.303966 / 0.255139 (0.048827) 0.340947 / 0.283200 (0.057748) 0.083901 / 0.141683 (-0.057781) 1.836848 / 1.452155 (0.384693) 1.846474 / 1.492716 (0.353758)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.260531 / 0.018006 (0.242524) 0.461912 / 0.000490 (0.461423) 0.008198 / 0.000200 (0.007998) 0.000303 / 0.000054 (0.000248)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.036730 / 0.037411 (-0.000682) 0.022072 / 0.014526 (0.007546) 0.030879 / 0.176557 (-0.145678) 0.075500 / 0.737135 (-0.661635) 0.029271 / 0.296338 (-0.267068)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.420705 / 0.215209 (0.205496) 4.318378 / 2.077655 (2.240723) 1.871340 / 1.504120 (0.367220) 1.683840 / 1.541195 (0.142646) 1.688950 / 1.468490 (0.220460) 0.438374 / 4.584777 (-4.146403) 5.165187 / 3.745712 (1.419475) 2.489180 / 5.269862 (-2.780682) 0.973344 / 4.565676 (-3.592333) 0.054830 / 0.424275 (-0.369445) 0.012928 / 0.007607 (0.005321) 0.562786 / 0.226044 (0.336741) 5.462995 / 2.268929 (3.194067) 2.395050 / 55.444624 (-53.049574) 1.938599 / 6.876477 (-4.937878) 2.013801 / 2.142072 (-0.128272) 0.643012 / 4.805227 (-4.162215) 0.140401 / 6.500664 (-6.360263) 0.066686 / 0.075469 (-0.008783)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.628723 / 1.841788 (-0.213065) 12.933597 / 8.074308 (4.859289) 27.309217 / 10.191392 (17.117825) 0.826336 / 0.680424 (0.145913) 0.525924 / 0.534201 (-0.008277) 0.515267 / 0.579283 (-0.064016) 0.538767 / 0.434364 (0.104403) 0.341499 / 0.540337 (-0.198839) 0.381106 / 1.386936 (-1.005830)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.009229 / 0.011353 (-0.002124) 0.003872 / 0.011008 (-0.007136) 0.029574 / 0.038508 (-0.008934) 0.037597 / 0.023109 (0.014488) 0.320153 / 0.275898 (0.044255) 0.355616 / 0.323480 (0.032136) 0.007207 / 0.007986 (-0.000778) 0.005040 / 0.004328 (0.000711) 0.007976 / 0.004250 (0.003725) 0.047146 / 0.037052 (0.010094) 0.316235 / 0.258489 (0.057746) 0.343660 / 0.293841 (0.049819) 0.033741 / 0.128546 (-0.094805) 0.009575 / 0.075646 (-0.066072) 0.281408 / 0.419271 (-0.137863) 0.058485 / 0.043533 (0.014952) 0.313017 / 0.255139 (0.057878) 0.343327 / 0.283200 (0.060127) 0.094667 / 0.141683 (-0.047016) 1.778412 / 1.452155 (0.326257) 1.897404 / 1.492716 (0.404687)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.265172 / 0.018006 (0.247166) 0.459025 / 0.000490 (0.458535) 0.005952 / 0.000200 (0.005752) 0.000130 / 0.000054 (0.000075)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.034366 / 0.037411 (-0.003046) 0.021260 / 0.014526 (0.006734) 0.033003 / 0.176557 (-0.143554) 0.086824 / 0.737135 (-0.650311) 0.033527 / 0.296338 (-0.262811)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.451837 / 0.215209 (0.236628) 4.482547 / 2.077655 (2.404893) 2.014855 / 1.504120 (0.510735) 1.811757 / 1.541195 (0.270562) 1.892663 / 1.468490 (0.424173) 0.456436 / 4.584777 (-4.128341) 5.123177 / 3.745712 (1.377464) 2.256495 / 5.269862 (-3.013367) 0.953350 / 4.565676 (-3.612326) 0.053479 / 0.424275 (-0.370796) 0.012568 / 0.007607 (0.004961) 0.580107 / 0.226044 (0.354062) 5.705365 / 2.268929 (3.436436) 2.664848 / 55.444624 (-52.779776) 2.184059 / 6.876477 (-4.692418) 2.188588 / 2.142072 (0.046515) 0.565839 / 4.805227 (-4.239388) 0.120384 / 6.500664 (-6.380280) 0.061625 / 0.075469 (-0.013844)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.599487 / 1.841788 (-0.242301) 12.674811 / 8.074308 (4.600503) 28.342319 / 10.191392 (18.150927) 0.834605 / 0.680424 (0.154182) 0.637712 / 0.534201 (0.103511) 0.508002 / 0.579283 (-0.071281) 0.566951 / 0.434364 (0.132587) 0.339901 / 0.540337 (-0.200437) 0.372115 / 1.386936 (-1.014821)

CML watermark

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==3.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.011417 / 0.011353 (0.000064) 0.004378 / 0.011008 (-0.006630) 0.036990 / 0.038508 (-0.001518) 0.041079 / 0.023109 (0.017970) 0.387691 / 0.275898 (0.111793) 0.409106 / 0.323480 (0.085626) 0.009130 / 0.007986 (0.001145) 0.005270 / 0.004328 (0.000941) 0.010540 / 0.004250 (0.006290) 0.043334 / 0.037052 (0.006282) 0.383849 / 0.258489 (0.125360) 0.428802 / 0.293841 (0.134961) 0.035469 / 0.128546 (-0.093077) 0.010128 / 0.075646 (-0.065519) 0.303502 / 0.419271 (-0.115770) 0.056915 / 0.043533 (0.013383) 0.362624 / 0.255139 (0.107485) 0.395269 / 0.283200 (0.112070) 0.093107 / 0.141683 (-0.048576) 2.089808 / 1.452155 (0.637653) 2.174345 / 1.492716 (0.681629)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.262944 / 0.018006 (0.244938) 0.485246 / 0.000490 (0.484757) 0.007225 / 0.000200 (0.007025) 0.000118 / 0.000054 (0.000063)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.041992 / 0.037411 (0.004581) 0.024321 / 0.014526 (0.009795) 0.046780 / 0.176557 (-0.129776) 0.079927 / 0.737135 (-0.657208) 0.042750 / 0.296338 (-0.253588)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.515910 / 0.215209 (0.300701) 5.150692 / 2.077655 (3.073037) 2.370466 / 1.504120 (0.866346) 2.147454 / 1.541195 (0.606260) 2.247144 / 1.468490 (0.778654) 0.514713 / 4.584777 (-4.070064) 5.746593 / 3.745712 (2.000881) 4.867108 / 5.269862 (-0.402753) 1.165733 / 4.565676 (-3.399944) 0.066332 / 0.424275 (-0.357943) 0.015244 / 0.007607 (0.007637) 0.676166 / 0.226044 (0.450122) 6.657317 / 2.268929 (4.388389) 2.928550 / 55.444624 (-52.516074) 2.353802 / 6.876477 (-4.522674) 2.368457 / 2.142072 (0.226384) 0.693800 / 4.805227 (-4.111427) 0.149947 / 6.500664 (-6.350717) 0.074227 / 0.075469 (-0.001242)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.811939 / 1.841788 (-0.029849) 14.303395 / 8.074308 (6.229087) 32.602843 / 10.191392 (22.411451) 0.924215 / 0.680424 (0.243791) 0.622222 / 0.534201 (0.088021) 0.581335 / 0.579283 (0.002052) 0.628252 / 0.434364 (0.193889) 0.375403 / 0.540337 (-0.164935) 0.392205 / 1.386936 (-0.994731)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.009736 / 0.011353 (-0.001617) 0.004358 / 0.011008 (-0.006650) 0.035228 / 0.038508 (-0.003280) 0.039515 / 0.023109 (0.016406) 0.370670 / 0.275898 (0.094772) 0.382357 / 0.323480 (0.058877) 0.007099 / 0.007986 (-0.000887) 0.005350 / 0.004328 (0.001021) 0.008722 / 0.004250 (0.004472) 0.047405 / 0.037052 (0.010352) 0.356592 / 0.258489 (0.098103) 0.383256 / 0.293841 (0.089415) 0.035835 / 0.128546 (-0.092712) 0.010176 / 0.075646 (-0.065471) 0.298121 / 0.419271 (-0.121150) 0.061243 / 0.043533 (0.017710) 0.349625 / 0.255139 (0.094486) 0.369935 / 0.283200 (0.086736) 0.098188 / 0.141683 (-0.043495) 2.098590 / 1.452155 (0.646435) 2.191290 / 1.492716 (0.698574)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.264911 / 0.018006 (0.246905) 0.482678 / 0.000490 (0.482188) 0.006207 / 0.000200 (0.006007) 0.000097 / 0.000054 (0.000043)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.039964 / 0.037411 (0.002552) 0.024210 / 0.014526 (0.009684) 0.035115 / 0.176557 (-0.141442) 0.076697 / 0.737135 (-0.660438) 0.034432 / 0.296338 (-0.261907)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.487500 / 0.215209 (0.272291) 4.880131 / 2.077655 (2.802476) 2.101425 / 1.504120 (0.597305) 1.868325 / 1.541195 (0.327130) 1.941042 / 1.468490 (0.472552) 0.514101 / 4.584777 (-4.070676) 5.987052 / 3.745712 (2.241340) 4.826440 / 5.269862 (-0.443421) 1.206555 / 4.565676 (-3.359121) 0.064124 / 0.424275 (-0.360151) 0.014714 / 0.007607 (0.007107) 0.637919 / 0.226044 (0.411875) 6.384584 / 2.268929 (4.115655) 2.733524 / 55.444624 (-52.711100) 2.294219 / 6.876477 (-4.582257) 2.326375 / 2.142072 (0.184302) 0.660703 / 4.805227 (-4.144524) 0.144243 / 6.500664 (-6.356421) 0.074052 / 0.075469 (-0.001418)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.885025 / 1.841788 (0.043238) 14.144030 / 8.074308 (6.069722) 33.100645 / 10.191392 (22.909253) 0.872963 / 0.680424 (0.192539) 0.656236 / 0.534201 (0.122035) 0.578208 / 0.579283 (-0.001076) 0.625011 / 0.434364 (0.190647) 0.394763 / 0.540337 (-0.145575) 0.408481 / 1.386936 (-0.978455)

CML watermark

Please sign in to comment.