Skip to content

Commit

Permalink
Fix typo (#3023)
Browse files Browse the repository at this point in the history
  • Loading branch information
qqaatw authored Oct 5, 2021
1 parent 9622fed commit 93255c5
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/datasets/fingerprint.py
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@ def proxy(func):


class Hasher:
"""Hasher that accepts python objets as inputs."""
"""Hasher that accepts python objects as inputs."""

dispatch: Dict = {}

Expand Down

1 comment on commit 93255c5

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==3.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.010948 / 0.011353 (-0.000405) 0.004402 / 0.011008 (-0.006606) 0.044688 / 0.038508 (0.006180) 0.038517 / 0.023109 (0.015407) 0.358082 / 0.275898 (0.082184) 0.426076 / 0.323480 (0.102596) 0.010020 / 0.007986 (0.002035) 0.004367 / 0.004328 (0.000039) 0.010623 / 0.004250 (0.006372) 0.049954 / 0.037052 (0.012901) 0.379715 / 0.258489 (0.121226) 0.401254 / 0.293841 (0.107413) 0.035503 / 0.128546 (-0.093044) 0.015793 / 0.075646 (-0.059854) 0.318753 / 0.419271 (-0.100519) 0.061082 / 0.043533 (0.017549) 0.387040 / 0.255139 (0.131901) 0.427072 / 0.283200 (0.143872) 0.101227 / 0.141683 (-0.040455) 2.259382 / 1.452155 (0.807227) 2.352905 / 1.492716 (0.860188)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.214283 / 0.018006 (0.196276) 0.527729 / 0.000490 (0.527239) 0.004747 / 0.000200 (0.004547) 0.000366 / 0.000054 (0.000312)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.044776 / 0.037411 (0.007365) 0.031123 / 0.014526 (0.016598) 0.035974 / 0.176557 (-0.140583) 0.149200 / 0.737135 (-0.587936) 0.038610 / 0.296338 (-0.257729)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.562506 / 0.215209 (0.347297) 5.324810 / 2.077655 (3.247155) 2.523547 / 1.504120 (1.019427) 2.123458 / 1.541195 (0.582264) 2.111816 / 1.468490 (0.643326) 0.591973 / 4.584777 (-3.992803) 6.682393 / 3.745712 (2.936680) 1.556889 / 5.269862 (-3.712973) 1.472742 / 4.565676 (-3.092934) 0.061862 / 0.424275 (-0.362413) 0.005863 / 0.007607 (-0.001745) 0.752916 / 0.226044 (0.526872) 7.329745 / 2.268929 (5.060816) 3.211455 / 55.444624 (-52.233169) 2.499532 / 6.876477 (-4.376945) 2.581832 / 2.142072 (0.439760) 0.758038 / 4.805227 (-4.047190) 0.149484 / 6.500664 (-6.351181) 0.061491 / 0.075469 (-0.013978)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.191009 / 1.841788 (-0.650779) 16.233397 / 8.074308 (8.159089) 37.915170 / 10.191392 (27.723778) 1.014378 / 0.680424 (0.333954) 0.692736 / 0.534201 (0.158535) 0.306755 / 0.579283 (-0.272528) 0.713328 / 0.434364 (0.278964) 0.259575 / 0.540337 (-0.280763) 0.252355 / 1.386936 (-1.134581)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.010930 / 0.011353 (-0.000423) 0.005848 / 0.011008 (-0.005160) 0.044579 / 0.038508 (0.006071) 0.038066 / 0.023109 (0.014956) 0.416307 / 0.275898 (0.140409) 0.435341 / 0.323480 (0.111861) 0.008890 / 0.007986 (0.000905) 0.004131 / 0.004328 (-0.000197) 0.010474 / 0.004250 (0.006223) 0.043036 / 0.037052 (0.005984) 0.402243 / 0.258489 (0.143754) 0.413210 / 0.293841 (0.119369) 0.036459 / 0.128546 (-0.092087) 0.013468 / 0.075646 (-0.062178) 0.313252 / 0.419271 (-0.106019) 0.058682 / 0.043533 (0.015149) 0.398932 / 0.255139 (0.143793) 0.402603 / 0.283200 (0.119403) 0.088927 / 0.141683 (-0.052756) 2.110727 / 1.452155 (0.658572) 2.256000 / 1.492716 (0.763284)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.377050 / 0.018006 (0.359044) 0.498652 / 0.000490 (0.498162) 0.060206 / 0.000200 (0.060006) 0.000539 / 0.000054 (0.000485)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.040933 / 0.037411 (0.003522) 0.025708 / 0.014526 (0.011183) 0.031309 / 0.176557 (-0.145248) 0.147959 / 0.737135 (-0.589176) 0.031768 / 0.296338 (-0.264570)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.570741 / 0.215209 (0.355532) 5.714587 / 2.077655 (3.636933) 2.583510 / 1.504120 (1.079390) 2.107889 / 1.541195 (0.566694) 2.135830 / 1.468490 (0.667340) 0.604460 / 4.584777 (-3.980317) 6.498056 / 3.745712 (2.752344) 1.521847 / 5.269862 (-3.748015) 1.407909 / 4.565676 (-3.157767) 0.063829 / 0.424275 (-0.360446) 0.005599 / 0.007607 (-0.002008) 0.689784 / 0.226044 (0.463740) 6.846343 / 2.268929 (4.577414) 3.126581 / 55.444624 (-52.318043) 2.447277 / 6.876477 (-4.429200) 2.496294 / 2.142072 (0.354221) 0.732998 / 4.805227 (-4.072229) 0.152076 / 6.500664 (-6.348588) 0.062273 / 0.075469 (-0.013196)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.179699 / 1.841788 (-0.662088) 16.158197 / 8.074308 (8.083889) 37.526849 / 10.191392 (27.335457) 0.998929 / 0.680424 (0.318505) 0.708026 / 0.534201 (0.173825) 0.290615 / 0.579283 (-0.288668) 0.693175 / 0.434364 (0.258811) 0.254574 / 0.540337 (-0.285763) 0.284281 / 1.386936 (-1.102655)

CML watermark

Please sign in to comment.