Skip to content

Commit

Permalink
finalize ci config
Browse files Browse the repository at this point in the history
  • Loading branch information
lhoestq committed Sep 25, 2020
1 parent cfde6f2 commit 8eaadf6
Showing 1 changed file with 25 additions and 4 deletions.
29 changes: 25 additions & 4 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,24 @@ jobs:
- run: pip install pyarrow==1.0.0
- run: HF_SCRIPTS_VERSION=master python -m pytest -sv ./tests/


run_dataset_script_tests_pyarrow_0p17_WIN:
working_directory: ~/datasets
executor:
name: win/default
shell: powershell
steps:
- checkout
- run: conda install python=3.6 --yes
- run: conda install pytorch --yes
- run: pip install virtualenv
- run: python -m virtualenv venv --system-site-packages
- run: "& venv/Scripts/activate.ps1"
- run: pip install .[tests]
- run: pip install pyarrow==0.17.1
- run: $env:HF_SCRIPTS_VERSION="master"
- run: python -m pytest -sv ./tests/

run_dataset_script_tests_pyarrow_1_WIN:
working_directory: ~/datasets
executor:
Expand Down Expand Up @@ -59,6 +77,7 @@ jobs:
- run: black --check --line-length 119 --target-version py36 tests src benchmarks datasets metrics
- run: isort --check-only tests src benchmarks datasets metrics
- run: flake8 tests src benchmarks datasets metrics

build_doc:
working_directory: ~/datasets
docker:
Expand All @@ -69,6 +88,7 @@ jobs:
- run: cd docs && make html SPHINXOPTS="-W"
- store_artifacts:
path: ./docs/_build

deploy_doc:
working_directory: ~/datasets
docker:
Expand All @@ -92,8 +112,9 @@ workflows:
build_and_test:
jobs:
- check_code_quality
# - run_dataset_script_tests_pyarrow_0p17
# - run_dataset_script_tests_pyarrow_1
- run_dataset_script_tests_pyarrow_0p17
- run_dataset_script_tests_pyarrow_1
- run_dataset_script_tests_pyarrow_0p17_WIN
- run_dataset_script_tests_pyarrow_1_WIN
# - build_doc
# - deploy_doc: *workflow_filters
- build_doc
- deploy_doc: *workflow_filters

1 comment on commit 8eaadf6

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.018820 / 0.011353 (0.007467) 0.016715 / 0.011008 (0.005707) 0.048389 / 0.038508 (0.009881) 0.029683 / 0.023109 (0.006574) 0.214300 / 0.275898 (-0.061598) 0.245896 / 0.323480 (-0.077584) 0.009195 / 0.007986 (0.001209) 0.004781 / 0.004328 (0.000453) 0.006906 / 0.004250 (0.002655) 0.049826 / 0.037052 (0.012773) 0.224114 / 0.258489 (-0.034375) 0.241681 / 0.293841 (-0.052160) 0.158385 / 0.128546 (0.029839) 0.135486 / 0.075646 (0.059839) 0.480072 / 0.419271 (0.060801) 0.554780 / 0.043533 (0.511248) 0.220266 / 0.255139 (-0.034873) 0.233673 / 0.283200 (-0.049526) 0.086313 / 0.141683 (-0.055370) 1.904113 / 1.452155 (0.451958) 1.883970 / 1.492716 (0.391253)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.038889 / 0.037411 (0.001478) 0.021956 / 0.014526 (0.007430) 0.126041 / 0.176557 (-0.050516) 0.427762 / 0.737135 (-0.309373) 0.158465 / 0.296338 (-0.137873)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.249529 / 0.215209 (0.034320) 2.295045 / 2.077655 (0.217390) 1.364731 / 1.504120 (-0.139389) 1.256987 / 1.541195 (-0.284208) 1.288568 / 1.468490 (-0.179922) 6.995493 / 4.584777 (2.410716) 5.800872 / 3.745712 (2.055160) 8.188064 / 5.269862 (2.918203) 7.222549 / 4.565676 (2.656872) 0.692264 / 0.424275 (0.267989) 0.013850 / 0.007607 (0.006243) 0.257475 / 0.226044 (0.031431) 2.715677 / 2.268929 (0.446749) 1.842797 / 55.444624 (-53.601827) 1.726804 / 6.876477 (-5.149672) 1.708241 / 2.142072 (-0.433832) 6.881079 / 4.805227 (2.075852) 7.317707 / 6.500664 (0.817043) 6.648887 / 0.075469 (6.573418)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 10.849643 / 1.841788 (9.007855) 27.581897 / 8.074308 (19.507589) 14.786738 / 10.191392 (4.595346) 0.919799 / 0.680424 (0.239375) 0.278299 / 0.534201 (-0.255902) 0.787278 / 0.579283 (0.207995) 0.604062 / 0.434364 (0.169698) 0.750193 / 0.540337 (0.209856) 1.588407 / 1.386936 (0.201470)
PyArrow==1.0
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.019496 / 0.011353 (0.008143) 0.018879 / 0.011008 (0.007871) 0.045336 / 0.038508 (0.006828) 0.031668 / 0.023109 (0.008558) 0.333794 / 0.275898 (0.057896) 0.362464 / 0.323480 (0.038984) 0.009704 / 0.007986 (0.001718) 0.004877 / 0.004328 (0.000549) 0.006624 / 0.004250 (0.002373) 0.050490 / 0.037052 (0.013438) 0.346226 / 0.258489 (0.087737) 0.370265 / 0.293841 (0.076424) 0.159540 / 0.128546 (0.030994) 0.133324 / 0.075646 (0.057677) 0.415932 / 0.419271 (-0.003339) 0.426606 / 0.043533 (0.383073) 0.343591 / 0.255139 (0.088452) 0.346067 / 0.283200 (0.062868) 0.093886 / 0.141683 (-0.047797) 1.735208 / 1.452155 (0.283053) 1.778771 / 1.492716 (0.286055)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.045349 / 0.037411 (0.007938) 0.021222 / 0.014526 (0.006696) 0.025205 / 0.176557 (-0.151351) 0.082736 / 0.737135 (-0.654400) 0.037116 / 0.296338 (-0.259223)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.275235 / 0.215209 (0.060026) 2.653258 / 2.077655 (0.575604) 1.846463 / 1.504120 (0.342343) 1.737160 / 1.541195 (0.195966) 1.791497 / 1.468490 (0.323007) 7.012551 / 4.584777 (2.427774) 5.856700 / 3.745712 (2.110987) 8.333760 / 5.269862 (3.063899) 7.240354 / 4.565676 (2.674677) 0.680077 / 0.424275 (0.255802) 0.010938 / 0.007607 (0.003331) 0.307171 / 0.226044 (0.081127) 3.109586 / 2.268929 (0.840657) 2.252044 / 55.444624 (-53.192580) 2.078505 / 6.876477 (-4.797972) 2.115314 / 2.142072 (-0.026758) 7.148662 / 4.805227 (2.343435) 4.735743 / 6.500664 (-1.764921) 6.750934 / 0.075469 (6.675465)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 11.121989 / 1.841788 (9.280201) 13.611170 / 8.074308 (5.536861) 15.031259 / 10.191392 (4.839867) 0.789512 / 0.680424 (0.109088) 0.568762 / 0.534201 (0.034561) 0.817309 / 0.579283 (0.238026) 0.620067 / 0.434364 (0.185703) 0.793937 / 0.540337 (0.253600) 1.601567 / 1.386936 (0.214631)

Please sign in to comment.