Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shrinker improvements #235

Closed
wants to merge 6 commits into from
Closed

Conversation

jmid
Copy link
Collaborator

@jmid jmid commented Apr 3, 2022

This PR combines QCheck(1) shrinker improvements to list, string, functions, and int.
The improvements arose from studying the benchmark outputs from #177 across a range of equivalent tests.
Here it became clear that both the QCheck(1) and QCheck2 shrinkers each had some sore spots.
As a first step, I set out to get QCheck(1) up to speed with QCheck2's performance on the tests where its shrinkers were lacking behind.

I don't expect to merge the PR in the current form, partly because it includes an int shrinker along the lines of the somewhat controversial #173, but

  • I wanted to convey the bigger picture
  • I was curious to hear feedback from potential users.

@vch9 : if you (or others) still have QCheck(1) tests (needing shrinking) I'd be curious to hear if you observe a difference
when pinning /~https://github.com/jmid/qcheck/tree/shrinker-improvements

Whereas these improvements have been obtained over artificial benchmark programs, I've been bitten in the past by shrinker performance in bigger developments, e.g.,

  • in the shrinker for our Wasm generator where it made a difference to reduce integers to zero quickly (in the common case most integer values are irrelevant).
  • in the shrinker for multicoretests, where the non-deterministic behavior is extra challenging. Here reproducing a failing run O(n log n) times has a much lower probability than O(log n) times, which means that a algorithmically faster shrinker performing fewer, well-chosen reduction attempts will produce smaller counterexamples than a "more exhaustive" one.

The core is an improved list_spine shrinker. As an added bonus this implicitly improves function shrinking.
Secondly, we can reuse the improved list shrinker for strings - by breaking them up in char lists - and then use the char-shrinker to reduce the individual chars. Overall, these three were identified as QCheck(1) sore spots in #177.

Despite the reduced complexity we are able to arrive at the same counterexamples - or simpler(!) as should
be clear from the expect-test logs.

Here's the new and simpler shrinker of list spines using recursion:

  let rec list_spine l yield =
    let rec split l len acc = match len,l with
      | _,[]
      | 0,_ -> List.rev acc, l
      | _,x::xs -> split xs (len-1) (x::acc) in
    match l with
    | [] -> ()
    | [_] -> yield []
    | [x;y] -> yield []; yield [x]; yield [y]
    | _::_ ->
      let len = List.length l in
      let xs,ys = split l ((1 + len) / 2) [] in
          yield xs;
          list_spine xs (fun xs' -> yield (xs'@ys))

The base cases with 0,1, or 2 elements are not surprising.
In the inductive case, the yield xs represents dropping the last half of the list (akin to bisection). The yield from the recursive call in the last line combines a reduced front half with an untouched second half ys.

So how well does this fare? It turns out to work reasonably well. Here's an example:

utop # Shrink.list_spine [1;2;3;4;5;6;7;8] (fun xs -> Printf.printf "%s\n" Print.(list int xs));;
[1; 2; 3; 4]
[1; 2; 5; 6; 7; 8]
[3; 4; 5; 6; 7; 8]
[1; 3; 4; 5; 6; 7; 8]
[2; 3; 4; 5; 6; 7; 8]

As an added bonus, there are both children containing and excluding the list head. This is useful, since sometimes it is central to keep it in a counterexample and sometimes it is not.

The list of children does not grow much longer on an input list twice as long:

utop # Shrink.list_spine [1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16] (fun xs -> Printf.printf "%s\n" Print.(list int xs));;
[1; 2; 3; 4; 5; 6; 7; 8]
[1; 2; 3; 4; 9; 10; 11; 12; 13; 14; 15; 16]
[1; 2; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16]
[3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16]
[1; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16]
[2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16]
- : unit = ()

In comparison, here's the output from the same lines run with the current list_spine shrinker:

# Shrink.list_spine [1;2;3;4;5;6;7;8] (fun xs -> Printf.printf "%s\n" Print.(list int xs));;
[5; 6; 7; 8]
[1; 6; 7; 8]
[1; 2; 7; 8]
[1; 2; 3; 8]
[1; 2; 3; 4]
[3; 4; 5; 6; 7; 8]
[1; 4; 5; 6; 7; 8]
[1; 2; 5; 6; 7; 8]
[1; 2; 3; 6; 7; 8]
[1; 2; 3; 4; 7; 8]
[1; 2; 3; 4; 5; 8]
[1; 2; 3; 4; 5; 6]
[2; 3; 4; 5; 6; 7; 8]
[1; 3; 4; 5; 6; 7; 8]
[1; 2; 4; 5; 6; 7; 8]
[1; 2; 3; 5; 6; 7; 8]
[1; 2; 3; 4; 6; 7; 8]
[1; 2; 3; 4; 5; 7; 8]
[1; 2; 3; 4; 5; 6; 8]
[1; 2; 3; 4; 5; 6; 7]
- : unit = ()
# Shrink.list_spine [1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16] (fun xs -> Printf.printf "%s\n" Print.(list int xs));;
[9; 10; 11; 12; 13; 14; 15; 16]
[1; 10; 11; 12; 13; 14; 15; 16]
[1; 2; 11; 12; 13; 14; 15; 16]
[1; 2; 3; 12; 13; 14; 15; 16]
[1; 2; 3; 4; 13; 14; 15; 16]
[1; 2; 3; 4; 5; 14; 15; 16]
[1; 2; 3; 4; 5; 6; 15; 16]
[1; 2; 3; 4; 5; 6; 7; 16]
[1; 2; 3; 4; 5; 6; 7; 8]
[5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16]
[1; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16]
[1; 2; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16]
[1; 2; 3; 8; 9; 10; 11; 12; 13; 14; 15; 16]
[1; 2; 3; 4; 9; 10; 11; 12; 13; 14; 15; 16]
[1; 2; 3; 4; 5; 10; 11; 12; 13; 14; 15; 16]
[1; 2; 3; 4; 5; 6; 11; 12; 13; 14; 15; 16]
[1; 2; 3; 4; 5; 6; 7; 12; 13; 14; 15; 16]
[1; 2; 3; 4; 5; 6; 7; 8; 13; 14; 15; 16]
[1; 2; 3; 4; 5; 6; 7; 8; 9; 14; 15; 16]
[1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 15; 16]
[1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 16]
[1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12]
[3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16]
[1; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16]
[1; 2; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16]
[1; 2; 3; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16]
[1; 2; 3; 4; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16]
[1; 2; 3; 4; 5; 8; 9; 10; 11; 12; 13; 14; 15; 16]
[1; 2; 3; 4; 5; 6; 9; 10; 11; 12; 13; 14; 15; 16]
[1; 2; 3; 4; 5; 6; 7; 10; 11; 12; 13; 14; 15; 16]
[1; 2; 3; 4; 5; 6; 7; 8; 11; 12; 13; 14; 15; 16]
[1; 2; 3; 4; 5; 6; 7; 8; 9; 12; 13; 14; 15; 16]
[1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 13; 14; 15; 16]
[1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 14; 15; 16]
[1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 15; 16]
[1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 16]
[1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14]
[2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16]
[1; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16]
[1; 2; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16]
[1; 2; 3; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16]
[1; 2; 3; 4; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16]
[1; 2; 3; 4; 5; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16]
[1; 2; 3; 4; 5; 6; 8; 9; 10; 11; 12; 13; 14; 15; 16]
[1; 2; 3; 4; 5; 6; 7; 9; 10; 11; 12; 13; 14; 15; 16]
[1; 2; 3; 4; 5; 6; 7; 8; 10; 11; 12; 13; 14; 15; 16]
[1; 2; 3; 4; 5; 6; 7; 8; 9; 11; 12; 13; 14; 15; 16]
[1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 12; 13; 14; 15; 16]
[1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 13; 14; 15; 16]
[1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 14; 15; 16]
[1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 15; 16]
[1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 16]
[1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15]
- : unit = ()

Because of the repeated chunk reduction (each O(log n) chunk size is attempted at a sequence of positions) the complexity of the current list_spine is something like O(n log n), whereas the new one is O(log n).
This is just for one run though. After a successful reduction, each shrinker is restarted.

@jmid
Copy link
Collaborator Author

jmid commented Apr 19, 2022

Below follows the output from a rewamped shrinker performance benchmark run of #177.

In total QCheck goes from 67.071s to 8.064s - not a bad improvement.
This is primarily achieved by the improvements to the list and function shrinkers, leaving only one benchmark (fold_left test, fun first) to take more than 1 second. The latter is further improved by combining the present PR with #240.

                                                         iteration seed 1234                   iteration seed 8743                   iteration seed 6789               total
Shrink test name                                  Q1/s  #succ/#att   Q2/s  #succ/#att   Q1/s  #succ/#att   Q2/s  #succ/#att   Q1/s  #succ/#att   Q2/s  #succ/#att    Q1/s   Q2/s
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
big bound issue59                                - skipped as generator is stateful, making it non-repeatable
long_shrink                                       0.006   89/292     0.922 3039/3099    0.002   84/276     0.615 3068/3127    0.001   82/268     0.675 3063/3124    0.009  2.212
ints arent 0 mod 3                                0.000    1/1       0.000    2/2       0.000    1/1       0.000    1/1       0.000    1/1       0.000   88/305     0.000  0.000
ints are 0                                        0.000   61/123     0.000   61/123     0.000   60/121     0.000   61/122     0.000   61/123     0.000   61/123     0.000  0.000
ints < 209609                                    - skipped as generator is stateful, making it non-repeatable
nat < 5001                                        0.000    6/66      0.000    7/77      0.000    6/57      0.000    7/69      0.000    7/73      0.000    8/85      0.000  0.000
char never produces 'abcdef'                      0.000    0/0       0.000    1/1       0.000    0/0       0.000    0/0       0.000    0/0       0.000    0/0       0.000  0.000
strings are empty                                 0.000  145/284     0.000    8/16      0.000  169/327     0.001   13/26      0.000   49/99      0.000    1/2       0.000  0.001
string never has a \000 char                      0.000    8/16      0.002   22/167     0.001   17/28      0.002   56/254     0.000    6/12      0.000   15/48      0.001  0.004
string never has a \255 char                      0.000   14/28      0.001   59/318     0.000   15/24      0.003   97/529     0.001   21/38      0.003   41/194     0.001  0.006
strings have unique chars                         0.000   13/31      0.000   18/30      0.002   15/30      0.002   24/52      0.000    5/14      0.000   15/20      0.002  0.002
pairs have different components                   0.000    0/6       0.000    0/6       0.000    0/6       0.000    0/6       0.000    0/10      0.000    0/10      0.000  0.000
pairs have same components                        0.000   62/124     0.000   63/125     0.000   61/122     0.000   62/123     0.000   62/124     0.000   63/125     0.000  0.000
pairs have a zero component                       0.000  122/307     0.000  122/306     0.000  121/304     0.000  122/306     0.000  116/295     0.002  123/308     0.000  0.002
pairs are (0,0)                                   0.000   62/124     0.000   63/125     0.000   61/122     0.000   62/123     0.000   62/124     0.000   63/125     0.000  0.000
pairs are ordered                                 0.000   93/1162    0.000   94/1217    0.000   87/985     0.000   85/865     0.000   91/1149    0.000   94/1326    0.000  0.000
pairs are ordered reversely                       0.000   62/124     0.000   62/124     0.000   62/124     0.000   62/124     0.000   62/124     0.000   62/124     0.000  0.000
pairs sum to less than 128                        0.000   56/126     0.000   56/126     0.000   58/131     0.000   59/138     0.000   57/131     0.000   57/130     0.000  0.000
pairs lists rev concat                            0.002   84/283     0.010   83/168     0.001   78/280     0.005   75/152     0.000   68/253     0.000   67/136     0.003  0.015
pairs lists no overlap                            0.000   24/44      0.003   27/60      0.000   17/37      0.002   18/41      0.000    8/22      0.000   11/28      0.001  0.004
triples have pair-wise different components       0.000    3/15      0.000    3/15      0.000    3/3       0.000    3/3       0.000    3/3       0.000    3/3       0.000  0.000
triples have same components                      0.000   63/126     0.000   64/127     0.000   63/126     0.000   64/128     0.000   57/114     0.000   62/122     0.000  0.000
triples are ordered                               0.000   63/126     0.000    3/4       0.000   62/123     0.000    3/4       0.000   63/126     0.000   91/1021    0.000  0.000
triples are ordered reversely                     0.000   63/125     0.000   64/126     0.000   63/126     0.000  124/247     0.000   63/125     0.000   65/127     0.000  0.000
quadruples have pair-wise different components    0.000    4/4       0.000    4/4       0.000    4/4       0.000    4/4       0.000    4/11      0.000    4/11      0.000  0.000
quadruples have same components                   0.000  124/310     0.000  126/313     0.000  113/287     0.000  115/292     0.000  124/310     0.000  123/307     0.000  0.000
quadruples are ordered                            0.000   64/127     0.000    5/6       0.000   63/124     0.000    4/5       0.000   58/115     0.000    5/6       0.000  0.000
quadruples are ordered reversely                  0.000   64/126     0.000   66/128     0.000   64/127     0.000  126/250     0.000   64/126     0.000   66/128     0.000  0.000
forall (a, b) in nat: a < b                       0.000    6/16      0.000    6/16      0.000    6/15      0.000    6/15      0.000    4/7       0.000    4/7       0.000  0.000
forall (a, b, c) in nat: a < b < c                0.000    3/7       0.000    3/7       0.000    6/22      0.000    7/28      0.000    3/3       0.000    3/3       0.000  0.000
forall (a, b, c, d) in nat: a < b < c < d         0.000    4/4       0.000    4/4       0.000    4/4       0.000    4/4       0.000    4/4       0.000    4/4       0.000  0.000
forall (a, b, c, d, e) in nat: a < b < c < d < e  0.000    5/5       0.000    5/5       0.000    5/5       0.000    5/5       0.000    5/5       0.000    5/5       0.000  0.000
forall (a, b, c, d, e, f) in nat: a < b < c < d   0.000    6/6       0.000    6/6       0.000    6/6       0.000    6/6       0.000    6/6       0.000    6/6       0.000  0.000
forall (a, b, c, d, e, f, g) in nat: a < b < c <  0.000    7/7       0.000    7/7       0.000    7/7       0.000    7/7       0.000    7/7       0.000    7/7       0.000  0.000
forall (a, b, c, d, e, f, g, h) in nat: a < b <   0.000    8/8       0.000    8/8       0.000    8/8       0.000    8/8       0.000    7/7       0.000    7/7       0.000  0.000
forall (a, b, c, d, e, f, g, h, i) in nat: a < b  0.000    9/9       0.000    9/9       0.000    9/9       0.000    9/9       0.000    8/8       0.000    8/8       0.000  0.000
bind ordered pairs                                0.000    2/2       0.000    1/1       0.000    2/2       0.000    1/1       0.000    2/2       0.000    1/1       0.000  0.000
bind list_size constant                           0.000   11/31      0.000   12/26      0.000   13/35      0.000   12/25      0.000   12/33      0.000   11/21      0.000  0.000
lists are empty                                   0.001    9/12      0.000    8/16      0.002   14/17      0.003   13/26      0.000    1/3       0.000    1/2       0.002  0.004
lists shorter than 10                             0.000   16/81      0.000   16/30      0.000   21/91      0.004   21/42      0.000   14/81      0.000   15/29      0.000  0.004
lists shorter than 432                            0.102  417/4881    1.049  412/457     0.139  404/4728    0.662  405/450     0.027  407/4793    0.067  419/447     0.269  1.778
lists shorter than 4332                           0.063   13/67      3.616 4022/4087    0.075   10/35      3.637 4020/4067    0.014    8/55      2.415 4013/4055    0.151  9.668
lists equal to duplication                        0.152   20/23      0.420    4/7       0.000    3/6       0.000    3/6       0.011   18/21      0.122   17/35      0.163  0.542
lists have unique elems                           0.000    8/18      0.000   11/22      0.004   12/25      0.008   17/30      0.000    8/22      0.000   10/20      0.004  0.008
tree contains only 42                             0.000    2/2       0.000    2/2       0.000    1/1       0.000    2/2       0.000    2/2       0.000    2/2       0.000  0.000
fail_pred_map_commute                             0.000  134/779     0.000   16/59      0.000  109/621     0.000   14/65      0.000  115/603     0.001  117/373     0.001  0.001
fail_pred_strings                                 0.000    1/4       0.000    1/4       0.000    1/4       0.000    1/4       0.000    1/3       0.000    2/5       0.000  0.000
fold_left fold_right                              0.000   17/63      0.000   22/73      0.001   31/91      0.006   58/139     0.000   22/80      0.000   39/95      0.001  0.006
fold_left fold_right uncurried                    0.002   31/136     0.043  325/984     0.022   34/86      9.235 4811/8969    0.000    3/17      0.000    2/13      0.025  9.279
fold_left fold_right uncurried fun last           0.000   12/42      0.000   25/86      0.001   30/86      0.002   54/176     0.000   22/81      0.001   40/93      0.001  0.003
fold_left test, fun first                         0.017  272/534     0.001   15/28      0.760  165/506     4.041   45/11947   6.652  548/20645   1.074  168/27748   7.430  5.115
                                                                                                                                                                    8.064 28.657

@jmid jmid force-pushed the shrinker-improvements branch from 56f99f1 to 96b08d7 Compare April 20, 2022 07:26
@jmid
Copy link
Collaborator Author

jmid commented Apr 20, 2022

OK, using the merged shrinker performance benchmark I've now cherry-picked each of the four shrinker improvements and measured each of them.

  • 5a67e37 - simpler list shrinker with better complexity
  • d731af8 - use improved list shrinker for strings
  • 43b3962 - improved function shrinker
  • c19ccd0 - improved int shrinker

Executive summary:

  • the improved list, string, and function shrinkers collectively reduce the runtime of the benchmark from 78.402s to 5.456s on my laptop
  • adding the int shrinker improvement on top reduces the runtime from an average of 5.448s to 5.3289s across 10 runs

Whereas the latter represents a statistical significant improvement (measured with ministat) it is not as drastic as I thought before measuring.
The duplicate 0 testing of the improved int shrinker is unsatisfying, but so is the duplicate output of the current int shrinker:

let test_int () =
List.iter (alco_check Alcotest.int (trace_false Shrink.int) "on repeated failure")
[ ("int 100", 100, [50; 75; 88; 94; 97; 99; 99]); (*WTF?*)
("int 1000", 1000, [500; 750; 875; 938; 969; 985; 993; 997; 999; 999]); (*WTF?*)
("int (-26)", -26, [-13; -20; -23; -25; -25]) ]; (*WTF?*)

We should address this separately. I therefore move to merge the first three commits.

Measurement details

To keep the tables focused below I've deleted benchmark entries with a total QCheck1 shrinker time below 0.1 seconds.

Here's first the performance of current master. Note how the focused tables highlight string, list, and function benchmarks.

                                                         iteration seed 1234                   iteration seed 8743                   iteration seed 6789               total
Shrink test name                                  Q1/s  #succ/#att   Q2/s  #succ/#att   Q1/s  #succ/#att   Q2/s  #succ/#att   Q1/s  #succ/#att   Q2/s  #succ/#att    Q1/s   Q2/s
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
string never has a \255 char                      0.001  249/316     0.001   59/318     0.092 4466/4520    0.002   97/529     0.377 9260/9365    0.002   41/194     0.469  0.005
strings have unique chars                         0.003  248/269     0.000   18/30      0.919 4465/4536    0.002   24/52      0.000   14/34      0.000   15/20      0.922  0.002
lists shorter than 432                            6.343 1696/5118102  1.049  412/457     6.108 1612/4863421  0.981  405/450     6.087 1667/5037661  0.126  419/447    18.537  2.156
lists shorter than 4332                           2.172   13/190735  3.331 4022/4087    1.455   11/126052  3.580 4020/4067    1.410    7/126607  2.408 4013/4055    5.037  9.318
lists equal to duplication                        0.145   20/23      0.470    4/7       0.000    7/13      0.000    3/6       0.021   20/25      0.091   17/35      0.165  0.561
fold_left fold_right uncurried                    2.496   97/80630   0.044  376/1550    0.144   38/390     2.160 2064/8057    0.000    5/20      0.000    4/17      2.640  2.204
fold_left test, fun first                         0.001   40/57      0.001   15/28     40.082  191/44563   3.318   47/9773   10.359  223/75912   0.002   36/64     50.442  3.321
                                                                                                                                                                   78.402 19.754

Step 1 Here's the performance of the simpler list shrinker. Note how the two lists shorter than benchmarks now run much faster.
On the other hand, the change causes fold_left test, fun first to really slow down because of the 93854 and 524691 shrink attempts with seed 8743 and 6789, respectively.
This latter test combines both lists, strings, and functions.

                                                         iteration seed 1234                   iteration seed 8743                   iteration seed 6789               total
Shrink test name                                  Q1/s  #succ/#att   Q2/s  #succ/#att   Q1/s  #succ/#att   Q2/s  #succ/#att   Q1/s  #succ/#att   Q2/s  #succ/#att    Q1/s   Q2/s
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
string never has a \255 char                      0.001  249/316     0.001   59/318     0.100 4466/4520    0.003   97/529     0.413 9260/9365    0.002   41/194     0.514  0.005
strings have unique chars                         0.003  248/269     0.000   18/30      0.917 4465/4536    0.001   24/52      0.000   14/34      0.000   15/20      0.920  0.002
lists shorter than 432                            0.161 1632/19461   0.903  412/457     0.194 1677/20004   1.016  405/450     0.076 1735/20729   0.112  419/447     0.431  2.032
lists shorter than 4332                           0.085   13/67      3.474 4022/4087    0.128   10/35      3.538 4020/4067    0.009    8/55      2.529 4013/4055    0.222  9.541
lists equal to duplication                        0.157   26/35      0.419    4/7       0.000    6/12      0.000    3/6       0.009   25/35      0.044   17/35      0.166  0.463
fold_left fold_right uncurried                    0.003   25/125     0.046  376/1550    0.023   34/119     2.255 2064/8057    0.000    5/20      0.000    4/17      0.025  2.301
fold_left test, fun first                         0.002   41/60      0.001   15/28     1488.644 1230/93854   3.423   47/9773   2123.311 2759/524691  0.002   36/64     3611.956  3.426
                                                                                                                                                                   3614.426 19.923

Step 2 Here's the performance of using the improved list shrinker for strings. This reduces the runtime of the two string tests to just 0.001s. Note how for strings have unique chars with seed 8743 this reduces 4465 succesful shrink attempts out of 4536 down to just 15 out of 30.

The string improvement also helps a bit on fold_left test, fun first which is now down to 67879 and 124382 shrink attempts with seed 8743 and 6789, respectively.

                                                         iteration seed 1234                   iteration seed 8743                   iteration seed 6789               total
Shrink test name                                  Q1/s  #succ/#att   Q2/s  #succ/#att   Q1/s  #succ/#att   Q2/s  #succ/#att   Q1/s  #succ/#att   Q2/s  #succ/#att    Q1/s   Q2/s
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
string never has a \255 char                      0.000   14/28      0.001   59/318     0.001   15/24      0.005   97/529     0.001   21/38      0.003   41/194     0.001  0.008
strings have unique chars                         0.000   13/31      0.000   18/30      0.001   15/30      0.003   24/52      0.000    5/14      0.000   15/20      0.001  0.003
lists shorter than 432                            0.158 1632/19461   0.893  412/457     0.200 1677/20004   0.934  405/450     0.069 1735/20729   0.062  419/447     0.427  1.889
lists shorter than 4332                           0.140   13/67      3.365 4022/4087    0.116   10/35      3.214 4020/4067    0.018    8/55      2.391 4013/4055    0.273  8.970
lists equal to duplication                        0.160   26/35      0.382    4/7       0.000    6/12      0.000    3/6       0.009   25/35      0.103   17/35      0.169  0.485
fold_left fold_right uncurried                    0.003   25/125     0.046  376/1550    0.023   34/119     2.162 2064/8057    0.000    5/20      0.000    4/17      0.026  2.209
fold_left test, fun first                         0.017  275/543     0.001   15/28     669.151 1006/67879   3.552   47/9773   110.429  726/124382  0.002   36/64     779.596  3.555
                                                                                                                                                                   780.559 19.211

Step 3 Here's the performance of adding the improved function shrinker on top.
This last one restores decent performance for fold_left test, fun first which is now down to 1551 and 1483 shrink attempts with seed 8743 and 6789, respectively.

                                                         iteration seed 1234                   iteration seed 8743                   iteration seed 6789               total
Shrink test name                                  Q1/s  #succ/#att   Q2/s  #succ/#att   Q1/s  #succ/#att   Q2/s  #succ/#att   Q1/s  #succ/#att   Q2/s  #succ/#att    Q1/s   Q2/s
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
string never has a \255 char                      0.000   14/28      0.001   59/318     0.000   15/24      0.005   97/529     0.001   21/38      0.003   41/194     0.002  0.008
strings have unique chars                         0.000   13/31      0.000   18/30      0.001   15/30      0.003   24/52      0.000    5/14      0.000   15/20      0.002  0.003
lists shorter than 432                            0.160 1632/19461   0.904  412/457     0.197 1677/20004   0.909  405/450     0.067 1735/20729   0.060  419/447     0.425  1.873
lists shorter than 4332                           0.135   13/67      3.534 4022/4087    0.116   10/35      3.196 4020/4067    0.020    8/55      2.364 4013/4055    0.270  9.094
lists equal to duplication                        0.156   26/35      0.381    4/7       0.000    6/12      0.000    3/6       0.009   25/35      0.103   17/35      0.165  0.483
fold_left fold_right uncurried                    0.002   44/199     0.044  376/1550    0.032   55/170     2.106 2064/8057    0.000    5/15      0.000    4/17      0.034  2.151
fold_left test, fun first                         0.016  275/543     0.000   15/28      4.206  305/1551    3.254   47/9773    0.272  296/1483    0.002   36/64      4.494  3.257
                                                                                                                                                                    5.456 18.971

Step 4 Here's the performance of adding the improved int shrinker on top.
Note how lists shorter than 432 has its total shrink attempt reduced from 19461, 20004, and 20729 down to 4881, 4728, 4793 total attempts.

                                                         iteration seed 1234                   iteration seed 8743                   iteration seed 6789               total
Shrink test name                                  Q1/s  #succ/#att   Q2/s  #succ/#att   Q1/s  #succ/#att   Q2/s  #succ/#att   Q1/s  #succ/#att   Q2/s  #succ/#att    Q1/s   Q2/s
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
string never has a \255 char                      0.000   14/28      0.001   59/318     0.000   15/24      0.003   97/529     0.001   21/38      0.002   41/194     0.002  0.005
strings have unique chars                         0.000   13/31      0.000   18/30      0.001   15/30      0.002   24/52      0.000    5/14      0.000   15/20      0.001  0.002
lists shorter than 432                            0.107  417/4881    0.831  412/457     0.143  404/4728    0.871  405/450     0.021  407/4793    0.060  419/447     0.271  1.762
lists shorter than 4332                           0.133   13/67      3.335 4022/4087    0.083   10/35      3.583 4020/4067    0.010    8/55      2.265 4013/4055    0.226  9.183
lists equal to duplication                        0.166   20/23      0.385    4/7       0.000    3/6       0.000    3/6       0.010   18/21      0.101   17/35      0.176  0.487
fold_left fold_right uncurried                    0.002   39/233     0.043  376/1550    0.026   49/161     2.119 2064/8057    0.000    2/14      0.000    4/17      0.028  2.163
fold_left test, fun first                         0.017  272/534     0.001   15/28      4.225  305/1558    3.208   47/9773    0.274  288/1434    0.002   36/64      4.516  3.211
                                                                                                                                                                    5.241 18.957

Statistical test For reference here's the total timings of 10 runs of step 3:

5.456
5.508
5.441
5.442
5.406
5.414
5.417
5.454
5.435
5.507

and 10 runs of step 4:

5.241
5.232
5.269
5.240
5.376
5.289
5.405
5.413
5.416
5.408

and the output of ministat:

x shrink_bench.log-stepwise-step3-improved-function-shrinker-timings
+ shrink_bench.log-stepwise-step4-improved-int-shrinker-timings
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|+    ++                 +            +                                                       +                  * +  * *           x   x       xx                                xx|
|          |___________________________________________________A______________________________M_____________________||__________________M___A______________________|                |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  10         5.406         5.508         5.442         5.448   0.035458896
+  10         5.232         5.416         5.376        5.3289   0.081052041
Difference at 95.0% confidence
	-0.1191 +/- 0.0587783
	-2.18612% +/- 1.0789%
	(Student's t, pooled s = 0.062557)

@jmid
Copy link
Collaborator Author

jmid commented Apr 21, 2022

I will rebase this PR shortly. For reference, here's the considered int shrinker replacement:

   (* inspired by QCheck2's int shrinker algorithm (non-exhaustive) *)
   let int x yield =
     let curr = ref 0 in (*to return 0 repeatedly *)  (*was: let curr = ref (x/2) *)
     (* try some divisors *)
     while !curr <> x do
       yield !curr;
       let half_diff = (x - !curr)/2 in (*was: let half_diff = (x/2) - (!curr/2) in *)
       if half_diff = 0
       then curr := x
       else curr := !curr + half_diff
     done

Edit: To avoid too much forcing and overwriting I'll submit a separate PR instead.

if x<0 then yield (x+1);
()
while !curr <> x do
yield !curr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to your merge request but what is yield? As x is continuously shrinking I guess it it's in charge of both marking the numbers of steps but also stop when the shrinker found the minimal value?

@vch9
Copy link
Contributor

vch9 commented May 6, 2022

The int shrinker improvement makes sense imo. Starting at 0 should help in the vast majority of cases, it also fix the case where the last shrinked element is repeated. LGTM

@jmid
Copy link
Collaborator Author

jmid commented May 9, 2022

Closing as the 3 dominant improvements have been included as part of #242.
Note: during review PR #242 was further extended with an improved char shrinker, which brought the QCheck(1) shrinker benchmark time down to 2 secs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants