Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reimplement logical-scheduler walker more efficiently #741

Merged
merged 1 commit into from Oct 21, 2021
Merged

reimplement logical-scheduler walker more efficiently #741

merged 1 commit into from Oct 21, 2021

Conversation

ghost
Copy link

@ghost ghost commented Oct 12, 2021

Lscheduler-walk-graph was reimplemented. It now runs a fairly standard
DFS-based topological ordering, but this is done on the reverse DAG
that's embodied by the structure, allowing the functions at each node
to be called without allocating any storage that the GC has to deal
with. This DFS uses an explicit stack, not recursion. A couple of
earlier attempts to do a DFS-based topological sorting failed in
extreme cases:

  • a straightforward recursive-function based DFS: would die with Lisp
    stack overflow for huge graphs

  • a straightforward explicit-stack DFS on the forward DAG, i.e.,
    collecting a topo-sorted list in reverse order: the list destined to
    become garbage would overwhelm SBCL GC, resulting in "heap exhausted
    during garbage collection" errors in extreme cases (e.g., some 600K+
    node graph encountered in qasm benchmmarks).

Besides the new walker, a few small tweaks in this file:

  • lscheduler-all-instructions implemented more efficiently
  • lscheduler-calculate-log-fidelity tweaked to call binding-from-instr less
  • lscheduler-calculate-volume doc string added

@ghost
Copy link
Author

ghost commented Oct 12, 2021

Here's timing output from make benchmark-nq for NEW WALKER:

+----------------------------+----------+-------+----------+
|        NAME                | TIME (s) | SWAPS | 2Q DEPTH |
+----------------------------+----------+-------+----------+
| STATIC/FULLY-CONNECTED 10  |     2.32 |     0 |        2 |
| STATIC/FULLY-CONNECTED 30  |    17.67 |     0 |        2 |
| STATIC/FULLY-CONNECTED 50  |    50.30 |     0 |        2 |
| STATIC/FULLY-CONNECTED 70  |   102.46 |     0 |        2 |
+----------------------------+----------+-------+----------+
+----------------------------+----------+-------+----------+
|        NAME                | TIME (s) | SWAPS | 2Q DEPTH |
+----------------------------+----------+-------+----------+
| BELL/FULLY-CONNECTED 10    |     4.72 |     0 |        9 |
| BELL/FULLY-CONNECTED 30    |    31.36 |     0 |       29 |
| BELL/FULLY-CONNECTED 50    |    70.14 |     0 |       49 |
| BELL/FULLY-CONNECTED 70    |   128.42 |     0 |       69 |
+----------------------------+----------+-------+----------+
+----------------------------+----------+-------+----------+
|        NAME                | TIME (s) | SWAPS | 2Q DEPTH |
+----------------------------+----------+-------+----------+
| QFT/FULLY-CONNECTED 10     |     9.42 |     0 |       37 |
| QFT/FULLY-CONNECTED 30     |    88.79 |     0 |      117 |
| QFT/FULLY-CONNECTED 50     |   298.36 |     0 |      197 |
| QFT/FULLY-CONNECTED 70     |   638.87 |     0 |      277 |
+----------------------------+----------+-------+----------+
+----------------------------+----------+-------+----------+
|        NAME                | TIME (s) | SWAPS | 2Q DEPTH |
+----------------------------+----------+-------+----------+
| HADAMARD/FULLY-CONNECTED 10 |     1.63 |     0 |        0 |
| HADAMARD/FULLY-CONNECTED 30 |    15.79 |     0 |        0 |
| HADAMARD/FULLY-CONNECTED 50 |    44.58 |     0 |        0 |
| HADAMARD/FULLY-CONNECTED 70 |    91.12 |     0 |        0 |
+----------------------------+----------+-------+----------+
+----------------------------+----------+-------+----------+
|        NAME                | TIME (s) | SWAPS | 2Q DEPTH |
+----------------------------+----------+-------+----------+
| STATIC/LINEAR 10           |     0.69 |     0 |        2 |
| STATIC/LINEAR 30           |     1.41 |     0 |        2 |
| STATIC/LINEAR 50           |     2.19 |     0 |        2 |
| STATIC/LINEAR 70           |     2.99 |     0 |        2 |
+----------------------------+----------+-------+----------+
+----------------------------+----------+-------+----------+
|        NAME                | TIME (s) | SWAPS | 2Q DEPTH |
+----------------------------+----------+-------+----------+
| BELL/LINEAR 10             |     2.57 |     0 |        9 |
| BELL/LINEAR 30             |     8.90 |     0 |       29 |
| BELL/LINEAR 50             |    16.58 |     0 |       49 |
| BELL/LINEAR 70             |    25.02 |     0 |       69 |
+----------------------------+----------+-------+----------+
+----------------------------+----------+-------+----------+
|        NAME                | TIME (s) | SWAPS | 2Q DEPTH |
+----------------------------+----------+-------+----------+
| QFT/LINEAR 10              |     5.64 |    61 |       79 |
| QFT/LINEAR 30              |    16.60 |   641 |      265 |
| QFT/LINEAR 50              |    59.71 |  1821 |      451 |
| QFT/LINEAR 70              |   153.12 |  3601 |      637 |
+----------------------------+----------+-------+----------+
+----------------------------+----------+-------+----------+
|        NAME                | TIME (s) | SWAPS | 2Q DEPTH |
+----------------------------+----------+-------+----------+
| HADAMARD/LINEAR 10         |     0.06 |     0 |        0 |
| HADAMARD/LINEAR 30         |     0.26 |     0 |        0 |
| HADAMARD/LINEAR 50         |     0.59 |     0 |        0 |
| HADAMARD/LINEAR 70         |     1.12 |     0 |        0 |
+----------------------------+----------+-------+----------+

NOTE: the code to run above benchmark data is in a pending PR (#733). However, although running the specific benchmark relies on that PR, this PR does NOT depend on that PR.

NOTE 2: Environment info:

Run on Fri Sep 17 20:40:24 PDT 2021
Hardware:
  Model Name:	MacBook Pro
  Model Identifier:	MacBookPro16,1
  Processor Name:	6-Core Intel Core i7
  Processor Speed:	2.6 GHz
  Number of Processors:	1
  Total Number of Cores:	6
  L2 Cache (per Core):	256 KB
  L3 Cache:	12 MB
Software:
  System Version:	macOS 10.15.7 (19H1217)
  Kernel Version:	Darwin 19.6.0

@ghost
Copy link
Author

ghost commented Oct 12, 2021

FYI, here are charts (as PNG files) showing the main performance improvements of this optimization:

qft-linear-baseline-vs-new-walker-visualization

qft-fully-connected-baseline-vs-new-walker-visualization

Based on CSV data here:
benchmark-nq-data.csv

@stylewarning
Copy link
Member

Super cool! Over 2x improvement in a lot of benchmarks. :D

@notmgsk
Copy link
Member

notmgsk commented Oct 12, 2021

Wunderbar!

@@ -179,14 +179,22 @@
(last-instrs :initform nil
:accessor lscheduler-last-instrs
:documentation "List of the instructions appearing at the \"bottom\" of a logical scheduler. These are sorted topologically ascending: earlier items in the list come logically after deeper items in the list.")
(later-instrs :initform (make-hash-table :test #'eql)
(later-instrs :initform (make-instr-hash-table)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO make-instruction-hash-table

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I guess instr is commonly used in the lscheduler code, so 🤷🏽

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷🏽 here too, but I found instr standalone was more often used than the spelled out variant.

stylewarning
stylewarning previously approved these changes Oct 19, 2021
@stylewarning
Copy link
Member

@mhdavid-hrl I'm ready to merge. All good?

@ghost
Copy link
Author

ghost commented Oct 19, 2021

@mhdavid-hrl I'm ready to merge. All good?

Could I just squash the typo commit with the base commit and eliminate the 1-line commit message, just for a slightly nicer history?

@stylewarning
Copy link
Member

@mhdavid-hrl I'm ready to merge. All good?

Could I just squash the typo commit with the base commit and eliminate the 1-line commit message, just for a slightly nicer history?

Yes of course; go ahead.

…verse DAG

Lscheduler-walk-graph was reimplemented. It now runs a fairly standard
DFS-based topological ordering, but this is done on the reverse DAG
that's embodied by the structure, allowing the functions at each node
to be called without allocating any storage that the GC has to deal
with. This DFS uses an explicit stack, not recursion. A couple of
earlier attempts to do a DFS-based topological sorting failed in
extreme cases:

- a straightforward recursive-function based DFS: would die with Lisp
  stack overflow for huge graphs

- a straightforward explicit-stack DFS on the forward DAG, i.e.,
  collecting a topo-sorted list in reverse order: the list destined to
  become garbage would overwhelm SBCL GC, resulting in "heap exhausted
  during garbage collection" errors in extreme cases (e.g., some 600K+
  node graph encountered in qasm benchmmarks).

Besides the new walker, a few small tweaks in this file:
- lscheduler-all-instructions implemented more efficiently
- lscheduler-calculate-log-fidelity tweaked to call binding-from-instr less
- lscheduler-calculate-volume doc string added
@ghost
Copy link
Author

ghost commented Oct 19, 2021

Done with rebase/squash/fix commit log, and all checks have passed. Good to merge now.

@stylewarning stylewarning merged commit d84d198 into quil-lang:master Oct 21, 2021
@ghost
Copy link
Author

ghost commented Nov 10, 2021

Just adding here that I just noticed this partially addressed issue "Clean up lscheduler-walk-graph #96".

@ghost ghost mentioned this pull request Nov 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants