Feature: branch pruning #844

frantzmiccoli · 2019-01-07T13:07:54Z

Repost of #818 after a rebase on master.

This is:

- [ ] a bugfix
- [x] a new feature

Checklist:

Changes are covered by unit tests
Code style is respected
Commit message explains why the change is made (see /~https://github.com/erlang/otp/wiki/Writing-good-commit-messages)
CHANGELOG.md contains a short summary of the change
Documentation is updated as necessary

Why this change is needed?

Calculation engine was resolving every function by first resolving its arguments including IFs, this was causing significant over evaluation when IFs were used as it meant for every case to be evaluated.

I have tested against 5 files made by 4 different people to ensure that this was not introducing regression, I have observed none. It generates speed improvement from 0% to 80% on those files.

EDIT Completing from the discussion I had with @PowerKiKi

Yes, this is touching the core of the calculation engine, so, it would not be to surprising to introduce regression. I did my best to thoroughly test it, if you see extra tests to perform let me know.

As per the extra public methods:

Stack::getStackItem(): enables code factorization as Calculation::_parseFormula() doesn't manipulate tokens only through a stack and was also manually creating the arrays representing tokens.
Stack::__toString(): was really convenient to debug I would recommend to leave it but I could truncate it.
I followed the result caching logic which introduced a few extra methods in Calculation: setBranchPruningEnabled(), enableBranchPruning(), disableBranchPruning() and clearBranchStore().
Nothing to do with this pull request but I am wondering if some functions are not public just to enable testing like processTokenStack(), I usually prefer to use the reflection API in my unit tests.

A side point about CalculationTest::testBranchPruningFormulaParsing, this test was pretty thick. I could not use a data provider as the expected result testing is way non trivial. So I did split the test in many different functions.

As per the code splitting in the calculation engine itself, the _parseFormula() and processTokenStack() functions are not using context objects that could be pass around to subrountines. I could spend some time splitting those huge methods as I think I have gained some understanding of the calculation engine inner workings but I think this should be for another pull request. I don't think it would deeply impact execution speed.

(pruning). We tag parsed tokens to associate a branch identifier to them

…atibility reasons

stale · 2019-03-08T13:07:58Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
If this is still an issue for you, please try to help by debugging it further and sharing your results.
Thank you for your contributions.

frantzmiccoli · 2019-03-11T15:24:47Z

Did a merge with master to avoid merge conflicts

PowerKiKi · 2019-07-27T04:38:57Z

@MarkBaker this PR still look quite promising should we merge it now ? or would it conflict with your work on the new calculation engine ?

frantzmiccoli · 2019-07-30T08:15:38Z

Side note: we have been using in production since a few weeks now on some pretty advanced sheet without observing any bugs

PowerKiKi · 2019-08-12T01:27:45Z

Thank you for your work and patience. I finally squashed and merged it as 0b387e7

1.9.0 ### Added - When <br> appears in a table cell, set the cell to wrap [#1071](#1071) and [#1070](#1070) - Add MAXIFS, MINIFS, COUNTIFS and Remove MINIF, MAXIF [#1056](#1056) - HLookup needs an ordered list even if range_lookup is set to false [#1055](#1055) and [#1076](#1076) - Improve performance of IF function calls via ranch pruning to avoid resolution of every branches [#844](#844) - MATCH function supports `*?~` Excel functionality, when match_type=0 [#1116](#1116) - Allow HTML Reader to accept HTML as a string [#1136](#1136) ### Fixed - Fix to AVERAGEIF() function when called with a third argument - Eliminate duplicate fill none style entries [#1066](#1066) - Fix number format masks containing literal (non-decimal point) dots [#1079](#1079) - Fix number format masks containing named colours that were being misinterpreted as date formats; and add support for masks that fully replace the value with a full text string [#1009](#1009) - Stricter-typed comparison testing in COUNTIF() and COUNTIFS() evaluation [#1046](#1046) - COUPNUM should not return zero when settlement is in the last period [#1020](#1020) and [#1021](#1021) - Fix handling of named ranges referencing sheets with spaces or "!" in their title - Cover `getSheetByName()` with tests for name with quote and spaces [#739](#739) - Best effort to support invalid colspan values in HTML reader - [#878](#878) - Fixes incorrect rows deletion [#868](#868) - MATCH function fix (value search by type, stop search when match_type=-1 and unordered element encountered) [#1116](#1116) - Fix `getCalculatedValue()` error with more than two INDIRECT [#1115](#1115) - Writer\Html did not hide columns [#985](#985)

frantzmiccoli added 8 commits January 7, 2019 14:04

Introduce elements to identify ifs and enable better branch resolution

5e09192

(pruning). We tag parsed tokens to associate a branch identifier to them

Working branch pruning with tests

147bd46

Branch bruning extratests

029bff2

Other branch pruning tests

edd2b43

Update changelog close PHPOffice#788: branch pruning

075377c

Remove debug comment

9a9eb87

Remove null coalescing operator "??" and other minor changes for comp…

1267900

…atibility reasons

Fix style errors and splitting tests

b1ef7ef

frantzmiccoli mentioned this pull request Jan 22, 2019

Calculation engine error on specific Multi-Cell computation #785

Closed

stale bot added the stale label Mar 8, 2019

MarkBaker removed the stale label Mar 8, 2019

frantzmiccoli force-pushed the feature/branch-pruning branch 2 times, most recently from aa38599 to c7e3e8b Compare March 11, 2019 15:21

frantzmiccoli force-pushed the feature/branch-pruning branch from c7e3e8b to da19445 Compare April 3, 2019 09:26

Resolve merge conflict and remove debugging code

4d8ca2a

frantzmiccoli force-pushed the feature/branch-pruning branch from da19445 to 4d8ca2a Compare April 3, 2019 09:27

PowerKiKi added the pinned pinned issue to avoid them becoming stale label May 26, 2019

PowerKiKi closed this in 0b387e7 Aug 12, 2019

rolandsusans mentioned this pull request Aug 28, 2019

Calculation Branch Pruning should be Off by default #1149

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: branch pruning #844

Feature: branch pruning #844

frantzmiccoli commented Jan 7, 2019

stale bot commented Mar 8, 2019

frantzmiccoli commented Mar 11, 2019 •

edited

Loading

PowerKiKi commented Jul 27, 2019

frantzmiccoli commented Jul 30, 2019

PowerKiKi commented Aug 12, 2019

Feature: branch pruning #844

Feature: branch pruning #844

Conversation

frantzmiccoli commented Jan 7, 2019

Why this change is needed?

stale bot commented Mar 8, 2019

frantzmiccoli commented Mar 11, 2019 • edited Loading

PowerKiKi commented Jul 27, 2019

frantzmiccoli commented Jul 30, 2019

PowerKiKi commented Aug 12, 2019

frantzmiccoli commented Mar 11, 2019 •

edited

Loading