Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster chunkedStream_getByte() #5005

Merged
merged 2 commits into from
Jul 18, 2014

Conversation

fkaelberer
Copy link
Contributor

Even though #4966 accelerated ensureRange() a lot, it is faster to avoid the check altogether. If the byte array is non-zero at the requested position, the data must have been loaded already, so we can skip the range check.

This speeds up some PDFs considerably, for example, page 11 of the tracemonkey paper now loads 20-30% faster.

@CodingFabian
Copy link
Contributor

totally makes sense, nice find

@Snuffleupagus
Copy link
Collaborator

/botio test

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.22.172.223:8877/1f75c93a1085574/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.21.233.14:8877/0d238b05d33328b/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/1f75c93a1085574/output.txt

Total script time: 21.06 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/0d238b05d33328b/output.txt

Total script time: 23.52 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@timvandermeij
Copy link
Contributor

/botio-linux preview

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/1bd4974acd87bba/output.txt

@timvandermeij
Copy link
Contributor

Looks really good to me.

this.ensureByte(pos);
}
this.pos++;
return byte;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this return 0 if ensureByte was needed? There should be a byte = this.bytes[pos]; after the ensureByte.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the byte at position pos has already been loaded, then it is correct to return 0 at return byte.
However, if this.bytes[pos] has not been loaded yet, then this.ensureByte(pos) will throw an exception and no value will be returned at all. I admit that this is not so obvious from the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could rename ensureByte? (and then also fix the name of the anon function which is right now ensureRange)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fkaelberer Perhaps it would be a good idea to add a comment here about ensureByte throwing an exception if the byte isn't loaded, to avoid confusion.

@samlh ensureByte doesn't modify this this.bytes in any way, please see: /~https://github.com/mozilla/pdf.js/blob/master/src/core/chunked_stream.js#L94.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could rename ensureByte? (and then also fix the name of the anon function which is right now ensureRange)

This does not seem relevant to the current PR.
(Furthermore, I don't understand what's wrong (or unclear) with the currently used names. E.g. ensureByte ensures that the byte we're trying to get actually exists, so that name looks fine to me.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the projects i have seen so far, ensure had the meaning of getOrCreate, which in this context would be: if its there ok, otherwise load it.
I personally would prefer "check" or "verify" as name, but thats just bikeshedding :)

@fkaelberer
Copy link
Contributor Author

Added a text comment to the code.

@samlh
Copy link

samlh commented Jun 27, 2014

Yeah, I should have looked a bit more closely. In my opinion, the naming is fine, I just (thoughtlessly) assumed blocking instead of exception/retry. Thanks for humoring my confusion.

@fkaelberer
Copy link
Contributor Author

I added another commit that avoids calling getByte() repeatedly in loops.

@CodingFabian
Copy link
Contributor

i reviewed it and looks good to me. makes good sense and I would like to see it merged.

@Snuffleupagus
Copy link
Collaborator

/botio test

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.22.172.223:8877/022c09f82d50fda/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.21.233.14:8877/9cf188a9e4183f9/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/9cf188a9e4183f9/output.txt

Total script time: 36.01 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

Snuffleupagus added a commit that referenced this pull request Jul 18, 2014
@Snuffleupagus Snuffleupagus merged commit 9c6316f into mozilla:master Jul 18, 2014
@Snuffleupagus
Copy link
Collaborator

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants