Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(tokenizer): Re-use object for current location #397

Closed
wants to merge 2 commits into from

Conversation

fb55
Copy link
Collaborator

@fb55 fb55 commented Feb 7, 2022

Previously, ctLoc was overwritten on every new character. This change should lead to a great reduction in allocations when using location information.

fb55 added 2 commits February 7, 2022 08:12
Previously, `ctLoc` was overwritten on every new character. This change should lead to a great reduction in allocations when using location information.
@@ -925,7 +953,7 @@ export class Tokenizer {
//------------------------------------------------------------------
private _stateData(cp: number): void {
this.preprocessor.dropParsedChunk();
this.ctLoc = this._getCurrentLocation();
this.updateStartLocation();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels a bit weird having to do this book-keeping — its existence seems to indicate that the data might be out of date and before you access it you’d have to call this function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played around with this some more. Turns out we can actually get rid of location updates in the *DATA states altogether: #402

fb55 added a commit to parse5/parse5-fork that referenced this pull request Feb 11, 2022
Supersedes inikulin#397

Previously, `ctLoc` was overwritten on every new character. inikulin#397 attempted to improve this by maintaining the locations in an updated object, and creating copies when needed.

With this PR, we instead use explicit offsets when creating tokens. There is no more need to update locations; instead, we exploit the fact that a *DATA state is entered only after emitting the previous token.

This change should lead to a great reduction in allocations when using location information.
@fb55 fb55 closed this Feb 11, 2022
@fb55 fb55 deleted the refactor/loc branch February 11, 2022 13:21
@fb55
Copy link
Collaborator Author

fb55 commented Feb 11, 2022

Closing in favour of #402

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants