Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for skipping the next translation group #69

Merged
merged 1 commit into from
Sep 5, 2023

Conversation

dyoo
Copy link
Collaborator

@dyoo dyoo commented Aug 30, 2023

This is the beginning of support for handling special comments for translation, #63

This adds support for adding a comment of the form: <!-- mdbook-xgettext: skip -->. This will cause the system to skip the next message group that would otherwise be translated.

It adds a dependency to the regex crate to match for the comment skip pattern.

@mgeisler
Copy link
Collaborator

Wow, this looks great!

src/lib.rs Show resolved Hide resolved
Copy link
Collaborator

@djmitche djmitche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks solid. However, I suspect that mdbook-xgettext: skip is rather a lot to remember when writing a slide.

Thoughts on using <!-- NO-TRANSLATE --> instead? I think the use of all-caps signifies that this comment is for machine interpretation, and the words are specific enough that they would not conflict with anything not translation-related, so there is no need to scope them with mdbook-xgettext.

@mgeisler
Copy link
Collaborator

Thoughts on using <!-- NO-TRANSLATE --> instead? I think the use of all-caps signifies that this comment is for machine interpretation, and the words are specific enough that they would not conflict with anything not translation-related, so there is no need to scope them with mdbook-xgettext.

Yeah, it's rather long indeed... I guess I was thinking people would copy-paste in front of large code blocks as needed.

I suggested the long form in #63 and it's slightly inspired by how xgettext supports special xgettext:c-format and xgettext:no-c-format comments. However, the manual also talks about a TRANSLATORS: prefix, which is more your style (I believe you trigger this with xgettext --add-comments=TRANSLATORS).

I'm not entirely sure where we should land, but we should try to think of how skipping can work with more directives in the future: translator comments and a message context.

@dyoo
Copy link
Collaborator Author

dyoo commented Aug 30, 2023

@mgeisler 's questions on what happens with the current implementation and inline sections made me reconsider. I didn't know offhand what exactly would happen, and that to me is a warning sign.

I'd rather it be obviously clear. So maybe it might make a safer and robust feature to have the skips define regions, rather than implicitly skip the next block. Explicitly marking off the start and end of a skip section feels both simpler and clearer to me.

It would look something like this

<!-- mdbook-xgettext: skip-start -->
This content is ignored.
<!-- mdbook-xgettext: skip-end -->

This would make it easier to see what sections are being ignored, removing ambiguity on the boundaries of the skipped region.

Would you be amendable to this proposed change to the feature?

@mgeisler
Copy link
Collaborator

It would look something like this

<!-- mdbook-xgettext: skip-start -->
This content is ignored.
<!-- mdbook-xgettext: skip-end -->

This would make it easier to see what sections are being ignored, removing ambiguity on the boundaries of the skipped region.

Would you be amendable to this proposed change to the feature?

I'm more thinking that we could have both. The immediate use case for skipping is to remove some very large code listings from the PO files for Comprehensive Rust. There your current implementation will already work great and I think we should merge it already.

A next step could be to support skipping regions like you suggest — I think this is a great idea! Infact, I just found a tool today which supports similar comments: mdpo.

Seeing the comments they support gives me an idea: we could use i18n as a shorthand for mdbook-i18n-helpers and implement these comments:

  • i18n-skip: this PR, skips next event,
  • i18n-disable: starts skipping events,
  • i18n-enable: stops skipping,
  • i18n-comment: translator comment,
  • i18n-context and i18n-ctx: sets message context for next (or following?) event.

@dyoo
Copy link
Collaborator Author

dyoo commented Aug 31, 2023

Sounds good! I have added more tests but have left the rest of the implementation as is; we can follow up with changing the keyword in a subsequent PR.

@@ -207,13 +222,14 @@ pub fn group_events<'a>(events: &'a [(usize, Event<'a>)]) -> Vec<Group<'a>> {
// make the group self-contained.
Event::Start(Tag::Paragraph | Tag::CodeBlock(..)) => {
// A translatable group starts here.
groups.push(state.into_group(idx, events));
groups.push(state.into_group(idx, events, &mut skip_next_group));

state = State::Translate(idx);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to avoid the &mut bool in-out parameter in into_group. I find that in-out parameters makes it hard to know what is updated where (and why it is updated).

I think we could avoid it if we instead change state to State::Skip here when skip_next_group == true. Perhaps we could have a pure function which takes a State and a Boolean and returns the new state and the new Boolean?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. I'll add that in. I had done it that way initially, and started getting tired of the state threading. But given that we'll eventually expand what we're keeping track of to include translator comments and message contexts, we might as well prepare the way for it.

I've adjusted the code to manage a GroupingContext object, and have into_group() consume and produce it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much!

@dyoo
Copy link
Collaborator Author

dyoo commented Sep 3, 2023

I note that there is appears to be a bug in the Markdown parser that I've reported upstream as pulldown-cmark/pulldown-cmark#712. The bug causes content on the same line as an inline HTML comment to be "glued" to the comment. In the case of our comment-skipping directive, this means that some of the behavior in this PR is due to accident rather than intention. :)

I do have a workaround for the inline HTML comment bug, prepared in main...dyoo:mdbook-i18n-helpers:isolate_first_html_comment. If it turns out that fixing the bug upstream is taking too much time, I can clean up that branch and prepare it for a PR. I wanted to give you a heads up so that we're aware of the issue.

Copy link
Collaborator

@mgeisler mgeisler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this should be a great start!

@mgeisler mgeisler changed the title Implements a comment directive to skip the next translation group. Add support for skipping the next translation group Sep 4, 2023
This adds support for adding a comment of the form: `<!--
mdbook-xgettext: skip -->`. This will cause the system to skip the
next message group that would otherwise be translated.

It adds a dependency to the regex crate to match for the comment skip
pattern.

Tests were added, including tests that cover some odd situations
involving inline HTML, which unfortunately appear to be due to
pulldown-cmark/pulldown-cmark#712.
@mgeisler mgeisler merged commit 08efd0b into google:main Sep 5, 2023
@mgeisler
Copy link
Collaborator

mgeisler commented Sep 5, 2023

Hey @dyoo, thanks for the update! I ran dprint fmt (the formatter we use in this repository) and pushed the result back to your branch. I also squashed the commits: that way every commit will pass all CI tests.

In general, I prefer to edit existing commits instead of having a string of commits in the history — especially if the string of commits represent fixes to comments that came up through code review. I wrote down some thoughts on this more than 10 years ago, in case you're curious 😄

@dyoo dyoo deleted the skip branch September 6, 2023 05:07
mgeisler added a commit to google/comprehensive-rust that referenced this pull request Sep 26, 2023
This builds on the work of @dyoo in
google/mdbook-i18n-helpers#69: by adding a
special `<!-- mdbook-xgettext: skip -->` comment, we can skip the
following code block.

I also modified a few code blocks to remove translatable text:
variable names are not expected to be translated, so it’s fine to have
a line with `println!("foo: {foo}")` in the code block.

Part of #1257.
mgeisler added a commit to google/comprehensive-rust that referenced this pull request Sep 26, 2023
This builds on the work of @dyoo in
google/mdbook-i18n-helpers#69: by adding a
special `<!-- mdbook-xgettext: skip -->` comment, we can skip the
following code block.

I also modified a few code blocks to remove translatable text: variable
names are not expected to be translated, so it’s fine to have a line
with `println!("foo: {foo}")` in the code block.

This PR removes 36 messages from the POT file. The number of lines drop
by 633 (3%).

Part of #1257.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants