-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add XPath and XQuery lexers #1089
Conversation
@MaximeKjaer Sorry that it's taken so long to get back to you on this. I'm going through the lexers and will have some comments soon! |
Sorry, there's a lot there! Please don't hesitate to post any questions you might have! Or any corrections to silly things I said :) |
- uriQName should be tested before qName, otherwise it will never match - The "as" keyword should have word delimiters around it
As pointed out by @pyrmont, this reduces the memory footprint when rouge loads all lexers.
Thank you so much for this extensive review 🎉 I'm happy to have learned a bunch while working on this, it's been fun :) I've replied to all your points individually above. As I said in one of the replies, I rebased to make sure all code examples throughout the commit history are from a permissive copyright license, and used with the appropriate copyright notice. Rebasing messed a little bit with the displayed ordering of commits on GitHub. so I'm sorry about that 😬 Apparently this is only a GitHub-specific issue; the commits are fine, and are in the correct order -- it's just GitHub displaying them incorrectly on the PR UI. As the help page says:
|
@MaximeKjaer In order to reduce warnings issued from the Ruby interpreter, we recommend using the Also, it'd be better if individual method definitions are separated by a blank line: # Terminal literals:
# https://www.w3.org/TR/xpath-31/#terminal-symbols
def self.digits
@digits ||= %r/[0-9]+/
end
def self.decimalLiteral
@decimalLiteral ||= %r/\.#{digits}|#{digits}\.[0-9]*/
end
def self.doubleLiteral
@doubleLiteral ||= %r/(\.#{digits})|#{digits}(\.[0-9]*)?[eE][+-]?#{digits}/
end # Operators
rule XPath.operators, Operator
rule %r/\b#{XPath.word_operators}\b/, Operator::Word
rule %r/\b#{XPath.keywords}\b/, Keyword
rule %r/[?,{}()\[\]]/, Punctuation
# Functions
rule %r/(function)(\s*)(#{XPath.openParen})/ do # function declaration
groups Keyword, Text::Whitespace, Punctuation
end For further details and examples, you may refer to #1180 |
@ashmaroli Thank you for the feedback! Spacing is fixed in 64f0276, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this is really an impressive amount of work. Please let me know if any of my comments don't make sense.
Oh, and I think you're missing a spec for XPath.
These aren't necessary
The code review is a lot of work to, so thank you! I've addressed all your comments individually, and added an XPath spec in c5ceb2c. |
@MaximeKjaer Thanks for doing such a massive amount of work and for being so responsive to all our suggestions. Rouge is a better library as a result of this PR :) |
Awesome, thank you all! Hope GitHub updates to this version of Rouge soon. |
@zyv My understanding is that GitHub has its own proprietary syntax highlighting library for the code displayed on GitHub proper. With GitHub Pages, GitHub does use Rouge but it's unfortunately a very old version. I'm hoping they update that soon :( |
Hope they update the version soon. I wrote the Support and in their Forum. Would be of utmost importance for me. |
This PR adds support for XPath 3.1 and XQuery 3.1. These are W3C recommendation languages made for querying tree-structured data.
From the XQuery documentation:
Therefore, the XQuery lexer extends the XPath.
I'm new to Ruby (but not to lexers and parsers!), so if there's anything that could be done better, let me know. For instance, I would have wanted to inherit the regex for
eqName
in the XQuery lexer, but had trouble finding a solution that both allowed me to inherit and use the regex in rules 🤷♂️ There are therefore a few regexes that have been duplicated in the XQuery lexer.I'm keeping the Git history true to the way I wrote these lexers, but I'm happy to rebase, or for the commits to be squashed on merge.
I don't expect CI to pass because of the issue highlighted in #1062.