Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add XPath and XQuery lexers #1089

Merged
merged 26 commits into from
Jul 1, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
60957f3
Add XPath and XQuery samples
MaximeKjaer Jun 14, 2019
a204bd8
Add XPath 2.0 lexer
MaximeKjaer Feb 4, 2019
a24b4fd
Add XPath 3.0 support
MaximeKjaer Feb 5, 2019
1fbf275
Add XPath 3.1 support
MaximeKjaer Feb 5, 2019
ee02f3c
Fix XPath 3.1 lexing
MaximeKjaer Feb 5, 2019
be3fd56
Fix XPath lexer description
MaximeKjaer Feb 5, 2019
b2c1709
Merge root and operator states in XPath lexer
MaximeKjaer Feb 6, 2019
4ecc931
Fix small bugs in XPath lexer
MaximeKjaer Feb 7, 2019
8b471d3
Optimize stack usage in XPath lexer
MaximeKjaer Feb 8, 2019
0d0f0a2
Add XQuery 3.1 lexer
MaximeKjaer Jun 14, 2019
2ec2ed2
Move regexes to object methods
MaximeKjaer Jun 7, 2019
9384abb
Add count argument to pop! instead of calling it twice
MaximeKjaer Jun 7, 2019
02b524d
Change Text tokens for whitespace to Text::Whitespace
MaximeKjaer Jun 7, 2019
c7cb938
Fix order of token and state commands to be consistent
MaximeKjaer Jun 7, 2019
64f0276
Add spacing to method definitions
MaximeKjaer Jun 24, 2019
96f2614
Change regexes to %r syntax
MaximeKjaer Jun 24, 2019
a90fe44
Replace goto with push/pop to keep :root at the bottom of the stack
MaximeKjaer Jun 24, 2019
9b30f21
Rename kindTestForPi to kindTestForPI
MaximeKjaer Jun 30, 2019
532834f
Remove word boundary from beginning of regexes
MaximeKjaer Jun 30, 2019
9e9d3c4
Replace Keyword::Reserved with Operator token for wildcards
MaximeKjaer Jun 30, 2019
5deaee0
Add mixin for comments and whitespace
MaximeKjaer Jun 30, 2019
70631b9
Escape curly brackets in XQuery lexer
MaximeKjaer Jun 30, 2019
511e19a
Remove unnecessary push of :root state
MaximeKjaer Jun 30, 2019
b96fb12
Remove unnecessary capturing group
MaximeKjaer Jun 30, 2019
971db33
Escape curly brackets in XPath lexer
MaximeKjaer Jun 30, 2019
c5ceb2c
Add XPath spec
MaximeKjaer Jun 30, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions lib/rouge/demos/xpath
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
(: Authors named Bob Joe who didn't graduate from Harvard :)
//author[first-name = "Joe" and last-name = "Bob"][degree/@from != "Harvard"]
22 changes: 22 additions & 0 deletions lib/rouge/demos/xquery
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
declare namespace html = "http://www.w3.org/1999/xhtml";

declare function local:test-function($catalog as document-node()) {
<html>
<head>
<title>XQuery example for the Rouge highlighter</title>
<link href="style.css"/>
</head>
<body>
<h1>List</h1>
<ul>
{for $product in $catalog/items/product[@sell-by > current-date()] return
<li>
<ul>
<li>{data($product/name)}</li>
<li>{$product/price * (1 + $product/tax)}$</li>
</ul>
</li>}
</ul>
</body>
</html>
};
332 changes: 332 additions & 0 deletions lib/rouge/lexers/xpath.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,332 @@
# -*- coding: utf-8 -*- #
# frozen_string_literal: true

module Rouge
module Lexers
class XPath < RegexLexer
title 'XPath'
desc 'XML Path Language (XPath) 3.1'
tag 'xpath'
filenames '*.xpath'

# Terminal literals:
# https://www.w3.org/TR/xpath-31/#terminal-symbols
def self.digits
@digits ||= %r/[0-9]+/
end

def self.decimalLiteral
@decimalLiteral ||= %r/\.#{digits}|#{digits}\.[0-9]*/
end

def self.doubleLiteral
@doubleLiteral ||= %r/(\.#{digits})|#{digits}(\.[0-9]*)?[eE][+-]?#{digits}/
end

def self.stringLiteral
@stringLiteral ||= %r/("(("")|[^"])*")|('(('')|[^'])*')/
end

def self.ncName
@ncName ||= %r/[a-z_][a-z_\-.0-9]*/i
end

def self.qName
@qName ||= %r/(?:#{ncName})(?::#{ncName})?/
end

def self.uriQName
@uriQName ||= %r/Q\{[^{}]*\}#{ncName}/
end

def self.eqName
@eqName ||= %r/(?:#{uriQName}|#{qName})/
end

def self.commentStart
@commentStart ||= %r/\(:/
end

def self.openParen
@openParen ||= %r/\((?!:)/
end

# Terminal symbols:
# https://www.w3.org/TR/xpath-30/#id-terminal-delimitation
def self.kindTest
@kindTest ||= Regexp.union %w(
element attribute schema-element schema-attribute
comment text node document-node namespace-node
)
end

def self.kindTestForPI
@kindTestForPI ||= Regexp.union %w(processing-instruction)
end

def self.axes
@axes ||= Regexp.union %w(
child descendant attribute self descendant-or-self
following-sibling following namespace
parent ancestor preceding-sibling preceding ancestor-or-self
)
end

def self.operators
@operators ||= Regexp.union %w(, => = := : >= >> > <= << < - * != + // / || |)
end

def self.keywords
@keywords ||= Regexp.union %w(let for some every if then else return in satisfies)
end

def self.word_operators
@word_operators ||= Regexp.union %w(
and or eq ge gt le lt ne is
div mod idiv
intersect except union
to
)
end

def self.constructorTypes
@constructorTypes ||= Regexp.union %w(function array map empty-sequence)
end

# Mixin states:

state :commentsAndWhitespace do
rule XPath.commentStart, Comment, :comment
rule %r/\s+/m, Text::Whitespace
end

# Lexical states:
# https://www.w3.org/TR/xquery-xpath-parsing/#XPath-lexical-states
# https://lists.w3.org/Archives/Public/public-qt-comments/2004Aug/0127.html
# https://www.w3.org/TR/xpath-30/#id-revision-log
# https://www.w3.org/TR/xpath-31/#id-revision-log

state :root do
mixin :commentsAndWhitespace

# Literals
rule XPath.doubleLiteral, Num::Float
rule XPath.decimalLiteral, Num::Float
rule XPath.digits, Num
rule XPath.stringLiteral, Literal::String

# Variables
rule %r/\$/, Name::Variable, :varname

# Operators
rule XPath.operators, Operator
rule %r/#{XPath.word_operators}\b/, Operator::Word
rule %r/#{XPath.keywords}\b/, Keyword
rule %r/[?,{}()\[\]]/, Punctuation

# Functions
rule %r/(function)(\s*)(#{XPath.openParen})/ do # function declaration
groups Keyword, Text::Whitespace, Punctuation
end
rule %r/(map|array|empty-sequence)/, Keyword # constructors
rule %r/(#{XPath.kindTest})(\s*)(#{XPath.openParen})/ do # kindtest
groups Keyword, Text::Whitespace, Punctuation
push :kindtest
pyrmont marked this conversation as resolved.
Show resolved Hide resolved
end
rule %r/(#{XPath.kindTestForPI})(\s*)(#{XPath.openParen})/ do # processing instruction kindtest
groups Keyword, Text::Whitespace, Punctuation
push :kindtestforpi
end
rule %r/(#{XPath.eqName})(\s*)(#{XPath.openParen})/ do # function call
groups Name::Function, Text::Whitespace, Punctuation
end
rule %r/(#{XPath.eqName})(\s*)(#)(\s*)(\d+)/ do # namedFunctionRef
groups Name::Function, Text::Whitespace, Name::Function, Text::Whitespace, Name::Function
end

# Type commands
rule %r/(cast|castable)(\s+)(as)/ do
groups Keyword, Text::Whitespace, Keyword
push :singletype
end
rule %r/(treat)(\s+)(as)/ do
groups Keyword, Text::Whitespace, Keyword
push :itemtype
end
rule %r/(instance)(\s+)(of)/ do
groups Keyword, Text::Whitespace, Keyword
push :itemtype
end
rule %r/(as)\b/ do
token Keyword
push :itemtype
end

# Paths
rule %r/(#{XPath.ncName})(\s*)(:)(\s*)(\*)/ do
groups Name::Tag, Text::Whitespace, Punctuation, Text::Whitespace, Operator
end
rule %r/(\*)(\s*)(:)(\s*)(#{XPath.ncName})/ do
groups Operator, Text::Whitespace, Punctuation, Text::Whitespace, Name::Tag
end
rule %r/(#{XPath.axes})(\s*)(::)/ do
groups Keyword, Text::Whitespace, Operator
end
rule %r/\.\.|\.|\*/, Operator
rule %r/@/, Name::Attribute, :attrname
rule XPath.eqName, Name::Tag
end

state :singletype do
mixin :commentsAndWhitespace

# Type name
rule XPath.eqName do
token Keyword::Type
pop!
end
end

state :itemtype do
mixin :commentsAndWhitespace

# Type tests
rule %r/(#{XPath.kindTest})(\s*)(#{XPath.openParen})/ do
groups Keyword::Type, Text::Whitespace, Punctuation
# go to kindtest then occurrenceindicator
goto :occurrenceindicator
push :kindtest
end
rule %r/(#{XPath.kindTestForPI})(\s*)(#{XPath.openParen})/ do
groups Keyword::Type, Text::Whitespace, Punctuation
# go to kindtestforpi then occurrenceindicator
goto :occurrenceindicator
push :kindtestforpi
end
rule %r/(item)(\s*)(#{XPath.openParen})(\s*)(\))/ do
groups Keyword::Type, Text::Whitespace, Punctuation, Text::Whitespace, Punctuation
goto :occurrenceindicator
end
rule %r/(#{XPath.constructorTypes})(\s*)(#{XPath.openParen})/ do
groups Keyword::Type, Text::Whitespace, Punctuation
end

# Type commands
rule %r/(cast|castable)(\s+)(as)/ do
groups Keyword, Text::Whitespace, Keyword
goto :singletype
end
rule %r/(treat)(\s+)(as)/ do
groups Keyword, Text::Whitespace, Keyword
goto :itemtype
end
rule %r/(instance)(\s+)(of)/ do
groups Keyword, Text::Whitespace, Keyword
goto :itemtype
end
rule %r/(as)\b/, Keyword

# Operators
rule XPath.operators do
token Operator
pop!
end
rule %r/#{XPath.word_operators}\b/ do
token Operator::Word
pop!
end
rule %r/#{XPath.keywords}\b/ do
token Keyword
pop!
end
rule %r/[\[),]/ do
token Punctuation
pop!
end

# Other types (e.g. xs:double)
rule XPath.eqName do
token Keyword::Type
goto :occurrenceindicator
end
end

# For pseudo-parameters for the KindTest productions
state :kindtest do
mixin :commentsAndWhitespace

# Pseudo-parameters:
rule %r/[?*]/, Operator
rule %r/,/, Punctuation
rule %r/(element|schema-element)(\s*)(#{XPath.openParen})/ do
groups Keyword::Type, Text::Whitespace, Punctuation
push :kindtest
end
rule XPath.eqName, Name::Tag

# End of pseudo-parameters
rule %r/\)/, Punctuation, :pop!
end

# Similar to :kindtest, but recognizes NCNames instead of EQNames
state :kindtestforpi do
mixin :commentsAndWhitespace

# Pseudo-parameters
rule XPath.ncName, Name
rule XPath.stringLiteral, Literal::String

# End of pseudo-parameters
rule %r/\)/, Punctuation, :pop!
end

state :occurrenceindicator do
mixin :commentsAndWhitespace

# Occurrence indicator
rule %r/[?*+]/ do
token Operator
pop!
end

# Otherwise, lex it in root state:
rule %r/(?![?*+])/ do
pop!
end
end

state :varname do
mixin :commentsAndWhitespace

# Function call
rule %r/(#{XPath.eqName})(\s*)(#{XPath.openParen})/ do
groups Name::Variable, Text::Whitespace, Punctuation
pop!
end

# Variable name
rule XPath.eqName, Name::Variable, :pop!
end

state :attrname do
mixin :commentsAndWhitespace

# Attribute name
rule XPath.eqName, Name::Attribute, :pop!
rule %r/\*/, Operator, :pop!
end

state :comment do
# Comment end
rule %r/:\)/, Comment, :pop!

# Nested comment
rule XPath.commentStart, Comment, :comment

# Comment contents
rule %r/[^:(]+/m, Comment
rule %r/[:(]/, Comment
end
end
end
end
Loading