Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable syntax highlighting for inline code #17585

Merged
merged 9 commits into from
Apr 2, 2021

Conversation

a-mr
Copy link
Contributor

@a-mr a-mr commented Mar 30, 2021

Add highlighting of text in single backticks as Nim code, just like it works for code-blocks.

An example from Nim Manual:

image

An artificial example:

image

the corresponding RST text:

Use `import std/os` statement to import this module.
More examples: `proc f()` and `var x: int` and `result = 5`.
So `"abc".match(re"(\w)").get.captures[0] == "a"`.

And 6 other languages that ``highlite.nim`` supports too:

* Python: `class X(object): pass`:python:
* C: `typedef unsigned char BYTE;`:c:
* YAML: `- {name: John Smith, age: 33}`:yaml:
* Java: `double x = Math.sin(1);`:java:
* C#: `listOfFoo.Where(delegate(Foo x) { return x.size > 10; });`:csharp:
* C++: `throw std::runtime_error("error");`:cpp:

Literals without highlighting still can be input: `import`:literal: and ``import``.

It also highlights blindly non-code like this:

image

To avoid this spurious highlighting the following rule is added to contributing.rst for using single and double backticks:

  • use single backticks for fragments of code in Nim and other
    programming languages, including identifiers

  • prefer double backticks otherwise:

    • for file names: ``os.nim``
    • for fragments of strings not enclosed by " and " and not
      related to code, e.g. text of compiler messages
    • for command line options: ``--docInternal``
    • also when code ends with a standalone \ (otherwise a combination of
      \ and a final ` would get escaped)

Ref. nim-lang/RFCs#355 for discussion on syntax.

Later it will be possible to switch default role (programming language or just literal), ref bug #17340.
cc @narimiran @timotheecour

@timotheecour
Copy link
Member

timotheecour commented Mar 30, 2021

@a-mr

  • how about disable the automatic syntax highlighting as nim for single backtick that doesn't contain an explicit :nim: role? I'm assuming this should be a small change in your PR.

IMO that's the only controversial thing in this PR due to the other implications you listed. In subsequent PR we can revisit that point, so that it doesn't block the rest of this PR

  • in future work, it would be nice to support string litterals with syntax highlighting, which emit could then use, either explicitly or via compiler logic, eg:
{.emit: """
#include <stdio.h>
""".lang(cpp).}

const s = """
import os
""".lang(py)
writeFile("foo.py", s)

# or even reusing apostrophe syntax, eg: 
const s = """
import os
"""'py

where lang(cpp) would not affect codegen but would affect rendering

@a-mr
Copy link
Contributor Author

a-mr commented Mar 30, 2021

@timotheecour
I grepped for both single and double backticks and estimated roughly that it's actually code in more than 95% cases in *.nim files (and it's almost always Nim code) and more than 50% in *.rst in this repo. You can grep and check yourself.

  • So it makes sense to make Nim code the default option and change to double backticks only when needed, at least for *.nim.
  • Also different *.rst files have different proportions of those 2 cases, so it makes sense to change .. default-role:: literal only for some of them.

default-role can be implemented in this PR or in a follow-up.

@timotheecour
Copy link
Member

timotheecour commented Mar 30, 2021

then how about this for single backtick without explicit role:

  • if in nim source file, use implicit nim role and syntax highlight as nim
  • if in rst source file, don't use implicit nim role; at least for now (can be revisited later, maybe)

that fits well with your 95% vs 50% stats, as well as with rst spec (ie honors default-role:code which is now present in most/all rst files); in subsequent PR we can think about whether to change those rst files to default-role:nim or similar)

@a-mr
Copy link
Contributor Author

a-mr commented Mar 30, 2021

I think we will have another spell in every rst file instead:

.. role:: nim(code)
   :language: nim
.. default-role:: nim

Github does not highlight it but it's still rendered as code: /~https://github.com/a-mr/Nim/blob/test-branch/doc/tut2.rst

(of course, currently Nim would say "Error: invalid directive: 'role'")

@timotheecour
Copy link
Member

I think we will have another spell in every rst file instead:

can these lines be replace by a single include instead?
something like:

.. include:: rstcommon.rst

and then we can add to rstcommon.rst whatever's in common, eg:

.. role:: nim(code)
   :language: nim
.. default-role:: nim

(and perhaps more, eg related to a common index, or C language highlighting, etc)
(maybe #4864 needs to be fixed first?)

@a-mr
Copy link
Contributor Author

a-mr commented Mar 30, 2021

Tried it. include works for rst2html.py but does not for Github: /~https://github.com/a-mr/Nim/blob/test-branch/doc/tut1.rst — it displayed as a normal text.

(#4864 seems unrelated)

@timotheecour
Copy link
Member

timotheecour commented Mar 30, 2021

ok how about:

.. default-role:: code
.. include:: rstcommon.rst

which should at least display as code in github, and then still allows factoring all common definitions in 1 place (including overriding .. default-role:: code if needed)

@a-mr
Copy link
Contributor Author

a-mr commented Mar 31, 2021

Agreed.

Added missing parts for default-role and some minimal support for role directive.

@@ -514,10 +525,17 @@ proc defaultFindFile*(filename: string): string =
if fileExists(filename): result = filename
else: result = ""

proc defaultRoleKind(options: RstParseOptions): RstNodeKind =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of adding defaultRoleKind, you could infer it from currRole, eg:

p.s.currRoleKind = defaultRoleKind(p.s.options)
p.s.currRole = defaultRole(p.s.options)

=>

p.s.currRole = defaultRole(p.s.options)
p.s.currRoleKind = whichRole(p, p.s.currRole)

(a bit more DRY)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

proc defaultRoleKind(options: RstParseOptions): RstNodeKind =
if roNimFile in options: rnInlineCode else: rnInlineLiteral
proc defaultRole(options: RstParseOptions): string =
if roNimFile in options: "nim" else: "literal"
Copy link
Member

@timotheecour timotheecour Apr 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add

type BuiltinRoles = enum
  brUnknown = "unknown"
  brNim = "nim"
  brLiteral = "literal"
  brCode = "code"
  ... # doesn't need to be a complete list

and then use it everytime you're using one of those builtin roles, it's more typesafe eg:

if roNimFile in options: $brNim else: $brLiteral

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too little profit for a new type. Most roles are met only once and immediately converted to RstNodeKind. The only exception is "nim" and "literal", which occur twice, but both occurrences are within a few lines.

@@ -1018,15 +1036,43 @@ proc fixupEmbeddedRef(n, a, b: PRstNode) =
for i in countup(0, sep - incr): a.add(n.sons[i])
for i in countup(sep + 1, n.len - 2): b.add(n.sons[i])

proc whichRole(sym: string): RstNodeKind =
case sym
const supportedLanguages = ["nim", "yaml", "python", "java", "c",
Copy link
Member

@timotheecour timotheecour Apr 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that supposed to stay in sync with

const
  sourceLanguageToStr*: array[SourceLanguage, string] = ["none",
    "Nim", "C++", "C#", "C", "Java", "Yaml", "Python"]

from lib/packages/docutils/highlite.nim? if so, refactor (DRY)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally, yes, but mirror should be maintained anyway, because c# role would not be allowed according to RST spec:

A role name is a single word consisting of alphanumerics plus isolated internal hyphens, underscores, plus signs, colons, and periods; no whitespace or other characters are allowed.

result = newRstNode(rnInlineCode)
var args = newRstNode(rnDirArg)
var lang = language
if language == "cpp": lang = "c++"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something is inconsistent:

.. code-block:: cpp

  #include <stdio.h>

.. code-block:: c++

  #include <stdio.h>

`#include <stdio.h>`:cpp:

`#include <stdio.h>`:c++:

gives
image

=> depending on block vs inline, the syntax that works is c++ vs cpp

we should have one that works in all contexts, and recommend it (including in example from eariler).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that :c++: is not parsed is another bug in our parser. Added to the list of minor bugs.

The proper fix (with code reusing) is not trivial, it's better to handle in another PR.

@@ -1052,14 +1098,23 @@ proc parsePostfix(p: var RstParser, n: PRstNode): PRstNode =
result = newRstNode(newKind, newSons)
elif match(p, p.idx, ":w:"):
# a role:
newKind = whichRole(nextTok(p).symbol)
let roleName = nextTok(p).symbol
newKind = whichRole(p, roleName)
if newKind == rnGeneralRole:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case newKind
of ...
of ...
else: ...

dispA(d.target, result, blockStart,
"\\begin{rstpre}\n" & n.anchor.idS & "\n", [])
else: # rnInlineCode
blockStart = "<tt class=\"docutils literal\"><span class=\"pre\">"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer """ ... """ to make it easier to copy paste

@@ -201,14 +206,14 @@ not in table"""
`|` outside a table cell should render as `\|`
consistently with markdown, see https://stackoverflow.com/a/66557930/1426932
]#
doAssert output1 == """
check(output1 == """
Copy link
Member

@timotheecour timotheecour Apr 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about import std/strformat and using &"". {...}..""" to avoid breaking the string?

ditto in other examples below

IMO, resulting code is less noisy, easier to read/edit

check """`foo\`bar`""".toHtml == """<tt class="docutils literal"><span class="pre">foo`bar</span></tt>"""
check """`\`bar`""".toHtml == """<tt class="docutils literal"><span class="pre">`bar</span></tt>"""
check """`a\b\x\\ar`""".toHtml == """<tt class="docutils literal"><span class="pre">a\b\x\\ar</span></tt>"""
check("""`foo.bar`""".toHtml ==
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these get more and more complicated to read/write/maintain, we should really look into timotheecour#676 at some point

Copy link
Member

@timotheecour timotheecour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, remaining comments can be addressed preferably before merging but in followup PR is ok too

a-mr and others added 3 commits April 2, 2021 19:29
Co-authored-by: Timothee Cour <timothee.cour2@gmail.com>
Co-authored-by: Timothee Cour <timothee.cour2@gmail.com>
@a-mr a-mr mentioned this pull request Apr 2, 2021
28 tasks
Copy link
Member

@timotheecour timotheecour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still LGTM , and previous LGTM comment still applies: remaining comments can be addressed either in this PR or in followup PR

@Araq Araq merged commit e35946f into nim-lang:devel Apr 2, 2021
@timotheecour timotheecour added the TODO: followup needed remove tag once fixed or tracked elsewhere label Apr 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TODO: followup needed remove tag once fixed or tracked elsewhere
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants