Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simpler doc links: getFilePermissions <#getFilePermissions,string>_ => $getFilePermissions #125

Open
timotheecour opened this issue Jan 30, 2019 · 37 comments

Comments

@timotheecour
Copy link
Member

timotheecour commented Jan 30, 2019

this PR nim-lang/Nim#10492 adds a lot of great documentation how ever it also adds a lot of things like:

`getFilePermissions <#getFilePermissions,string>`_

which are required today by docgen to produce valid links.
This is not DRY for 2 reasons:

  • the name is consistently repeated twice
  • the args are mentioned which is a different kind of DRY violation (args mention in source code as well as in that doc link)

other drawbacks:

  • this makes for less readable docs when browsing source code

  • it increases risk of links becoming out of sync when proc signature changes (eg an extra (optional) param is later added, which can often happen)

  • discourages using proper doc links as it adds a barrier to "get it right" (which explains why so few procs have these links)

  • EDIT: doc links are order-dependent:

proc fn*[T: SomeFloat](a: T)=discard
proc fn*[T: SomeInteger](a: T)=discard

generates:

fn,T
fn,T_2

proposal

  • add a DRY and easy to remember syntax to refer to a symbol (see below for concrete syntax)
  • let docgen resolve it to an actual doc link exactly as it would if user typed getFilePermissions <#getFilePermissions,string>_ in full
  • the only problem is when symbol is overloaded; there are 2 approaches
    • make docgen issue an ambiguity error and require full syntax getFilePermissions <#getFilePermissions,string>_ or something a bit DRY-er getFilePermissions <string>_
    • make docgen resolve to the 1st symbol it finds; shouldn't be a big deal; overloads are generally placed next to each other and oftentimes only 1 is documented anyway;

in any case this is a practicaly tradeoff: we improve situation for the 95% case and make the 5% case well, as bad as it already is today

concrete syntax for the proposal

open to suggestions;
maybe:

see also $`getFilePermissions` which is related # docgen will resolve to same as `getFilePermissions <#getFilePermissions,string>`_
see also $`strutils.foo` which is related # docgen will resolve to same as `strutils.foo <#strutils.foo,string,int>`_ # ie, can use modules in other symbols

benefits:

  • would fix Fix broken links in docs Nim#16337
  • broken links would give CT errors when running nim doc
  • links would be easier to read/write/edit
  • each link would give rise to an entry in the index
  • links wouldn't break when you change a proc to a func or add optional params
  • there are easy ways to handle overloaded procs in the same module, I can describe what I have in mind
@timotheecour timotheecour changed the title DRY doc links: getFilePermissions <#getFilePermissions,string>_ => getFilePermissions DRY doc links: getFilePermissions <#getFilePermissions,string>_ => $getFilePermissions Jan 30, 2019
@Araq Araq changed the title DRY doc links: getFilePermissions <#getFilePermissions,string>_ => $getFilePermissions simpler doc links: getFilePermissions <#getFilePermissions,string>_ => $getFilePermissions Oct 28, 2020
@timotheecour timotheecour changed the title simpler doc links: getFilePermissions <#getFilePermissions,string>_ => $getFilePermissions simpler doc links: getFilePermissions <#getFilePermissions,string>_ => ``$getFilePermissions```` Dec 7, 2020
@timotheecour timotheecour changed the title simpler doc links: getFilePermissions <#getFilePermissions,string>_ => ``$getFilePermissions```` simpler doc links: getFilePermissions <#getFilePermissions,string>_ => "$getFilePermissions`" Dec 7, 2020
@timotheecour timotheecour changed the title simpler doc links: getFilePermissions <#getFilePermissions,string>_ => "$getFilePermissions`" simpler doc links: getFilePermissions <#getFilePermissions,string>_ => $getFilePermissions Dec 7, 2020
@ringabout
Copy link
Member

Yeah, I agree.
Original doc links way makes proc + nosideEffect => func a bit annoying.

@a-mr
Copy link

a-mr commented Feb 11, 2021

RST has the mechanism of custom roles for "interpreted text" (between ``).

So proper RST-style syntax would be something like:

:nim:`proc getFilePermissions`
:nim:proc:`getFilePermissions`

We can parse the text inside `` to generate proper link so we can get close to actual definition of the proc:

:nim:`proc getFilePermissions(string)`
:nim:proc:`getFilePermissions(string)`

Probably we can also study how it's done in sphinx

cc @konsumlamm

@timotheecour
Copy link
Member Author

timotheecour commented Mar 5, 2021

@a-mr

Probably we can also study how it's done in sphinx

we can do better than requiring users to write :py:const: or :py:class:, because nim is strongly typed and nim has access to resolved symbols when running nim doc, in particular nim doc --project can be expected to resolve all references including ones outside a module (or even nim doc without --project thanks to index files; theindex.html is proof this can all work)

:nim:proc getFilePermissions
:nim:proc getFilePermissions(string)

that's not DRY though, and still suffers from being sensitive to things like proc<=>func edits which would break links.

For nim files, all that's needed IMO is a single char to trigger link resolution. eg:

## See also: :`delete` # a reference to `delete` in same module
## See also: :`strutils.delete` # a reference to `delete` in a potentially different module

overloads

When a symbol is overloaded, IMO following is good enough:

## See also: :`httpclient.delete` # references 1st overload of `delete` in `httpclient`

docgen can, for overloads within a module, make it easy to "click to go to next overload". In the rare cases where showing 1st overload isn't good enough, we can extend the syntax, there are many ways, but this isn't needed in 1st version of this feature.

benefits

  • easy to read/write/edit such links in code
  • automatically generate warning when such symbols are not found
  • links are robust to future API additions such as adding params, adding overloads, etc

bikeshedding

I've tried to see which special char would work, see

criterion:

  • the links should use same syntax in rst and nim files
  • it should be as simple as possible, just 1 char to trigger resolving a link

conclusion:
2 options seem to render well in both nim and rst in github (note that nim rst2html and nim doc can always be customize as needed):

option1: `strutils.delete`_

option2: :`strutils.delete`
  • option 1 renders in rst github as a link that points to itself. This is fine since the main use case is nim doc and nim rst2html. Note that the linke can point to something if there's another definition somewhere, for eg:
see also: `strutils.delete`_.

.. _strutils.delete: /~https://github.com/nim-lang

but that's not the intended use (although it opens door for including an auto-generated rst file which would define all those links if we want to make those links work from github viewer).

conclusion

I prefer option 1. Any downside I haven't thought about?

links

@timotheecour
Copy link
Member Author

timotheecour commented Mar 18, 2021

after nim-lang/Nim#17372, there's now another, possibly better option:

strutils.delete_

instead of:

`strutils.delete`_

example:

##[
foo1_

foo.bar2_

works with a comma separating links too:

foo.bar3_, foo.bar4_

`foo.bar5`_

`foo.bar6`_, `foo.bar7`_
##]

example of how it renders in github rst: /~https://github.com/timotheecour/vitanim/blob/3ae7daf173b4e074774af7062476beb28418de5b/testcases/tests/t10468b.rst#id11

@timotheecour
Copy link
Member Author

timotheecour commented Jul 26, 2021

@a-mr if you want to tackle this I'm happy to help, I think your previous contributions to docgen/rst (2 pass approach, underscore link support, sorting, etc) should make this more within reach; I'd be ok with the syntax strutils.delete_ which is as simple as it gets (simply adds an underscore).

For semantics we could do this:

  • if symbol is not overloaded, the hyperlink points to the symbol definition directly
  • if symbol is overloaded, the hyperlink points to the 1st overload; thanks to docgen: sort symbols (fix #17910) Nim#18560, the other overloads will all appear in order right below. In the rare cases where we absolutely must point to a specific overload, we can devise a syntax for that later, but I don't see it as critical

@a-mr
Copy link

a-mr commented Jul 26, 2021

@timotheecour ,
I think I need to clarify 2 things first (note I know very little about the compiler). Let us assume that we are talking about `strutils.delete`_

  • is this new syntax going to "resolve" doc links to modules that are not imported? E.g. if strutils was not imported in the module being processed then docgen would know nothing about delete, right?
    • if no then we should just mechanically fall back to some default HTML link that will be generated e.g. for every proc?
  • what is its interaction with modules system/paths? E.g. where is strutils searched?
    • considering that there can be module with same name with different paths: should not we add possibility to create the same directory hierarchy as main code? I'm a bit bothered by the fact that currently koch docs dumps *.html to web/upload/1.5.1 (except the separate compiler directory). Also should not our link look something like `std/strutils.delete`_ then?

@timotheecour
Copy link
Member Author

timotheecour commented Jul 26, 2021

Let us assume that we are talking about `strutils.delete`_

eg:

# good:
## See also strutils.delete_
# bad:
## See also `strutils.delete`_

is this new syntax going to "resolve" doc links to modules that are not imported? E.g. if strutils was not imported in the module being processed then docgen would know nothing about delete, right?

  • yes, this shall resolve doc links that are not imported in current module (and in fact, this point should be irrelevant), so that in system.nim you can write:
proc compileOption(option, arg: string): bool {.magic: "CompileOptionArg".} =
  ## See also compilesettings.querySetting_

then docgen will transform this into a link (that may not exist yet, depending on how you run nim doc vs nim doc --project):

  ## See also <a class="reference external" href="compilesettings.html#_querySetting">std/compilesettings.querySetting</a>

if no then we should just mechanically fall back to some default HTML link that will be generated e.g. for every proc?

  • This indeed requires adding an additional "short-name anchor" for each symbol foo, eg:
<a id="foo,string,string"></a> # classical anchor
<a id="_foo"></a> # short-name anchor

(it's easy to generate 2 anchors per symbol)
I'm using a prefix _ to disambiguate short-name anchors from other ones; it's an implementation detail as doc comments in user code shall use an intermediate syntax which docgen translates to those.

what is its interaction with modules system/paths? E.g. where is strutils searched?

this is yet another benefit of this RFC: it lets docgen handle resolving of external module links, so that you can write compilesettings.querySetting_ and then docgen transforms it to a correct href regardless of how you run nim doc (so long it's consistent obviously for all modules that are docgen'd).

  • for nim doc --project foo.nim (the newer way to build docs, which is used for building compiler docs, fusion etc), the layout follows the filesystem (and .. is transformed into _._ to avoid going out of outdir, refs nim doc --project: @@ caused issues; use _._ instead Nim#14454)
  • for nim doc foo.nim (which is used by stdlib, currently), layout is flattened

in either way though, docgen should know where to find a module relatively to another module (by inspecting whether user passed in --project, ie optWholeProject flag to see if it's not flattened, and also checking --docRoot)

What's needed is this API:

type DocMode = enum
  dmFlat # for `nim doc`, eg used by stdlib
  dmFilesystem # for `nim doc --project`, eg used by compiler docs, fusion etc

proc genRelativeLink(docMode: DocMode, toModule: string, fromModule: string): string = ...
  runnableExamples:
    assert genRelativeLink(dmFlat, "std/system", "std/compilesettings") == "system.html"
    assert genRelativeLink(dmFilesystem, "compiler/plugins/locals", "compiler/pathutils") == "locals.html"
    assert genRelativeLink(dmFilesystem, "compiler/pathutils", "compiler/plugins/locals") == "_._/pathutils.html"

note 1

for external modules, this should also be possible, but would need some way to register location of external docs in some place; this can be discussed in another RFC:

# we can use a clean syntax for external links:
## See also pkg/regex.findAllBounds_

note 2

it's simplest to do everything in terms of canonical paths (introduced in nim-lang/Nim#16999), ie:

  • system/assertions, fusion/matching, std/tables

happy to discuss more

@Araq
Copy link
Member

Araq commented Jul 27, 2021

Please ensure backwards compatibility, links are shared and kept around for good.

@timotheecour
Copy link
Member Author

we will emit a warning that there is an ambiguity

at what time do you issue a warning? it's about whether the context is known to nim doc or not.

eg, if you call nim doc system.nim which has a comment see also strutils.split_ ; how can nim doc know that split_ could resolve to both iterator or proc, given that it's in another module it hasn't seen yet?

if you issue a warning when compiling nim doc strutils.nim, that means you'd systematically get a warning whenever an iterator overloads a proc, regardless whether you use the link split_

the warning can make sense if all the context is known at the time of nimdoc, eg if see also split_ is used within strutils (pointing to same module), or, maybe in other cases with nim doc --project

in that case we will also select the "winning" overload according to fixed priority "procs > iterators > templates" because people normally imply procs

that's fine, and a rare case anyways (iterator overloading a proc), and if symbols are listed alphabetically regardless of kind, it's really not a big problem

split(string, char, int)_ anyway

for the explicit overload syntax, we could require the symbol kind, eg:

iterator split(string, char, int)_

@a-mr
Copy link

a-mr commented Jul 30, 2021

at what time do you issue a warning?

I've thought that through only for local resolution — I will issue a warning in rst 2nd pass (finishGenerateDoc). Regarding different-module symbol I'm not yet sure but I guess it does not matter, I will need to retrieve all the symbols of that module anyway so it can be done at any stage (including finishGenerateDoc)

@timotheecour
Copy link
Member Author

timotheecour commented Jul 30, 2021

oh actually there's a really simple solution to this problem: we can rely on index files (or reuse/augment the one that's generated via nim doc which IIRC implies --index; preferably json format) to list all the references there as well as the anchors generated; then nim buildIndex (called after all the nim doc foo commands) can issue warnings for each problem found:

# foo.nim:
## see bar.baz_
## see bar.baz2_
## see bar.baz2(int, float)_

# bar.nim:
iterator baz(): int = discard
proc baz(): int = discard
nim doc foo # generates htmldocs/foo.idx
nim doc bar # generates htmldocs/bar.idx
nim buildIndex # generates htmldocs/theindex.html from all the idx files

nim buildIndex will now report the following warnings:

  • bar.baz_ should be disambiguated as either iterator bar.baz_ or proc bar.baz_
  • bar.baz2_ link reference invalid, did you mean xxx ?
  • bar.baz2(int, float)_ link reference invalid, did you mean xxx ?

note 1

nim buildIndex is an existing command that's already called to generate theindex.html; this simply enhances what it does to also report link inconsistencies

note 2

it will work the same regardless of separate compilation (nim doc foo followed by nim doc bar) or joint (via nim doc --project all), and regardless of same module references vs cross-modules references; no need to distinguish those cases (simpler!)

note 3

this will allow fixing nim-lang/Nim#16337 by reporting all broken links automatically

@a-mr
Copy link

a-mr commented Jul 30, 2021

sounds like a great idea. But I cannot figure out how to use this command:

$ bin/nim buildIndex lib/pure 
Hint: used config file '/home/amakarov/activity-shared/Nim/config/nim.cfg' [Conf]
Hint: used config file '/home/amakarov/activity-shared/Nim/config/config.nims' [Conf]
Hint: used config file '/home/amakarov/activity-shared/Nim/config/nim.cfg' [Conf]
Hint: used config file '/home/amakarov/activity-shared/Nim/config/config.nims' [Conf]
Hint: used config file '/home/amakarov/activity-shared/Nim/config/nimdoc.cfg' [Conf]
fatal.nim(53)            sysFatal
Error: unhandled exception: options.nim(651, 3) `not conf.outFile.isEmpty`  [AssertionDefect]

docgen.rst says it accepts a directory.

I know alternative: for the compiler we can run bin/nim doc --project --index:on compiler/main.nim but what about stdlib...

@timotheecour
Copy link
Member Author

for separate nim doc invocations, see how it's done in kochdocs; this works (but needs better documentation, PR welcome):

nim doc --outdir:/tmp/d19 --index lib/pure/strutils
nim doc --outdir:/tmp/d19 --index lib/pure/os
nim buildIndex -o:/tmp/d19/theindex.html /tmp/d19

@a-mr
Copy link

a-mr commented Aug 2, 2021

During implementation/writing a test I found a problem with interpretation of spaces and case sensitivity.
I'm implementing it by adding additional anchors from docgen.nim.

According to RST spec, links are case-insensitive and any number of whitespace is equivalent to one space character. Current rst.nim takes that into account.

This opens a question how to deal with links like SortOrder_ and `fill(openArray[T], T)`_. Changing openArray -> openarray is harmless but it's not so for SortOrder -> sortorder and T -> t. Also what to do with a space after comma (openArray, T)?

My current plan is to normalize both references from the rst.nim side and generated anchors from docgen.nim side in this way:

  1. any spaces before and after punctuation — ,, (, ), [, ] — are deleted.
  2. tokens inside this references/anchors are normalized the Nim way — first char is preserved — in these cases:
    • if they are at the very 1st char of a reference/anchor
    • if they are after keywords like proc and const
    • if they are after the mentioned punctuation signs

So the examples above become Sortorder and fill(openarray[T],T).
Both points are not exactly RST spec-compliant but what else can be done?

@a-mr
Copy link

a-mr commented Aug 2, 2021

It seems that having a separate set for docgen-generated anchors in rst.nim is the way to go. Then we can match references with anchors by dedicated rules independently.

@timotheecour
Copy link
Member Author

timotheecour commented Aug 2, 2021

can you clarfiy?

will the following work?

type SortOrder* = enum k0, k1
proc sortOrder*() = discard
proc foo*[T](a: int, sortOrder: SortOrder) = discard
proc foo*[T](a: int, sortOrder: SortOrder, c: int) = discard
proc foo*() =
  ## See: SortOrder_ (will link to type)
  ## See: sortOrder_ (will link to proc)
  ## See: foo_ (will link to 1st foo overload)
  ## See: `foo[T](int,SortOrder,int)` (will link to 2nd overload)

can we preserve case as describe above, even at expense of RST compliance (makes it easier to copy paste from code)? what would break?

if links generated cannot distinguish between SortOrder_ and sortOrder_, we can think about an internal encoding for caps, which would be hidden from users (ie references in docs would not need the encoding, it'd be generated automatically)

sortOrder_ => sortorder_
SortOrder_ => ^sortorder_

but i don't think it's needed, is it?

if you have a WIP PR i can test against, this would clarify things

@a-mr
Copy link

a-mr commented Aug 2, 2021

yes, it will work more or less. sortOrder and SortOrder will resolve OK. The other 2 cases with work the following nuances:

  • foo_ is actually referring not to first overload but to whole group of procs foo (it will add a wrapper group <div id="foo-procs-all">, likewise for all same-name overloads)
  • last case should be spelled as `foo(int,SortOrded,int)`_

If you meant using T inside input parameters like

proc foo*[T](a: int, sortOrder: SortOrder, c: T) = discard

then the link should be spelled as `foo(int,SortOrder,T)`_ without any [T]. It's because that's how anchors are preserved, you can check that the generated anchor id = foo,int,SortOrder,T. So T in parameter is just inserted literally. I guess this cannot be changed without breaking reverse compatibility or introducing extra level of wrappers.

Regarding copy-pasting: I don't expect supporting parameters names since by the same reason they are redundant, wdyt?

I'm not sure what to do with [T], I did not expect to add any semantic checking in incorrect cases like `proc foo*[U](T)`_. So everything I can do for this support is to delete [T] after after foo* in my internal anchor.

I'll try to send the PR tomorrow.

@timotheecour
Copy link
Member Author

can foo,int,SortOrder,T introduce ambiguities?
eg:

proc foo[T](a: int, b: SortOrder, c: T)
proc foo[T](a: int, b: SortOrder[T])

future work:

  • osproc.nim#L1230 has:
  ## * `posix_utils.sendSignal(pid: Pid, signal: int) <posix_utils.html#sendSignal,Pid,int>`_

which is not DRY; I wonder if we can support writing:

  ## * posix_utils.sendSignal_

and then show in the docs a correct link along with this link name:

proc posix_utils.sendSignal(pid: Pid, signal: int)

this probably require a 2-pass algorithm, hence "future work"

@a-mr
Copy link

a-mr commented Aug 8, 2021

can foo,int,SortOrder,T introduce ambiguities?
e.g.

No, it cannot. Only the first example has this signature, for second one it will be foo,int,SortOrder[T].

which is not DRY; I wonder if we can support writing:

Yes the current plan (for the second part of work) is to support both variants:

  • posix_utils.sendSignal_
  • `posix_utils.sendSignal(Pid,int)`_

and then show in the docs a correct link along with this link name:

A good news is that current idx files already preserve them so it shouldn't be hard to implement.

I think you are right, it's the most logical behaviour.

@a-mr
Copy link

a-mr commented Aug 11, 2021

@timotheecour
Current plan is to set a full link name like you wrote above:

proc sendSignal(pid: Pid, signal: int)

What if we want to provide our own name for the link like:

this Posix function

?
Is there any syntax expected for that?

@timotheecour
Copy link
Member Author

Is there any syntax expected for that?

ya this should be discussed (but the preferred/encouraged way should be the simplest See posix_utils.sendSignal_)

some options:

  # this should work but not recommended (brittle links etc)
  ## See `this <posix_utils.html#sendSignal,Pid,int>`_

  # maybe this? (general group + individual overload)
  ## See `this <posix_utils.sendSignal_>`_ posix function.
  ## See `this <posix_utils.sendSignal(Pid,int)_>`_ posix function.

  # or this? (general group + individual overload)
  ## See [this](posix_utils.sendSignal_) posix function.
  ## See [this](posix_utils.sendSignal(Pid,int)_) posix function.

@a-mr
Copy link

a-mr commented Aug 12, 2021

@timotheecour
2 more suggestions:

  1. What about this DRY but limited form (without arbitrary names) with using " to delimit a part for inclusion into link text:

    `"proc posix_utils.sendSignal" (Pid, int)`_ -> proc posix_utils.sendSignal
    `proc posix_utils."sendSignal" (Pid, int)`_ -> sendSignal
    `"proc " posix_utils."sendSignal" (Pid, int)`_ -> proc sendSignal
    ?

  2. What if we specify proc after the name as a request to use this name literally:
    `posix_utils.sendSignal proc`_ -> posix_utils.sendSignal proc
    Format "short name + proc" is widespread in os.nim and other files in Nim repository, so it makes sense to add special syntax for it. And we can still show a tip with full name when hovering over the link.

@timotheecour
Copy link
Member Author

What about this DRY but limited form (without arbitrary names) with using " to delimit a part for inclusion into link text:

i don't really like these; if i decide to use a custom link name, I shouldn't be limited to using the symbol name (otherwise might as well stick to the default See posix_utils.sendSignal_)

What if we specify proc after the name as a request to use this name literally:

is the problem you're trying to solve to allow a syntax to show the short-form (without the full declaration), or to allow showing a custom link name?

I think we should allow writing arbitrary custom link name, which neither 1 nor 2 do (but by default encourage the simplest See posix_utils.sendSignal_ which auto-generates link name from declaration, showing nb of overloads if there are overloads, else showing the resolved declaration if there is a single symbol)

Format "short name + proc" is widespread in os.nim and other files in Nim repository

But See posix_utils.sendSignal_ could be used in each such instance, advantageously.

So all that's needed is to agree on a syntax to allow arbitrary custom link names; i've suggested one in #125 (comment) but feel free to suggest alternatives

@a-mr
Copy link

a-mr commented Aug 12, 2021

is the problem you're trying to solve to allow a syntax to show the short-form

yes, in most cases the full form is overwhelming IMHO. Consider this modified example from os.nim (splitPath):

  ## See also: BEFORE
  ## * `joinPath(head, tail) proc <#joinPath,string,string>`_
  ## * `joinPath(varargs) proc <#joinPath,varargs[string]>`_
  ## * `/ proc <#/,string,string>`_
  ## * `/../ proc <#/../,string,string>`_
  ## * `relativePath proc <#relativePath,string,string>`_
  ##
  ## See also: AFTER
  ## * `joinPath(string, string)`_
  ## * `joinPath(varargs[string])`_
  ## * `/`_
  ## * `/../`_
  ## * relativePath_

image

To my taste the AFTER case is unnecessarily detailed.

@timotheecour
Copy link
Member Author

timotheecour commented Aug 12, 2021

3 possibilities:

  • how about controlling in the UI whether the long or short form is shown (just like we have a switch in UI for dark mode vs light mode)
  • or, use a similar technique as we use for expanding the pragmas by clicking on it
  • or just simply always show the short-form for links

the main point is to allow keeping the nice, short form for all links and not have to introduce a decision making each and every time we write a link to a proc in docs

@a-mr
Copy link

a-mr commented Aug 28, 2021

In the updated edition of PR link text is what is actually displayed:

image

Referencing by just function names is also working now:

* `joinPath(head, tail)`_

image

Full signature is displayed as a tooltip on hovering over the link.

@a-mr
Copy link

a-mr commented Nov 6, 2021

There is one more design decision regarding cross-module doc links: whether .idx should be read implicitly or not.

If we input `mod1.proc1`_ link then the mod1.idx module should be loaded.

I'd prefer to introduce explicit syntax for that:

.. import:: mod1.idx

This is a new dedicated RST directive import here. It is supposed to be placed at the 1st doc comment in .nim file or somewhere in the beginning of .rst file. Then all symbols exported from mod1.nim (and recorded in mod1.idx) will be available for referencing in the current .nim or .rst file.

@a-mr
Copy link

a-mr commented Sep 17, 2022

Updated plan

  1. No separate stage for link checking (like nim buildIndex proposed above) is required.
  2. No explicit import for .idx files.

Instead:

  1. doc processing will be done in 2 runs — a) with .idx generation b) with link resolution.
  2. we'll have a rule that specially formatted link string like module: linkTarget (note the colon :) will load a <module>.idx file implicitly. This will work for .md/.rst files also, so e.g. it will be possible to reference section headings (and other anchors) in documentation between .md files in the simple format.

Using a recently introduced Markdown syntax [link string] (instead of RST `link string`_) the referencing will look like:

Ref. [manual: Lexical Analysis]
  -- reference section "Lexical Analysis" of manual.md from any `.nim/.md` file
Ref. [posix_utils: sendSignal(Pid, int)] 
  -- reference the procedure of module posix_utils.nim from any file

In this example manual.idx and strutils.idx will be sought for in output dir (default htmldocs or one configured manually in cmdline).
During the first run, when .idx files are not yet generated, nim doc/md2html will emit warnings. To suppress them a new option --noCheckLinks will be introduced. So 2 runs will typically look like:

  • nim doc --noCheckLinks --index:on <file> for all files in the project
  • nim doc <file> for all files in the project

In practice double running should not be too bothersome because all .idx can be generated once (e.g. in Nim repository — by ./koch docs ) and then one can write docs doing referencing with link checking using existing .idx files.

@Araq
Copy link
Member

Araq commented Sep 21, 2022

Think about the consequences regarding algorithmic complexity. It used to be the case that index generation was done without .idx files entirely but it caused an O(n^2) algorithm or even some exponential explosion.

a-mr added a commit to a-mr/Nim that referenced this issue Dec 1, 2022
Fully implements nim-lang/RFCs#125
Follow-up of: nim-lang#18642 (for internal links)
and nim-lang#20127.

Overview
--------

Explicit import-like directive is required, called `.. importdoc::`.
(the syntax is % RST, Markdown will use it for a while).

Then one can reference any symbols/headings/anchors, as if they
were in the local file (but they will be prefixed with a module name
or markup document in link text).
It's possible to reference anything from anywhere (any direction
in `.nim`/`.md`/`.rst` files).

See `doc/docgen.md` for full description.

Working is based on `.idx` files, hence one needs to generate
all `.idx` beforehand. A dedicated option `--index:only` is introduced
(and a separate stage for `--index:only` is added to `kochdocs.nim`).

Performance note
----------------

Full run for `./koch docs` now takes 185% of the time before this PR.
(After: 315 s, before: 170 s on my PC).
All the time seems to be spent on `--index:only` run, which takes
almost as much (85%) of normal doc run -- it seems that most time
is spent on file parsing, turning off HTML generation phase has not
helped much.
(One could avoid it by specifying list of files that can be referenced
and pre-processing only them. But it can become error-prone and I assume
that these linke will be **everywhere** in the repository anyway,
especially considering nim-lang/RFCs#478.
So every `.nim`/`.md` file is processed for `.idx` first).

But that's all without significant part of repository converted to
cross-module auto links. To estimate impact I checked the time for
`doc`ing a few files (after all indexes have been generated), and
everywhere difference was **negligible**.
E.g. for `lib/std/private/osfiles.nim` that `importdoc`s large
`os.idx` and hence should have been a case with relatively large
performance impact, but:

* After: 0.59 s.
* Before: 0.59 s.

So Nim compiler works so slow that doc part basically does not matter :-)

Testing
-------

1) added `extlinks` test to `nimdoc/`
2) checked that `theindex.html` is still correct
2) fixed broken auto-links for modules that were derived from `os.nim`
   by adding appropriate ``importdoc``

Implementation note
-------------------

Parsing and formating of `.idx` entries is moved into a dedicated
`rstidx.nim` module from `rstgen.nim`.

`.idx` file format changed:

* fields are not escaped in most cases because we need original
  strings for referencing, not HTML ones
  (the exception is linkTitle for titles and headings).
  Escaping happens later -- on the stage of `rstgen` buildIndex, etc.
* all lines have fixed number of columns 6
* added discriminator tag as a first column,
  it always allows distinguish Nim/markup entries, titles/headings, etc.
  `rstgen` does not rely any more (in most cases) on ad-hoc logic
  to determine what type each entry is.
* there is now always a title entry added at the first line.
* add a line number as 6th column
* linkTitle (4th) column has a different format: before it was like
  `module: funcName()`, now it's `proc funcName()`.
  (This format is also propagated to `theindex.html` and search results,
  I kept it that way since I like it more though it's discussible.)
  This column is what used for Nim symbols resolution.
* also changed details on column format for headings and titles:
  "keyword" is original, "linkTitle" is HTML one
a-mr added a commit to a-mr/Nim that referenced this issue Dec 16, 2022
Fully implements nim-lang/RFCs#125
Follow-up of: nim-lang#18642 (for internal links)
and nim-lang#20127.

Overview
--------

Explicit import-like directive is required, called `.. importdoc::`.
(the syntax is % RST, Markdown will use it for a while).

Then one can reference any symbols/headings/anchors, as if they
were in the local file (but they will be prefixed with a module name
or markup document in link text).
It's possible to reference anything from anywhere (any direction
in `.nim`/`.md`/`.rst` files).

See `doc/docgen.md` for full description.

Working is based on `.idx` files, hence one needs to generate
all `.idx` beforehand. A dedicated option `--index:only` is introduced
(and a separate stage for `--index:only` is added to `kochdocs.nim`).

Performance note
----------------

Full run for `./koch docs` now takes 185% of the time before this PR.
(After: 315 s, before: 170 s on my PC).
All the time seems to be spent on `--index:only` run, which takes
almost as much (85%) of normal doc run -- it seems that most time
is spent on file parsing, turning off HTML generation phase has not
helped much.
(One could avoid it by specifying list of files that can be referenced
and pre-processing only them. But it can become error-prone and I assume
that these linke will be **everywhere** in the repository anyway,
especially considering nim-lang/RFCs#478.
So every `.nim`/`.md` file is processed for `.idx` first).

But that's all without significant part of repository converted to
cross-module auto links. To estimate impact I checked the time for
`doc`ing a few files (after all indexes have been generated), and
everywhere difference was **negligible**.
E.g. for `lib/std/private/osfiles.nim` that `importdoc`s large
`os.idx` and hence should have been a case with relatively large
performance impact, but:

* After: 0.59 s.
* Before: 0.59 s.

So Nim compiler works so slow that doc part basically does not matter :-)

Testing
-------

1) added `extlinks` test to `nimdoc/`
2) checked that `theindex.html` is still correct
2) fixed broken auto-links for modules that were derived from `os.nim`
   by adding appropriate ``importdoc``

Implementation note
-------------------

Parsing and formating of `.idx` entries is moved into a dedicated
`rstidx.nim` module from `rstgen.nim`.

`.idx` file format changed:

* fields are not escaped in most cases because we need original
  strings for referencing, not HTML ones
  (the exception is linkTitle for titles and headings).
  Escaping happens later -- on the stage of `rstgen` buildIndex, etc.
* all lines have fixed number of columns 6
* added discriminator tag as a first column,
  it always allows distinguish Nim/markup entries, titles/headings, etc.
  `rstgen` does not rely any more (in most cases) on ad-hoc logic
  to determine what type each entry is.
* there is now always a title entry added at the first line.
* add a line number as 6th column
* linkTitle (4th) column has a different format: before it was like
  `module: funcName()`, now it's `proc funcName()`.
  (This format is also propagated to `theindex.html` and search results,
  I kept it that way since I like it more though it's discussible.)
  This column is what used for Nim symbols resolution.
* also changed details on column format for headings and titles:
  "keyword" is original, "linkTitle" is HTML one
Varriount pushed a commit to nim-lang/Nim that referenced this issue Jan 4, 2023
* docgen: implement cross-document links

Fully implements nim-lang/RFCs#125
Follow-up of: #18642 (for internal links)
and #20127.

Overview
--------

Explicit import-like directive is required, called `.. importdoc::`.
(the syntax is % RST, Markdown will use it for a while).

Then one can reference any symbols/headings/anchors, as if they
were in the local file (but they will be prefixed with a module name
or markup document in link text).
It's possible to reference anything from anywhere (any direction
in `.nim`/`.md`/`.rst` files).

See `doc/docgen.md` for full description.

Working is based on `.idx` files, hence one needs to generate
all `.idx` beforehand. A dedicated option `--index:only` is introduced
(and a separate stage for `--index:only` is added to `kochdocs.nim`).

Performance note
----------------

Full run for `./koch docs` now takes 185% of the time before this PR.
(After: 315 s, before: 170 s on my PC).
All the time seems to be spent on `--index:only` run, which takes
almost as much (85%) of normal doc run -- it seems that most time
is spent on file parsing, turning off HTML generation phase has not
helped much.
(One could avoid it by specifying list of files that can be referenced
and pre-processing only them. But it can become error-prone and I assume
that these linke will be **everywhere** in the repository anyway,
especially considering nim-lang/RFCs#478.
So every `.nim`/`.md` file is processed for `.idx` first).

But that's all without significant part of repository converted to
cross-module auto links. To estimate impact I checked the time for
`doc`ing a few files (after all indexes have been generated), and
everywhere difference was **negligible**.
E.g. for `lib/std/private/osfiles.nim` that `importdoc`s large
`os.idx` and hence should have been a case with relatively large
performance impact, but:

* After: 0.59 s.
* Before: 0.59 s.

So Nim compiler works so slow that doc part basically does not matter :-)

Testing
-------

1) added `extlinks` test to `nimdoc/`
2) checked that `theindex.html` is still correct
2) fixed broken auto-links for modules that were derived from `os.nim`
   by adding appropriate ``importdoc``

Implementation note
-------------------

Parsing and formating of `.idx` entries is moved into a dedicated
`rstidx.nim` module from `rstgen.nim`.

`.idx` file format changed:

* fields are not escaped in most cases because we need original
  strings for referencing, not HTML ones
  (the exception is linkTitle for titles and headings).
  Escaping happens later -- on the stage of `rstgen` buildIndex, etc.
* all lines have fixed number of columns 6
* added discriminator tag as a first column,
  it always allows distinguish Nim/markup entries, titles/headings, etc.
  `rstgen` does not rely any more (in most cases) on ad-hoc logic
  to determine what type each entry is.
* there is now always a title entry added at the first line.
* add a line number as 6th column
* linkTitle (4th) column has a different format: before it was like
  `module: funcName()`, now it's `proc funcName()`.
  (This format is also propagated to `theindex.html` and search results,
  I kept it that way since I like it more though it's discussible.)
  This column is what used for Nim symbols resolution.
* also changed details on column format for headings and titles:
  "keyword" is original, "linkTitle" is HTML one

* fix paths on Windows + more clear code

* Update compiler/docgen.nim

Co-authored-by: Andreas Rumpf <rumpf_a@web.de>

* Handle .md and .nim paths uniformly in findRefFile

* handle titles better + more comments

* don't allow markup overwrite index title for .nim files

Co-authored-by: Andreas Rumpf <rumpf_a@web.de>
survivorm pushed a commit to survivorm/Nim that referenced this issue Feb 28, 2023
* docgen: implement cross-document links

Fully implements nim-lang/RFCs#125
Follow-up of: nim-lang#18642 (for internal links)
and nim-lang#20127.

Overview
--------

Explicit import-like directive is required, called `.. importdoc::`.
(the syntax is % RST, Markdown will use it for a while).

Then one can reference any symbols/headings/anchors, as if they
were in the local file (but they will be prefixed with a module name
or markup document in link text).
It's possible to reference anything from anywhere (any direction
in `.nim`/`.md`/`.rst` files).

See `doc/docgen.md` for full description.

Working is based on `.idx` files, hence one needs to generate
all `.idx` beforehand. A dedicated option `--index:only` is introduced
(and a separate stage for `--index:only` is added to `kochdocs.nim`).

Performance note
----------------

Full run for `./koch docs` now takes 185% of the time before this PR.
(After: 315 s, before: 170 s on my PC).
All the time seems to be spent on `--index:only` run, which takes
almost as much (85%) of normal doc run -- it seems that most time
is spent on file parsing, turning off HTML generation phase has not
helped much.
(One could avoid it by specifying list of files that can be referenced
and pre-processing only them. But it can become error-prone and I assume
that these linke will be **everywhere** in the repository anyway,
especially considering nim-lang/RFCs#478.
So every `.nim`/`.md` file is processed for `.idx` first).

But that's all without significant part of repository converted to
cross-module auto links. To estimate impact I checked the time for
`doc`ing a few files (after all indexes have been generated), and
everywhere difference was **negligible**.
E.g. for `lib/std/private/osfiles.nim` that `importdoc`s large
`os.idx` and hence should have been a case with relatively large
performance impact, but:

* After: 0.59 s.
* Before: 0.59 s.

So Nim compiler works so slow that doc part basically does not matter :-)

Testing
-------

1) added `extlinks` test to `nimdoc/`
2) checked that `theindex.html` is still correct
2) fixed broken auto-links for modules that were derived from `os.nim`
   by adding appropriate ``importdoc``

Implementation note
-------------------

Parsing and formating of `.idx` entries is moved into a dedicated
`rstidx.nim` module from `rstgen.nim`.

`.idx` file format changed:

* fields are not escaped in most cases because we need original
  strings for referencing, not HTML ones
  (the exception is linkTitle for titles and headings).
  Escaping happens later -- on the stage of `rstgen` buildIndex, etc.
* all lines have fixed number of columns 6
* added discriminator tag as a first column,
  it always allows distinguish Nim/markup entries, titles/headings, etc.
  `rstgen` does not rely any more (in most cases) on ad-hoc logic
  to determine what type each entry is.
* there is now always a title entry added at the first line.
* add a line number as 6th column
* linkTitle (4th) column has a different format: before it was like
  `module: funcName()`, now it's `proc funcName()`.
  (This format is also propagated to `theindex.html` and search results,
  I kept it that way since I like it more though it's discussible.)
  This column is what used for Nim symbols resolution.
* also changed details on column format for headings and titles:
  "keyword" is original, "linkTitle" is HTML one

* fix paths on Windows + more clear code

* Update compiler/docgen.nim

Co-authored-by: Andreas Rumpf <rumpf_a@web.de>

* Handle .md and .nim paths uniformly in findRefFile

* handle titles better + more comments

* don't allow markup overwrite index title for .nim files

Co-authored-by: Andreas Rumpf <rumpf_a@web.de>
capocasa pushed a commit to capocasa/Nim that referenced this issue Mar 31, 2023
* docgen: implement cross-document links

Fully implements nim-lang/RFCs#125
Follow-up of: nim-lang#18642 (for internal links)
and nim-lang#20127.

Overview
--------

Explicit import-like directive is required, called `.. importdoc::`.
(the syntax is % RST, Markdown will use it for a while).

Then one can reference any symbols/headings/anchors, as if they
were in the local file (but they will be prefixed with a module name
or markup document in link text).
It's possible to reference anything from anywhere (any direction
in `.nim`/`.md`/`.rst` files).

See `doc/docgen.md` for full description.

Working is based on `.idx` files, hence one needs to generate
all `.idx` beforehand. A dedicated option `--index:only` is introduced
(and a separate stage for `--index:only` is added to `kochdocs.nim`).

Performance note
----------------

Full run for `./koch docs` now takes 185% of the time before this PR.
(After: 315 s, before: 170 s on my PC).
All the time seems to be spent on `--index:only` run, which takes
almost as much (85%) of normal doc run -- it seems that most time
is spent on file parsing, turning off HTML generation phase has not
helped much.
(One could avoid it by specifying list of files that can be referenced
and pre-processing only them. But it can become error-prone and I assume
that these linke will be **everywhere** in the repository anyway,
especially considering nim-lang/RFCs#478.
So every `.nim`/`.md` file is processed for `.idx` first).

But that's all without significant part of repository converted to
cross-module auto links. To estimate impact I checked the time for
`doc`ing a few files (after all indexes have been generated), and
everywhere difference was **negligible**.
E.g. for `lib/std/private/osfiles.nim` that `importdoc`s large
`os.idx` and hence should have been a case with relatively large
performance impact, but:

* After: 0.59 s.
* Before: 0.59 s.

So Nim compiler works so slow that doc part basically does not matter :-)

Testing
-------

1) added `extlinks` test to `nimdoc/`
2) checked that `theindex.html` is still correct
2) fixed broken auto-links for modules that were derived from `os.nim`
   by adding appropriate ``importdoc``

Implementation note
-------------------

Parsing and formating of `.idx` entries is moved into a dedicated
`rstidx.nim` module from `rstgen.nim`.

`.idx` file format changed:

* fields are not escaped in most cases because we need original
  strings for referencing, not HTML ones
  (the exception is linkTitle for titles and headings).
  Escaping happens later -- on the stage of `rstgen` buildIndex, etc.
* all lines have fixed number of columns 6
* added discriminator tag as a first column,
  it always allows distinguish Nim/markup entries, titles/headings, etc.
  `rstgen` does not rely any more (in most cases) on ad-hoc logic
  to determine what type each entry is.
* there is now always a title entry added at the first line.
* add a line number as 6th column
* linkTitle (4th) column has a different format: before it was like
  `module: funcName()`, now it's `proc funcName()`.
  (This format is also propagated to `theindex.html` and search results,
  I kept it that way since I like it more though it's discussible.)
  This column is what used for Nim symbols resolution.
* also changed details on column format for headings and titles:
  "keyword" is original, "linkTitle" is HTML one

* fix paths on Windows + more clear code

* Update compiler/docgen.nim

Co-authored-by: Andreas Rumpf <rumpf_a@web.de>

* Handle .md and .nim paths uniformly in findRefFile

* handle titles better + more comments

* don't allow markup overwrite index title for .nim files

Co-authored-by: Andreas Rumpf <rumpf_a@web.de>
bung87 pushed a commit to bung87/Nim that referenced this issue Jul 29, 2023
* docgen: implement cross-document links

Fully implements nim-lang/RFCs#125
Follow-up of: nim-lang#18642 (for internal links)
and nim-lang#20127.

Overview
--------

Explicit import-like directive is required, called `.. importdoc::`.
(the syntax is % RST, Markdown will use it for a while).

Then one can reference any symbols/headings/anchors, as if they
were in the local file (but they will be prefixed with a module name
or markup document in link text).
It's possible to reference anything from anywhere (any direction
in `.nim`/`.md`/`.rst` files).

See `doc/docgen.md` for full description.

Working is based on `.idx` files, hence one needs to generate
all `.idx` beforehand. A dedicated option `--index:only` is introduced
(and a separate stage for `--index:only` is added to `kochdocs.nim`).

Performance note
----------------

Full run for `./koch docs` now takes 185% of the time before this PR.
(After: 315 s, before: 170 s on my PC).
All the time seems to be spent on `--index:only` run, which takes
almost as much (85%) of normal doc run -- it seems that most time
is spent on file parsing, turning off HTML generation phase has not
helped much.
(One could avoid it by specifying list of files that can be referenced
and pre-processing only them. But it can become error-prone and I assume
that these linke will be **everywhere** in the repository anyway,
especially considering nim-lang/RFCs#478.
So every `.nim`/`.md` file is processed for `.idx` first).

But that's all without significant part of repository converted to
cross-module auto links. To estimate impact I checked the time for
`doc`ing a few files (after all indexes have been generated), and
everywhere difference was **negligible**.
E.g. for `lib/std/private/osfiles.nim` that `importdoc`s large
`os.idx` and hence should have been a case with relatively large
performance impact, but:

* After: 0.59 s.
* Before: 0.59 s.

So Nim compiler works so slow that doc part basically does not matter :-)

Testing
-------

1) added `extlinks` test to `nimdoc/`
2) checked that `theindex.html` is still correct
2) fixed broken auto-links for modules that were derived from `os.nim`
   by adding appropriate ``importdoc``

Implementation note
-------------------

Parsing and formating of `.idx` entries is moved into a dedicated
`rstidx.nim` module from `rstgen.nim`.

`.idx` file format changed:

* fields are not escaped in most cases because we need original
  strings for referencing, not HTML ones
  (the exception is linkTitle for titles and headings).
  Escaping happens later -- on the stage of `rstgen` buildIndex, etc.
* all lines have fixed number of columns 6
* added discriminator tag as a first column,
  it always allows distinguish Nim/markup entries, titles/headings, etc.
  `rstgen` does not rely any more (in most cases) on ad-hoc logic
  to determine what type each entry is.
* there is now always a title entry added at the first line.
* add a line number as 6th column
* linkTitle (4th) column has a different format: before it was like
  `module: funcName()`, now it's `proc funcName()`.
  (This format is also propagated to `theindex.html` and search results,
  I kept it that way since I like it more though it's discussible.)
  This column is what used for Nim symbols resolution.
* also changed details on column format for headings and titles:
  "keyword" is original, "linkTitle" is HTML one

* fix paths on Windows + more clear code

* Update compiler/docgen.nim

Co-authored-by: Andreas Rumpf <rumpf_a@web.de>

* Handle .md and .nim paths uniformly in findRefFile

* handle titles better + more comments

* don't allow markup overwrite index title for .nim files

Co-authored-by: Andreas Rumpf <rumpf_a@web.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants