Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docgen: implement cross-document links #20990

Merged
merged 6 commits into from
Jan 4, 2023

Conversation

a-mr
Copy link
Contributor

@a-mr a-mr commented Dec 1, 2022

This proposal fully implements nim-lang/RFCs#125 Follow-up of: #18642 (for internal links) and #20127.

Overview

Explicit import-like directive is required, called .. importdoc::. (the syntax is RST, Markdown will use it for a while).

Then one can reference any symbols/headings/anchors, as if they were in the local file (but they will be prefixed with a module name or markup document in link text).
It's possible to reference anything from anywhere (any direction in .nim/.md/.rst files).

See doc/docgen.md for full description.

Working is based on .idx files, hence one needs to generate all .idx beforehand. A dedicated option --index:only is introduced (and a separate stage for --index:only is added to kochdocs.nim).

Performance note

Full run for ./koch docs now takes 185% of the time before this PR. (After: 315 s, before: 170 s on my PC).
All the time seems to be spent on --index:only run, which takes almost as much (85%) of normal doc run -- it seems that most time is spent on file parsing, turning off HTML generation phase has not helped much.
(One could avoid it by specifying list of files that can be referenced and pre-processing only them. But it can become error-prone and I assume that these linke will be everywhere in the repository anyway, especially considering nim-lang/RFCs#478. So every .nim/.md file is processed for .idx first).

But that's all without significant part of repository converted to cross-module auto links. To estimate impact I checked the time for docing a few files (after all indexes have been generated), and everywhere difference was negligible.
E.g. for lib/std/private/osfiles.nim that importdocs large os.idx and hence should have been a case with relatively large performance impact, but:

  • After: 0.59 s.
  • Before: 0.59 s.

So Nim compiler works so slow that doc part basically does not matter :-)

Testing

  1. added extlinks test to nimdoc/
  2. checked that theindex.html is still correct
  3. fixed broken auto-links for modules that were derived from os.nim
    by adding appropriate importdoc

Implementation note

Parsing and formating of .idx entries is moved into a dedicated rstidx.nim module from rstgen.nim.

.idx file format changed:

  • fields are not escaped in most cases because we need original strings for referencing, not HTML ones (the exception is linkTitle for titles and headings). Escaping happens later -- on the stage of rstgen buildIndex, etc.
  • all lines have fixed number of columns 6
  • added discriminator tag as a first column, it always allows distinguish Nim/markup entries, titles/headings, etc. rstgen does not rely any more (in most cases) on ad-hoc logic to determine what type each entry is.
  • there is now always a title entry added at the first line.
  • add a line number as 6th column
  • linkTitle (4th) column has a different format: before it was like module: funcName(), now it's proc funcName(). (This format is also propagated to theindex.html and search results, I kept it that way since I like it more though it's discussible.) This column is what used for Nim symbols resolution.
  • also changed details on column format for headings and titles: "keyword" is original, "linkTitle" is HTML one

compiler/docgen.nim Outdated Show resolved Hide resolved
a-mr and others added 4 commits December 17, 2022 00:34
Fully implements nim-lang/RFCs#125
Follow-up of: nim-lang#18642 (for internal links)
and nim-lang#20127.

Overview
--------

Explicit import-like directive is required, called `.. importdoc::`.
(the syntax is % RST, Markdown will use it for a while).

Then one can reference any symbols/headings/anchors, as if they
were in the local file (but they will be prefixed with a module name
or markup document in link text).
It's possible to reference anything from anywhere (any direction
in `.nim`/`.md`/`.rst` files).

See `doc/docgen.md` for full description.

Working is based on `.idx` files, hence one needs to generate
all `.idx` beforehand. A dedicated option `--index:only` is introduced
(and a separate stage for `--index:only` is added to `kochdocs.nim`).

Performance note
----------------

Full run for `./koch docs` now takes 185% of the time before this PR.
(After: 315 s, before: 170 s on my PC).
All the time seems to be spent on `--index:only` run, which takes
almost as much (85%) of normal doc run -- it seems that most time
is spent on file parsing, turning off HTML generation phase has not
helped much.
(One could avoid it by specifying list of files that can be referenced
and pre-processing only them. But it can become error-prone and I assume
that these linke will be **everywhere** in the repository anyway,
especially considering nim-lang/RFCs#478.
So every `.nim`/`.md` file is processed for `.idx` first).

But that's all without significant part of repository converted to
cross-module auto links. To estimate impact I checked the time for
`doc`ing a few files (after all indexes have been generated), and
everywhere difference was **negligible**.
E.g. for `lib/std/private/osfiles.nim` that `importdoc`s large
`os.idx` and hence should have been a case with relatively large
performance impact, but:

* After: 0.59 s.
* Before: 0.59 s.

So Nim compiler works so slow that doc part basically does not matter :-)

Testing
-------

1) added `extlinks` test to `nimdoc/`
2) checked that `theindex.html` is still correct
2) fixed broken auto-links for modules that were derived from `os.nim`
   by adding appropriate ``importdoc``

Implementation note
-------------------

Parsing and formating of `.idx` entries is moved into a dedicated
`rstidx.nim` module from `rstgen.nim`.

`.idx` file format changed:

* fields are not escaped in most cases because we need original
  strings for referencing, not HTML ones
  (the exception is linkTitle for titles and headings).
  Escaping happens later -- on the stage of `rstgen` buildIndex, etc.
* all lines have fixed number of columns 6
* added discriminator tag as a first column,
  it always allows distinguish Nim/markup entries, titles/headings, etc.
  `rstgen` does not rely any more (in most cases) on ad-hoc logic
  to determine what type each entry is.
* there is now always a title entry added at the first line.
* add a line number as 6th column
* linkTitle (4th) column has a different format: before it was like
  `module: funcName()`, now it's `proc funcName()`.
  (This format is also propagated to `theindex.html` and search results,
  I kept it that way since I like it more though it's discussible.)
  This column is what used for Nim symbols resolution.
* also changed details on column format for headings and titles:
  "keyword" is original, "linkTitle" is HTML one
Co-authored-by: Andreas Rumpf <rumpf_a@web.de>
@a-mr a-mr force-pushed the cross-document-concise-links branch from aeaa544 to 6c68c65 Compare December 16, 2022 21:36
@Varriount
Copy link
Contributor

@a-mr Any further work required for this?

@a-mr
Copy link
Contributor Author

a-mr commented Dec 23, 2022

@Varriount
No, that was the last (mostly cosmetic) change I wanted to add.

@Varriount Varriount merged commit 2620da9 into nim-lang:devel Jan 4, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jan 4, 2023

Thanks for your hard work on this PR!
The lines below are statistics of the Nim compiler built from 2620da9

Hint: mm: orc; opt: speed; options: -d:release
166134 lines; 8.488s; 611.559MiB peakmem

survivorm pushed a commit to survivorm/Nim that referenced this pull request Feb 28, 2023
* docgen: implement cross-document links

Fully implements nim-lang/RFCs#125
Follow-up of: nim-lang#18642 (for internal links)
and nim-lang#20127.

Overview
--------

Explicit import-like directive is required, called `.. importdoc::`.
(the syntax is % RST, Markdown will use it for a while).

Then one can reference any symbols/headings/anchors, as if they
were in the local file (but they will be prefixed with a module name
or markup document in link text).
It's possible to reference anything from anywhere (any direction
in `.nim`/`.md`/`.rst` files).

See `doc/docgen.md` for full description.

Working is based on `.idx` files, hence one needs to generate
all `.idx` beforehand. A dedicated option `--index:only` is introduced
(and a separate stage for `--index:only` is added to `kochdocs.nim`).

Performance note
----------------

Full run for `./koch docs` now takes 185% of the time before this PR.
(After: 315 s, before: 170 s on my PC).
All the time seems to be spent on `--index:only` run, which takes
almost as much (85%) of normal doc run -- it seems that most time
is spent on file parsing, turning off HTML generation phase has not
helped much.
(One could avoid it by specifying list of files that can be referenced
and pre-processing only them. But it can become error-prone and I assume
that these linke will be **everywhere** in the repository anyway,
especially considering nim-lang/RFCs#478.
So every `.nim`/`.md` file is processed for `.idx` first).

But that's all without significant part of repository converted to
cross-module auto links. To estimate impact I checked the time for
`doc`ing a few files (after all indexes have been generated), and
everywhere difference was **negligible**.
E.g. for `lib/std/private/osfiles.nim` that `importdoc`s large
`os.idx` and hence should have been a case with relatively large
performance impact, but:

* After: 0.59 s.
* Before: 0.59 s.

So Nim compiler works so slow that doc part basically does not matter :-)

Testing
-------

1) added `extlinks` test to `nimdoc/`
2) checked that `theindex.html` is still correct
2) fixed broken auto-links for modules that were derived from `os.nim`
   by adding appropriate ``importdoc``

Implementation note
-------------------

Parsing and formating of `.idx` entries is moved into a dedicated
`rstidx.nim` module from `rstgen.nim`.

`.idx` file format changed:

* fields are not escaped in most cases because we need original
  strings for referencing, not HTML ones
  (the exception is linkTitle for titles and headings).
  Escaping happens later -- on the stage of `rstgen` buildIndex, etc.
* all lines have fixed number of columns 6
* added discriminator tag as a first column,
  it always allows distinguish Nim/markup entries, titles/headings, etc.
  `rstgen` does not rely any more (in most cases) on ad-hoc logic
  to determine what type each entry is.
* there is now always a title entry added at the first line.
* add a line number as 6th column
* linkTitle (4th) column has a different format: before it was like
  `module: funcName()`, now it's `proc funcName()`.
  (This format is also propagated to `theindex.html` and search results,
  I kept it that way since I like it more though it's discussible.)
  This column is what used for Nim symbols resolution.
* also changed details on column format for headings and titles:
  "keyword" is original, "linkTitle" is HTML one

* fix paths on Windows + more clear code

* Update compiler/docgen.nim

Co-authored-by: Andreas Rumpf <rumpf_a@web.de>

* Handle .md and .nim paths uniformly in findRefFile

* handle titles better + more comments

* don't allow markup overwrite index title for .nim files

Co-authored-by: Andreas Rumpf <rumpf_a@web.de>
capocasa pushed a commit to capocasa/Nim that referenced this pull request Mar 31, 2023
* docgen: implement cross-document links

Fully implements nim-lang/RFCs#125
Follow-up of: nim-lang#18642 (for internal links)
and nim-lang#20127.

Overview
--------

Explicit import-like directive is required, called `.. importdoc::`.
(the syntax is % RST, Markdown will use it for a while).

Then one can reference any symbols/headings/anchors, as if they
were in the local file (but they will be prefixed with a module name
or markup document in link text).
It's possible to reference anything from anywhere (any direction
in `.nim`/`.md`/`.rst` files).

See `doc/docgen.md` for full description.

Working is based on `.idx` files, hence one needs to generate
all `.idx` beforehand. A dedicated option `--index:only` is introduced
(and a separate stage for `--index:only` is added to `kochdocs.nim`).

Performance note
----------------

Full run for `./koch docs` now takes 185% of the time before this PR.
(After: 315 s, before: 170 s on my PC).
All the time seems to be spent on `--index:only` run, which takes
almost as much (85%) of normal doc run -- it seems that most time
is spent on file parsing, turning off HTML generation phase has not
helped much.
(One could avoid it by specifying list of files that can be referenced
and pre-processing only them. But it can become error-prone and I assume
that these linke will be **everywhere** in the repository anyway,
especially considering nim-lang/RFCs#478.
So every `.nim`/`.md` file is processed for `.idx` first).

But that's all without significant part of repository converted to
cross-module auto links. To estimate impact I checked the time for
`doc`ing a few files (after all indexes have been generated), and
everywhere difference was **negligible**.
E.g. for `lib/std/private/osfiles.nim` that `importdoc`s large
`os.idx` and hence should have been a case with relatively large
performance impact, but:

* After: 0.59 s.
* Before: 0.59 s.

So Nim compiler works so slow that doc part basically does not matter :-)

Testing
-------

1) added `extlinks` test to `nimdoc/`
2) checked that `theindex.html` is still correct
2) fixed broken auto-links for modules that were derived from `os.nim`
   by adding appropriate ``importdoc``

Implementation note
-------------------

Parsing and formating of `.idx` entries is moved into a dedicated
`rstidx.nim` module from `rstgen.nim`.

`.idx` file format changed:

* fields are not escaped in most cases because we need original
  strings for referencing, not HTML ones
  (the exception is linkTitle for titles and headings).
  Escaping happens later -- on the stage of `rstgen` buildIndex, etc.
* all lines have fixed number of columns 6
* added discriminator tag as a first column,
  it always allows distinguish Nim/markup entries, titles/headings, etc.
  `rstgen` does not rely any more (in most cases) on ad-hoc logic
  to determine what type each entry is.
* there is now always a title entry added at the first line.
* add a line number as 6th column
* linkTitle (4th) column has a different format: before it was like
  `module: funcName()`, now it's `proc funcName()`.
  (This format is also propagated to `theindex.html` and search results,
  I kept it that way since I like it more though it's discussible.)
  This column is what used for Nim symbols resolution.
* also changed details on column format for headings and titles:
  "keyword" is original, "linkTitle" is HTML one

* fix paths on Windows + more clear code

* Update compiler/docgen.nim

Co-authored-by: Andreas Rumpf <rumpf_a@web.de>

* Handle .md and .nim paths uniformly in findRefFile

* handle titles better + more comments

* don't allow markup overwrite index title for .nim files

Co-authored-by: Andreas Rumpf <rumpf_a@web.de>
a-mr added a commit to a-mr/Nim that referenced this pull request Apr 2, 2023
This fixes a regression introduced in nim-lang#20990 . When a group referencing
is used and one of the overloaded symbols is in `include`d file, then
`nim doc` crashes. The fix is in distinguishing (the index of) module
and file where the symbol is defined, and using only module as the
key in hash table for group referencing.
Araq pushed a commit that referenced this pull request Apr 2, 2023
This fixes a regression introduced in #20990 . When a group referencing
is used and one of the overloaded symbols is in `include`d file, then
`nim doc` crashes. The fix is in distinguishing (the index of) module
and file where the symbol is defined, and using only module as the
key in hash table for group referencing.
capocasa pushed a commit to capocasa/Nim that referenced this pull request May 15, 2023
This fixes a regression introduced in nim-lang#20990 . When a group referencing
is used and one of the overloaded symbols is in `include`d file, then
`nim doc` crashes. The fix is in distinguishing (the index of) module
and file where the symbol is defined, and using only module as the
key in hash table for group referencing.
capocasa pushed a commit to capocasa/Nim that referenced this pull request May 16, 2023
This fixes a regression introduced in nim-lang#20990 . When a group referencing
is used and one of the overloaded symbols is in `include`d file, then
`nim doc` crashes. The fix is in distinguishing (the index of) module
and file where the symbol is defined, and using only module as the
key in hash table for group referencing.
bung87 pushed a commit to bung87/Nim that referenced this pull request Jul 29, 2023
* docgen: implement cross-document links

Fully implements nim-lang/RFCs#125
Follow-up of: nim-lang#18642 (for internal links)
and nim-lang#20127.

Overview
--------

Explicit import-like directive is required, called `.. importdoc::`.
(the syntax is % RST, Markdown will use it for a while).

Then one can reference any symbols/headings/anchors, as if they
were in the local file (but they will be prefixed with a module name
or markup document in link text).
It's possible to reference anything from anywhere (any direction
in `.nim`/`.md`/`.rst` files).

See `doc/docgen.md` for full description.

Working is based on `.idx` files, hence one needs to generate
all `.idx` beforehand. A dedicated option `--index:only` is introduced
(and a separate stage for `--index:only` is added to `kochdocs.nim`).

Performance note
----------------

Full run for `./koch docs` now takes 185% of the time before this PR.
(After: 315 s, before: 170 s on my PC).
All the time seems to be spent on `--index:only` run, which takes
almost as much (85%) of normal doc run -- it seems that most time
is spent on file parsing, turning off HTML generation phase has not
helped much.
(One could avoid it by specifying list of files that can be referenced
and pre-processing only them. But it can become error-prone and I assume
that these linke will be **everywhere** in the repository anyway,
especially considering nim-lang/RFCs#478.
So every `.nim`/`.md` file is processed for `.idx` first).

But that's all without significant part of repository converted to
cross-module auto links. To estimate impact I checked the time for
`doc`ing a few files (after all indexes have been generated), and
everywhere difference was **negligible**.
E.g. for `lib/std/private/osfiles.nim` that `importdoc`s large
`os.idx` and hence should have been a case with relatively large
performance impact, but:

* After: 0.59 s.
* Before: 0.59 s.

So Nim compiler works so slow that doc part basically does not matter :-)

Testing
-------

1) added `extlinks` test to `nimdoc/`
2) checked that `theindex.html` is still correct
2) fixed broken auto-links for modules that were derived from `os.nim`
   by adding appropriate ``importdoc``

Implementation note
-------------------

Parsing and formating of `.idx` entries is moved into a dedicated
`rstidx.nim` module from `rstgen.nim`.

`.idx` file format changed:

* fields are not escaped in most cases because we need original
  strings for referencing, not HTML ones
  (the exception is linkTitle for titles and headings).
  Escaping happens later -- on the stage of `rstgen` buildIndex, etc.
* all lines have fixed number of columns 6
* added discriminator tag as a first column,
  it always allows distinguish Nim/markup entries, titles/headings, etc.
  `rstgen` does not rely any more (in most cases) on ad-hoc logic
  to determine what type each entry is.
* there is now always a title entry added at the first line.
* add a line number as 6th column
* linkTitle (4th) column has a different format: before it was like
  `module: funcName()`, now it's `proc funcName()`.
  (This format is also propagated to `theindex.html` and search results,
  I kept it that way since I like it more though it's discussible.)
  This column is what used for Nim symbols resolution.
* also changed details on column format for headings and titles:
  "keyword" is original, "linkTitle" is HTML one

* fix paths on Windows + more clear code

* Update compiler/docgen.nim

Co-authored-by: Andreas Rumpf <rumpf_a@web.de>

* Handle .md and .nim paths uniformly in findRefFile

* handle titles better + more comments

* don't allow markup overwrite index title for .nim files

Co-authored-by: Andreas Rumpf <rumpf_a@web.de>
bung87 pushed a commit to bung87/Nim that referenced this pull request Jul 29, 2023
This fixes a regression introduced in nim-lang#20990 . When a group referencing
is used and one of the overloaded symbols is in `include`d file, then
`nim doc` crashes. The fix is in distinguishing (the index of) module
and file where the symbol is defined, and using only module as the
key in hash table for group referencing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants