This project is a sourcecode transpiler for commutative diagrams.
The aim is being able to translate from and to any format, most of which are LaTeX DSLs.
Here is the progress on the planned ones:
Target | Import | Export |
---|---|---|
amscd | ██████████ |
██████████ |
amscdx | ██████░░░░ |
███████░░░ |
CoDi | ░░░░░░░░░░ |
░░░░░░░░░░ |
quiver | ████████░░ |
███████░░░ |
tikz-cd | ░░░░░░░░░░ |
░░░░░░░░░░ |
xymatrix | ██░░░░░░░░ |
░░░░░░░░░░ |
... |
Private repo: /~https://github.com/paolobrasolin/ouroboros
A transpiler like this could be realized with many technologies. I have a few end goals:
- integrating this in quiver;
- creating a conversion service with no backend infrastructure;
- using a pleasant language, with good libraries;
- learning something new.
TypeScript therefore looks like the best choice. On top of it, two outstanding libraries that trivialize a lot of groundwork are nearley.js for grammar-based parsing and Superstruct for data validation and coercion.
Freely transpiling among many DSLs requires a transpilation procedure for each ordered source/target language pair we want to connect.
How many transpilers do we need in total?
- If we connect
n
DSLs directly, then we need two timesn(n-1)/2
(i.e. twice the number of edges of aKₙ graph
). - If we connect
n
DSLs through an artificial Universal Language, then we need two timesn
(i.e. twice the number of edges of theSₙ graph
).
Implementing an Universal Language (UL for short) clearly is the winning strategy.
Each DSL will have a dedicated folder It will contain a some components allowing it to be transpiled back and forth from the UL.
-
schema
describes the AST withsuperstruct
structures.- Optional fragments of the DSL are accounted for by using
optional
. Anything which is valid for the original processor mustvalidate
. - Implicit defaults of the DSL are accounted for by using
defaulted
. The schema must be the single source of truth for about the DSL defaults: consumers of the AST must simply trust coercion (e.g. viacreate
) to make them defaults explicit. - In nested objects
defaulted
s must be on childrenStruct
s, whileoptional
s should be on the parentStruct
s. This allowsassert
s to be a simple way (after coercion) to get rid of the... | undefined
from the signatures ofoptional
parts when processing the AST.
- Optional fragments of the DSL are accounted for by using
-
grammar
describes the DSL with anearley
grammar.- It is an optional component which might be used by the
parser
. - It must not be ambiguous.
- It is an optional component which might be used by the
-
parser
implements aparse
function to transform sourcecode into an AST.parse
is responsible to perform any extra necessary decoding/deserialization on the input.parse
outputs a bona fide object respectingschema
, meaning that the signatures are correct but no explicit validation (and especially no coercion) is done at this time.parse
may output an array to account for ambiguity and simplify testing, but it should only contain a single object as we ban ambiguous grammars.
-
injector
implements aninject
function mapping the DSL AST into the UL AST.inject
must assume scheme coercion has been done, so it can have no knowledge of the DSL defaults and can simply perform a fewassert
s to check for presence and cirvumvent the inconvenient* | undefined
signatures.inject
mustcreate
its output, so it can avoid reasoning only about the features being actively used. This allows targeted testing withtoMatchObject
and avoids the need for backtracking when adding new features to the UL, all while keeping the UL fully explicit.
-
projector
implements aproject
function mapping the UL AST onto the DSL AST.project
maps only features available in the target DSL.- TODO: a policy for approximating missing features and collecting waringns for unsupported ones must be estabilished.
-
renderer
implements arender
function to transform an AST to sourcecode.- TODO: perhaps
render
should include a minification process to produce the minimal code leveraging implicit defaults of the DSL. Maybe avoiding coercion is enough, but I haven't made up my mind yet.
- TODO: perhaps
-
index
ties together all components into a simple API.- It implements
read = inject ∘ coerce ∘ parse
, which translates DSL source into its representation in universal language. - It implements
write = render ∘ project ∘ coerce
, which translates a univesal language representation into DSL source. (NOTE: coercion here can be omitted as long as we keep the UL completely explicit.)
- It implements
The UL will also have its own folder. It will contain much less than other DSLs, since it's used only for internal representation.
-
schema
describes the AST withsuperstruct
structures.- This is the only component of the UL and is used only for internal representation.
A few more words should be spent about the design of the UL, as two very different approaches can be followed for the usage of optional
structures.
-
Everything is optional (except topology).
- PRO: injectors output can be limited to the used attributes
- CON: projectors input needs to be
assert
ed to circumvent partial signatures (after the input has been coerced externally, of course)
-
Everything is mandatory (and has reasonable defaults).
- CON: injectors output must be
create
d as the injector must not know about defaults and all properties are mandatory; this also avoids breakage on UL extensions - PRO: projectors input has simple signatures (no
* | undefined
) and can be destructured right away while simply ignoring unsupported features of the target DSL
- CON: injectors output must be
It's a matter of balance, but ultimately the latter alternative has slightly better ergonomics, and a fully explicit UL schema should be simpler to reason about.