Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make all values lists of strings, instead of strings #2458

Open
casey opened this issue Nov 8, 2024 · 6 comments
Open

Make all values lists of strings, instead of strings #2458

casey opened this issue Nov 8, 2024 · 6 comments

Comments

@casey
Copy link
Owner

casey commented Nov 8, 2024

Just values are all strings. This is convenient, because it means that Just is effectively statically typed. It is impossible to provoke a type error.

However, it means that many desirable things are impossible:

  • Can't have lists.
  • No way to represent a *arg. Currently arg is a space-separated string, but users may want to do something with each entry in the list, and re-splitting is error-prone.
  • Representation of booleans is tricky. Some functions return true/false, but these are also valid strings.

We could add a type system, see #528. But another interesting option would be to change our single type from strings to lists of strings. This is done in rc, Plan 9s shell.

This seems crazy! Everything is a list? However, consider this: The shell, and Just, benefit from enormous simplicity because everything is a string. No type errors! No type system! We could abandon this, and add a type system, but another path is to make the one type we do have more powerful.

All current values, instead of being strings, would be lists containing single strings. When used in interpolations, or as arguments to / and +, an list of a single string would behave in exactly the same way as a string does now, so this would be a backwards compatible change.

This would allow the following nice things:

  • The only false value would be the empty list, and any non-empty list is true. [''] is true, so it would be possible to provide the empty string as a value, and still allow a fallback with value || fallback.
  • make has a ton of functions which treat a string as a space-separated list. These are useful, but error prone, so I haven't wanted to add these. We could add useful, non-error-prone versions of these.
  • + could append an extension to every element of a list: ["hello", "goodbye"] + '.c' -> ["hello.c", "goodbye".c]
  • Ditto for /: ["/", "/usr"] / "bin" -> ["/bin", "/usr/bin"]
  • false could be represented by the empty list, and all other values would be interpreted as true. This [''] would be true, allowing the user to provide the empty string for a value.
  • for which evaluates to a list: for a in var { a + ".c" }

The main downside, I think, is that it is very unfamiliar and counter-intuitive. Users are used to shell-like languages, including make and just, where everything is a string. Everything is a list of strings is very weird. rc implements and makes a compelling case for lists of lists being the single type, but it's unfamiliar and mostly forgotten.

Open questions:

  • Should we call them lists or arrays?
  • Should we go even further and make the only type associative arrays? Associative arrays are more powerful, but emulating lists with associative arrays is problematic, and lists are a popular type. Plus we're really going down the dynamic language path at that point, since property access can fail.
  • Is this totally crazy and we should not do this?
  • What other features would be easy to add if we did this?
@neunenak
Copy link
Contributor

neunenak commented Nov 8, 2024

Open questions:

* Should we call them lists or arrays?

I personally prefer the terminology list, but it doesn't really matter.

* Should we go even further and make the only type associative arrays? Associative arrays are more powerful, but emulating lists with associative arrays is problematic, and lists are a popular type. Plus we're really going down the dynamic language path at that point, since property access can fail.

Associative arrays are useful, but they can be emulated with a list of 2-lists representing keys and values. On the other hand, I don't think that treating a list as an implicit associative array with natural number keys, Javascript-style, is necessarily a problem. If there's any mechanism for indexing into a list with a natural number, then that has the possibility of failing even if there's no notion of an associative array.

@gyreas
Copy link
Contributor

gyreas commented Nov 14, 2024

hmmm...

I still think clear separation between list and string is the way to go. Essentially introducing an additional type: list, which maintain the static-typing of Just. I think it's more intuitive than ditching singular strings.

@casey
Copy link
Owner Author

casey commented Nov 14, 2024

I still think clear separation between list and string is the way to go. Essentially introducing an additional type: list, which maintain the static-typing of Just. I think it's more intuitive than ditching singular strings.

Adding two types, lists and strings, would mean that just would either become dynamically typed, and have type errors when you tried to use a list where a string was necessary, or vice versa; or we would have to add a static type system, which would be a huge undertaking.

@gyreas
Copy link
Contributor

gyreas commented Nov 14, 2024

In a way, though, you still be creating two types since a list is a composite type; it needs elements to make sense (or not). As for dynamic typing, like I mention below, singular strings will convert to singular lists. This conversion can happen somewhere between parsing and evaluation. Besides, isn't Just an interpreter of commands, dynamic typing shouldn't come as a shock to it.

Re-reading Drew response to another hacker's experiment on whitespaces in shell here, he mentions that the key ingredient is adding list of strings (alongside strings).

I think for Just, we should treat strings like a list with one element. Citing your example: ["/", "/usr"] / "bin", you already instinctively feel that "bin" should be used like that rather than ["/", "/usr"] / ["bin"].

PS: I haven't looked at Just source in a long while, maybe out of touch with this reality. Lemme know

@latk
Copy link

latk commented Jan 11, 2025

I've been taking a shot at implementing the infrastructure for this idea (in the context of #1988), and changing the data model seems to be quite doable. However, I've been hitting a dead end regarding the exact semantics.

The problem is that this issue makes a key assumption that "All current values, instead of being strings, would be lists containing single strings." However, this is incompatible with the idea in #1988 to use such lists for lossless handling of variadic recipe arguments, which is effectively already a kind of list in the current Just implementation.

Consider this example recipe:

set unstable   # for the && || control flow ops
set positional-arguments

@example *args:
  echo bool={{ args && "true" || "false" }}
  echo path={{ args / "foo" }}
  echo narg=$#

The current behavior can be demonstrated as follows, as the args are currently immediately joined into a single string. Only the number of positional-arguments makes some cases distinguishable, as shell arguments are handled before the arguments are collected into Just variables.

$ just example
bool=false
path=/foo
narg=0
$ just example ''
bool=false
path=/foo
narg=1
$ just example a b
bool=true
path=a b/foo
narg=2
$ just example 'a b'
bool=true
path=a b/foo
narg=1

If the *args were changed into a list that preserves the individual arguments, and if the interesting ideas in this issue were implemented, then (a) the truthiness of the example '' example would change, and (b) the example a b path concatenation would change to a/foo b/foo.

I think this means that once we have list values, we have a choice:

  • Either turn variadic recipe arguments into lists, which are joined into a single string whenever they encounter a single-value operator like / or +.
  • Or support operators that automatically distribute over lists (['a', 'b'] + ['.h', '.c'] == ['a.h', 'b.h', 'a.c', 'b.c']), but this mean variadic recipe arguments cannot be lists.

(The third choice would be to introduce a breaking change, but I wouldn't want that.)

I'm biased because I really want #1988 lossless recipe arg forwarding, and I'm not sure if it's ever possible to get that if we don't use lists for that feature. But that will mean that all existing operators will have to treat multi-item lists and single-strings equivalently as per their joined string representation. For example ['a', 'b'] == ['a b'] == 'a b' for all purposes. Instead, we'd need a new mechanism for per-list-item application, e.g. something like Raku's hyperoperators or some list generator syntax such as for expressions.

Next steps:

  • I'm continuing my Question: How to forward *args? #1988 experiments with a list-based Val type under the assumption that all existing operators shall treat ['a', 'b'] == 'a b'.
  • I recommend against stabilizing the && and || operators until we have clearer insight into their interaction with lists – it's still possible to have [''] be truthy, this would only break unstable justfiles that depend on the truthiness of variadic recipe args.

@casey
Copy link
Owner Author

casey commented Jan 11, 2025

Good summary! I agree with all your points. Some thoughts:

  • One good thing to note is that although we cannot introduce breaking changes, we can introduce breaking changes behind a new, opt-in setting. And, if such a setting were really just better, we could eventually roll it into an edition (see Just Editions #1201).

  • I've intentionally held off on stabilizing && / || and not allowing expressions in if … { } conditionals because of this kind of thing. And if we have lists, we definitely need to think about what's falsey and truthy with lists in mind. Using the empty list as false and everything else as true is definitely the most appealing option, but it might surprise users if an empty string (i.e. ['']) is true.

  • == is tough! you can have a var arg and do var_arg == "", which if no args were passed will be true, but would be false if var_arg was an empty list.

  • I haven't thought about it, but we would need syntax for constructing lists. [a, b, …] is the obvious choice, but I haven't thought too hard about if that works with the grammar. I think it's fine though, we only use it for attributes, which cannot appear in the same places as expressions.

  • Unstable settings or syntax can be used for longer-term experimentation. For example, an unstable setting which, if set, makes variadic arguments lists of strings, and if not set, they're turned into a list containing a single space-separated string. We could land such a PR, since it should be backwards compatible. We can then experiment, and either we figure out a way to make everything backwards compatible, in which case we can remove the setting and make it default on, or if we can't, we can stabilize the setting so users can opt into the new behavior.

  • Operators that distribute over lists seems really nice. Also seems like a really big design space for the exact behavior.

Anyways, will definitely hold off on stabilizing && and ||.

latk added a commit to latk/just.rs that referenced this issue Jan 11, 2025
In support of:

* casey#2458
* casey#1988

A `Val` is a list of strings, but at each point of usage we must decide whether
to view it as a singular joined string or as a list of its parts. Previously,
Just values were single strings, so in most places we have to invoke
`to_joined()` in order to maintain compatibility. In particular, recipes,
functions, and operators like `+` or `/` operate solely on strings. This
includes logical operators like `&&`, which continue to be defined on strings.
That means, the values `[]` and `['']` are currently completely equivalent.

So far, this is a purely internal change without externally visible effects.
Only the `Bindings`/`Scope`/`Evaluator` had API changes.

No new syntax is implemented. However, in expectation of expressions that build
lists, a new `evaluate_list_expression() -> Vec<String>` method is introduced
that could be used to implement splat or generator expressions. It is already
referenced wherever we have lists of arguments, e.g. variadic functions like
`shell()` and dependency arguments. But because singular expressions are
equivalent to a joined string, this is an unobservable detail for now.

For experimenting with lists of strings, variadic recipe parameters like `*args`
now produce a multi-part Val, and default to an empty list (not a list with an
empty string). Because all operators use `to_joined()`, this is an unobservable
implementation detail. However, if any operator becomes list-aware, then this
detail should be reverted, or moved behind an unstable setting.

For better understanding of the current behavior, I added a bunch of tests.
These will help detect regressions if functions or operators become list-aware.
No existing tests had to be touched.

Next steps: This change is just the foundation for other work, but some ideas
are mutually exclusive. Relevant aspects:

* list syntax in casey#2458
* list aware operators in casey#2458
* lossless forwarding of variadics: casey#1988
* invoking dependencies multiple times: casey#558

The preparatory work like `evaluate_list_expression()` is biased towards
implementing a splat operator that would enable casey#2458 list syntax and casey#1988 list
forwarding, but doesn't commit us to any particular design yet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants