Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement printf %()T %.s %*s %.*s #668

Merged
merged 10 commits into from
Mar 19, 2020

Conversation

akinomyoga
Copy link
Collaborator

@akinomyoga akinomyoga commented Mar 18, 2020

Resolve #650

Note on %()T

In the implementation of %()T, I needed to reflect the exported shell variable TZ in the real environment variable TZ. This is the corresponding section. I'm not sure whether I did this in the right way. (Maybe a new method like self.mem.GetExportedVar(name) should be added to class Mem). Could you check and edit if necessary?

Note on %6.s

In the current master branch, the parsing of %6.s is broken to generate the following errors (s is treated as a precision), so I added a check for the precision and treat missing precision as 0 (as POSIX requirements).

osh$ printf '%6.s\n' a
  %6.s\n
      ^
(source.ArgvWord word_spid:2):1: Invalid printf format character
osh$ printf '%6.ss\n' a
Traceback (most recent call last):
  File "/home/murase/prog/ext-github/oilshell.oil/bin/oil.py", line 965, in <module>
    sys.exit(main(sys.argv))
  File "/home/murase/prog/ext-github/oilshell.oil/bin/oil.py", line 911, in main
    return AppBundleMain(argv)
  File "/home/murase/prog/ext-github/oilshell.oil/bin/oil.py", line 884, in AppBundleMain
    status = ShellMain('osh', argv0, main_argv, login_shell)
  File "/home/murase/prog/ext-github/oilshell.oil/bin/oil.py", line 686, in ShellMain
    prompt_plugin, errfmt)
  File "/home/murase/prog/ext-github/oilshell.oil/core/main_loop.py", line 95, in Interactive
    is_return, _ = ex.ExecuteAndCatch(node)
  File "/home/murase/prog/ext-github/oilshell.oil/osh/cmd_exec.py", line 1862, in ExecuteAndCatch
    status = self._Execute(node, fork_external=fork_external)
  File "/home/murase/prog/ext-github/oilshell.oil/osh/cmd_exec.py", line 1807, in _Execute
    status, check_errexit = self._Dispatch(node, fork_external)
  File "/home/murase/prog/ext-github/oilshell.oil/osh/cmd_exec.py", line 1061, in _Dispatch
    status = self._RunSimpleCommand(cmd_val, fork_external)
  File "/home/murase/prog/ext-github/oilshell.oil/osh/cmd_exec.py", line 802, in _RunSimpleCommand
    return self.RunSimpleCommand(cmd_val, fork_external)
  File "/home/murase/prog/ext-github/oilshell.oil/osh/cmd_exec.py", line 890, in RunSimpleCommand
    return self._RunBuiltin(builtin_id, cmd_val, fork_external)
  File "/home/murase/prog/ext-github/oilshell.oil/osh/cmd_exec.py", line 524, in _RunBuiltin
    status = self._RunBuiltinAndRaise(builtin_id, cmd_val, fork_external)
  File "/home/murase/prog/ext-github/oilshell.oil/osh/cmd_exec.py", line 467, in _RunBuiltinAndRaise
    status = builtin_func.Run(cmd_val)
  File "/home/murase/prog/ext-github/oilshell.oil/osh/builtin_printf.py", line 218, in Run
    precision = int(part.precision.val)
ValueError: invalid literal for int() with base 10: 's'

Note on printf # flag

The tests for flag # in the current master branch doesn't fully describe the behavior, so I added and modified tests.

@akinomyoga akinomyoga changed the title Impl printf Implement printf %()T %.s %*s %.*s Mar 18, 2020
@andychu
Copy link
Contributor

andychu commented Mar 18, 2020

Thanks this is great! I pulled and ran the tests and they passed.

And thanks for pointing out the bug. I think the existing code might not be behaving as much like a "parser" as I want it do -- I'm looking at why.


Please separate the | rules into separate lines like this, since the lexer can't be converted to C otherwise:

-   R('[1-9][0-9]*|\*', Id.Format_Num),
+   R('[1-9][0-9]*', Id.Format_Num),
+   R('\*', Id.Format_Num),
    C('.', Id.Format_Dot),
    # We support dsq.  The others we parse to display an error message.
-   R('[disqbcouxXeEfFgG]|\([^()]*\)T', Id.Format_Type),
+   R('[disqbcouxXeEfFgG]', Id.Format_Type),
+   R('\([^()]*\)T', Id.Format_Type),

(build/dev.sh fastlex which is part of build/dev.sh all fails with | because of the custom regex translator I use. )


In addition I try to maintain different token types rather than looking at token values, see property #2 here:

http://www.oilshell.org/blog/2019/02/07.html#three-nice-properties-of-a-parser

So it would be nice if

  • * is Id.Format_Star
  • %()T is Id.Format_Time

These are defined in frontend/id_kind.py under 'Format'. You can add Star and Time if you think the code fits into the code extra.

This is more of a style thing but I think it makes the code cleaner. If it causes complications let me know.


One nice way to view the AST is to turn if 0 to if 1 in Run(). That's how I view the separation between parsing and runtime.


OK I'm looking at this a little more... I think it's good but I think the width parsing was already messed up as you noticed.

@andychu
Copy link
Contributor

andychu commented Mar 18, 2020

Oh I see there are two problems:

  1. The grammar at line 48 is wrong? Because the number after . is optional right?
  2. The code isn't properly checking that the grammar is satisifed -- it puts an invalid value in part.precision.
    fmt           = Format_Percent Flag? Num? (Dot Num)? Type

/~https://github.com/oilshell/oil/blob/master/osh/builtin_printf.py#L48

- /tmp/lSedbk_builtin_printf.py
+ osh/builtin_printf.py

@@ -85,6 +85,8 @@

      if self.token_type == Id.Format_Dot:
        self._Next(lex_mode_e.PrintfPercent)  # past dot
+       if self.token_type != Id.Format_Num:
+         p_die('unexpected token after .', token=self.cur_token)
        part.precision = self.cur_token
        self._Next(lex_mode_e.PrintfPercent)

Now I get this:

$ bin/osh -c 'printf "%6.ss\n" x'
  %6.ss\n
     ^
(source.ArgvWord word_spid:2):1: unexpected token after .

So I think it is important to update the grammar and then follow the code. But if you think that will mess things up let me know and I can try it. It's more important that it works at first than to follow a particular style.


Long term I want to share the printf lexer and parser with a static variant:

echo ${x %.3f}   # printf formatting

This is analogous to $'\n' and echo -e '\n' being static and dynamic variants. Oil is trying to be more static. Those actually share the same lexing rules! (with some changes since bash's duplicate code led to slightly different languages in these two cases)


Also another consequence of this is that I try to never synthesize Token values. Those should come from the lexer. If necessary the schema in frontend/syntax.asdl may need to be changed instead. Again I use the if 1 block to make sure the schema makes sense. We can talk about it on Zulip if you have questions.

@andychu
Copy link
Contributor

andychu commented Mar 18, 2020

Hm yeah I think the grammar should be something like:

  control = Num | Star
  fancy = Flag? control? (Dot control?)? Type
  fmt     = Format_Percent (fancy | Time)

instead of

  fmt           = Format_Percent Flag? Num? (Dot Num)? Type

Does that look right?

No actually that's not right because you can combine width and precision with time:

andy@lisa:~/git/oilshell/oil$ printf '%10(%Y-%m)T\n'
   2020-03

Maybe add those test cases to make sure they work?

@andychu
Copy link
Contributor

andychu commented Mar 18, 2020

So then I think the grammar is more like:

  control = Num | Star
  fmt = Percent control? (Dot control?)? (Type | Time)

?

@akinomyoga
Copy link
Collaborator Author

Thank you for your review!

Please separate the | rules into separate lines like this

In addition I try to maintain different token types rather than looking at token values

b695317 I added token IDs Star, Time, and Zero. Zero was added because 0 can appear in both flags context and precision context.

So I think it is important to update the grammar and then follow the code.

27e435c Updated.

Also another consequence of this is that I try to never synthesize Token values.

28c169f Changed so that Dot is assigned to part.precision when a value is not supplied.

No actually that's not right because you can combine width and precision with time:

Maybe add those test cases to make sure they work?

1c84df0 Added a test case and supported precision for %()T.

Multiple flags?

By the way, I noticed that the current implementation only accepts a single flag. Since Oil currently supports only - and 0, which are mutually exclusive, there is no use case for specifying multiple flags, but basically the parser should accept multiple flags such as %#010.8x.

Copy link
Contributor

@andychu andychu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parsing part looks great! Thank you. The Zero is an improvement.

I don't know about multiple flags... I guess it's OK to parse more as long as nothing that exists is broken. We can recognize the full language and implement it later, as long as there is a decent error message about what's not implemented.


About the TZ part -- this will have to be rewritten in C at some point, but it doesn't have to be done now (e.g. let's worry about the rest of #653 before that).

So the important part is really the tests. As long as the tests are covering the intended behavior and will catch bugs in future C code, it's good.

@@ -252,14 +314,31 @@ def Run(self, cmd_val):
s = '%x' % d
elif typ == 'X':
s = '%X' % d
elif part.type.id == Id.Format_Time:
# set timezone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this part in a high level comment? I don't know these APIs very well and eventually they will have to be translated to C.

(Actually if you prefer to write in C, then native/libc.c has related functions and is fairly easy to modify.)

Does strftime read os.environ['TZ']? I'm not sure how it works exactly.


OK I looked over the docs a bit, I guess time.tzset() reads os.environ['TZ']. That's kind of confusing. Please add a comment about that.

Copy link
Collaborator Author

@akinomyoga akinomyoga Mar 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cb08494 Thank you! I added code comments. (edit: added a typofix)

d = None
elif d == -2: # shell start time
d = shell_start_time
s = time.strftime(typ[1:-2], time.localtime(d));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no semicolon needed

Also where does -2 come from? I'm having trouble following that. May need more comments.

Copy link
Collaborator Author

@akinomyoga akinomyoga Mar 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the semicolon and added comments in cb08494. Also, I changed to explicitly set d = time.time() (not relying on the default behavior of time.localtime(None)). (edit: added a typo fix)

@akinomyoga
Copy link
Collaborator Author

akinomyoga commented Mar 19, 2020

88360e5 I also added support of multiple flags.

@andychu andychu merged commit cde3f10 into oils-for-unix:master Mar 19, 2020
@akinomyoga akinomyoga deleted the impl-printf branch March 19, 2020 08:19
@andychu
Copy link
Contributor

andychu commented Mar 19, 2020

Great thank you! This is one of the biggest contributions ever to Oil.

bash printf is so complicated! This is why I want a simple functions in Oil like:

var s = strftime('%Y-%m-%d') 

:-)

I made this change to accomodate how we strip down the CPython interpreter (will be obsolete after C++ translation, but it's closer to what C looks like)

172e654

Also I adjusted this zsh output:

eba6506

Can you try building shells like this? At least 2 people have done it before and it takes 2-3 minutes.

/~https://github.com/oilshell/oil/blob/master/test/spec-bin.sh

That is the "golden version" of the shell we test. (Some of them are old but they are stable/known. The spec tests tickle lots of verison-specific behavior.)


Thanks again!

@akinomyoga
Copy link
Collaborator Author

Thank you!

172e654

FYI: I understand the motivation for using POSIX API and I'm not sure if it causes a real problem or not. but Python documentation recommends to call setenv/unsetenv indirectly through os.environ instead of directly calling them to keep consistency between the environment variables that Python caches and the real environment variables.

Can you try building shells like this?

Oh, I'm sorry. I actually noticed the results of test cases are sometimes different but didn't think about it seriously. Now I built the shells! Thank you!

@andychu
Copy link
Contributor

andychu commented Mar 19, 2020

Minor updates:

I removed a condition on TZ, and added some tests, see rationale here:

fddcf67

Basically there is no equivalent of posix.environ in C++ -- it's just getenv, putenv, and unsetenv.


Also, I thought that TZ was like LANG. LANG is respected even if it's NOT exported. But that's not true of TZ. #527

I looked in lib/sh/strftime.c in bash and it does something ad hoc and even has some #ifdefs.

I think the current behavior is fine but if you see any issues let me know.


Right before I mentioned spec-bin.sh I moved it to _deps/spec-bin:

ff303b6

Then I realized you might not have pulled in the last 5 minutes. So if you have _tmp/spec-bin just move it to _deps/spec-bin and it should work fine.

I did this for the Travis build which caches the _deps directory: http://travis-ci.oilshell.org/jobs/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement printf %*s and %.*s
2 participants