Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

esm: utility method for detecting ES module syntax #27808

Closed

Conversation

GeoffreyBooth
Copy link
Member

This PR adds a utility method to detect whether input JavaScript source code contains ES module syntax:

const { containsModuleSyntax } = require('module');

containsModuleSyntax('import { shuffle } from "lodash"'); // true
containsModuleSyntax('console.log(process.version)'); // false

containsModuleSyntax('import "./file.mjs"'); // true
containsModuleSyntax('import("./file.mjs")'); // false

The use case for this is any utility that currently uses vm.Script’s runInThisContext and might also want to support ESM input source code, and therefore would need to know when to use vm.Script versus the ES module vm.SourceTextModule (and has no definitive signal from the user as to the intended source code type, and where the risk of evaluating in the wrong module system is acceptable for the use case). Two such utilities that I’m familiar with are Babel and CoffeeScript, which each take arbitrary source code as user input (like --eval) and use vm.runInThisContext for evaluation. I would also imagine that this method would be useful to some loaders or testing frameworks.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@ljharb
Copy link
Member

ljharb commented May 22, 2019

(link to nodejs/modules#330 and nodejs/ecmascript-modules#69)
cc @nodejs/modules

Copy link
Member

@ljharb ljharb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems more useful to have a method that returns an array of possible parse goals - ie, esm, cjs, or both - which allows for future options like “wasm”, “json”, etc - but “contains module syntax” certainly does what it says on the tin.

@devsnek
Copy link
Member

devsnek commented May 22, 2019

i am concerned about the longevity of this api, not that the api itself is a bad thing. is this the right place to put it, is there some other api in the future we might want instead, etc. perhaps flagging it would make sense for now?

@devsnek devsnek added esm Issues and PRs related to the ECMAScript Modules implementation. module Issues and PRs related to the module subsystem. semver-minor PRs that contain new features and should be released in the next minor version. labels May 22, 2019
@BridgeAR
Copy link
Member

I agree with @devsnek that this API might change again. An alternative to a flag would be to mark it as experimental.

@Fishrock123
Copy link
Contributor

Does this api really belong in core? Can’t it be an npm module?

Especially since it’s future may be uncertain, out of core flexibility seems better to me...

@GeoffreyBooth
Copy link
Member Author

Does this api really belong in core? Can’t it be an npm module?

Of course, any utility method could be outside core. I was thinking that this belonged in core because it’s useful in relation to vm.Script and vm.SourceTextModule. We might also find ourselves using this method in core someday, for example if we wanted to provide some kind of automatic way to choose the type for --input-type.

@TimothyGu
Copy link
Member

Hmm, I think we should wait until such a usage in core becomes apparent and then discuss adding this. It seems possible right now that a future feature would in fact require a slightly different algorithm for deciding if a file is CJS or a module.

@GeoffreyBooth
Copy link
Member Author

GeoffreyBooth commented May 22, 2019

Ultimately I think there will be multiple algorithms for deciding how a file or a source input string should be treated. There’s no one ideal algorithm that will work for all scenarios.

The difficulty in “detecting” CommonJS is that most (all?) CommonJS files that have 'use strict' or could have 'use strict' without erroring are also valid as ESM. Such source code may reference the CommonJS globals like require, but that’s permitted in ES module code; such references would just throw, unless the reference was one like if (require). So you could make a theoretical “probably CommonJS source” method, that looks for references to these globals and possibly uses of non-strict mode syntax like with or HTML comments, and that might be a useful companion to this method. But I wouldn’t combine them. There’s value in having a method that does exactly one thing and does it fast. containsModuleSyntax doesn’t even parse the source code, it only tokenizes; and it breaks as soon as an import or export statement token is encountered. Since the current behavior of Node is to default to running all code as CommonJS, it will be useful to have a way to very quickly do “if definitively ESM, run as ESM; else run as CommonJS” for the tools that want to do so.

@devsnek
Copy link
Member

devsnek commented May 22, 2019

There’s no one ideal algorithm that will work for all scenarios.

that's a pretty good argument to leave it out of core for now. perhaps you could develop a module with lots of algorithms and such that can be merged into core in the future?

@sendilkumarn
Copy link

Yeah, I also think adding it as an npm module makes more sense at this moment rather than having this in the core.

@SMotaal
Copy link

SMotaal commented May 23, 2019

Thinking of this long enough — I must admit it is neither a core thing nor just another npm thing — Some things are just to intertwined to specific features of core to a point where it creates too much noise when forced to always play catch up in the wild.

So in my opinion — this needs to have:

  1. Steady core support — ie any aspects used from core in the detection.

To be clear, not an actual API, just what an API like this one needs to get the job done without ugly package soup.

  1. A clear procedural document in core — ie how to detect.

To the point, ESM where otherwise not unambiguous is a source text that meets the following criteria… etc.

  1. A recommended npm package — while maintained.

Opportunity to transition this out right here :)

Last thing we need is a popularity contest on how a package seems to magically detect ESM modules only to discover it just drops safe guards to do so — that is if we are serious about having an auto detect mode (assuming it is a loader extension at some point, yes?).

@addaleax
Copy link
Member

addaleax commented Jun 3, 2019

Does the API here lock us into shipping with a parser module like acorn? If so, I’d also really prefer this to be an npm module.

@SMotaal
Copy link

SMotaal commented Jun 3, 2019

npm module

@addaleax is there precedence for @nodejs npm modules? I'd really like this to have the benefit of being a Node.js backed utility so that it always meets the bar.

@addaleax
Copy link
Member

addaleax commented Jun 3, 2019

@SMotaal If I recall correctly, we own the @nodejs namespace on npm, so if we wanted to, we could do it, but I’m not sure if this is really a case where it’s necessary to have something “official” – it seems like this module would not typically require an unusual amount of maintenance?

@SMotaal
Copy link

SMotaal commented Jun 3, 2019

a parser module like acorn

@addaleax I raised this a few months back, and the rational at the moment was acorn ships because of REPL, which I must point out was already causing me issues, but reason still dictated that since it is there already, that was a different problem space to work out, and that was more appropriate than adding yet another parser to the mix.

I am working on that other one regardless in unrelated efforts, looking for better ways to avoid AST parse reporting errors artificially when they are not.

@SMotaal
Copy link

SMotaal commented Jun 3, 2019

have something “official”

I think that this is a very fine grain special case — when people implement loaders, they will aim for convenience, they will sell anything goes mentality and claim that "we the mojo will always detect CJS vs ESM formats for you" which is in very high demand these days.

I'd like this problem to be solved by an "official" solution so that people do not solve this in ways that overload the loader or make it too hard for security aspects that they fail to properly predict what code will be loaded before it is actually loaded.

(sorry for the edits — it is annoying but takes my me time to catch my mistakes)

// writing, Acorn doesn't support import() expressions as they are only Stage
// 3; yet Node already supports them.
const acorn = require('internal/deps/acorn/acorn/dist/acorn');
source = stripShebang(source);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the correct thing to use now is stripShebangOrBOM if you know that you're parsing cjs. If you're parsing ESM, you should only use stripShebang. This obviously presents a problem, and I don't have a solution for it.

Copy link
Member Author

@GeoffreyBooth GeoffreyBooth Jun 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stripShebangOrBOM was removed in 702331b, because the V8 parser handles stripping shebangs/hashbangs now. Only stripBOM remains.

This is a problem, as Acorn doesn’t strip hashbangs. If I remove stripShebang from my method, the following throws:

containsModuleSyntax('#!/usr/bin/env node\nimport "./file.js"');

When it should return true. I could resurrect stripShebangOrBOM for this method, I suppose?

I guess this is the problem with relying on Acorn for parsing in certain places (like the REPL) and V8 everywhere else.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, Acorn has an allowHashBang option, and it’s supported by its tokenizer. All is well. I rebased from the latest master and added a test with a shebang line.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GeoffreyBooth can we maybe have a preflight checklist re making sure that acorn configs/plugins... etc remain effective for this feature?

I'm thinking that if it is used sparsely, we might want to sanely keep track of dependencies so that updates don't break features like this.

@benjamingr
Copy link
Member

I've opened an issue to discuss vendoring this and other packages as NPM modules as @addaleax and @SMotaal discussed above.

@GeoffreyBooth GeoffreyBooth force-pushed the detect-import-export branch from 695d780 to a8a7776 Compare June 5, 2019 06:13
@GeoffreyBooth
Copy link
Member Author

Does the API here lock us into shipping with a parser module like acorn?

Yes, but Node core already uses Acorn in the REPL and in assert. I assume that refactoring out Node’s dependency on Acorn from those places would probably entail some deprecations, and this method would just be one of several deprecations that would happen. I think it’s more likely that Acorn would stick around until we can somehow use V8 as our parser, in which case it can support this API along with the REPL and assert.

@targos
Copy link
Member

targos commented Jun 5, 2019

Does the API here lock us into shipping with a parser module like acorn?

Yes, but Node core already uses Acorn in the REPL and in assert. I assume that refactoring out Node’s dependency on Acorn from those places would probably entail some deprecations, and this method would just be one of several deprecations that would happen. I think it’s more likely that Acorn would stick around until we can somehow use V8 as our parser, in which case it can support this API along with the REPL and assert.

As long as this doesn't expose anything specific to Acorn (such as error messages or objects returned by the library), I'm fine with it.

@GeoffreyBooth
Copy link
Member Author

As long as this doesn’t expose anything specific to Acorn (such as error messages or objects returned by the library), I’m fine with it.

It doesn’t. The method either returns true or false or throws whatever exception is thrown by new vm.Script(source, { displayErrors: true });.

@GeoffreyBooth
Copy link
Member Author

along with allowHashBang this will let a file with \uFEFF#! in it through, which is incorrect for cjs, and we don’t even strip the bom for esm.

@devsnek Okay, I removed stripBOM.

@GeoffreyBooth GeoffreyBooth force-pushed the detect-import-export branch from a8a7776 to 750d617 Compare June 5, 2019 19:54
@devsnek
Copy link
Member

devsnek commented Jun 5, 2019

@GeoffreyBooth but we do strip bom for cjs.

@GeoffreyBooth GeoffreyBooth force-pushed the detect-import-export branch from 750d617 to 0d7db35 Compare June 5, 2019 22:55
@GeoffreyBooth
Copy link
Member Author

GeoffreyBooth commented Jun 5, 2019

@GeoffreyBooth but we do strip bom for cjs.

@devsnek Okay, then we should keep stripBOM. The BOM might confuse Acorn, so we can and should remove it. Whether or not source code has a BOM is irrelevant to whether or not the source contains module syntax.

Whether or not the source code is runnable is outside the scope of what this method is aiming to tell you. If a string has a BOM and also has module syntax, then this method should return true, even if Node chokes on the BOM when it tries to run the file. This method isn’t aiming to tell you if Node can successfully parse or evaluate the source, just whether or not it contains module syntax.

@GeoffreyBooth
Copy link
Member Author

along with allowHashBang this will let a file with \uFEFF#! in it through, which is incorrect for cjs, and we don't even strip the bom for esm.

@devsnek Whether or not source code contains a BOM or a hashbang is irrelevant to whether or not the source contains module syntax. For the purposes of this method we can strip out anything we want that's not an import or export statement, and we have to strip out anything that causes Acorn to choke when it shouldn't (like a hashbang). So I think the current version is correct. Do you mind lifting your block?

@devsnek
Copy link
Member

devsnek commented Jun 13, 2019

@GeoffreyBooth it's a syntactic difference for esm and cjs. Maybe you could explicitly sniff it instead of stripping it?

@GeoffreyBooth
Copy link
Member Author

@GeoffreyBooth it’s a syntactic difference for esm and cjs. Maybe you could explicitly sniff it instead of stripping it?

I’m not sure what you’re asking for here. If a BOM is permissible in ESM, and therefore ESM may or may not contain a BOM, then sniffing it doesn’t tell us anything about whether code contains module syntax.

The stripping, of BOM or of hashbangs, is only to prevent Acorn from throwing an exception on code that should be parseable. Now that hashbangs are permitted in JavaScript, Acorn shouldn’t require a special option to strip them—but it does. Now that BOMs are permitted in JavaScript, Acorn shouldn’t require us to strip them preemptively—but we need to. These edits are just to work around Acorn’s limitations.

@SMotaal
Copy link

SMotaal commented Jun 13, 2019

it's a syntactic difference for esm and cjs

@devsnek can you elaborate what you mean, would:

  1. BOM presence negate the outcome for a source text that is otherwise ESM?
  2. BOM absence negate the outcome for a source text that is otherwise not ESM?

I am almost certain we're not considering BOM a disambiguating aspect here so my apologies for needing more clarification please.

@ljharb
Copy link
Member

ljharb commented Jun 13, 2019

(note that hashbangs are stage 3, so not yet permitted in JS)

@devsnek
Copy link
Member

devsnek commented Jun 13, 2019

what i'm saying is we don't strip bom for source text modules, so we can use the presence of bom as another indicator for cjs.

@GeoffreyBooth
Copy link
Member Author

what i’m saying is we don’t strip bom for source text modules, so we can use the presence of bom as another indicator for cjs.

This method doesn’t look for indicators of CommonJS. It only looks for module syntax.

@GeoffreyBooth
Copy link
Member Author

Thanks @devsnek.

Can someone please trigger a build? The last build failed on some crypto tests that have nothing to do with this PR.

@nodejs-github-bot

This comment has been minimized.

@nodejs-github-bot
Copy link
Collaborator

@BridgeAR
Copy link
Member

Should this stay open?

@GeoffreyBooth
Copy link
Member Author

Technically it's still on the modules roadmap, and there's no alternative yet. Until we discuss and agree to table it, I think this stays open.

@GeoffreyBooth
Copy link
Member Author

Closing per nodejs/modules#465.

@ljharb ljharb deleted the detect-import-export branch January 23, 2020 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. module Issues and PRs related to the module subsystem. semver-minor PRs that contain new features and should be released in the next minor version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.