Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add javascript processor #1406

Closed
wants to merge 7 commits into from
Closed

Conversation

kabukky
Copy link
Contributor

@kabukky kabukky commented Aug 22, 2022

You won't remember, but a few months ago I went on discord and asked if a javascript processor is something you guys would consider merging into main.

I got a very positive reaction, so here it is!

There are a few TODOs in here that are worth discussing I think, so feel free to comment.

Here are two examples:

With inline code:

config.yaml

input:
  http_server:
    path: /test
    allowed_verbs:
      - POST
    sync_response:
      status: 200

pipeline:
  processors:
  - javascript:
      code: |
          console.log(getMeta("http_server_request_path"));
          var root = getRoot();
          root.foo = "bar";
          setRoot(root);

output:
  sync_response: {}

With .js files:

config.yaml

input:
  http_server:
    path: /test
    allowed_verbs:
      - POST
    sync_response:
      status: 200

pipeline:
  processors:
  - javascript:
      file: test.js

output:
  sync_response: {}
  processors:
    - bloblang: |
        root = "see console"

test.js

let foo = require("foo");

console.log(foo.bar());

foo.js

const bar = () => {
    return "foobar!"
}

exports.bar = bar;

I've tested this a bit and found JavaScript very helpful when complex logic is needed that is sometimes hard to read or difficult to implement in bloblang.

This is a first draft that works well for me. What do you think?

@kabukky kabukky requested a review from Jeffail as a code owner August 22, 2022 17:56
@Jeffail
Copy link
Collaborator

Jeffail commented Aug 24, 2022

Hey @kabukky, this is awesome, it might take me a little while to dig into the implementation properly but at a glance it looks great and this is going to be super powerful.

@kabukky
Copy link
Contributor Author

kabukky commented Aug 25, 2022

@Jeffail that is good to hear. Please let me know if you have any questions or suggestions. I will get right to it.

Best

@packman80
Copy link

May be add v8go (/~https://github.com/rogchap/v8go) support instead goja?. It's binding for v8 and it's much faster than goja (it's used in current implementation).
From comment: dop251/goja#2 (comment)

function factorial(n) {
    return n === 1 ? n : n * factorial(--n);
}

var i = 0;

while (i++ < 1e6) {
    factorial(10);
}

The execution times roughly were:
otto: 33.195s
goja: 3.937s
duktape: 1.545s
v8 (go binding): 0.309s
v8 native (d8): 0.187s

In that test v8go x10 times faster than goja.

@kabukky
Copy link
Contributor Author

kabukky commented Aug 27, 2022

@packman80 I'm not sure the performance benefit is worth the hassle of using v8go.
v8go would add dependencies to Cgo and V8 binaries to Benthos. And as far as I can see, v8go only brings binaries for Linux and macOS.

A Go native JavaScript VM is much easier to maintain in my opinion. But if the maintainers think otherwise, I will change this to another Go JS library, no problem.

func createRequireRegistry() {
// Initialize global registry. This is where modules (JS files) live. This enables easy code re-use.
// TODO: We need to set a registry root folder. Can we make this configurable somehow? Setting it to the working directory for now.
requireRegistry = require.NewRegistry(require.WithGlobalFolders(""))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @kabukky, I still need to put time aside to properly play around with this, but I think the major question I have is whether the registry here could be instanciated per-processor, and how bad the performance hit would be in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Jeffail, thanks for looking into this! The performance overhead would be minimal as the require.NewRegistrycall just instantiates a new Struct. I'll change my code.

@darcyg
Copy link

darcyg commented Oct 25, 2022

@kabukky
Great feature.
Can I directly read and write the cache?
Or directly access data from other components with label names.
It would be even better if you could call output directly.
When I need mqtt.publish, the current output mechanism is too cumbersome.
If you can use outputs.getCompentsByLabel(labelname).asMqtt.publish(json) directly
That would be great.

@kabukky
Copy link
Contributor Author

kabukky commented Oct 25, 2022

@kabukky Great feature. Can I directly read and write the cache? Or directly access data from other components with label names. It would be even better if you could call output directly. When I need mqtt.publish, the current output mechanism is too cumbersome. If you can use outputs.getCompentsByLabel(labelname).asMqtt.publish(json) directly That would be great.

@darcyg That's not possible atm.
@Jeffail Is that something that fits into the design principle of Benthos?

@darcyg
Copy link

darcyg commented Oct 25, 2022

@darcyg That's not possible atm. @Jeffail Is that something that fits into the design principle of Benthos?

I think
It's not a big problem if you use a scripting engine in your outputs.
Using javascript for conditional branching can simplify configuration logic
Especially the branch configuration of switch:case, when there are nested branches.
My ambition is to implement a little more complicated logic.
Especially conditional output, or data read and write.
yaml is too complicated to configure.
I had to do it with multiple streams.

Of course I use benthos as a rule engine.
Not exactly the idea of a streaming engine.

@darcyg
Copy link

darcyg commented Nov 2, 2022

@kennyp
I redid the javascript patch on the latest benthos (4.10).
No interface with v4/public/service
The interface under v4/internal/ is used
It is convenient for later reading and calling other process and output and component functions of cache
0001-process-javascript-new.patch.zip

My code is on my own gitlab server, so I submit it as a patch.

@darcyg
Copy link

darcyg commented Nov 16, 2022

@kabukky @Jeffail
Based on the previous patch files.
Add cache and output resource access

cat test.yaml

input:
  http_server:
    path: /test
    allowed_verbs:
      - POST
    sync_response:
      status: 200

pipeline:
  processors:
  - javascript:
      cache_res:
        - testa
      code: |
          console.log(getMeta("http_server_request_path")+" 1234");
          var root = getRoot();
          root.foo = getMeta("http_server_request_path")+"_1234";
          setRoot(root);
          setCache("cc1","300s");
          root.foo1 = getCache("cc1");
output:
  label: js1
  broker:
    outputs:
      - javascript:
          cache_res:
            - testa
          output_res:
            - mqtt_ores
          code: |
            console.log(getMeta("http_server_request_path")+" 5678");
            var root = getRoot();
            root.foo = getMeta("http_server_request_path")+"_5678";
            root.topic = "abc"+root.foo;
            console.log(root.foo);
            if (root.abc == "1") {
              console.log("check yes");
              benthos_output("mqtt_ores",JSON.stringify(root))
            } else {
              console.log("check no");
            }
           
      - sync_response: {}
      
cache_resources:
  - label: testa
    memory:
      default_ttl: 600s
      compaction_interval: ""
      init_values:
        cc1: bar1
        cc2: bar2
        
output_resources:
  - label: mqtt_ores
    mqtt:
      urls:
        - tcp://192.168.36.1:1883
      topic: ${! json("topic")}

call

curl -X POST http://127.0.0.1:4195/test -d '{"abc":"1"}'  # Send mqtt data, print "check yes"
curl -X POST http://127.0.0.1:4195/test -d '{"abc":"0"}'  # not Send mqtt data, print "check no"

mqtt

0003-javascript-call-cache-and-output-components.zip

@packman80
Copy link

@darcyg good job. But can you please issue it as a pull request

@darcyg
Copy link

darcyg commented Dec 9, 2022

@darcyg good job. But can you please issue it as a pull request

I will deal with it after the release of 4.11.0 a few days later. I'm busy these days.
My github account is too messy. Fork too many projects.

@kabukky
Copy link
Contributor Author

kabukky commented Jan 18, 2023

Did you get a chance to look at this?
I still think this would be the minimally invasive way of adding a JS processor.

Best

@packman80
Copy link

packman80 commented Feb 7, 2023

Hey @darcyg, any news ?

@darcyg
Copy link

darcyg commented Feb 20, 2023

Hey @darcyg, any news ?

/~https://github.com/darcyg/benthos
darcyg@0a4a670

Sorry, I rarely use github to manage projects. It is unclear how pull requests submit commit to this # 1406.

Maybe you can clone my project, modify it and submit it to the main project.

I didn't write test code. My go language is temporary learning and not proficient.

@Jeffail
Copy link
Collaborator

Jeffail commented Apr 24, 2023

Thanks @kabukky, I finally got around to working on this, I've done a pretty hefty rewrite of some bits, especially the docs/functions, but the core bits are mostly the same.

Merged via 226de1b
And 813a342

@Jeffail Jeffail closed this Apr 24, 2023
@kabukky
Copy link
Contributor Author

kabukky commented Apr 26, 2023

Thanks so much @Jeffail
I'm not sure about the VM re-use via the pool. Goja runtime re-use is tricky. See here for example: dop251/goja#291

@Jeffail
Copy link
Collaborator

Jeffail commented Apr 27, 2023

Thanks so much @Jeffail I'm not sure about the VM re-use via the pool. Goja runtime re-use is tricky. See here for example: dop251/goja#291

Yeah I had a play around with the different options, I've updated the docs to explain how the current system works with caveats. Basically if programs that utilise variables encapsulate their logic within anonymous functions then if I've understood things correctly we should be fine. I've also added examples that show it in action (the structure mutation one here: https://www.benthos.dev/docs/components/processors/javascript#examples).

If things turn ugly and it trips up users we can do some tinkering to get it right, but for now I think for the performance benefits it's worth trying this approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants