Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creation of MediaStreamTrack from raw PCM stream; Example 18 with a JSON file instead of messaging #2570

Closed
guest271314 opened this issue Aug 30, 2020 · 7 comments
Assignees
Labels

Comments

@guest271314
Copy link

To workaround Chromium refusal to support exposure and capture of monitor devices for getUserMedia() at Linux have created several workarounds /~https://github.com/guest271314/captureSystemAudio/ that, from perspective here, could be simplified.

Am able to get a ReadableStream where the value when read is raw PCM audio data, 2 channel, 44100 sample rate, (s16le).

The current version uses browser extension and Native Messaging to fetch() from localhost where output is the monitor device data passed through PHP passthru(), then due to Chromium extension messaging code not supporting transfer, converted to text, then messaged to a different origin.

There has to be a simpler way to do this.

Two options occur, though there are certainly other options that perhaps have not conceived of yet, thus this question

  1. Somehow create a MediaStreamTrack directly from raw PCM input and somehow get that MediaStreamTrack exposed at a different origin
  2. Use Native File System to wwrite and read either a single file or multiple files to accomplish ICE, offer, answer

Option 1. is probably more involved, though avoids using WebAssembly.Memory, SharedArrayBuffer, ArrayBuffer, TypedArrays, which have limitations both by default design and architechture, https://bugs.chromium.org/p/v8/issues/detail?id=7881#c60

#59: No, it does not. WebAssembly.Memory.grow() will attempt to grow all the way up to 4GB. On 32-bit systems, it is very unlikely that a large amount of contiguous memory address space is still free, so the kernel will often deny such requests. (There's no hard limit on that; in a fresh process you might be able to allocate as much as 2GB, with sufficient address space fragmentation in a long-lived process even getting 256MB might fail.)

Unrelated to memory growth itself, the limit for TypedArray length on 32-bit systems is currently (and will likely continue to be) 1GB. So while you might be able to grow a Wasm memory to be bigger than that, you won't be able to create a TypedArray spanning its entire underlying ArrayBuffer.

That limitation is observable at 32-bit systems wasmerio/wasmer-php#121 (comment) where when attempting to dynamically use WebAssembly.Memory.grow(1) when the current value (Uint8Array) plus previously written values exceed initial or current SharedArrayBuffer byteLength ultimately allocation of increased memory can not succeed; e.g., trying to capture 30 minutes of audio (which is written to memory, while memory grows dynamically) can result in only 16 minutes and 43 seconds of audio being recorded.

Option 2. is probably the simplest in this case. Somehow create a MediaStreamTrack from raw PCM input, then write and read offer and answer, and "ICE", negotiation, et al. using a local file (JSON) in order to avoid using JavaScript ArrayBuffer, SharedArrayBuffer, WebAssembly.Memory.grow(growNPages) at all.

However, to achieve that am asking the precise flow-chart of exchanges of offer and answer between two RTCPeerConnections, as the instances will be on different origins, and Chromium extension messaging could occasionally require reloading the extension, using Natuive File System to write and read the file(s) bypasses the need to use messaging which requires communication between Native Messsaging host, Chromium extension and arbitrary web page.

Can the specification be updated with an example of performing the complete necessary steps to establish peer connection for both sides of the connection, taking Example 18 https://w3c.github.io/webrtc-pc/#example-18 as the base case, using a single JSON file, or multiple files if needed, instead of signaling (messaging)?

In this case the extension code will make to send-only offer and the arbitrary web page will make the receive-only answer.

Ideally, we should be able to somehow just pass the raw PCM to a method of RTCPeerConnection, et al. and not use Web Audio API AudioWorklet or TypedArray or ArrayBuffer at all. Is that possible?

Alternatively, are there any other ways to solve this problem that am not considering?

@aboba aboba added the question label Sep 3, 2020
@youennf
Copy link
Contributor

youennf commented Sep 3, 2020

Can the specification be updated with an example

Signaling is out of scope but there are other resources and tutorial in the web that could help you.

Is that possible?

WebAudio is designed for that purpose.

@aboba aboba self-assigned this Sep 3, 2020
@aboba
Copy link
Contributor

aboba commented Sep 3, 2020

We have discussed providing mechanisms for accessing/transforming/encoding raw media, as part of WebRTC-NV. But currently, WebAudio is the most straightforward way to create a MediaStreamTrack from PCM audio.

@guest271314
Copy link
Author

WebAudio with SharedArrayBuffer or TypedArray as backing has limitations at 32-bit architectures where the stream is dynamic, has no definitive end, see WebAudio/web-audio-api-v2#97. There is no way, currently, to use a ReadableStream directly with AudioWorklet without involving some type of memory allocation, pre-allocated or dynamic. It should be possible to both pass raw PCM directly to the contructor or a method or transceiver, or create a MediaStream or MediaStreamTrack directly from ReadableStream, WritableStream or TransforStream w3c/webrtc-encoded-transform#41, without needing to use JavaScript TypedArrays and ArrayBuffer and SharedArrayBuffer.

@guest271314
Copy link
Author

Signaling is out of scope but there are other resources and tutorial in the web that could help you.

And yet there is a signaling example at the specification, yet the example is incomplete.

This is the precise case:

  • Arbitrary web page passes message to Chromium extension
  • Chromium extension connects to Native Messaging host to begin server which serves STDOUT of audio output device - not microphone, even though Chromium labels micrphone as "audiooutput"
  • Extension has permission to fetch() localhost, where the raw PCM is the response that streams system audio output until the request is aborted

In order to avoid using JavaScript TypedArray, ArrayBuffer, SharedArrayBuffer it would be preferable to create a MediaStreamTrack at the OS and some how get that to JavaScript, though that is not possible.

Instead we should be able to create a RTCPeerConnection() with raw PCM (or Opus, AV1, etc.).

WebAudio has no standardized means to accept dynamic live streams of binary data, besides AudioWorklet, and fetch() is not defined in AudioWorkletGlobalScope, so we have to use SharedArrayBuffer or TypedArray and, or Transferable Streams to dynamically stream binary input data that is indeterminate, has no definitive end.

In this case the closest we can get is using RTCPeerConnection at extension, though we still have the deficit of trying to store the input stream binary data and keep the processing ahead of the 384 calls to AudioWorkletProcessor.process() per second. A challenge at 32-bit systems.

Thus, the three related issues where, with a common interest, it is certainly possible to write out a specification to convert raw audio binary data to a MediaStreamTrack that can be passed by reference, zero-copy transfered to different context or, kindly complete the signaling example to show how to exchange the SDP in the case above.

@guest271314
Copy link
Author

Signaling is out of scope but there are other resources and tutorial in the web that could help you.

The reason that specifically asked for signaling example completion as the goal is not to use Chromium extension messaging through postMessage() at all to complete the procedure.

Instead, to avoid attempting to pass messages generated asynchronously in a synchronous messaging API, once gather the exact required flow-chart, will experiment with writing and reading SDP as JSON to a single file, in an array of objects, using Native File System at the browser and inotity-tools at the OS, to perform tasks when the single file is closed, after offer, answer; that is, the perfect negotiation using a single file with file event notification accessible at the browser.

@guest271314
Copy link
Author

@youennf

Signaling is out of scope but there are other resources and tutorial in the web that could help you.

The signaling portion of the Issue is solved.

After re-reading and manually performing the paste at /~https://github.com/fippo/paste several dozen times gather the minimal requirements.

The use case in this instance is capture and streaming of monitor devices at Chromium browser, which refuses to support capture or listing of monitor devices at Linux with getUserMedia() and enumerateDevices(), respectively.

Firefox does support capture of monitor devices at Linux.

One solution to achieve the requirement of capturing monitor device at Chromium, or at least gaining access to the device, by any means, is to perform the capture of the device at Nightly, then establish an WebRTC connection with Chromium to access the MediaStreamTrack and MediaStream created at Nightly.

The use of clipboard is not ideal. Used here to automate the procedure to the extent possible, where Chromium requires focus on the document to read and write to navigator.clipboard.

At Nightly set flags dom.events.testing.asyncClipboard and media.navigator.permission.disabled to true.

<!DOCTYPE html>

<html>
  <head>
    <meta charset="utf-8" />
  </head>
  <body>
    <script>
      (async _ => {
        const webrtc = new RTCPeerConnection({ sdpSemantics: 'unified-plan' });
        [
          'onsignalingstatechange',
          'oniceconnectionstatechange',
          'onicegatheringstatechange',
        ].forEach(event => webrtc.addEventListener(event, console.log));
        let sdp;
        webrtc.onicecandidate = async event => {
          console.log('candidate', event.candidate);
          if (!event.candidate) {
            sdp = webrtc.localDescription.sdp;
            if (sdp.indexOf('a=end-of-candidates') === -1) {
              sdp += 'a=end-of-candidates\r\n';
            }
            try {
              await navigator.clipboard.writeText(sdp);

              async function* readClipboard() {
                while (true) {
                  try {
                    await new Promise(resolve => setTimeout(resolve, 1000));
                    // dom.events.testing.asyncClipboard
                    // optionally dom.events.asyncClipboard.dataTransfer
                    const text = await navigator.clipboard.readText();
                    if (
                      text.replace(/[\n\s]+/g, '') !==
                      sdp.replace(/[\n\s]+/g, '')
                    ) {
                      sdp = text;
                      console.log({ sdp, text });
                      break;
                    }
                    yield text;
                  } catch (e) {
                    console.error(e);
                    throw e;
                  }
                }
              }
              for await (const text of readClipboard()) {
                console.log(text);
              }

              await webrtc.setRemoteDescription({ type: 'answer', sdp: sdp });
            } catch (e) {
              throw e;
            }
          }
        };
        try {
          // media.navigator.permission.disabled
          let stream = await navigator.mediaDevices.getUserMedia({
            audio: true,
          });
          const label = 'Monitor of Built-in Audio Analog Stereo';
          let [track] = stream.getAudioTracks();
          if (track.label !== label) {
            const device = (
              await navigator.mediaDevices.enumerateDevices()
            ).find(({ label: _ }) => label === _);
            const { deviceId } = device;
            console.log(device);
            track.stop();
            stream = await navigator.mediaDevices.getUserMedia({
              audio: { deviceId: { exact: deviceId } },
            });
            [track] = stream.getAudioTracks();
          }

          const sender = webrtc.addTransceiver(stream.getAudioTracks()[0], {
            streams: [stream],
            direction: 'sendonly',
          });
          const offer = await webrtc.createOffer();
          webrtc.setLocalDescription(offer);
        } catch (e) {
          throw e;
        }
      })().catch(console.error);
    </script>
  </body>
</html>

at Chromium

<!DOCTYPE html>

<html>
  <head>
    <meta charset="utf-8" />
    <style>
      body *:not(script) {
        display: block;
      }
    </style>
  </head>
  <body>
    <button id="capture">Capture system audio</button>
    <audio id="audio" autoplay controls muted></audio>

    <script>
      const audio = document.getElementById('audio');
      const capture = document.getElementById('capture');
      ['loadedmetadata', 'play', 'playing'].forEach(event =>
        audio.addEventListener(event, console.log)
      );
      const webrtc = new RTCPeerConnection({ sdpSemantics: 'unified-plan' });
      [
        'onsignalingstatechange',
        'oniceconnectionstatechange',
        'onicegatheringstatechange',
      ].forEach(event => webrtc.addEventListener(event, console.log));

      webrtc.onicecandidate = async event => {
        if (!event.candidate) {
          let sdp = webrtc.localDescription.sdp;
          if (sdp.indexOf('a=end-of-candidates') === -1) {
            sdp += 'a=end-of-candidates\r\n';
          }
          try {
            await navigator.clipboard.writeText(sdp);
          } catch (e) {
            console.error(e);
          }
        }
      };
      webrtc.ontrack = ({ transceiver, streams: [stream] }) => {
        console.log(transceiver);
        const {
          receiver: { track },
        } = transceiver;
        track.onmute = track.onunmute = e => console.log(e);
        audio.srcObject = stream;
      };
      onfocus = async _ => {
        onfocus = null;
        try {
          const sdp = await navigator.clipboard.readText();
          console.log(sdp);
          await webrtc.setRemoteDescription({ type: 'offer', sdp });
          const answer = await webrtc.createAnswer();
          webrtc.setLocalDescription(answer);
          await navigator.clipboard.writeText();
        } catch (e) {
          console.error(e);
        }
      };
    </script>
  </body>
</html>

Screenshot_2020-09-07_16-30-22

TODO: Improve signaling method. Establish RTCPeerConnection on any page at Chromium.

@aboba
Copy link
Contributor

aboba commented Nov 5, 2020

Overtaken by Events (proposals for new raw media APIs).

@aboba aboba reopened this Nov 5, 2020
@aboba aboba closed this as completed Nov 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants