index.html

<!DOCTYPE HTML>
<!--
	Editorial by HTML5 UP
	html5up.net | @ajlkn
	Free for personal and commercial use under the CCA 3.0 license (html5up.net/license)
-->
<html>
	<head>
		<title>Ultra Dual-Path Compression For Joint Echo Cancellation and Noise Suppression</title>
		<meta charset="utf-8" />
		<meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no" />
		<link rel="stylesheet" href="assets/css/main.css" />
	</head>
	<body class="is-preload">

		<!-- Wrapper -->
			<div id="wrapper">

				<!-- Main -->
					<div id="main">
						<div class="inner">
							<!-- Banner -->
								<section id="banner">
									<div class="content">
										<header>
											<h2>Ultra Dual-Path Compression For Joint Echo Cancellation and Noise Suppression</h2>
											<p>Accepted by Interspeech 2023</p>
											<p>Authors: Hangting Chen, Jianwei Yu, Yi Luo, Rongzhi Gu, Weihua Li, Zhuocheng Lu, Chao Weng</p>
											<p>Tencent AI Lab, Audio and Speech Signal Processing Oteam</p>
											<p>Email: erichtchen@tencent.com</p>
										</header>
										<p>Abstract:
										 Echo cancellation and noise reduction are essential for full-duplex communication, yet most existing neural networks have high computational costs and are inflexible in tuning model complexity. 
										 In this paper, we introduce time-frequency dual-path compression to achieve a wide range of compression ratios on computational cost. 
										 Specifically, for frequency compression, trainable filters are used to replace manually designed filters for dimension reduction. 
										 For time compression, only using frame skipped prediction causes large performance degradation, which can be alleviated by a post-processing network with full sequence modeling. 
										 We have found that under fixed compression ratios, dual-path compression combining both the time and frequency methods will give further performance improvement, covering compression ratios from 4x to 32x with little model size change. 
										 Moreover, the proposed models show competitive performance compared with fast FullSubNet and DeepFilterNet.
										</p>
										<p><a href=/~https://github.com/tencent-ailab/UltraDualPathCompression>Core source code is available.</a></p>
										<!--
										<ul class="actions">
											<li><a href="#" class="button big">Learn More</a></li>
										</ul>-->
									</div>
								</section>
								
								<section id="poster">
									<h2>Poster</h2>
									<tr style="border-top:1px solid black">
										<td><img id="poster_fig" alt="" width="100%" src="images/poster.png"></td>
									</tr>
								</section>

							<!-- Section -->
								<section id="details">
									<header class="major">
										<h2>A Quick Review</h2>
									</header>
									<div class="features">
										<article>
											<div class="content">
												<h3>Background</h3>
												<p>Model performance relates to (1) architecture and (2) # params. & computational cost. 
												When the architecture is determined, the # params as well as the computational cost can be tuned by hyperparameters. </p>
												<p>AEC&ANS for full-duplex communication needs low computational cost when deployed on PC/Phones.
												Thus, an important research topic is to lower computational cost while lowering performance degradation.</p>
											</div>
										</article>
										<article>
											<div class="content">
												<h3>Experimental setup</h3>
												<p>Models are trained on a simulated dataset with 100K iterations. Though the model performance can be further improved with a larger iteration number, we adopt such an iteration number for fast comparison.  </p>
												<p>Models are tested on a semi-simulated dataset, where the echos are real-recorded ones from the AEC challenge.</p>
												<p>Models have a window size and a hop size of 20ms and 10ms, respectively. Additional look ahead is not allowed due to causal processing.</p>
											</div>
										</article>
										<article>
											<div class="content">
												<h3>Framework & Method</h3>
												<p>The whole framework consists of a linear AEC (Kalman-filter based), an online DPT-FSNet, compression and decompression modules and a post network.</p>
												<p>Compression and Decompression are operated on the input spectra and the output features, respectively. 
													In detail, the compression module compresses the input spectra first along the time axis and then the frequency using linear transform. The time compression uses the past and the current frames while the frequency compression is conducted with the Mel scale.
													The decompression module decompresses the output feature first along frequency and then the time axis using linear transform. The time decompression predicts the current feature and copies it for future frames.
													A post network (1-layer GRU) is needed to eliminate the degradation caused by the time compression.
												</p>
											</div>
										</article>
										<article>
											<div class="content">
												<h3>Results</h3>
												<p>1. Time compression needs an ultra-light post-net.</p>
												<p>2. Dual-path compression performs better than single-path compression.
												For example, from 1.8G MACs/s to 140M MACs/s, the model achieves a 13x compression ratio but only has 0.22 PESQ drop, 0.16 and 0.14 PESQ higher than single-path compression.</p>
											</div>
										</article>
									</div>
								</section>

								<section id="results_table">
									<h2>Detailed results</h2>
									<tr style="border-top:1px solid black">
										<td><img id="results_fig" alt="" width="75%" src="images/results.png"></td>
									</tr>
								</section>

							<!-- Section -->
								<section id="dt">
									<header class="major">
										<h2>Double talk</h2>
									</header>
									<div class="posts">
										<article>
											<table cellspacing="50" style="width: 50%">
												<thead>
													<tr>
														<th><center>MIC</center> </th>
														<th><center>LPB(REF)</center> </th>
														<th><center>Uncompressed model</center> </th>
														<th><center>Dual-path</center> </th>
														<th><center>Dual-path</center> </th>
													</tr>
													<tr style="font-size: small">
														<th>&nbsp;</th>
														<th>&nbsp; </th>
														<th>&nbsp;</center></th>
														<th><center>T2 x F2</center></th>
														<th><center>T4 x F4</center></th>
													</tr>
													<tr style="font-size: small">
														<th>&nbsp;</th>
														<th>&nbsp; </th>
														<th><center>MACs: 1.8G/s</center></th>
														<th><center>MACs: 486M/s</center></th>
														<th><center>MACs: 140M/s</center></th>
													</tr>
												</thead>
												<tbody>
													<tr style="border-top:1px solid black">
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP15_SNRP99_0000001.wav.mix.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP15_SNRP99_0000001.wav.lpb.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP15_SNRP99_0000001.wav.est.1.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP15_SNRP99_0000001.wav.est.4.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP15_SNRP99_0000001.wav.est.16.wav" ></audio></td>
													</tr>
													<tr style="border-top:1px solid black">
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP05_SNRP15_0000001.wav.mix.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP05_SNRP15_0000001.wav.lpb.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP05_SNRP15_0000001.wav.est.1.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP05_SNRP15_0000001.wav.est.4.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP05_SNRP15_0000001.wav.est.16.wav" ></audio></td>
													</tr>
													<tr style="border-top:1px solid black">
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP05_SNRP99_0000001.wav.mix.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP05_SNRP99_0000001.wav.lpb.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP05_SNRP99_0000001.wav.est.1.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP05_SNRP99_0000001.wav.est.4.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP05_SNRP99_0000001.wav.est.16.wav" ></audio></td>
													</tr>
													<tr style="border-top:1px solid black">
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRN05_SNRP15_0000001.wav.mix.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRN05_SNRP15_0000001.wav.lpb.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRN05_SNRP15_0000001.wav.est.1.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRN05_SNRP15_0000001.wav.est.4.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRN05_SNRP15_0000001.wav.est.16.wav" ></audio></td>
													</tr>
													<tr style="border-top:1px solid black">
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRN05_SNRP99_0000001.wav.mix.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRN05_SNRP99_0000001.wav.lpb.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRN05_SNRP99_0000001.wav.est.1.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRN05_SNRP99_0000001.wav.est.4.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRN05_SNRP99_0000001.wav.est.16.wav" ></audio></td>
													</tr>
												</tbody>
											</table>

											<!--<ul class="actions">
												<li><a href="#" class="button">More</a></li>
											</ul>-->
										</article>
									</div>
								</section>
								
							<!-- Section -->
								<section id="nest">
									<header class="major">
										<h2>Near-end single talk</h2>
									</header>
									<div class="posts">
										<article>
											<!--a href="#" class="image"><img src="images/pic02.jpg" alt="" /></--a>-->
											<table cellspacing="50" style="width: 50%">
												<thead>
													<tr>
														<th><center>MIC</center> </th>
														<th><center>Uncompressed model</center> </th>
														<th><center>Dual-path</center> </th>
														<th><center>Dual-path</center> </th>
													</tr>
													<tr style="font-size: small">
														<th>&nbsp; </th>
														<th>&nbsp;</center></th>
														<th><center>T2 x F2</center></th>
														<th><center>T4 x F4</center></th>
													</tr>
													<tr style="font-size: small">
														<th>&nbsp; </th>
														<th><center>MACs: 1.8G/s</center></th>
														<th><center>MACs: 486M/s</center></th>
														<th><center>MACs: 140M/s</center></th>
													</tr>
												</thead>
												<tbody>
													<tr style="border-top:1px solid black">
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP99_SNRP05_0000000.wav.mix.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP99_SNRP05_0000000.wav.est.1.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP99_SNRP05_0000000.wav.est.4.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP99_SNRP05_0000000.wav.est.16.wav" ></audio></td>
													</tr>
													<tr style="border-top:1px solid black">
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP99_SNRP05_0000001.wav.mix.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP99_SNRP05_0000001.wav.est.1.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP99_SNRP05_0000001.wav.est.4.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP99_SNRP05_0000001.wav.est.16.wav" ></audio></td>
													</tr>
													<tr style="border-top:1px solid black">
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP99_SNRP15_0000000.wav.mix.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP99_SNRP15_0000000.wav.est.1.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP99_SNRP15_0000000.wav.est.4.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP99_SNRP15_0000000.wav.est.16.wav" ></audio></td>
													</tr>
													<tr style="border-top:1px solid black">
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP99_SNRP15_0000001.wav.mix.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP99_SNRP15_0000001.wav.est.1.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP99_SNRP15_0000001.wav.est.4.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRP99_SNRP15_0000001.wav.est.16.wav" ></audio></td>
													</tr>
												</tbody>
											</table>

											<!--<ul class="actions">
												<li><a href="#" class="button">More</a></li>
											</ul>-->
										</article>
									</div>
								</section>

							<!-- Section -->
								<section id="fest">
									<header class="major">
										<h2>Far-end single talk</h2>
									</header>
									<div class="posts">
										<article>
											<table cellspacing="50" style="width: 50%">
												<thead>
													<tr>
														<th><center>MIC</center> </th>
														<th><center>LPB(REF)</center> </th>
														<th><center>Uncompressed model</center> </th>
														<th><center>Dual-path</center> </th>
														<th><center>Dual-path</center> </th>
													</tr>
													<tr style="font-size: small">
														<th>&nbsp;</th>
														<th>&nbsp; </th>
														<th>&nbsp;</center></th>
														<th><center>T2 x F2</center></th>
														<th><center>T4 x F4</center></th>
													</tr>
													<tr style="font-size: small">
														<th>&nbsp;</th>
														<th>&nbsp; </th>
														<th><center>MACs: 1.8G/s</center></th>
														<th><center>MACs: 486M/s</center></th>
														<th><center>MACs: 140M/s</center></th>
													</tr>
												</thead>
												<tbody>
													<tr style="border-top:1px solid black">
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRN99_SNRP99_0000000.wav.mix.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRN99_SNRP99_0000000.wav.lpb.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRN99_SNRP99_0000000.wav.est.1.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRN99_SNRP99_0000000.wav.est.4.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRN99_SNRP99_0000000.wav.est.16.wav" ></audio></td>
													</tr>
													<tr style="border-top:1px solid black">
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRN99_SNRP99_0000001.wav.mix.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRN99_SNRP99_0000001.wav.lpb.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRN99_SNRP99_0000001.wav.est.1.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRN99_SNRP99_0000001.wav.est.4.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRN99_SNRP99_0000001.wav.est.16.wav" ></audio></td>
													</tr>
													<tr style="border-top:1px solid black">
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRN99_SNRP99_0000002.wav.mix.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRN99_SNRP99_0000002.wav.lpb.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRN99_SNRP99_0000002.wav.est.1.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRN99_SNRP99_0000002.wav.est.4.wav" ></audio></td>
														<td><audio controls class="audio-player" preload="metadata" style="width:180px;">
														  <source src="demos_audios/test_SIRN99_SNRP99_0000002.wav.est.16.wav" ></audio></td>
													</tr>
												</tbody>
											</table>
											<!--<ul class="actions">
												<li><a href="#" class="button">More</a></li>
											</ul>-->
										</article>
									</div>
								</section>

							<!-- Section -->
								<section id="rtf_test">
									<header class="major">
										<h2>Real-time factor (RTF) test</h2>
									</header>
									<div class="posts">
										<article>
											<table cellspacing="50" style="width: 100%">
												<thead>
													<tr>
														<th> RTF test on Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz using a model with 13x compression rato (T4 x F4) </th>
													</tr>
													<tr>
														<th> The test is carried with frame-by-frame process to mimic real-world online processing.</th>
													</tr>
												</thead>
												<tbody>
													<tr style="border-top:1px solid black">
														<td><video id="rtf_test_video" width="150%" controls>
														  <source src="video/rtf_test.webm" ></video></td>
													</tr>
													
												</tbody>
											</table>
											<!--<ul class="actions">
												<li><a href="#" class="button">More</a></li>
											</ul>-->
										</article>
									</div>
								</section>

							<!-- Section -->
							<section>
								<header class="major">
									<h2>Related works</h2>
								</header>
								<p>[1] F. Dang, H. Chen, and P. Zhang, 
									“Dpt-fsnet: Dual-path transformer based full-band and sub-band fusion network for speech enhancement,” 
									in 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022. IEEE, 202</p>
								<p>[2] S. Zhang, Z. Wang, J. Sun, Y. Fu, B. Tian, Q. Fu, and L. Xie,
									“Multi-task deep residual echo suppression with echo-aware loss,” 
									in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022. IEEE, 2022, pp. 9127–9
								</p>
								<p>[3] X. Hao and X. Li, 
									“Fast fullsubnet: Accelerate full-band and subband fusion model for single-channel speech enhancement,” arXiv preprint arXiv:2212.09019, 2022
								</p>
							</section>
						</div>
					</div>


				<!-- Sidebar -->
					<div id="sidebar">
						<div class="inner">
							<!-- Search
								<section id="search" class="alt">
									<form method="post" action="#">
										<input type="text" name="query" id="query" placeholder="Search" />
									</form>
								</section>
							-->
							<!-- Menu -->
								<nav id="menu">
									<header class="major">
										<h2>Menu</h2>
									</header>
									<ul>
										<li><a href="#banner">Abstract</a></li>
										<li><a href="#poster">Poster</a></li>
										<li><a href="#details">A Quick Review</a></li>
										<li><a href="#results_table">Detailed results</a></li>
										<li>
											<span class="opener">Audio samples</span>
											<ul>
												<li><a href="#dt">Double talk</a></li>
												<li><a href="#nest">Near-end single talk</a></li>
												<li><a href="#fest">Far-end single talk</a></li>
											</ul>
										</li>
										<li><a href="#rtf_test">Real-time factor (RTF) test</a></li>
									</ul>
								</nav>


								<footer id="footer">
									<p>
										<center>Created by Hangting Chen </center>
									</p>
									<p>
										<center>Augest 13th, 2023</center>
									</p>
									<p class="copyright" style="display: flex; flex-direction: column; align-items: center; justify-content: center;">
										<img src="images/tencent_icon.png" width="50%" alt="">
									</p>

								</footer>
						</div>
					</div>

			</div>

		<!-- Scripts -->
			<script src="assets/js/jquery.min.js"></script>
			<script src="assets/js/browser.min.js"></script>
			<script src="assets/js/breakpoints.min.js"></script>
			<script src="assets/js/util.js"></script>
			<script src="assets/js/main.js"></script>

	</body>
</html>