Acoustic Echo Cancellation¶
Acoustic echo cancellation (AEC) techniques themselves are well-developed; see [HAY02] [TAS09] [HS04] for an overview and practical algorithms. However, there are still research activities such as an efficient combination method of AEC and multi-channel processing [HNK05] [TAS09] and residual echo suppression due to the non-linear distortion [FF18] [CSVH18]. Generally speaking, the use of the AEC before beamforming provides better speech enhancement performance at the expense of computational complexity [KEL97]. The BTK provides the NLMS AEC [HAY02] [TAS09] [HS04]. The BTK also implements the subband AEC with Kalman filters [MCKR+11] [MKR11] rather than the full band AEC [EV06] [FF18].
Listing 3 shows an example AEC script, unit_test/test_subband_aec.py in the Git repository. As shown in Listing 3, we first need to import the AEC Python module. We then set the analysis filterbank feature pointer object to an AEC object. The AEC feature instance is finally given to the synthesis filter bank in order to reconstruct the echo-suppressed signal.
Kalman Filtering AEC¶
By default, the sample script described in Listing 3 performs Kalman filtering-based AEC in the subband domain. To try it, run unit_test/test_subband_aec.py in the Git repository as
$ cd ${your_btk_git_repository}/unit_test
$ python test_subband_aec.py \
-i data/speech_and_reverb_lt.wav \
-p data/lt.wav \
-o out/kf_aec_output.wav
Notice that the script uses the Nyquist(M) filter bank with M=256, m=4 and r=1, by default. You can see how well the Kalman filtering-based AEC can suppress the interfering music signal by comparing the input file (data/speech_and_reverb_lt.wav) with the output file (out/kf_aec_output.wav).
As specified from line 167 to to 177 in Listing 3, Kalman filtering-based AEC has a couple of hyper-parameters. The sensitivity of the the Kalman filtering AEC on recognition performance was investigated in [MCKR+11] [MKR11].
NLMS AEC¶
The BTK also provides the simplest AEC method based on the normalized least mean square (NMLS) algorithm as a reference. To do it, we just need to specify the NLMS option in a JSON file as
{ "type":"nlms",
"energy_threshold":100,
"delta":100,
"epsilon":10E-4}
Then, we feed the JSON file to unit_test/test_subband_aec.py:
$ cd ${your_btk_git_repository}/unit_test
$ python test_subband_aec.py \
-c nlms.json \
-i data/speech_and_reverb_lt.wav \
-p data/lt.wav \
-o out/kf_aec_output.wav
In practice, the NLMS-based AEC methods need additional components to avoid diverging a filter estimate such as accurate double-talk detection or meticulous step size parameter setting. Kalman filters will not require such heuristic methods since the step size parameter can be adjusted in a statistical framework; see Chapter 10 in [HAY02] for the details of the difference.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 | #!/usr/bin/python
"""
Test subband acoustic echo cancellation on the single channel data.
.. moduleauthor:: John McDonough, Kenichi Kumatani <k_kumatani@ieee.org>
"""
import argparse, json
import os.path
import pickle
import wave
import sys
import numpy
from btk20.common import *
from btk20.stream import *
from btk20.feature import *
from btk20.modulated import *
from btk20.aec import *
def test_subband_aec(analysis_filter_path,
synthesis_filter_path,
M, m, r,
input_audio_path,
reference_audio_path,
out_path,
aec_conf,
samplerate=16000):
D = M / 2**r # frame shift
# Read analysis prototype 'h'
with open(analysis_filter_path, 'r') as fp:
h_fb = pickle.load(fp)
# Read synthesis prototype 'g'
with open(synthesis_filter_path, 'r') as fp:
g_fb = pickle.load(fp)
# Instantiation of an audio file reader
input_sample_feat = SampleFeaturePtr(block_len = D, shift_len = D, pad_zeros = True)
reference_sample_feat = SampleFeaturePtr(block_len = D, shift_len = D, pad_zeros = True)
# Instantiation of over-sampled DFT analysis filter bank
input_afb = OverSampledDFTAnalysisBankPtr(input_sample_feat, prototype = h_fb, M = M, m = m, r = r, delay_compensation_type=2)
reference_afb = OverSampledDFTAnalysisBankPtr(reference_sample_feat, prototype = h_fb, M = M, m = m, r = r, delay_compensation_type=2)
# Instantiation of subband AEC
if aec_conf['type'].lower() == 'information_filter':
# Information Kalman filter AEC
aec = InformationFilterEchoCancellationFeaturePtr(reference_afb, input_afb,
sample_num = aec_conf.get('filter_length', 2),
beta = aec_conf.get('beta', 0.95),
sigmau2 = aec_conf.get('sigmau2', 10E-4),
sigmak2 = aec_conf.get('sigmak2', 5.0),
snr_threshold = aec_conf.get('snr_threshold', 0.01),
energy_threshold = aec_conf.get('energy_threshold', 100),
smooth = aec_conf.get('smooth', 0.9),
loading = aec_conf.get('loading', 1.0E-02),
amp4play = aec_conf.get('amp4play', 1.0))
elif aec_conf['type'].lower() == 'square_root_information_filter':
# Square root information filter
aec = SquareRootInformationFilterEchoCancellationFeaturePtr(reference_afb, input_afb,
sample_num = aec_conf.get('filter_length', 2),
beta = aec_conf.get('beta', 0.95),
sigmau2 = aec_conf.get('sigmau2', 10E-4),
sigmak2 = aec_conf.get('sigmak2', 5.0),
snr_threshold = aec_conf.get('snr_threshold', 0.01),
energy_threshold = aec_conf.get('energy_threshold', 100),
smooth = aec_conf.get('smooth', 0.9),
loading = aec_conf.get('loading', 1.0E-02),
amp4play = aec_conf.get('amp4play', 1.0))
elif aec_conf['type'].lower() == 'dtd_block_kalman_filter':
# Kalman filtering AEC with double-talk detector (DTD)
aec = DTDBlockKalmanFilterEchoCancellationFeaturePtr(reference_afb, input_afb,
sample_num = aec_conf.get('filter_length', 2),
beta = aec_conf.get('beta', 0.95),
sigmau2 = aec_conf.get('sigmau2', 10E-4),
sigmak2 = aec_conf.get('sigmak2', 5.0),
snr_threshold = aec_conf.get('snr_threshold', 0.01),
energy_threshold = aec_conf.get('energy_threshold', 100),
smooth = aec_conf.get('smooth', 0.9),
amp4play = aec_conf.get('amp4play', 1.0))
elif aec_conf['type'].lower() == 'nlms':
# Normalized least mean square AEC
aec = NLMSAcousticEchoCancellationFeaturePtr(reference_afb, input_afb,
delta = aec_conf.get('delta', 100.0),
epsilon = aec_conf.get('epsilon', 1.0E-04),
threshold = aec_conf.get('energy_threshold', 100.0))
else:
raise KeyError('Invalid AEC type {}'.format(aec_conf['type']))
# Instantiation of over-sampled DFT synthesis filter bank
sfb = OverSampledDFTSynthesisBankPtr(aec, prototype = g_fb, M = M, m = m, r = r, delay_compensation_type=2)
# Read the observed audio file
input_sample_feat.read(input_audio_path, samplerate)
# Read the reference audio file
reference_sample_feat.read(reference_audio_path, samplerate)
if not os.path.exists(os.path.dirname(out_path)):
try:
os.makedirs(os.path.dirname(out_path))
except:
pass
wavefile = wave.open(out_path, 'w')
wavefile.setnchannels(1)
wavefile.setsampwidth(2)
wavefile.setframerate(int(samplerate))
# Perform subband AEC through the oversampled DFT-modulated filer bank
for frame_no, b in enumerate(sfb):
if frame_no % 128 == 0:
print('%0.2f sec. processed' %(frame_no * D / samplerate))
storewave = numpy.array(b, numpy.int16)
wavefile.writeframes(storewave.tostring())
wavefile.close()
def build_parser():
M = 256
m = 4
r = 1
protoPath = 'prototype.ny'
analysis_filter_path = '%s/h-M%d-m%d-r%d.pickle' %(protoPath, M, m, r)
synthesis_filter_path = '%s/g-M%d-m%d-r%d.pickle' %(protoPath, M, m, r)
parser = argparse.ArgumentParser(description='test subband AEC.')
parser.add_argument('-a', dest='analysis_filter_path',
default=analysis_filter_path,
help='analysis filter prototype file')
parser.add_argument('-s', dest='synthesis_filter_path',
default=synthesis_filter_path,
help='synthesis filter prototype file')
parser.add_argument('-M', dest='M',
default=M, type=int,
help='no. of subbands')
parser.add_argument('-m', dest='m',
default=m, type=int,
help='Prototype filter length factor')
parser.add_argument('-r', dest='r',
default=r, type=int,
help='Decimation factor')
parser.add_argument('-i', dest='input_audio_path',
default='data/speech_and_reverb_lt.wav',
help='observation audio file')
parser.add_argument('-o', dest='out_path',
default='out/aec_output.wav',
help='output audio file')
parser.add_argument('-p', dest='reference_audio_path',
default='data/lt.wav',
help='reference audio file')
parser.add_argument('-c', dest='aec_conf_path',
default=None,
help='JSON path for AEC configuration')
return parser
if __name__ == '__main__':
parser = build_parser()
args = parser.parse_args()
if args.aec_conf_path is None:
# Default AEC configuration
aec_conf={'type':'dtd_block_kalman_filter', # 'information_filter' or 'square_root_information_filter'
'filter_length':36, # length of the subband Kalman filter
'loading':10e-4, # diagonal loading added to the information matrix
'sigmau2':10e-6, # initial variance
'sigmak2':5.0, # initial Kalman gain
'beta':0.95, # forgetting factor recursive observation noise variance estimation
'snr_threshold':0.01,
'energy_threshold':1.0E+01,
'smooth':0.95,
'amp4play':1.0,
}
else:
with open(args.aec_conf_path, 'r') as jsonfp:
aec_conf = json.load(jsonfp)
print('AEC config.')
print(json.dumps(aec_conf, indent=4))
print('')
test_subband_aec(args.analysis_filter_path,
args.synthesis_filter_path,
args.M, args.m, args.r,
args.input_audio_path,
args.reference_audio_path,
args.out_path,
aec_conf,
samplerate=16000)
|