Acoustic Echo Cancellation

Acoustic echo cancellation (AEC) techniques themselves are well-developed; see [HAY02] [TAS09] [HS04] for an overview and practical algorithms. However, there are still research activities such as an efficient combination method of AEC and multi-channel processing [HNK05] [TAS09] and residual echo suppression due to the non-linear distortion [FF18] [CSVH18]. Generally speaking, the use of the AEC before beamforming provides better speech enhancement performance at the expense of computational complexity [KEL97]. The BTK provides the NLMS AEC [HAY02] [TAS09] [HS04]. The BTK also implements the subband AEC with Kalman filters [MCKR+11] [MKR11] rather than the full band AEC [EV06] [FF18].

Listing 3 shows an example AEC script, unit_test/test_subband_aec.py in the Git repository. As shown in Listing 3, we first need to import the AEC Python module. We then set the analysis filterbank feature pointer object to an AEC object. The AEC feature instance is finally given to the synthesis filter bank in order to reconstruct the echo-suppressed signal.

Kalman Filtering AEC

By default, the sample script described in Listing 3 performs Kalman filtering-based AEC in the subband domain. To try it, run unit_test/test_subband_aec.py in the Git repository as

$ cd ${your_btk_git_repository}/unit_test
$ python test_subband_aec.py  \
  -i data/speech_and_reverb_lt.wav \
  -p data/lt.wav \
  -o out/kf_aec_output.wav

Notice that the script uses the Nyquist(M) filter bank with M=256, m=4 and r=1, by default. You can see how well the Kalman filtering-based AEC can suppress the interfering music signal by comparing the input file (data/speech_and_reverb_lt.wav) with the output file (out/kf_aec_output.wav).

As specified from line 167 to to 177 in Listing 3, Kalman filtering-based AEC has a couple of hyper-parameters. The sensitivity of the the Kalman filtering AEC on recognition performance was investigated in [MCKR+11] [MKR11].

NLMS AEC

The BTK also provides the simplest AEC method based on the normalized least mean square (NMLS) algorithm as a reference. To do it, we just need to specify the NLMS option in a JSON file as

{ "type":"nlms",
   "energy_threshold":100,
   "delta":100,
   "epsilon":10E-4}

Then, we feed the JSON file to unit_test/test_subband_aec.py:

$ cd ${your_btk_git_repository}/unit_test
$ python test_subband_aec.py  \
  -c nlms.json \
  -i data/speech_and_reverb_lt.wav \
  -p data/lt.wav \
  -o out/kf_aec_output.wav

In practice, the NLMS-based AEC methods need additional components to avoid diverging a filter estimate such as accurate double-talk detection or meticulous step size parameter setting. Kalman filters will not require such heuristic methods since the step size parameter can be adjusted in a statistical framework; see Chapter 10 in [HAY02] for the details of the difference.

Listing 6 unit_test/test_subband_aec.py
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
#!/usr/bin/python
"""
Test subband acoustic echo cancellation on the single channel data.

.. moduleauthor:: John McDonough, Kenichi Kumatani <k_kumatani@ieee.org>
"""
import argparse, json
import os.path
import pickle
import wave
import sys
import numpy

from btk20.common import *
from btk20.stream import *
from btk20.feature import *
from btk20.modulated import *
from btk20.aec import *

def test_subband_aec(analysis_filter_path,
                     synthesis_filter_path,
                     M, m, r,
                     input_audio_path,
                     reference_audio_path,
                     out_path,
                     aec_conf,
                     samplerate=16000):

    D = M / 2**r # frame shift

    # Read analysis prototype 'h'
    with open(analysis_filter_path, 'r') as fp:
        h_fb = pickle.load(fp)

    # Read synthesis prototype 'g'
    with open(synthesis_filter_path, 'r') as fp:
        g_fb = pickle.load(fp)

    # Instantiation of an audio file reader
    input_sample_feat = SampleFeaturePtr(block_len = D, shift_len = D, pad_zeros = True)
    reference_sample_feat = SampleFeaturePtr(block_len = D, shift_len = D, pad_zeros = True)
    # Instantiation of over-sampled DFT analysis filter bank
    input_afb = OverSampledDFTAnalysisBankPtr(input_sample_feat, prototype = h_fb, M = M, m = m, r = r, delay_compensation_type=2)
    reference_afb = OverSampledDFTAnalysisBankPtr(reference_sample_feat, prototype = h_fb, M = M, m = m, r = r, delay_compensation_type=2)

    # Instantiation of subband AEC
    if aec_conf['type'].lower() == 'information_filter':
        # Information Kalman filter AEC
        aec = InformationFilterEchoCancellationFeaturePtr(reference_afb, input_afb,
                                                          sample_num = aec_conf.get('filter_length', 2),
                                                          beta = aec_conf.get('beta', 0.95),
                                                          sigmau2 = aec_conf.get('sigmau2', 10E-4),
                                                          sigmak2 = aec_conf.get('sigmak2', 5.0),
                                                          snr_threshold = aec_conf.get('snr_threshold', 0.01),
                                                          energy_threshold = aec_conf.get('energy_threshold', 100),
                                                          smooth = aec_conf.get('smooth', 0.9),
                                                          loading = aec_conf.get('loading', 1.0E-02),
                                                          amp4play = aec_conf.get('amp4play', 1.0))
    elif aec_conf['type'].lower() == 'square_root_information_filter':
        # Square root information filter
        aec = SquareRootInformationFilterEchoCancellationFeaturePtr(reference_afb, input_afb,
                                                                    sample_num = aec_conf.get('filter_length', 2),
                                                                    beta = aec_conf.get('beta', 0.95),
                                                                    sigmau2 = aec_conf.get('sigmau2', 10E-4),
                                                                    sigmak2 = aec_conf.get('sigmak2', 5.0),
                                                                    snr_threshold = aec_conf.get('snr_threshold', 0.01),
                                                                    energy_threshold = aec_conf.get('energy_threshold', 100),
                                                                    smooth = aec_conf.get('smooth', 0.9),
                                                                    loading = aec_conf.get('loading', 1.0E-02),
                                                                    amp4play = aec_conf.get('amp4play', 1.0))
    elif aec_conf['type'].lower() == 'dtd_block_kalman_filter':
        # Kalman filtering AEC with double-talk detector (DTD)
        aec = DTDBlockKalmanFilterEchoCancellationFeaturePtr(reference_afb, input_afb,
                                                             sample_num = aec_conf.get('filter_length', 2),
                                                             beta = aec_conf.get('beta', 0.95),
                                                             sigmau2 = aec_conf.get('sigmau2', 10E-4),
                                                             sigmak2 = aec_conf.get('sigmak2', 5.0),
                                                             snr_threshold = aec_conf.get('snr_threshold', 0.01),
                                                             energy_threshold = aec_conf.get('energy_threshold', 100),
                                                             smooth = aec_conf.get('smooth', 0.9),
                                                             amp4play = aec_conf.get('amp4play', 1.0))
    elif aec_conf['type'].lower() == 'nlms':
        # Normalized least mean square AEC
        aec =  NLMSAcousticEchoCancellationFeaturePtr(reference_afb, input_afb,
                                                      delta = aec_conf.get('delta', 100.0),
                                                      epsilon = aec_conf.get('epsilon', 1.0E-04),
                                                      threshold = aec_conf.get('energy_threshold', 100.0))
    else:
        raise KeyError('Invalid AEC type {}'.format(aec_conf['type']))

    # Instantiation of over-sampled DFT synthesis filter bank
    sfb = OverSampledDFTSynthesisBankPtr(aec, prototype = g_fb, M = M, m = m, r = r, delay_compensation_type=2)
    # Read the observed audio file
    input_sample_feat.read(input_audio_path, samplerate)
    # Read the reference audio file
    reference_sample_feat.read(reference_audio_path, samplerate)

    if not os.path.exists(os.path.dirname(out_path)):
        try:
            os.makedirs(os.path.dirname(out_path))
        except:
            pass
    wavefile = wave.open(out_path, 'w')
    wavefile.setnchannels(1)
    wavefile.setsampwidth(2)
    wavefile.setframerate(int(samplerate))

    # Perform subband AEC through the oversampled DFT-modulated filer bank
    for frame_no, b in enumerate(sfb):
        if frame_no % 128 == 0:
            print('%0.2f sec. processed' %(frame_no * D / samplerate))
        storewave = numpy.array(b, numpy.int16)
        wavefile.writeframes(storewave.tostring())

    wavefile.close()


def build_parser():

    M = 256
    m = 4
    r = 1

    protoPath    = 'prototype.ny'
    analysis_filter_path  = '%s/h-M%d-m%d-r%d.pickle' %(protoPath, M, m, r)
    synthesis_filter_path = '%s/g-M%d-m%d-r%d.pickle' %(protoPath, M, m, r)

    parser = argparse.ArgumentParser(description='test subband AEC.')
    parser.add_argument('-a', dest='analysis_filter_path',
                        default=analysis_filter_path,
                        help='analysis filter prototype file')
    parser.add_argument('-s', dest='synthesis_filter_path',
                        default=synthesis_filter_path,
                        help='synthesis filter prototype file')
    parser.add_argument('-M', dest='M',
                        default=M, type=int,
                        help='no. of subbands')
    parser.add_argument('-m', dest='m',
                        default=m, type=int,
                        help='Prototype filter length factor')
    parser.add_argument('-r', dest='r',
                        default=r, type=int,
                        help='Decimation factor')
    parser.add_argument('-i', dest='input_audio_path',
                        default='data/speech_and_reverb_lt.wav',
                        help='observation audio file')
    parser.add_argument('-o', dest='out_path',
                        default='out/aec_output.wav',
                        help='output audio file')
    parser.add_argument('-p', dest='reference_audio_path',
                        default='data/lt.wav',
                        help='reference audio file')
    parser.add_argument('-c', dest='aec_conf_path',
                        default=None,
                        help='JSON path for AEC configuration')

    return parser


if __name__ == '__main__':

    parser = build_parser()
    args = parser.parse_args()

    if args.aec_conf_path is None:
        # Default AEC configuration
        aec_conf={'type':'dtd_block_kalman_filter', # 'information_filter' or 'square_root_information_filter'
                  'filter_length':36, # length of the subband Kalman filter
                  'loading':10e-4, # diagonal loading added to the information matrix
                  'sigmau2':10e-6, # initial variance
                  'sigmak2':5.0, # initial Kalman gain
                  'beta':0.95, # forgetting factor recursive observation noise variance estimation
                  'snr_threshold':0.01,
                  'energy_threshold':1.0E+01,
                  'smooth':0.95,
                  'amp4play':1.0,
                }
    else:
        with open(args.aec_conf_path, 'r') as jsonfp:
            aec_conf = json.load(jsonfp)

    print('AEC config.')
    print(json.dumps(aec_conf, indent=4))
    print('')
    test_subband_aec(args.analysis_filter_path,
                     args.synthesis_filter_path,
                     args.M, args.m, args.r,
                     args.input_audio_path,
                     args.reference_audio_path,
                     args.out_path,
                     aec_conf,
                     samplerate=16000)