Dereverberation

Several papers reported that reverberation effects would degrade far-field speech recognition accuracy [TA05] [KDGH+16] [YN12]. Dereverberation can be achieved through super-Gaussian beamforming [KMB12] or weighted prediction error (WPE) algorithm [YN12]. The BTK implements the single channel and multi-channel WPE algorithm in the subband domain.

It is worth noting that there is another pure Python implementation of the WPE algorithm, Nara WPE. The main differences between Nara WPE and BTK2.0 are:

  • Easy use of a different multi-rate filter bank system in addition to the naive overlap/overadd method based on short term FFT, and
  • C++ implementation of the core WPE process that makes processing time faster than the pure Python implementation.

We will go through the example code Listing 5 step-by-step.

Subband Weighted Prediction Error (WPE) Algorithm

The “dereverberation” module in the BTK provides the single channel and multi-channel WPE dereverberators. Notice that the filter estimation process is separated from the filtering process in the BTK2.0. So, you can apply the dereverberation filter estimated with a small portion of data to the rest of a recording.

Listing 5 shows how to use the “dereverberation” module to run the WPE dereverberator. Depending the number of input audio files, the script switches the single channel dereverberator function single_channel_wpe() defined at line 27 to the multi-channel versionmulti_channel_wpe() declared at line 183.

Single channel WPE dereverberator

To taste the single channel WPE, run a unit_test/test_subband_dereverberator.py:

$ cd ${your_btk_git_repository}/unit_test
$ python test_subband_dereverberator.py  \
  -i data/CMU/R1/M1005/KINECT/RAW/segmented/U1001_1M_16k_b16_c1.wav \
  -o out/U1001_1M_16k_b16_c1.wav

This will generate the dereverbed output as out/U1001_1M_16k_b16_c1.wav.

You can configure several WPE parameters by feeding the JSON file:

    {"lower_num":0,
    "upper_num":32,
    "iterations_num":2,
    "load_db": -18.0,
    "band_width":0.0,
    "diagonal_bias":0.0001
}

into the script as

$ python test_subband_dereverberator.py  \
  -c your_wpe_pram.json \
  -i data/CMU/R1/M1005/KINECT/RAW/segmented/U1001_1M_16k_b16_c1.wav \
  -o out/U1001_1M_16k_b16_c1.wav

Tip

The reverberation suppression performance can be improved by increasing the WPE parameter, ‘upper_num’ (specified at line 279 in Listing 5). This results in a longer filter length. Note that the use of more filter coefficients leads to more computation complexity.

To build the single channel WPE processor, unit_test/test_subband_dereverberator.py creates the feature pointer object graph:

  • SampleFeaturePtr(),
  • OverSampledDFTAnalysisBankPtr(),
  • SingleChannelWPEDereverberationFeaturePtr(), and
  • OverSampledDFTSynthesisBankPtr().

First, the script runs dereverberation filter estimation by calling SingleChannelWPEDereverberationFeaturePtr.estimate_filter() at line 73 in Listing 5. After filter estimation, the python iterator is called to reconstruct the time signal (at line 84). The iterator runs the next() method of each feature pointer object back from OverSampledDFTSynthesisBankPtr() to SampleFeaturePtr().

Multi-channel WPE dereverberator

The multi-channel WPE version can be tested by typing the following command:

$ cd ${your_btk_git_repository}/unit_test
$ python test_subband_dereverberator.py  \
  -i data/CMU/R1/M1005/KINECT/RAW/segmented/U1001_1M_16k_b16_c1.wav \
     data/CMU/R1/M1005/KINECT/RAW/segmented/U1001_1M_16k_b16_c2.wav \
     data/CMU/R1/M1005/KINECT/RAW/segmented/U1001_1M_16k_b16_c3.wav \
     data/CMU/R1/M1005/KINECT/RAW/segmented/U1001_1M_16k_b16_c4.wav \
  -o out/U1001_1M_16k_b16_c1.wav \
      out/U1001_1M_16k_b16_c2.wav \
      out/U1001_1M_16k_b16_c3.wav \
      out/U1001_1M_16k_b16_c4.wav \
   -b 26000 \
   -e 64000

Tip

Notice that the command above will estimate the filter coefficients with the speech signal present from the 26000th frame to the 64000th frame in the input files. It is better not to use silence audio data for the sake of computational efficiency. The silence data or weak signals will not lead to a good filter solution.

Warning

Do not feed identical multiple audio files into the multi-channel WPE estimator. It will cause numerical instability. If the input signals are very similar, consider performing the single channel WPE algorithm on each channel independently. That will also speed up filter estimation.

The multi-channel WPE dereverberator will consist of

  • Multiple SampleFeaturePtr() instances,
  • Multiple OverSampledDFTAnalysisBankPtr() instances,
  • A MultiChannelWPEDereverberationPtr() instance,
  • Multiple MultiChannelWPEDereverberationFeaturePtr() instances, and
  • Multiple OverSampledDFTSynthesisBankPtr() instances.

In order to generate multiple output, the script calls the chain of the iterators for each channel at line 173.

Listing 5 unit_test/test_subband_dereverberator.py
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
#!/usr/bin/python
"""
Test a single-channel or multi-channel weighted prediction error (WPE) dereverberator on the subband domain.

It will switch to the multi-channel WPE If the multiple input audio files are specified.

.. reference::
[1] Kumatani, J. W. McDonough, S. Schachl, D. Klakow, P. N. Garner and W. Li, "Filter bank design based on minimization of individual aliasing terms for minimum mutual information subband adaptive beamforming," in ICASSP, Las Vegas, USA, 2008.

[2] T. Yoshioka and T. Nakatani, "Generalization of multi-channel linear prediction methods for blind MIMO impulse response shortening," IEEE Trans. Audio, Speech, Language Process, pp. 2707-2720, 2012.

.. moduleauthor:: John McDonough, Kenichi Kumatani <k_kumatani@ieee.org>
"""
import argparse, json
import os.path
import pickle
import wave
import sys
import numpy

from btk20.common import *
from btk20.stream import *
from btk20.feature import *
from btk20.modulated import *
from btk20.dereverberation import *

def single_channel_wpe(h_fb, g_fb, D, M, m, r, input_audio_path, out_path, wpe_conf, samplerate, start_frame_no, end_frame_no):
    """
    Run weighted prediction error (WPE) dereverberation on single channel data

    :param h_fb: Analysis filter coefficeint; This must be generated with /tools/filterbank/{design_nyquist_filter.py,design_de_haan_filter.py}
    :type h_fb: numpy float vector
    :param g_fb: Synthesis filter coefficients paird with h_fb
    :type g_fbM: numpy float vector
    :param M: Number of subbands
    :type M: integer
    :param m: Filter length factor
    :type m: integer
    :param D: Decimation factor
    :type D: integer
    :param input_audio_path: Input audio file that you would like to dereverb
    :type input_audio_path: string
    :param out_path: Output audio file
    :type out_path: string
    :param wpe_conf: Dictionary to specify WPE parameters
    :type wpe_conf: Python dictionary
    :param samplerate: Sampling rate of the input audio
    :type samplerate: integer
    :param start_frame_no: Start point used for filter estimation
    :type start_frame_no: integer
    :param end_frame_no: End point used for filter estimation
    :type end_frame_no: integer
    """

    # Instantiation of an audio file reader
    sample_feat = SampleFeaturePtr(block_len = D, shift_len = D, pad_zeros = True)
    # Instantiation of over-sampled DFT analysis filter bank
    afb = OverSampledDFTAnalysisBankPtr(sample_feat, prototype = h_fb, M = M, m = m, r = r, delay_compensation_type=2)
    # Instantiation of single channel WPE dereverberator
    dereverb = SingleChannelWPEDereverberationFeaturePtr(afb,
                                                         lower_num = wpe_conf.get('lower_num', 0),
                                                         upper_num = wpe_conf.get('upper_num', 64),
                                                         iterations_num = wpe_conf.get('iterations_num', 2),
                                                         load_db = wpe_conf.get('load_db', -20.0),
                                                         band_width = wpe_conf.get('band_width', 0.0),
                                                         samplerate = samplerate)
    # Instantiation of synthesis filter bank
    sfb = OverSampledDFTSynthesisBankPtr(dereverb, prototype=g_fb, M=M, m=m, r=r, delay_compensation_type=2)

    # Estimate the dereverberation filter
    sample_feat.read(input_audio_path, samplerate)
    dereverb.print_objective_func(50)
    frame_num = dereverb.estimate_filter()
    print('%d frames are used for filter estimation' %frame_num)

    # Opening the output audio file
    wavefile = wave.open(out_path, 'w')
    wavefile.setnchannels(1)
    wavefile.setsampwidth(2)
    wavefile.setframerate(int(samplerate))

    # Run WPE dereverberation
    sample_feat.read(input_audio_path, samplerate)
    for frame_no, b in enumerate(sfb):
        if frame_no % 128 == 0:
            print('%0.2f sec. processed' %(frame_no * D / samplerate))
        storewave = numpy.array(b, numpy.int16)
        wavefile.writeframes(storewave.tostring())

    wavefile.close()


def multi_channel_wpe(h_fb, g_fb, D, M, m, r, input_audio_paths, out_paths, wpe_conf, samplerate, start_frame_no, end_frame_no):
    """
    Run weighted prediction error (WPE) on multi-channel data

    :param h_fb: Analysis filter coefficeint; This must be generated with /tools/filterbank/{design_nyquist_filter.py,design_de_haan_filter.py}
    :type h_fb: numpy float vector
    :param g_fb: Synthesis filter coefficients paird with h_fb
    :type g_fbM: numpy float vector
    :param M: Number of subbands
    :type M: integer
    :param m: Filter length factor
    :type m: integer
    :param D: Decimation factor
    :type D: integer
    :param input_audio_path: List of input audio file that you would like to dereverb
    :type input_audio_path: List of string
    :param out_path: Output audio file
    :type out_path: string
    :param wpe_conf: Dictionary to specify WPE parameters
    :type wpe_conf: Python dictionary
    :param samplerate: Sampling rate of the input audio
    :type samplerate: integer
    :param start_frame_no: Start point used for filter estimation
    :type start_frame_no: integer
    :param end_frame_no: End point used for filter estimation
    :type end_frame_no: integer
    """

    channels_num = len(input_audio_paths)
    # Instantiation of multi-channel dereverberation filter estimation based on WPE
    pre_dereverb = MultiChannelWPEDereverberationPtr(subbands_num=M, channels_num=channels_num,
                                                     lower_num = wpe_conf.get('lower_num', 0),
                                                     upper_num = wpe_conf.get('upper_num', 32),
                                                     iterations_num = wpe_conf.get('iterations_num', 2),
                                                     load_db = wpe_conf.get('load_db', -20.0),
                                                     band_width = wpe_conf.get('band_width', 0.0),
                                                     diagonal_bias = wpe_conf.get('diagonal_bias', 0.001),
                                                     samplerate = samplerate)
    pre_dereverb.print_objective_func(50)

    sample_feats = []
    afbs = []
    for c, input_audio_path in enumerate(input_audio_paths):
        # Instantiation of an audio file reader
        sample_feat = SampleFeaturePtr(block_len = D, shift_len = D, pad_zeros = True)
        sample_feat.read(input_audio_path, samplerate)
        # Instantiation of over-sampled DFT analysis filter bank
        afb = OverSampledDFTAnalysisBankPtr(sample_feat, prototype = h_fb, M = M, m = m, r = r, delay_compensation_type=2)
        pre_dereverb.set_input(afb)
        # Keep the instances
        sample_feats.append(sample_feat)
        afbs.append(afb)

    # build the dereverberation filter
    frame_num = pre_dereverb.estimate_filter()
    print('%d frames are used for filter estimation' %frame_num)

    sfbs = []
    wavefiles = []
    for c in range(channels_num):
        # Reread the test audio
        sample_feats[c].read(input_audio_paths[c], samplerate)
        # Instantiate the multi-channel WPE feature object
        dereverb = MultiChannelWPEDereverberationFeaturePtr(pre_dereverb, channel_no=c)
        sfb = OverSampledDFTSynthesisBankPtr(dereverb, prototype = g_fb, M = M, m = m, r = r, delay_compensation_type = 2)
        sfbs.append(sfb)
        # Open an output file pointer
        wavefile = wave.open(out_paths[c], 'w')
        wavefile.setnchannels(1)
        wavefile.setsampwidth(2) #
        wavefile.setframerate(int(samplerate))
        wavefiles.append(wavefile)

    # Perform dereverberation on each channel data
    frame_no = 0
    while True:
        if frame_no % 128 == 0:
            print('%0.2f sec. processed' %(frame_no * D / samplerate))
        try:
            for c in range(channels_num):
                wavefiles[c].writeframes(numpy.array(sfbs[c].next(), numpy.int16).tostring())
        except StopIteration:
            break
        frame_no += 1

    # Close all the output file pointers
    for wavefile in wavefiles:
        wavefile.close()


def test_subband_dereverberator(analysis_filter_path,
                                synthesis_filter_path,
                                M, m, r,
                                input_audio_paths,
                                out_paths,
                                wpe_conf,
                                samplerate=16000,
                                start_frame_no = 0,
                                end_frame_no =  -1):

    assert len(input_audio_paths) == len(out_paths), 'No. input files have to be equal to no. output files'
    D = M / 2**r # frame shift

    # Read analysis prototype 'h'
    with open(analysis_filter_path, 'r') as fp:
        h_fb = pickle.load(fp)

    # Read synthesis prototype 'g'
    with open(synthesis_filter_path, 'r') as fp:
        g_fb = pickle.load(fp)

    for out_path in out_paths:
        if not os.path.exists(os.path.dirname(out_path)):
            try:
                os.makedirs(os.path.dirname(out_path))
            except:
                pass

    if len(input_audio_paths) == 1:
        single_channel_wpe(h_fb, g_fb, D, M, m, r, input_audio_paths[0], out_paths[0], wpe_conf, samplerate, start_frame_no, end_frame_no)
    else:
        multi_channel_wpe(h_fb, g_fb, D, M, m, r, input_audio_paths, out_paths, wpe_conf, samplerate, start_frame_no, end_frame_no)


def build_parser():

    M = 256
    m = 4
    r = 1

    protoPath    = 'prototype.ny'
    analysis_filter_path  = '%s/h-M%d-m%d-r%d.pickle' %(protoPath, M, m, r)
    synthesis_filter_path = '%s/g-M%d-m%d-r%d.pickle' %(protoPath, M, m, r)

    default_input_audio_paths = ['data/CMU/R1/M1005/KINECT/RAW/segmented/U1001_1M_16k_b16_c1.wav',
                                 'data/CMU/R1/M1005/KINECT/RAW/segmented/U1001_1M_16k_b16_c2.wav',
                                 'data/CMU/R1/M1005/KINECT/RAW/segmented/U1001_1M_16k_b16_c3.wav',
                                 'data/CMU/R1/M1005/KINECT/RAW/segmented/U1001_1M_16k_b16_c4.wav']
    default_out_paths =         ['out/U1001_1M_c1.wav',
                                 'out/U1001_1M_c2.wav',
                                 'out/U1001_1M_c3.wav',
                                 'out/U1001_1M_c4.wav']

    parser = argparse.ArgumentParser(description='test subband WPE dereverberation.')
    parser.add_argument('-a', dest='analysis_filter_path',
                        default=analysis_filter_path,
                        help='analysis filter prototype file')
    parser.add_argument('-s', dest='synthesis_filter_path',
                        default=synthesis_filter_path,
                        help='synthesis filter prototype file')
    parser.add_argument('-M', dest='M',
                        default=M, type=int,
                        help='no. of subbands')
    parser.add_argument('-m', dest='m',
                        default=m, type=int,
                        help='Prototype filter length factor')
    parser.add_argument('-r', dest='r',
                        default=r, type=int,
                        help='Decimation factor')
    parser.add_argument('-i', dest='input_audio_paths', nargs='+',
                        default=default_input_audio_paths,
                        help='observation audio file(s)')
    parser.add_argument('-o', dest='out_paths', nargs='+',
                        default=default_out_paths,
                        help='output audio file(s)')
    parser.add_argument('-c', dest='wpe_conf_path',
                        default=None,
                        help='JSON path for WPE dereverberator configuration')
    parser.add_argument('-b', dest='start_frame_no',
                        default=26000,
                        help='Start frame point for filter estimation')
    parser.add_argument('-e', dest='end_frame_no',
                        default=-1, # 62000
                        help='end frame point for filter estimation. Will be the end of the file if it is -1')

    return parser


if __name__ == '__main__':

    parser = build_parser()
    args = parser.parse_args()

    if args.wpe_conf_path is None:
        # Default WPE configuration
        wpe_conf={'lower_num':0,
                  'upper_num':32, # upper_num - lower_num == filter length,
                  'iterations_num':2,
                  'load_db': -18.0,
                  'band_width':0.0,
                  'diagonal_bias':0.0001, # Diagonal loading for Cholesky decomposition stabilization (Multi-channel WPE only)
        }
    else:
        with open(args.wpe_conf_path, 'r') as jsonfp:
            wpe_conf = json.load(jsonfp)

    print('WPE config.')
    print(json.dumps(wpe_conf, indent=4))
    print('')
    test_subband_dereverberator(args.analysis_filter_path,
                                args.synthesis_filter_path,
                                args.M, args.m, args.r,
                                args.input_audio_paths,
                                args.out_paths,
                                wpe_conf,
                                samplerate=16000,
                                start_frame_no=args.start_frame_no,
                                end_frame_no=args.end_frame_no)