Dereverberation¶
Several papers reported that reverberation effects would degrade far-field speech recognition accuracy [TA05] [KDGH+16] [YN12]. Dereverberation can be achieved through super-Gaussian beamforming [KMB12] or weighted prediction error (WPE) algorithm [YN12]. The BTK implements the single channel and multi-channel WPE algorithm in the subband domain.
It is worth noting that there is another pure Python implementation of the WPE algorithm, Nara WPE. The main differences between Nara WPE and BTK2.0 are:
- Easy use of a different multi-rate filter bank system in addition to the naive overlap/overadd method based on short term FFT, and
- C++ implementation of the core WPE process that makes processing time faster than the pure Python implementation.
We will go through the example code Listing 5 step-by-step.
Subband Weighted Prediction Error (WPE) Algorithm¶
The “dereverberation” module in the BTK provides the single channel and multi-channel WPE dereverberators. Notice that the filter estimation process is separated from the filtering process in the BTK2.0. So, you can apply the dereverberation filter estimated with a small portion of data to the rest of a recording.
Listing 5 shows how to use the “dereverberation” module to run the WPE dereverberator. Depending the number of input audio files, the script switches the single channel dereverberator function single_channel_wpe() defined at line 27 to the multi-channel versionmulti_channel_wpe() declared at line 183.
Single channel WPE dereverberator¶
To taste the single channel WPE, run a unit_test/test_subband_dereverberator.py:
$ cd ${your_btk_git_repository}/unit_test
$ python test_subband_dereverberator.py \
-i data/CMU/R1/M1005/KINECT/RAW/segmented/U1001_1M_16k_b16_c1.wav \
-o out/U1001_1M_16k_b16_c1.wav
This will generate the dereverbed output as out/U1001_1M_16k_b16_c1.wav.
You can configure several WPE parameters by feeding the JSON file:
{"lower_num":0,
"upper_num":32,
"iterations_num":2,
"load_db": -18.0,
"band_width":0.0,
"diagonal_bias":0.0001
}
into the script as
$ python test_subband_dereverberator.py \
-c your_wpe_pram.json \
-i data/CMU/R1/M1005/KINECT/RAW/segmented/U1001_1M_16k_b16_c1.wav \
-o out/U1001_1M_16k_b16_c1.wav
Tip
The reverberation suppression performance can be improved by increasing the WPE parameter, ‘upper_num’ (specified at line 279 in Listing 5). This results in a longer filter length. Note that the use of more filter coefficients leads to more computation complexity.
To build the single channel WPE processor, unit_test/test_subband_dereverberator.py creates the feature pointer object graph:
- SampleFeaturePtr(),
- OverSampledDFTAnalysisBankPtr(),
- SingleChannelWPEDereverberationFeaturePtr(), and
- OverSampledDFTSynthesisBankPtr().
First, the script runs dereverberation filter estimation by calling SingleChannelWPEDereverberationFeaturePtr.estimate_filter() at line 73 in Listing 5. After filter estimation, the python iterator is called to reconstruct the time signal (at line 84). The iterator runs the next() method of each feature pointer object back from OverSampledDFTSynthesisBankPtr() to SampleFeaturePtr().
Multi-channel WPE dereverberator¶
The multi-channel WPE version can be tested by typing the following command:
$ cd ${your_btk_git_repository}/unit_test
$ python test_subband_dereverberator.py \
-i data/CMU/R1/M1005/KINECT/RAW/segmented/U1001_1M_16k_b16_c1.wav \
data/CMU/R1/M1005/KINECT/RAW/segmented/U1001_1M_16k_b16_c2.wav \
data/CMU/R1/M1005/KINECT/RAW/segmented/U1001_1M_16k_b16_c3.wav \
data/CMU/R1/M1005/KINECT/RAW/segmented/U1001_1M_16k_b16_c4.wav \
-o out/U1001_1M_16k_b16_c1.wav \
out/U1001_1M_16k_b16_c2.wav \
out/U1001_1M_16k_b16_c3.wav \
out/U1001_1M_16k_b16_c4.wav \
-b 26000 \
-e 64000
Tip
Notice that the command above will estimate the filter coefficients with the speech signal present from the 26000th frame to the 64000th frame in the input files. It is better not to use silence audio data for the sake of computational efficiency. The silence data or weak signals will not lead to a good filter solution.
Warning
Do not feed identical multiple audio files into the multi-channel WPE estimator. It will cause numerical instability. If the input signals are very similar, consider performing the single channel WPE algorithm on each channel independently. That will also speed up filter estimation.
The multi-channel WPE dereverberator will consist of
- Multiple SampleFeaturePtr() instances,
- Multiple OverSampledDFTAnalysisBankPtr() instances,
- A MultiChannelWPEDereverberationPtr() instance,
- Multiple MultiChannelWPEDereverberationFeaturePtr() instances, and
- Multiple OverSampledDFTSynthesisBankPtr() instances.
In order to generate multiple output, the script calls the chain of the iterators for each channel at line 173.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 | #!/usr/bin/python
"""
Test a single-channel or multi-channel weighted prediction error (WPE) dereverberator on the subband domain.
It will switch to the multi-channel WPE If the multiple input audio files are specified.
.. reference::
[1] Kumatani, J. W. McDonough, S. Schachl, D. Klakow, P. N. Garner and W. Li, "Filter bank design based on minimization of individual aliasing terms for minimum mutual information subband adaptive beamforming," in ICASSP, Las Vegas, USA, 2008.
[2] T. Yoshioka and T. Nakatani, "Generalization of multi-channel linear prediction methods for blind MIMO impulse response shortening," IEEE Trans. Audio, Speech, Language Process, pp. 2707-2720, 2012.
.. moduleauthor:: John McDonough, Kenichi Kumatani <k_kumatani@ieee.org>
"""
import argparse, json
import os.path
import pickle
import wave
import sys
import numpy
from btk20.common import *
from btk20.stream import *
from btk20.feature import *
from btk20.modulated import *
from btk20.dereverberation import *
def single_channel_wpe(h_fb, g_fb, D, M, m, r, input_audio_path, out_path, wpe_conf, samplerate, start_frame_no, end_frame_no):
"""
Run weighted prediction error (WPE) dereverberation on single channel data
:param h_fb: Analysis filter coefficeint; This must be generated with /tools/filterbank/{design_nyquist_filter.py,design_de_haan_filter.py}
:type h_fb: numpy float vector
:param g_fb: Synthesis filter coefficients paird with h_fb
:type g_fbM: numpy float vector
:param M: Number of subbands
:type M: integer
:param m: Filter length factor
:type m: integer
:param D: Decimation factor
:type D: integer
:param input_audio_path: Input audio file that you would like to dereverb
:type input_audio_path: string
:param out_path: Output audio file
:type out_path: string
:param wpe_conf: Dictionary to specify WPE parameters
:type wpe_conf: Python dictionary
:param samplerate: Sampling rate of the input audio
:type samplerate: integer
:param start_frame_no: Start point used for filter estimation
:type start_frame_no: integer
:param end_frame_no: End point used for filter estimation
:type end_frame_no: integer
"""
# Instantiation of an audio file reader
sample_feat = SampleFeaturePtr(block_len = D, shift_len = D, pad_zeros = True)
# Instantiation of over-sampled DFT analysis filter bank
afb = OverSampledDFTAnalysisBankPtr(sample_feat, prototype = h_fb, M = M, m = m, r = r, delay_compensation_type=2)
# Instantiation of single channel WPE dereverberator
dereverb = SingleChannelWPEDereverberationFeaturePtr(afb,
lower_num = wpe_conf.get('lower_num', 0),
upper_num = wpe_conf.get('upper_num', 64),
iterations_num = wpe_conf.get('iterations_num', 2),
load_db = wpe_conf.get('load_db', -20.0),
band_width = wpe_conf.get('band_width', 0.0),
samplerate = samplerate)
# Instantiation of synthesis filter bank
sfb = OverSampledDFTSynthesisBankPtr(dereverb, prototype=g_fb, M=M, m=m, r=r, delay_compensation_type=2)
# Estimate the dereverberation filter
sample_feat.read(input_audio_path, samplerate)
dereverb.print_objective_func(50)
frame_num = dereverb.estimate_filter()
print('%d frames are used for filter estimation' %frame_num)
# Opening the output audio file
wavefile = wave.open(out_path, 'w')
wavefile.setnchannels(1)
wavefile.setsampwidth(2)
wavefile.setframerate(int(samplerate))
# Run WPE dereverberation
sample_feat.read(input_audio_path, samplerate)
for frame_no, b in enumerate(sfb):
if frame_no % 128 == 0:
print('%0.2f sec. processed' %(frame_no * D / samplerate))
storewave = numpy.array(b, numpy.int16)
wavefile.writeframes(storewave.tostring())
wavefile.close()
def multi_channel_wpe(h_fb, g_fb, D, M, m, r, input_audio_paths, out_paths, wpe_conf, samplerate, start_frame_no, end_frame_no):
"""
Run weighted prediction error (WPE) on multi-channel data
:param h_fb: Analysis filter coefficeint; This must be generated with /tools/filterbank/{design_nyquist_filter.py,design_de_haan_filter.py}
:type h_fb: numpy float vector
:param g_fb: Synthesis filter coefficients paird with h_fb
:type g_fbM: numpy float vector
:param M: Number of subbands
:type M: integer
:param m: Filter length factor
:type m: integer
:param D: Decimation factor
:type D: integer
:param input_audio_path: List of input audio file that you would like to dereverb
:type input_audio_path: List of string
:param out_path: Output audio file
:type out_path: string
:param wpe_conf: Dictionary to specify WPE parameters
:type wpe_conf: Python dictionary
:param samplerate: Sampling rate of the input audio
:type samplerate: integer
:param start_frame_no: Start point used for filter estimation
:type start_frame_no: integer
:param end_frame_no: End point used for filter estimation
:type end_frame_no: integer
"""
channels_num = len(input_audio_paths)
# Instantiation of multi-channel dereverberation filter estimation based on WPE
pre_dereverb = MultiChannelWPEDereverberationPtr(subbands_num=M, channels_num=channels_num,
lower_num = wpe_conf.get('lower_num', 0),
upper_num = wpe_conf.get('upper_num', 32),
iterations_num = wpe_conf.get('iterations_num', 2),
load_db = wpe_conf.get('load_db', -20.0),
band_width = wpe_conf.get('band_width', 0.0),
diagonal_bias = wpe_conf.get('diagonal_bias', 0.001),
samplerate = samplerate)
pre_dereverb.print_objective_func(50)
sample_feats = []
afbs = []
for c, input_audio_path in enumerate(input_audio_paths):
# Instantiation of an audio file reader
sample_feat = SampleFeaturePtr(block_len = D, shift_len = D, pad_zeros = True)
sample_feat.read(input_audio_path, samplerate)
# Instantiation of over-sampled DFT analysis filter bank
afb = OverSampledDFTAnalysisBankPtr(sample_feat, prototype = h_fb, M = M, m = m, r = r, delay_compensation_type=2)
pre_dereverb.set_input(afb)
# Keep the instances
sample_feats.append(sample_feat)
afbs.append(afb)
# build the dereverberation filter
frame_num = pre_dereverb.estimate_filter()
print('%d frames are used for filter estimation' %frame_num)
sfbs = []
wavefiles = []
for c in range(channels_num):
# Reread the test audio
sample_feats[c].read(input_audio_paths[c], samplerate)
# Instantiate the multi-channel WPE feature object
dereverb = MultiChannelWPEDereverberationFeaturePtr(pre_dereverb, channel_no=c)
sfb = OverSampledDFTSynthesisBankPtr(dereverb, prototype = g_fb, M = M, m = m, r = r, delay_compensation_type = 2)
sfbs.append(sfb)
# Open an output file pointer
wavefile = wave.open(out_paths[c], 'w')
wavefile.setnchannels(1)
wavefile.setsampwidth(2) #
wavefile.setframerate(int(samplerate))
wavefiles.append(wavefile)
# Perform dereverberation on each channel data
frame_no = 0
while True:
if frame_no % 128 == 0:
print('%0.2f sec. processed' %(frame_no * D / samplerate))
try:
for c in range(channels_num):
wavefiles[c].writeframes(numpy.array(sfbs[c].next(), numpy.int16).tostring())
except StopIteration:
break
frame_no += 1
# Close all the output file pointers
for wavefile in wavefiles:
wavefile.close()
def test_subband_dereverberator(analysis_filter_path,
synthesis_filter_path,
M, m, r,
input_audio_paths,
out_paths,
wpe_conf,
samplerate=16000,
start_frame_no = 0,
end_frame_no = -1):
assert len(input_audio_paths) == len(out_paths), 'No. input files have to be equal to no. output files'
D = M / 2**r # frame shift
# Read analysis prototype 'h'
with open(analysis_filter_path, 'r') as fp:
h_fb = pickle.load(fp)
# Read synthesis prototype 'g'
with open(synthesis_filter_path, 'r') as fp:
g_fb = pickle.load(fp)
for out_path in out_paths:
if not os.path.exists(os.path.dirname(out_path)):
try:
os.makedirs(os.path.dirname(out_path))
except:
pass
if len(input_audio_paths) == 1:
single_channel_wpe(h_fb, g_fb, D, M, m, r, input_audio_paths[0], out_paths[0], wpe_conf, samplerate, start_frame_no, end_frame_no)
else:
multi_channel_wpe(h_fb, g_fb, D, M, m, r, input_audio_paths, out_paths, wpe_conf, samplerate, start_frame_no, end_frame_no)
def build_parser():
M = 256
m = 4
r = 1
protoPath = 'prototype.ny'
analysis_filter_path = '%s/h-M%d-m%d-r%d.pickle' %(protoPath, M, m, r)
synthesis_filter_path = '%s/g-M%d-m%d-r%d.pickle' %(protoPath, M, m, r)
default_input_audio_paths = ['data/CMU/R1/M1005/KINECT/RAW/segmented/U1001_1M_16k_b16_c1.wav',
'data/CMU/R1/M1005/KINECT/RAW/segmented/U1001_1M_16k_b16_c2.wav',
'data/CMU/R1/M1005/KINECT/RAW/segmented/U1001_1M_16k_b16_c3.wav',
'data/CMU/R1/M1005/KINECT/RAW/segmented/U1001_1M_16k_b16_c4.wav']
default_out_paths = ['out/U1001_1M_c1.wav',
'out/U1001_1M_c2.wav',
'out/U1001_1M_c3.wav',
'out/U1001_1M_c4.wav']
parser = argparse.ArgumentParser(description='test subband WPE dereverberation.')
parser.add_argument('-a', dest='analysis_filter_path',
default=analysis_filter_path,
help='analysis filter prototype file')
parser.add_argument('-s', dest='synthesis_filter_path',
default=synthesis_filter_path,
help='synthesis filter prototype file')
parser.add_argument('-M', dest='M',
default=M, type=int,
help='no. of subbands')
parser.add_argument('-m', dest='m',
default=m, type=int,
help='Prototype filter length factor')
parser.add_argument('-r', dest='r',
default=r, type=int,
help='Decimation factor')
parser.add_argument('-i', dest='input_audio_paths', nargs='+',
default=default_input_audio_paths,
help='observation audio file(s)')
parser.add_argument('-o', dest='out_paths', nargs='+',
default=default_out_paths,
help='output audio file(s)')
parser.add_argument('-c', dest='wpe_conf_path',
default=None,
help='JSON path for WPE dereverberator configuration')
parser.add_argument('-b', dest='start_frame_no',
default=26000,
help='Start frame point for filter estimation')
parser.add_argument('-e', dest='end_frame_no',
default=-1, # 62000
help='end frame point for filter estimation. Will be the end of the file if it is -1')
return parser
if __name__ == '__main__':
parser = build_parser()
args = parser.parse_args()
if args.wpe_conf_path is None:
# Default WPE configuration
wpe_conf={'lower_num':0,
'upper_num':32, # upper_num - lower_num == filter length,
'iterations_num':2,
'load_db': -18.0,
'band_width':0.0,
'diagonal_bias':0.0001, # Diagonal loading for Cholesky decomposition stabilization (Multi-channel WPE only)
}
else:
with open(args.wpe_conf_path, 'r') as jsonfp:
wpe_conf = json.load(jsonfp)
print('WPE config.')
print(json.dumps(wpe_conf, indent=4))
print('')
test_subband_dereverberator(args.analysis_filter_path,
args.synthesis_filter_path,
args.M, args.m, args.r,
args.input_audio_paths,
args.out_paths,
wpe_conf,
samplerate=16000,
start_frame_no=args.start_frame_no,
end_frame_no=args.end_frame_no)
|