Feature Extraction

This page shows the usage of the BTK for speech feature extraction. The example scripts below will also help you understand how to use the stream feature pointer object and iterator.

Log Power Feature

Code shown in Listing 1 performs log-power computation from an audio file. In that example, we first define the dependency between processors from line 17 to 25. At line 17, the script instantiates the audio sample feature pointer object, SampleFeaturePtr, for reading audio samples block-by-block. The SampleFeaturePtr instance, samplefe, is then cascaded into the HammingFeaturePtr object that performs Hamming windowing on a block of audio samples. After that, we pass the HammingFeaturePtr instance to the FFT processor to obtain a complex-valued DFT coefficient. Those frequency component will be transformed into the log domain through the SpectralPowerFeaturePtr and LogFeaturePtr.

Actual execution is performed though the Python iterator at line 33; the iterator executes a chain of speech processing from Hamming windowing, FFT, power computation and log operation in order.

The final output, the log power vector, is stored in log_vector at line 33 and dumped into a file.

Listing 1 unit_test/log_power_extractor.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#!/usr/bin/python
"""
Compute log power feature from an audio file
"""
import pickle, numpy
from btk20.common import *
from btk20.stream import *
from btk20.feature import *

D = 160 # 10 msec for 16 kHz audio
fft_len = 256
pow_num = fft_len//2 + 1
input_filename = "../tools/filterbank/Headset1.wav"
output_filename = "log_power.pickle"

# Audio file reader
samplefe  = SampleFeaturePtr(block_len=D, shift_len=D, pad_zeros=False)
# Hamming window calculator
hammingfe = HammingFeaturePtr(samplefe)
# FFT feature extractor
fftfe     = FFTFeaturePtr(hammingfe, fft_len=fft_len)
# Power (complex square) feature extractor
powerfe   = SpectralPowerFeaturePtr(fftfe, pow_num=pow_num)
# Log feature extractor
logfe     = LogFeaturePtr(powerfe)

# Reading the audio file
samplefe.read(input_filename)

with open(output_filename, 'w') as ofp:
    frame_no = 0
    # compute the log power feature at each frame
    for log_vector in logfe:
        # print the first 10-dimension vector
        print('fr. {}: {}..'.format(frame_no, numpy.array2string(log_vector[0:10], formatter={'float_kind':lambda x: "%.2f" % x})))
        pickle.dump(log_vector, ofp, True)
        frame_no += 1

Mel-Frequency Cepstral Coefficient (MFCC)

In the same way as log power feature extraction, we can implement MFCC feature extraction. Listing 2 shows an example of MFCC computation. As it is clear in the code, we just need to insert more feature pointer objects to obtain the MFCC.

Listing 2 unit_test/mfcc_extractor.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#!/usr/bin/python
"""
Compute mel frequency cepstrum coefficient (MFCC) from an audio file
"""
import pickle, numpy
from btk20.common import *
from btk20.stream import *
from btk20.feature import *

samplerate  = 16000.0
D = 160 # 10 msec for 16 kHz audio
fft_len = 256
pow_num = fft_len//2 + 1
mel_num     = 30      # no. mel-filter bank output
lower       = 100.0   # lower frequency for the mel-filter bank
upper       = 6800.0  # upper frequency for the mel-filter bank
ncep        = 13      # no. cepstral coefficients

input_filename = "../tools/filterbank/Headset1.wav"
output_filename = "mfcc.pickle"

# Audio file reader
samplefe  = SampleFeaturePtr(block_len=D, shift_len=D, pad_zeros=False)
sample_storage = StorageFeaturePtr(samplefe)
# Hamming window calculator
hammingfe = HammingFeaturePtr(sample_storage)
# FFT feature extractor
fftfe     = FFTFeaturePtr(hammingfe, fft_len=fft_len)
# Power (complex square) feature extractor
powerfe   = SpectralPowerFeaturePtr(fftfe, pow_num=pow_num)
# Vocal tract length normalizer
vtlnfe    = VTLNFeaturePtr(powerfe, coeff_num=pow_num, edge=0.8, version=2)
# Mel-filter bank feature extractor
melfe     = MelFeaturePtr(vtlnfe, pow_num=pow_num, filter_num=mel_num, rate=samplerate, low=lower, up=upper, version=2)
# Log feature extractor
logfe     = LogFeaturePtr(powerfe)
# Cepstrum computation
cepfe     = CepstralFeaturePtr(logfe, ncep=ncep)
# Storage the MFCC feature
cep_storage = StorageFeaturePtr(cepfe)

# Reading the audio file
samplefe.read(input_filename)

with open(output_filename, 'w') as ofp:
    frame_no = 0
    # compute the MFCC at each frame
    for cep_vector in cep_storage:
        print('fr. {}: {}'.format(frame_no, numpy.array2string(cep_vector, formatter={'float_kind':lambda x: "%.2f" % x})))
        pickle.dump(cep_vector, ofp, True)
        frame_no += 1

Minimum Variance Distortionless Response (MVDR) Feature

[WM05] [VSR13]

Those scripts can be found in btk20_src/unit_test of the git repository.