API Docs¶
torch_pesq module¶
- class torch_pesq.PesqLoss(factor, sample_rate=48000, nbarks=49, win_length=512, n_fft=512, hop_length=256)¶
Bases:
Module
Perceptual Evaluation of Speech Quality
Implementation of the PESQ score in the PyTorch framework, closely following the ITU P.862 reference. There are two mayor difference:
no time alignment
energy normalization uses an IIR filter
- Parameters:
factor (float) – Scaling of the loss function
sample_rate (int) – Sampling rate of the time signal, re-samples if different from 16kHz
nbarks (int) – Number of bark bands
win_length (int) – Window size used in the STFT
n_fft (int) – Number of frequency bins
hop_length (int) – Distance between different frames
- to_spec¶
Perform a Short-Time Fourier Transformation on the time signal returning the power spectral density
- Type:
torch.nn.Module
- fbank¶
Apply a Bark scaling to the power distribution
- Type:
torch.nn.Module
- loudness¶
Estimate perceived loudness of the Bark scaled spectrogram
- Type:
torch.nn.Module
- power_filter¶
IIR filter coefficients to calculate power in 325Hz to 3.25kHz band
- Type:
TensorType
- pre_filter¶
Pre-empasize filter, applied to reference and degraded signal
- Type:
TensorType
- factor: float¶
- align_level(signal)¶
Align power to 10**7 for band 325 to 3.25kHz
- Parameters:
signal (TensorType["batch", "sample"]) – Input time signal with size [batch, sample]
- Returns:
Tensor containing the scaled time signal
- Return type:
TensorType[“batch”, “sample”]
- preemphasize(signal)¶
Pre-empasize a signal
This pre-emphasize filter is also applied in the reference implementation. The filter coefficients are taken from the reference.
- Parameters:
signal (TensorType["batch", "sample"]) – Input time signal with size [batch, sample]
- Returns:
Tensor containing the pre-emphasized signal
- Return type:
TensorType[“batch”, “sample”]
- raw(ref, deg)¶
Calculate symmetric and asymmetric distances
- Parameters:
ref (Tensor[Tensor]) –
deg (Tensor[Tensor]) –
- Return type:
Tuple[Tensor[Tensor], Tensor[Tensor]]
- mos(ref, deg)¶
Calculate Mean Opinion Score
- Parameters:
ref (TensorType["batch", "sample"]) – Reference signal
deg (TensorType["batch", "sample"]) – Degraded signal
- Returns:
Mean Opinion Score in range (1.08, 4.999)
- Return type:
TensorType[“batch”, “sample”]
- forward(ref, deg)¶
Calculate a loss variant of the MOS score
This function combines symmetric and asymmetric distances but does not apply a range compression and flip the sign in order to maximize the MOS.
- Parameters:
ref (TensorType["batch", "sample"]) – Reference signal
deg (TensorType["batch", "sample"]) – Degraded signal
- Returns:
Loss value in range [0, inf)
- Return type:
TensorType[“batch”, “sample”]
- class torch_pesq.BarkScale(nfreqs=256, nbarks=49)¶
Bases:
Module
Bark filterbank according to P.862; can be extended with linear interpolation
The ITU P.862 standard models perception with a Bark scaled filterbank. It uses rectangular filters and a constant width until 4kHz center frequency. This implementation uses interpolation to approximate the original parametrization when the number of band is different from the reference implementation.
- Parameters:
nfreqs (int) – Number of frequency bins
nbarks (int) – Number of Bark bands
- pow_dens_correction¶
Power density correction factors for each filter band
- Type:
list
- width_hz¶
Width of each filter in Hz
- Type:
list
- width_bark¶
Width of each filter in Bark
- Type:
list
- centre¶
Centre frequency of each band
- Type:
list
- fbank¶
Filterbank matrix converting power spectrum to band powers
- Type:
TensorType[“band”, “bark”]
- weighted_norm(tensor, p=2)¶
Calculates the p-norm taking band width into consideration
- Parameters:
tensor (TensorType["batch", "frame", "band"]) – Power spectrogram with nfreqs frequency bins
p (float) – Norm value
- Returns:
scaled norm value
- Return type:
TensorType[“batch”, “frame”]
- forward(tensor)¶
Converts a Hz-scaled spectrogram to a Bark-scaled spectrogram
- Parameters:
tensor (TensorType["batch", "frame", "band"]) – A Hz-scaled power spectrogram
- Returns:
A Bark-scaled power spectrogram
- Return type:
TensorType[“batch”, “frame”, “bark”]
- training: bool¶
- class torch_pesq.Loudness(nbark=49)¶
Bases:
Module
Apply a loudness curve to the Bark spectrogram
- Parameters:
nbark (int) – Number of bark bands
- threshs¶
Hearing threshold per band; below a band is assumed to contain no significant energy
- Type:
TensorType[1, 1, “band”]
- exp¶
Exponent of each band
- Type:
TensorType[1, 1, “band”]
- total_audible(tensor, factor=1.0)¶
Calculate total audible energy for each frame over all bands
- Parameters:
tensor (TensorType["batch", "frame", "band"]) – A Bark scaled spectrogram with shape [batch_size, nframes, nbands]
factor (float) – Scaling factor of the hearing threshold
- Returns:
A tensor containing the hearable energy with shape [batch_size, nframes]
- Return type:
TensorType[“batch”, “frame”]
- time_avg_audible(tensor, silent)¶
Calculate arithmetic mean of audible energy for each band over all frames
- Parameters:
tensor (TensorType["batch", "frame", "band"]) – A Bark scaled spectrogram with shape [batch_size, nframes, nbands]
silent (TensorType["batch", "frame"]) – Indicates whether a frame is silent or not
- Returns:
A tensor containing the hearable energy with shape [batch_size, nbands]
- Return type:
TensorType[“batch”, “band”]
- forward(pow_dens)¶
Transform Bark scaled power spectrogram to audible energy per band
- Parameters:
pow_dens (TensorType["batch", "frame", "band"]) – A Bark scaled spectrogram with shape [batch_size, nframes, nbands]
- Returns:
A tensor containing the hearable energy with shape [batch_size, nframes, nbands]
- Return type:
TensorType[“batch”, “frame”, “band”]
- training: bool¶
torch_pesq.bark module¶
- torch_pesq.bark.interp(values, nelms_new)¶
Apply linear interpolation to the list of values
- Parameters:
values (list) – The list of values to be interpolated
nelms_new (int) – Number of values of returned list
- Returns:
a list of interpolated values
- Return type:
TensorType
- class torch_pesq.bark.BarkScale(nfreqs=256, nbarks=49)¶
Bases:
Module
Bark filterbank according to P.862; can be extended with linear interpolation
The ITU P.862 standard models perception with a Bark scaled filterbank. It uses rectangular filters and a constant width until 4kHz center frequency. This implementation uses interpolation to approximate the original parametrization when the number of band is different from the reference implementation.
- Parameters:
nfreqs (int) – Number of frequency bins
nbarks (int) – Number of Bark bands
- pow_dens_correction¶
Power density correction factors for each filter band
- Type:
list
- width_hz¶
Width of each filter in Hz
- Type:
list
- width_bark¶
Width of each filter in Bark
- Type:
list
- centre¶
Centre frequency of each band
- Type:
list
- fbank¶
Filterbank matrix converting power spectrum to band powers
- Type:
TensorType[“band”, “bark”]
- weighted_norm(tensor, p=2)¶
Calculates the p-norm taking band width into consideration
- Parameters:
tensor (TensorType["batch", "frame", "band"]) – Power spectrogram with nfreqs frequency bins
p (float) – Norm value
- Returns:
scaled norm value
- Return type:
TensorType[“batch”, “frame”]
- forward(tensor)¶
Converts a Hz-scaled spectrogram to a Bark-scaled spectrogram
- Parameters:
tensor (TensorType["batch", "frame", "band"]) – A Hz-scaled power spectrogram
- Returns:
A Bark-scaled power spectrogram
- Return type:
TensorType[“batch”, “frame”, “bark”]
- training: bool¶
torch_pesq.loss module¶
- class torch_pesq.loss.PesqLoss(factor, sample_rate=48000, nbarks=49, win_length=512, n_fft=512, hop_length=256)¶
Bases:
Module
Perceptual Evaluation of Speech Quality
Implementation of the PESQ score in the PyTorch framework, closely following the ITU P.862 reference. There are two mayor difference:
no time alignment
energy normalization uses an IIR filter
- Parameters:
factor (float) – Scaling of the loss function
sample_rate (int) – Sampling rate of the time signal, re-samples if different from 16kHz
nbarks (int) – Number of bark bands
win_length (int) – Window size used in the STFT
n_fft (int) – Number of frequency bins
hop_length (int) – Distance between different frames
- to_spec¶
Perform a Short-Time Fourier Transformation on the time signal returning the power spectral density
- Type:
torch.nn.Module
- fbank¶
Apply a Bark scaling to the power distribution
- Type:
torch.nn.Module
- loudness¶
Estimate perceived loudness of the Bark scaled spectrogram
- Type:
torch.nn.Module
- power_filter¶
IIR filter coefficients to calculate power in 325Hz to 3.25kHz band
- Type:
TensorType
- pre_filter¶
Pre-empasize filter, applied to reference and degraded signal
- Type:
TensorType
- factor: float¶
- align_level(signal)¶
Align power to 10**7 for band 325 to 3.25kHz
- Parameters:
signal (TensorType["batch", "sample"]) – Input time signal with size [batch, sample]
- Returns:
Tensor containing the scaled time signal
- Return type:
TensorType[“batch”, “sample”]
- preemphasize(signal)¶
Pre-empasize a signal
This pre-emphasize filter is also applied in the reference implementation. The filter coefficients are taken from the reference.
- Parameters:
signal (TensorType["batch", "sample"]) – Input time signal with size [batch, sample]
- Returns:
Tensor containing the pre-emphasized signal
- Return type:
TensorType[“batch”, “sample”]
- raw(ref, deg)¶
Calculate symmetric and asymmetric distances
- Parameters:
ref (Tensor[Tensor]) –
deg (Tensor[Tensor]) –
- Return type:
Tuple[Tensor[Tensor], Tensor[Tensor]]
- mos(ref, deg)¶
Calculate Mean Opinion Score
- Parameters:
ref (TensorType["batch", "sample"]) – Reference signal
deg (TensorType["batch", "sample"]) – Degraded signal
- Returns:
Mean Opinion Score in range (1.08, 4.999)
- Return type:
TensorType[“batch”, “sample”]
- forward(ref, deg)¶
Calculate a loss variant of the MOS score
This function combines symmetric and asymmetric distances but does not apply a range compression and flip the sign in order to maximize the MOS.
- Parameters:
ref (TensorType["batch", "sample"]) – Reference signal
deg (TensorType["batch", "sample"]) – Degraded signal
- Returns:
Loss value in range [0, inf)
- Return type:
TensorType[“batch”, “sample”]
- training: bool¶
torch_pesq.loudness module¶
- class torch_pesq.loudness.Loudness(nbark=49)¶
Bases:
Module
Apply a loudness curve to the Bark spectrogram
- Parameters:
nbark (int) – Number of bark bands
- threshs¶
Hearing threshold per band; below a band is assumed to contain no significant energy
- Type:
TensorType[1, 1, “band”]
- exp¶
Exponent of each band
- Type:
TensorType[1, 1, “band”]
- total_audible(tensor, factor=1.0)¶
Calculate total audible energy for each frame over all bands
- Parameters:
tensor (TensorType["batch", "frame", "band"]) – A Bark scaled spectrogram with shape [batch_size, nframes, nbands]
factor (float) – Scaling factor of the hearing threshold
- Returns:
A tensor containing the hearable energy with shape [batch_size, nframes]
- Return type:
TensorType[“batch”, “frame”]
- time_avg_audible(tensor, silent)¶
Calculate arithmetic mean of audible energy for each band over all frames
- Parameters:
tensor (TensorType["batch", "frame", "band"]) – A Bark scaled spectrogram with shape [batch_size, nframes, nbands]
silent (TensorType["batch", "frame"]) – Indicates whether a frame is silent or not
- Returns:
A tensor containing the hearable energy with shape [batch_size, nbands]
- Return type:
TensorType[“batch”, “band”]
- forward(pow_dens)¶
Transform Bark scaled power spectrogram to audible energy per band
- Parameters:
pow_dens (TensorType["batch", "frame", "band"]) – A Bark scaled spectrogram with shape [batch_size, nframes, nbands]
- Returns:
A tensor containing the hearable energy with shape [batch_size, nframes, nbands]
- Return type:
TensorType[“batch”, “frame”, “band”]
- training: bool¶