API Docs¶

torch_pesq module¶

class torch_pesq.PesqLoss(factor, sample_rate=48000, nbarks=49, win_length=512, n_fft=512, hop_length=256)¶

Bases: Module

Perceptual Evaluation of Speech Quality

Implementation of the PESQ score in the PyTorch framework, closely following the ITU P.862 reference. There are two mayor difference:

no time alignment

energy normalization uses an IIR filter

Parameters:

factor (float) – Scaling of the loss function
sample_rate (int) – Sampling rate of the time signal, re-samples if different from 16kHz
nbarks (int) – Number of bark bands
win_length (int) – Window size used in the STFT
n_fft (int) – Number of frequency bins
hop_length (int) – Distance between different frames

to_spec¶

Perform a Short-Time Fourier Transformation on the time signal returning the power spectral density

Type:: torch.nn.Module

fbank¶

Apply a Bark scaling to the power distribution

Type:: torch.nn.Module

loudness¶

Estimate perceived loudness of the Bark scaled spectrogram

Type:: torch.nn.Module

power_filter¶

IIR filter coefficients to calculate power in 325Hz to 3.25kHz band

Type:: TensorType

pre_filter¶

Pre-empasize filter, applied to reference and degraded signal

Type:: TensorType

factor: float¶

align_level(signal)¶

Align power to 10**7 for band 325 to 3.25kHz

Parameters:: signal (TensorType["batch", "sample"]) – Input time signal with size [batch, sample]
Returns:: Tensor containing the scaled time signal
Return type:: TensorType[“batch”, “sample”]

preemphasize(signal)¶

Pre-empasize a signal

This pre-emphasize filter is also applied in the reference implementation. The filter coefficients are taken from the reference.

Parameters:: signal (TensorType["batch", "sample"]) – Input time signal with size [batch, sample]
Returns:: Tensor containing the pre-emphasized signal
Return type:: TensorType[“batch”, “sample”]

raw(ref, deg)¶

Calculate symmetric and asymmetric distances

Parameters:

ref (Tensor[Tensor]) –
deg (Tensor[Tensor]) –

Return type:

Tuple[Tensor[Tensor], Tensor[Tensor]]

mos(ref, deg)¶

Calculate Mean Opinion Score

Parameters:

ref (TensorType["batch", "sample"]) – Reference signal
deg (TensorType["batch", "sample"]) – Degraded signal

Returns:

Mean Opinion Score in range (1.08, 4.999)

Return type:

TensorType[“batch”, “sample”]

forward(ref, deg)¶

Calculate a loss variant of the MOS score

This function combines symmetric and asymmetric distances but does not apply a range compression and flip the sign in order to maximize the MOS.

Parameters:

ref (TensorType["batch", "sample"]) – Reference signal
deg (TensorType["batch", "sample"]) – Degraded signal

Returns:

Loss value in range [0, inf)

Return type:

TensorType[“batch”, “sample”]

class torch_pesq.BarkScale(nfreqs=256, nbarks=49)¶

Bases: Module

Bark filterbank according to P.862; can be extended with linear interpolation

The ITU P.862 standard models perception with a Bark scaled filterbank. It uses rectangular filters and a constant width until 4kHz center frequency. This implementation uses interpolation to approximate the original parametrization when the number of band is different from the reference implementation.

Parameters:

nfreqs (int) – Number of frequency bins
nbarks (int) – Number of Bark bands

pow_dens_correction¶

Power density correction factors for each filter band

Type:: list

width_hz¶

Width of each filter in Hz

Type:: list

width_bark¶

Width of each filter in Bark

Type:: list

centre¶

Centre frequency of each band

Type:: list

fbank¶

Filterbank matrix converting power spectrum to band powers

Type:: TensorType[“band”, “bark”]

weighted_norm(tensor, p=2)¶

Calculates the p-norm taking band width into consideration

Parameters:

tensor (TensorType["batch", "frame", "band"]) – Power spectrogram with nfreqs frequency bins
p (float) – Norm value

Returns:

scaled norm value

Return type:

TensorType[“batch”, “frame”]

forward(tensor)¶

Converts a Hz-scaled spectrogram to a Bark-scaled spectrogram

Parameters:: tensor (TensorType["batch", "frame", "band"]) – A Hz-scaled power spectrogram
Returns:: A Bark-scaled power spectrogram
Return type:: TensorType[“batch”, “frame”, “bark”]

training: bool¶

class torch_pesq.Loudness(nbark=49)¶

Bases: Module

Apply a loudness curve to the Bark spectrogram

Parameters:: nbark (int) – Number of bark bands

threshs¶

Hearing threshold per band; below a band is assumed to contain no significant energy

Type:: TensorType[1, 1, “band”]

exp¶

Exponent of each band

Type:: TensorType[1, 1, “band”]

total_audible(tensor, factor=1.0)¶

Calculate total audible energy for each frame over all bands

Parameters:

tensor (TensorType["batch", "frame", "band"]) – A Bark scaled spectrogram with shape [batch_size, nframes, nbands]
factor (float) – Scaling factor of the hearing threshold

Returns:

A tensor containing the hearable energy with shape [batch_size, nframes]

Return type:

TensorType[“batch”, “frame”]

time_avg_audible(tensor, silent)¶

Calculate arithmetic mean of audible energy for each band over all frames

Parameters:

tensor (TensorType["batch", "frame", "band"]) – A Bark scaled spectrogram with shape [batch_size, nframes, nbands]
silent (TensorType["batch", "frame"]) – Indicates whether a frame is silent or not

Returns:

A tensor containing the hearable energy with shape [batch_size, nbands]

Return type:

TensorType[“batch”, “band”]

forward(pow_dens)¶

Transform Bark scaled power spectrogram to audible energy per band

Parameters:: pow_dens (TensorType["batch", "frame", "band"]) – A Bark scaled spectrogram with shape [batch_size, nframes, nbands]
Returns:: A tensor containing the hearable energy with shape [batch_size, nframes, nbands]
Return type:: TensorType[“batch”, “frame”, “band”]

training: bool¶

torch_pesq.bark module¶

torch_pesq.bark.interp(values, nelms_new)¶

Apply linear interpolation to the list of values

Parameters:

values (list) – The list of values to be interpolated
nelms_new (int) – Number of values of returned list

Returns:

a list of interpolated values

Return type:

TensorType

class torch_pesq.bark.BarkScale(nfreqs=256, nbarks=49)¶

Bases: Module

Bark filterbank according to P.862; can be extended with linear interpolation

The ITU P.862 standard models perception with a Bark scaled filterbank. It uses rectangular filters and a constant width until 4kHz center frequency. This implementation uses interpolation to approximate the original parametrization when the number of band is different from the reference implementation.

Parameters:

nfreqs (int) – Number of frequency bins
nbarks (int) – Number of Bark bands

pow_dens_correction¶

Power density correction factors for each filter band

Type:: list

width_hz¶

Width of each filter in Hz

Type:: list

width_bark¶

Width of each filter in Bark

Type:: list

centre¶

Centre frequency of each band

Type:: list

fbank¶

Filterbank matrix converting power spectrum to band powers

Type:: TensorType[“band”, “bark”]

weighted_norm(tensor, p=2)¶

Calculates the p-norm taking band width into consideration

Parameters:

tensor (TensorType["batch", "frame", "band"]) – Power spectrogram with nfreqs frequency bins
p (float) – Norm value

Returns:

scaled norm value

Return type:

TensorType[“batch”, “frame”]

forward(tensor)¶

Converts a Hz-scaled spectrogram to a Bark-scaled spectrogram

Parameters:: tensor (TensorType["batch", "frame", "band"]) – A Hz-scaled power spectrogram
Returns:: A Bark-scaled power spectrogram
Return type:: TensorType[“batch”, “frame”, “bark”]

training: bool¶

torch_pesq.loss module¶

class torch_pesq.loss.PesqLoss(factor, sample_rate=48000, nbarks=49, win_length=512, n_fft=512, hop_length=256)¶

Bases: Module

Perceptual Evaluation of Speech Quality

Implementation of the PESQ score in the PyTorch framework, closely following the ITU P.862 reference. There are two mayor difference:

no time alignment

energy normalization uses an IIR filter

Parameters:

factor (float) – Scaling of the loss function
sample_rate (int) – Sampling rate of the time signal, re-samples if different from 16kHz
nbarks (int) – Number of bark bands
win_length (int) – Window size used in the STFT
n_fft (int) – Number of frequency bins
hop_length (int) – Distance between different frames

to_spec¶

Perform a Short-Time Fourier Transformation on the time signal returning the power spectral density

Type:: torch.nn.Module

fbank¶

Apply a Bark scaling to the power distribution

Type:: torch.nn.Module

loudness¶

Estimate perceived loudness of the Bark scaled spectrogram

Type:: torch.nn.Module

power_filter¶

IIR filter coefficients to calculate power in 325Hz to 3.25kHz band

Type:: TensorType

pre_filter¶

Pre-empasize filter, applied to reference and degraded signal

Type:: TensorType

factor: float¶

align_level(signal)¶

Align power to 10**7 for band 325 to 3.25kHz

Parameters:: signal (TensorType["batch", "sample"]) – Input time signal with size [batch, sample]
Returns:: Tensor containing the scaled time signal
Return type:: TensorType[“batch”, “sample”]

preemphasize(signal)¶

Pre-empasize a signal

This pre-emphasize filter is also applied in the reference implementation. The filter coefficients are taken from the reference.

Parameters:: signal (TensorType["batch", "sample"]) – Input time signal with size [batch, sample]
Returns:: Tensor containing the pre-emphasized signal
Return type:: TensorType[“batch”, “sample”]

raw(ref, deg)¶

Calculate symmetric and asymmetric distances

Parameters:

ref (Tensor[Tensor]) –
deg (Tensor[Tensor]) –

Return type:

Tuple[Tensor[Tensor], Tensor[Tensor]]

mos(ref, deg)¶

Calculate Mean Opinion Score

Parameters:

ref (TensorType["batch", "sample"]) – Reference signal
deg (TensorType["batch", "sample"]) – Degraded signal

Returns:

Mean Opinion Score in range (1.08, 4.999)

Return type:

TensorType[“batch”, “sample”]

forward(ref, deg)¶

Calculate a loss variant of the MOS score

This function combines symmetric and asymmetric distances but does not apply a range compression and flip the sign in order to maximize the MOS.

Parameters:

ref (TensorType["batch", "sample"]) – Reference signal
deg (TensorType["batch", "sample"]) – Degraded signal

Returns:

Loss value in range [0, inf)

Return type:

TensorType[“batch”, “sample”]

training: bool¶

torch_pesq.loudness module¶

class torch_pesq.loudness.Loudness(nbark=49)¶

Bases: Module

Apply a loudness curve to the Bark spectrogram

Parameters:: nbark (int) – Number of bark bands

threshs¶

Hearing threshold per band; below a band is assumed to contain no significant energy

Type:: TensorType[1, 1, “band”]

exp¶

Exponent of each band

Type:: TensorType[1, 1, “band”]

total_audible(tensor, factor=1.0)¶

Calculate total audible energy for each frame over all bands

Parameters:

tensor (TensorType["batch", "frame", "band"]) – A Bark scaled spectrogram with shape [batch_size, nframes, nbands]
factor (float) – Scaling factor of the hearing threshold

Returns:

A tensor containing the hearable energy with shape [batch_size, nframes]

Return type:

TensorType[“batch”, “frame”]

time_avg_audible(tensor, silent)¶

Calculate arithmetic mean of audible energy for each band over all frames

Parameters:

tensor (TensorType["batch", "frame", "band"]) – A Bark scaled spectrogram with shape [batch_size, nframes, nbands]
silent (TensorType["batch", "frame"]) – Indicates whether a frame is silent or not

Returns:

A tensor containing the hearable energy with shape [batch_size, nbands]

Return type:

TensorType[“batch”, “band”]

forward(pow_dens)¶

Transform Bark scaled power spectrogram to audible energy per band

Parameters:: pow_dens (TensorType["batch", "frame", "band"]) – A Bark scaled spectrogram with shape [batch_size, nframes, nbands]
Returns:: A tensor containing the hearable energy with shape [batch_size, nframes, nbands]
Return type:: TensorType[“batch”, “frame”, “band”]

training: bool¶