API Docs

torch_pesq module

class torch_pesq.PesqLoss(factor, sample_rate=48000, nbarks=49, win_length=512, n_fft=512, hop_length=256)

Bases: Module

Perceptual Evaluation of Speech Quality

Implementation of the PESQ score in the PyTorch framework, closely following the ITU P.862 reference. There are two mayor difference:

  1. no time alignment

  2. energy normalization uses an IIR filter

Parameters:
  • factor (float) – Scaling of the loss function

  • sample_rate (int) – Sampling rate of the time signal, re-samples if different from 16kHz

  • nbarks (int) – Number of bark bands

  • win_length (int) – Window size used in the STFT

  • n_fft (int) – Number of frequency bins

  • hop_length (int) – Distance between different frames

to_spec

Perform a Short-Time Fourier Transformation on the time signal returning the power spectral density

Type:

torch.nn.Module

fbank

Apply a Bark scaling to the power distribution

Type:

torch.nn.Module

loudness

Estimate perceived loudness of the Bark scaled spectrogram

Type:

torch.nn.Module

power_filter

IIR filter coefficients to calculate power in 325Hz to 3.25kHz band

Type:

TensorType

pre_filter

Pre-empasize filter, applied to reference and degraded signal

Type:

TensorType

factor: float
align_level(signal)

Align power to 10**7 for band 325 to 3.25kHz

Parameters:

signal (TensorType["batch", "sample"]) – Input time signal with size [batch, sample]

Returns:

Tensor containing the scaled time signal

Return type:

TensorType[“batch”, “sample”]

preemphasize(signal)

Pre-empasize a signal

This pre-emphasize filter is also applied in the reference implementation. The filter coefficients are taken from the reference.

Parameters:

signal (TensorType["batch", "sample"]) – Input time signal with size [batch, sample]

Returns:

Tensor containing the pre-emphasized signal

Return type:

TensorType[“batch”, “sample”]

raw(ref, deg)

Calculate symmetric and asymmetric distances

Parameters:
  • ref (Tensor[Tensor]) –

  • deg (Tensor[Tensor]) –

Return type:

Tuple[Tensor[Tensor], Tensor[Tensor]]

mos(ref, deg)

Calculate Mean Opinion Score

Parameters:
  • ref (TensorType["batch", "sample"]) – Reference signal

  • deg (TensorType["batch", "sample"]) – Degraded signal

Returns:

Mean Opinion Score in range (1.08, 4.999)

Return type:

TensorType[“batch”, “sample”]

forward(ref, deg)

Calculate a loss variant of the MOS score

This function combines symmetric and asymmetric distances but does not apply a range compression and flip the sign in order to maximize the MOS.

Parameters:
  • ref (TensorType["batch", "sample"]) – Reference signal

  • deg (TensorType["batch", "sample"]) – Degraded signal

Returns:

Loss value in range [0, inf)

Return type:

TensorType[“batch”, “sample”]

class torch_pesq.BarkScale(nfreqs=256, nbarks=49)

Bases: Module

Bark filterbank according to P.862; can be extended with linear interpolation

The ITU P.862 standard models perception with a Bark scaled filterbank. It uses rectangular filters and a constant width until 4kHz center frequency. This implementation uses interpolation to approximate the original parametrization when the number of band is different from the reference implementation.

Parameters:
  • nfreqs (int) – Number of frequency bins

  • nbarks (int) – Number of Bark bands

pow_dens_correction

Power density correction factors for each filter band

Type:

list

width_hz

Width of each filter in Hz

Type:

list

width_bark

Width of each filter in Bark

Type:

list

centre

Centre frequency of each band

Type:

list

fbank

Filterbank matrix converting power spectrum to band powers

Type:

TensorType[“band”, “bark”]

weighted_norm(tensor, p=2)

Calculates the p-norm taking band width into consideration

Parameters:
  • tensor (TensorType["batch", "frame", "band"]) – Power spectrogram with nfreqs frequency bins

  • p (float) – Norm value

Returns:

scaled norm value

Return type:

TensorType[“batch”, “frame”]

forward(tensor)

Converts a Hz-scaled spectrogram to a Bark-scaled spectrogram

Parameters:

tensor (TensorType["batch", "frame", "band"]) – A Hz-scaled power spectrogram

Returns:

A Bark-scaled power spectrogram

Return type:

TensorType[“batch”, “frame”, “bark”]

training: bool
class torch_pesq.Loudness(nbark=49)

Bases: Module

Apply a loudness curve to the Bark spectrogram

Parameters:

nbark (int) – Number of bark bands

threshs

Hearing threshold per band; below a band is assumed to contain no significant energy

Type:

TensorType[1, 1, “band”]

exp

Exponent of each band

Type:

TensorType[1, 1, “band”]

total_audible(tensor, factor=1.0)

Calculate total audible energy for each frame over all bands

Parameters:
  • tensor (TensorType["batch", "frame", "band"]) – A Bark scaled spectrogram with shape [batch_size, nframes, nbands]

  • factor (float) – Scaling factor of the hearing threshold

Returns:

A tensor containing the hearable energy with shape [batch_size, nframes]

Return type:

TensorType[“batch”, “frame”]

time_avg_audible(tensor, silent)

Calculate arithmetic mean of audible energy for each band over all frames

Parameters:
  • tensor (TensorType["batch", "frame", "band"]) – A Bark scaled spectrogram with shape [batch_size, nframes, nbands]

  • silent (TensorType["batch", "frame"]) – Indicates whether a frame is silent or not

Returns:

A tensor containing the hearable energy with shape [batch_size, nbands]

Return type:

TensorType[“batch”, “band”]

forward(pow_dens)

Transform Bark scaled power spectrogram to audible energy per band

Parameters:

pow_dens (TensorType["batch", "frame", "band"]) – A Bark scaled spectrogram with shape [batch_size, nframes, nbands]

Returns:

A tensor containing the hearable energy with shape [batch_size, nframes, nbands]

Return type:

TensorType[“batch”, “frame”, “band”]

training: bool

torch_pesq.bark module

torch_pesq.bark.interp(values, nelms_new)

Apply linear interpolation to the list of values

Parameters:
  • values (list) – The list of values to be interpolated

  • nelms_new (int) – Number of values of returned list

Returns:

a list of interpolated values

Return type:

TensorType

class torch_pesq.bark.BarkScale(nfreqs=256, nbarks=49)

Bases: Module

Bark filterbank according to P.862; can be extended with linear interpolation

The ITU P.862 standard models perception with a Bark scaled filterbank. It uses rectangular filters and a constant width until 4kHz center frequency. This implementation uses interpolation to approximate the original parametrization when the number of band is different from the reference implementation.

Parameters:
  • nfreqs (int) – Number of frequency bins

  • nbarks (int) – Number of Bark bands

pow_dens_correction

Power density correction factors for each filter band

Type:

list

width_hz

Width of each filter in Hz

Type:

list

width_bark

Width of each filter in Bark

Type:

list

centre

Centre frequency of each band

Type:

list

fbank

Filterbank matrix converting power spectrum to band powers

Type:

TensorType[“band”, “bark”]

weighted_norm(tensor, p=2)

Calculates the p-norm taking band width into consideration

Parameters:
  • tensor (TensorType["batch", "frame", "band"]) – Power spectrogram with nfreqs frequency bins

  • p (float) – Norm value

Returns:

scaled norm value

Return type:

TensorType[“batch”, “frame”]

forward(tensor)

Converts a Hz-scaled spectrogram to a Bark-scaled spectrogram

Parameters:

tensor (TensorType["batch", "frame", "band"]) – A Hz-scaled power spectrogram

Returns:

A Bark-scaled power spectrogram

Return type:

TensorType[“batch”, “frame”, “bark”]

training: bool

torch_pesq.loss module

class torch_pesq.loss.PesqLoss(factor, sample_rate=48000, nbarks=49, win_length=512, n_fft=512, hop_length=256)

Bases: Module

Perceptual Evaluation of Speech Quality

Implementation of the PESQ score in the PyTorch framework, closely following the ITU P.862 reference. There are two mayor difference:

  1. no time alignment

  2. energy normalization uses an IIR filter

Parameters:
  • factor (float) – Scaling of the loss function

  • sample_rate (int) – Sampling rate of the time signal, re-samples if different from 16kHz

  • nbarks (int) – Number of bark bands

  • win_length (int) – Window size used in the STFT

  • n_fft (int) – Number of frequency bins

  • hop_length (int) – Distance between different frames

to_spec

Perform a Short-Time Fourier Transformation on the time signal returning the power spectral density

Type:

torch.nn.Module

fbank

Apply a Bark scaling to the power distribution

Type:

torch.nn.Module

loudness

Estimate perceived loudness of the Bark scaled spectrogram

Type:

torch.nn.Module

power_filter

IIR filter coefficients to calculate power in 325Hz to 3.25kHz band

Type:

TensorType

pre_filter

Pre-empasize filter, applied to reference and degraded signal

Type:

TensorType

factor: float
align_level(signal)

Align power to 10**7 for band 325 to 3.25kHz

Parameters:

signal (TensorType["batch", "sample"]) – Input time signal with size [batch, sample]

Returns:

Tensor containing the scaled time signal

Return type:

TensorType[“batch”, “sample”]

preemphasize(signal)

Pre-empasize a signal

This pre-emphasize filter is also applied in the reference implementation. The filter coefficients are taken from the reference.

Parameters:

signal (TensorType["batch", "sample"]) – Input time signal with size [batch, sample]

Returns:

Tensor containing the pre-emphasized signal

Return type:

TensorType[“batch”, “sample”]

raw(ref, deg)

Calculate symmetric and asymmetric distances

Parameters:
  • ref (Tensor[Tensor]) –

  • deg (Tensor[Tensor]) –

Return type:

Tuple[Tensor[Tensor], Tensor[Tensor]]

mos(ref, deg)

Calculate Mean Opinion Score

Parameters:
  • ref (TensorType["batch", "sample"]) – Reference signal

  • deg (TensorType["batch", "sample"]) – Degraded signal

Returns:

Mean Opinion Score in range (1.08, 4.999)

Return type:

TensorType[“batch”, “sample”]

forward(ref, deg)

Calculate a loss variant of the MOS score

This function combines symmetric and asymmetric distances but does not apply a range compression and flip the sign in order to maximize the MOS.

Parameters:
  • ref (TensorType["batch", "sample"]) – Reference signal

  • deg (TensorType["batch", "sample"]) – Degraded signal

Returns:

Loss value in range [0, inf)

Return type:

TensorType[“batch”, “sample”]

training: bool

torch_pesq.loudness module

class torch_pesq.loudness.Loudness(nbark=49)

Bases: Module

Apply a loudness curve to the Bark spectrogram

Parameters:

nbark (int) – Number of bark bands

threshs

Hearing threshold per band; below a band is assumed to contain no significant energy

Type:

TensorType[1, 1, “band”]

exp

Exponent of each band

Type:

TensorType[1, 1, “band”]

total_audible(tensor, factor=1.0)

Calculate total audible energy for each frame over all bands

Parameters:
  • tensor (TensorType["batch", "frame", "band"]) – A Bark scaled spectrogram with shape [batch_size, nframes, nbands]

  • factor (float) – Scaling factor of the hearing threshold

Returns:

A tensor containing the hearable energy with shape [batch_size, nframes]

Return type:

TensorType[“batch”, “frame”]

time_avg_audible(tensor, silent)

Calculate arithmetic mean of audible energy for each band over all frames

Parameters:
  • tensor (TensorType["batch", "frame", "band"]) – A Bark scaled spectrogram with shape [batch_size, nframes, nbands]

  • silent (TensorType["batch", "frame"]) – Indicates whether a frame is silent or not

Returns:

A tensor containing the hearable energy with shape [batch_size, nbands]

Return type:

TensorType[“batch”, “band”]

forward(pow_dens)

Transform Bark scaled power spectrogram to audible energy per band

Parameters:

pow_dens (TensorType["batch", "frame", "band"]) – A Bark scaled spectrogram with shape [batch_size, nframes, nbands]

Returns:

A tensor containing the hearable energy with shape [batch_size, nframes, nbands]

Return type:

TensorType[“batch”, “frame”, “band”]

training: bool