`gaborator.h`

template<class T> class analyzer;

The Gaborator supports three types of frequency scales:
logarithmic, linear, and mel scales. Each is represented by a class
deriving from the abstract base class `fq_scale`

.

class fq_scale {

Comparison operators are provided for compatibility with standard container classes.

bool operator<(const fq_scale &rhs) const; bool operator==(const fq_scale &rhs) const;

};

The `log_fq_scale`

class represents a logarithmic
frequency scale, corresponding to a constant-Q transform or
constant-Q spectrogram.

class log_fq_scale: public fq_scale {

log_fq_scale(double bands_per_octave, double ff_min, double ff_ref = 0.5);

`bands_per_octave`

- The number of frequency bands per octave. Values from 4 to 384 (inclusive) are supported. Integer values yield the best performance, but non-integer values are also supported. The minimum of 4 applies when the default overlap of 0.7 is used; if the overlap is increased from the default, the minimum should be increased proportionally. Values greater than 384 probably work but are not regularly tested.
`ff_min`

- The lower limit of the analysis frequency range, in units of the
sample rate. The analysis filter bank will extend low enough in
frequency that the lowest bandpass filter center frequency falls at
or below
`ff_min`

. Values from 0.0001 up to but not including 0.5 are supported. Values below 0.0001 probably work but are not regularly tested. `ff_ref`

- The reference frequency, in units of the sample rate.
This allows fine-tuning of the analysis and synthesis filter
banks such that the center frequency of one of the filters
is aligned with
`ff_ref`

. If`ff_ref`

falls outside the frequency range of the bandpass filter bank, this works as if the range were extended to include`ff_ref`

. Must be positive. A typical value when analyzing music is`440.0 / fs`

, where`fs`

is the sample rate in Hz.

};

The `lin_fq_scale`

class represents a linear
frequency scale, corresponding to a short-term Fourier transform
(STFT) or constant-bandwidth spectrogram.

class lin_fq_scale: public fq_scale {

lin_fq_scale(double size);

`size`

The size of the transform. This corresponds to the "FFT size" parameter that is typically used with FFT based spectrum analyzers: the frequency spacing between bands is the sample rate divided by

*size*, and the number of frequency bands produced is roughly half of*size*because negative frequencies are omitted. However, in the Gaborator, the*size*parameter is not limited to powers of two, or even to integer values, so any arbitrary band spacing of*x*Hz can be achieved by specifying a*size*of`fs / x`

where`fs`

is the sample rate in Hz.Values from 2 to 10000 are supported. Larger values probably work but are not regularly tested.

};

The `mel_fq_scale`

class represents a mel
frequency scale, a scale designed to mimic human pitch perception
such that one mel is close to the smallest percepible difference
in frequency.
It is defined by the formula *mel = 2595
log _{10}(1 + f / 700)*, where

The mel scale is approximately logarithmic at high frequencies, and approximately linear at low frequencies, with a smooth transition between the two.

class mel_fq_scale: public fq_scale {

mel_fq_scale(double bands_per_mel, double fs);

`bands_per_mel`

- The number of frequency bands per mel. Supported values range from 0.003 (yielding a total of 12 frequency bands between 0 and 22.5 kHz at a 44.1 kHz sample rate) to 1.0 (yielding some 3924 bands). The minimum of 0.003 applies when the default overlap of 0.7 is used; if the overlap is increased from the default, the minimum should be increased proportionally. Values larger than 1.0 probably work but are not regularly tested.
`fs`

- The sample rate in Hz. Because the mel scale is defined in terms of absolute frequencies in Hz, the sample rate needs to be specified explicitly; this is an exception to the sample rate agnostic design of the rest of the Gaborator API. Values from 8000 to 96000 are supported. Values outside this range probably work but are not regularly tested.

};

enum class coef_phase { global, local };

The spectrogram cofficients produced by the Gaborator are complex numbers, and as such they have both a magnitude and a phase. The phase can be defined in several different ways, using different reference points for what is considered a phase of zero.

Under one possible definition, an input signal of a real and
positive impulse at some time *t* will cause all
spectrogram coefficients at that time *t* to also be real
and positive, or in other words, to have a phase of zero, for any
frequency. We call this the *local* phase convention, and it is
the default beginning with version 2 of the Gaborator.

Under an alternative definition, an input signal of a cosine wave
at some frequency *f* will cause all spectrogram coefficients at
that frequency *f* to have a phase of zero, at any point in time.
We call this the *global* phase convention, and it was the
default in version 1 of the Gaborator.

The local phase convention is preferred because it guarantees that the coefficients for a given point in time only depend on the local signal behavior around that point in time, and not on how far that point in time is from the reference point of t=0. This avoids a loss of floating point precision for signals far from t=0.

A `parameters`

object holds a set of parameters for
the spectrum analysis and resynthesis.

class parameters {

The only required parameter is the frequency scale, which is passed
as an argument to the `parameters`

constructor. Optional
parameters may be specified by assigning to data members of
the `parameters`

object after construction.

parameters(const fq_scale &scale);

For backwards compatibilty with version 1 of the Gaborator, the following constrctor is also supported:

parameters(double bands_per_octave, double ff_min, double ff_ref = 1.0);

This constructs a set of parameters with a logarithmic frequency scale, like
`parameters(log_fq_scale(bands_per_octave, ff_min, ff_ref))`

except that the `phase`

member is initialized to the version 1 default
of `coef_phase::global`

rather than the verison 2 default of
`coef_phase::local`

.

The phase convention
to be used for the analysis and resynthesis
can be set or examined using the public data member `phase`

.

coef_phase phase;

The amount of overlap between
adjacent bandpass bands can be set or examined using the public data
member `overlap`

. The default is 0.7, meaning each band
will have a standard deviation of 0.7 times the band spacing.

double overlap;

Comparison operators are provided for compatibility with standard container classes.

bool operator<(const parameters &rhs) const; bool operator==(const parameters &rhs) const;

};

A `coefs`

object stores a set of spectrogram coefficients.
It is a dynamic data structure and will be automatically grown to
accommodate new time ranges, for example as newly recorded audio is analyzed.
The template argument `T`

must match that of the `analyzer`

(usually `float`

).
The template argument `C`

is the data type used to store each
coefficient value; there is usually no need to specify it explicitly as
it will default to `std::complex<T>`

.

template <class T, class C = std::complex<T>> class coefs {

coefs(analyzer<T> &a);

Construct an empty set of coefficients for use with the spectrum
analyzer `a`

. This represents a signal that is zero
at all points in time.

};

The `analyzer`

object performs spectrum analysis and/or resynthesis
according to the given parameters. The template argument `T`

is
the floating-point type to use for the calculations. This is typically `float`

;
alternatively, `double`

can be used for increased accuracy at the
expense of speed and memory consumption.

template <class T> class analyzer {

analyzer(const parameters ¶ms);

`params`

- The spectrum analysis parameters.

void analyze(const T *signal, int64_t t0, int64_t t1, coefs<T> &coefs) const;

Spectrum analyze the samples at `*signal`

and add the
resulting coefficients to `coefs`

.

`signal`

- The signal samples to analyze, beginning with the sample from time
`t0`

and ending with the last sample before time`t1`

, for a total of`t1 - t0`

samples. `t0`

- The point in time when the sample at
`signal[0]`

was taken, in samples. For example, when analyzing an audio recording, this is typically 0 for the first sample in the recording, but this reference point is arbitrary, and negative times are valid. Accuracy begins to successively decrease outside the range of about ±10^{8}samples, so using large time values should be avoided when they are not necessary because of the length of the track. `t1`

- The point in time of the sample one past the
end of the array of samples at
`signal`

, in samples. `coefs`

- The coefficient object that the results of the spectrum analysis are added to.

If the `coefs`

object already contains some
coefficients, the new coefficients are summed to those already
present. Because the analysis is a linear operation, this allows a
signal to be analyzed in blocks, by making multiple calls
to `analyze()`

with non-overlapping ranges that together
cover the entire signal. For efficiency, the blocks should
be large, as in
`analyze(first_131072_samples, 0, 131072, coefs)`

,
`analyze(next_131072_samples, 131072, 262144, coefs)`

,
etc.

void synthesize(const coefs<T> &coefs, int64_t t0, int64_t t1, T *signal) const;

Synthesize signal samples from the coefficients `coefs`

and store them at `*signal`

.

`coefs`

- The coefficients to synthesize the signal from.
`t0`

- The point in time of the first sample to synthesize,
in samples, using the same time scale as in
`analyze()`

. `t1`

- The point in time of the sample one past the last one to synthesize.
`signal`

- The synthesized signal samples will be written here,
beginning with the sample from time
`t0`

and and ending with the last sample before time`t1`

, for a total of`t1 - t0`

samples.

The time range `t0`

...`t1`

may extend outside
the range analyzed using `analyze()`

, in which case the
signal is assumed to be zero in the un-analyzed range.

A signal may be synthesized in blocks by making multiple calls to
`analyze()`

with different sample ranges. For efficiency,
the blocks should be large, and each `t0`

should
be multiple of a large power of two.

The frequency bands of the analysis filter bank are numbered by non-negative integers that increase towards lower (sic) frequencies. Although this numbering may seem backwards, it makes a certain amount of sense for constant-Q analysis where the frequency range has a hard upper limit (the Nyquist frequency) but no hard lower limit. Numbering the bands from high to low frequency allows the frequency range to be extended into arbitrarily low frequencies without using negative band numbers or having to renumber existing bands. It also maps logically to computer graphics coordinate systems where the Y coordinate increases downwards.

When using a logarithmic frequency scale,
there is a number of *bandpass bands* corresponding to the
logarithmically spaced bandpass analysis filters, from near 0.5
(the Nyquist frequency, or half the sample rate) to
near f_{min}, and a single *lowpass band* containing the
residual signal from frequencies below f_{min}.
Note that there is no special highpass band; frequencies at or
close to the Nyquist frequency are included in the highest-frequency
bandpass band(s).

When using a linear or mel frequency scale, all the bands are
considered *bandpass bands* and there is no special *lowpass band*.

The band numbering can be examined using the following methods:

int bands_begin() const;

Return the lowest valid band number. This is always 0.

int bands_end() const;

Return the highest valid band number plus one. This is also the total number of bands.

int bandpass_bands_begin() const;

Return the lowest valid bandpass band number, corresponding to the
highest-frequency bandpass band. This is currently 0, but should
a future version of the Gaborator support a highpass band, that will
become band 0 and `bandpass_bands_begin()`

will return 1.

int bandpass_bands_end() const;

Return the highest valid bandpass band number plus one, corresponding to one past the lowest-frequency bandpass band.

int band_lowpass() const;

For an anzlyer using a logarithmic frequency scale only,
return the band number of the lowpass band. This returns the same
number as `bandpass_bands_end()`

, but is preferred for
clarity when referring to the lowpass band rather than
the excluded upper bound of the range of bandpass bands.

int band_ref() const;

For an anzlyer using a logarithmic frequency scale only,
return the band number corresponding to the reference frequency
`ff_ref`

. If `ff_ref`

falls within
the frequency range of the bandpass filter bank, this is
a valid bandpass band number, otherwise it is not.

double band_ff(int band) const;

Return the center frequency of band number *band*, in units of the
sample rate. The center frequency of the lowpass band (if present) is 0.

double bandpass_band_ff(double band) const;

Return the center frequency of bandpass band number *band*, in
units of the sample rate. Unlike `band_ff`

, this takes a
floating point argument and supports interpolating between bands by
giving a non-integer band number, and/or extrapolating outside the valid
range of bandpass band numbers.

double band_q(double band) const;

Return the Q (quality factor) of the analysis filter of bandpass
band *band*. Q is defined as the -3 dB bandwidth divided by
the center frequency. When using a logarithmic frequency scale, the
bands form a constant-Q filter bank and Q is the same for all bands,
and when using a linear or mel frequency scale, Q will vary from band
to band.

double band_analysis_support(int band) const;

Return the one-sided time domain *support* of any of the analysis
filter for band *band*.
When calling `analyze()`

with a sample at time *t*,
the spectrogram coefficients of band *band* will change
significantly only within the time range *t ± support*.
Coefficients outside the range may change slightly,
but the changes will sufficiently small that they may be ignored without
significantly reducing accuracy.

double analysis_support() const;

Return the largest `band_analysis_support(band)`

of any band.

double band_synthesis_support(int band) const;

Returns the one-sided time domain *support* of the
reconstruction filter for band *band*. When
calling `synthesize()`

to synthesize a sample at
time *t*, the sample will only be significantly affected by
spectrogram coefficients of band *band* in the time range *t
± support*. Coefficients outside the range may be used in
the synthesis, but substituting zeroes for the actual coefficient
values will not significantly reduce accuracy.

double synthesis_support() const;

Return the largest `band_synthesis_support(band)`

of any band.

};

template <class T, class F, class C0, class... CI> void process(F f, int b0, int b1, int64_t t0, int64_t t1, coefs<T, C0> &coefs0, coefs<T, CI>&... coefsi);

Process one or more coefficient sets `coefs0`

... by applying
the function `f`

to each coefficient present in `coefs0`

,
in an indeterminate order.

This can be optionally limited to coefficients whose
band number *b* and sample time *t* satisfy
`b0`

≤ *b* < `b1`

and
`t0`

≤ *t* < `t1`

.
To process every coefficient present
in `coefs0`

, pass `INT_MIN, INT_MAX, INT64_MIN, INT64_MAX`

for the arguments `b0`

, `b1`

, `t0`

,
and `t1`

, respectively.

The function `f`

should have the call signature

template <class T> void f(int b, int64_t t, std::complex<T> &c0, std::complex<T> &ci...);

where

`b`

- The band number of the frequency band the coefficients
`c0`

and`ci...`

pertain to. This may be either a bandpass band or the lowpass band. `t`

- The point in time the coefficients
`c0`

and`ci...`

pertain to, in samples `c0`

- A reference to a complex coefficient from
`coefs0`

`ci...`

- Optional references to complex coefficients from the additional
coefficient sets
`coefsi...`

.

The function `f`

may read and/or modify each of the
coefficients passed through `c0`

and each
`ci...`

.

The first coefficient set `c0`

is a special case when
it comes to the treatment of missing values. Coefficients missing
from `c0`

will not be iterated over at all, but when a
coefficient *is* iterated over and is missing from one of the additional
coefficient sets `ci...`

, it will be automatically created
and initialized to zero in that additional coefficient set.

For example, in the common case where the processing takes one input
and produces one output, `c0`

should be the input
and `c1`

should be the output. This ensures that the
entire input is iterated over, and that the output coefficients get
created as needed.

*Note: The template parameters C0
and CI... exist to support the processing of coefficient
sets containing data of types other
than std::complex<T>, which is not currently part of the
documented API. In typical use, there is no need to specify them
because the template parameter list
can be deduced, but if they are explicitly specified, they should all
be std::complex<T>.
*

template <class T, class F, class C0, class... CI> void fill(F f, int b0, int b1, int64_t t0, int64_t t1, coefs<T, C0> &coefs0, coefs<T, CI>&... coefsi);

Fill a region of the time-frequency plane with coefficients
and apply the function `f`

to each.

This works like `process()`

except that it is not limited
to processing coefficients that already exist in `coefs0`

;
instead, any missing coefficients in `coefs0`

as well as
any of the `coefsi`

... are created and initialized to zero
before `f`

is called.

The `t0`

and `t1`

arguments must specify an
explicit, bounded time range — they must not be given as
INT64_MIN and/or INT64_MAX as that would mean creating coefficients
for an an astronomically large time range, requiring a correspondingly
astronomical amount of memory.

template <class T> void forget_before(const analyzer<T> &a, coefs<T> &c, int64_t limit);

Allow the coefficients for points in time before `limit`

(a time in units of samples) to be forgotten.
Streaming applications can use this to free memory used by coefficients
that are no longer needed. Coefficients that have been forgotten will
read as zero. This does not guarantee that all coefficients before
`limit`

are forgotten, only that ones for
`limit`

or later are not, and that the amount of memory
consumed by any remaining coefficients before `limit`

is
bounded.

Prior to version 1.5, iteration over the coefficients was done
using the `apply()`

function.
It is similar to `process()`

, except that it

- requires an additional
`analyzer`

argument, - takes arguments in a different order,
- applies a function
`f`

taking arguments in a different order, - does not support restricting the processing to a range of band numbers,
- only supports iterating over a single coefficient set, and
- provides default values for t0 and t1.

In new code, `process()`

is preferred.

template <class T, class F> void apply(const analyzer<T> &a, coefs<T> &c, F f, int64_t t0 = INT64_MIN, int64_t t1 = INT64_MAX);

Apply the function `f`

to each coefficient in the coefficient
set `c`

for points in time *t* that satisfy
`t0`

≤ *t* < `t1`

.
If the `t0`

and `t1`

arguments are omitted, `f`

is applied to every coefficient.

`a`

- The spectrum analyzer that produced the coefficients
`c`

`c`

- A set of spectrogram coefficients
`f`

- A function to apply to each coefficient in
`c`

, with the call signaturetemplate <class T> void f(std::complex<T> &coef, int band, int64_t t);

`coef`

- A reference to a single complex coefficient. This may be read and/or modified.
`band`

- The band number of the frequency band the coefficient
`coef0`

pertains to. This may be either a bandpass band or the lowpass band. `t`

- The point in time the coefficient
`c0`

pertains to, in samples

`t0`

- When not
`INT64_MIN`

, only apply`f`

to the coefficients for time ≥`t0`

`t1`

- When not
`INT64_MAX`

, only apply`f`

to the coefficients for time <`t1`