gaborator.h
template<class T> class analyzer;
The Gaborator supports three types of frequency scales:
logarithmic, linear, and mel scales. Each is represented by a class
deriving from the abstract base class fq_scale
.
class fq_scale {
Comparison operators are provided for compatibility with standard container classes.
bool operator<(const fq_scale &rhs) const; bool operator==(const fq_scale &rhs) const;
};
The log_fq_scale
class represents a logarithmic
frequency scale, corresponding to a constant-Q transform or
constant-Q spectrogram.
class log_fq_scale: public fq_scale {
log_fq_scale(double bands_per_octave, double ff_min, double ff_ref = 0.5);
bands_per_octave
ff_min
ff_min
.
Values from 0.0001 up to but not including 0.5 are supported.
Values below 0.0001 probably work but are not regularly tested.ff_ref
ff_ref
. If ff_ref
falls outside the frequency range of the bandpass filter bank, this
works as if the range were extended to include
ff_ref
. Must be positive. A typical value
when analyzing music is 440.0 / fs
, where
fs
is the sample rate in Hz.
};
The lin_fq_scale
class represents a linear
frequency scale, corresponding to a short-term Fourier transform
(STFT) or constant-bandwidth spectrogram.
class lin_fq_scale: public fq_scale {
lin_fq_scale(double size);
size
The size of the transform. This
corresponds to the "FFT size" parameter that is typically used with
FFT based spectrum analyzers: the frequency spacing between bands is
the sample rate divided by size, and the number of frequency
bands produced is roughly half of size because negative
frequencies are omitted. However, in the Gaborator, the size
parameter is not limited to powers of two, or even to integer values,
so any arbitrary band spacing of x Hz can be achieved by
specifying a size of fs / x
where
fs
is the sample rate in Hz.
Values from 2 to 10000 are supported. Larger values probably work but are not regularly tested.
};
The mel_fq_scale
class represents a mel
frequency scale, a scale designed to mimic human pitch perception
such that one mel is close to the smallest perceptible difference
in frequency.
It is defined by the formula mel = 2595
log10(1 + f / 700), where f
is the frequency in Hz.
The mel scale is approximately logarithmic at high frequencies, and approximately linear at low frequencies, with a smooth transition between the two.
class mel_fq_scale: public fq_scale {
mel_fq_scale(double bands_per_mel, double fs);
bands_per_mel
The number of frequency bands per mel. Supported values range from 0.003 (yielding a total of 12 frequency bands between 0 and 22.05 kHz at a 44.1 kHz sample rate) to 1.0 (yielding some 3924 bands). Values larger than 1.0 probably work but are not regularly tested.
The minimum of 0.003 applies when the default overlap of 0.7 is used; if the overlap is increased from the default, the minimum should be increased proportionally. Also, if the sample rate is lower than 44.1 kHz, the minimum should be increased in inverse proportion to the sample rate. For example, at a sample rate 22.05 kHz and the default overlap of 0.7, the minimum number of bands per mel is 0.008).
fs
};
enum class coef_phase { global, local };
The spectrogram coefficients produced by the Gaborator are complex numbers, and as such they have both a magnitude and a phase. The phase can be defined in several different ways, using different reference points for what is considered a phase of zero.
Under one possible definition, an input signal of a real and positive impulse at some time t will cause all spectrogram coefficients at that time t to also be real and positive, or in other words, to have a phase of zero, for any frequency. We call this the local phase convention, and it is the default beginning with version 2 of the Gaborator.
Under an alternative definition, an input signal of a cosine wave at some frequency f will cause all spectrogram coefficients at that frequency f to have a phase of zero, at any point in time. We call this the global phase convention, and it was the default in version 1 of the Gaborator.
The local phase convention is preferred because it guarantees that the coefficients for a given point in time only depend on the local signal behavior around that point in time, and not on how far that point in time is from the reference point of t=0. This avoids a loss of floating point precision for signals far from t=0.
A parameters
object holds a set of parameters for
the spectrum analysis and resynthesis.
class parameters {
The only required parameter is the frequency scale, which is passed
as an argument to the parameters
constructor. Optional
parameters may be specified by assigning to data members of
the parameters
object after construction.
parameters(const fq_scale &scale);
For backwards compatibility with version 1 of the Gaborator, the following constructor is also supported:
parameters(double bands_per_octave, double ff_min, double ff_ref = 1.0);
This constructs a set of parameters with a logarithmic frequency scale, like
parameters(log_fq_scale(bands_per_octave, ff_min, ff_ref))
except that the phase
member is initialized to the version 1 default
of coef_phase::global
rather than the version 2 default of
coef_phase::local
.
The phase convention
to be used for the analysis and resynthesis
can be set or examined using the public data member phase
.
coef_phase phase;
The amount of overlap between
adjacent bandpass bands can be set or examined using the public data
member overlap
. The default is 0.7, meaning each band
will have a standard deviation of 0.7 times the band spacing.
double overlap;
Comparison operators are provided for compatibility with standard container classes.
bool operator<(const parameters &rhs) const; bool operator==(const parameters &rhs) const;
};
A coefs
object stores a set of spectrogram coefficients.
It is a dynamic data structure and will be automatically grown to
accommodate new time ranges, for example as newly recorded audio is analyzed.
The template argument T
must match that of the analyzer
(usually float
).
The template argument C
is the data type used to store each
coefficient value; there is usually no need to specify it explicitly as
it will default to std::complex<T>
.
template <class T, class C = std::complex<T>> class coefs {
coefs(analyzer<T> &a);
Construct an empty set of coefficients for use with the spectrum
analyzer a
. This represents a signal that is zero
at all points in time.
};
The analyzer
object performs spectrum analysis and/or resynthesis
according to the given parameters. The template argument T
is
the floating-point type to use for the calculations. This is typically float
;
alternatively, double
can be used for increased accuracy at the
expense of speed and memory consumption.
template <class T> class analyzer {
analyzer(const parameters ¶ms);
params
void analyze(const T *signal, int64_t t0, int64_t t1, coefs<T> &coefs) const;
Spectrum analyze the samples at *signal
and add the
resulting coefficients to coefs
.
signal
t0
and ending with the last sample before time t1
, for a total of
t1 - t0
samples.t0
signal[0]
was taken,
in samples. For example, when analyzing an audio recording, this is typically
0 for the first sample in the recording, but this reference point is arbitrary,
and negative times are valid. When using the global phase convention,
accuracy begins to successively decrease outside the range of about
±108 samples, so using large time values should be
avoided. When using the local phase convention, no such loss of accuracy
occurs.
t1
signal
,
in samples.
coefs
If the coefs
object already contains some
coefficients, the new coefficients are summed to those already
present. Because the analysis is a linear operation, this allows a
signal to be analyzed in blocks, by making multiple calls
to analyze()
with non-overlapping ranges that together
cover the entire signal. For efficiency, the blocks should
be large, as in
analyze(first_131072_samples, 0, 131072, coefs)
,
analyze(next_131072_samples, 131072, 262144, coefs)
,
etc.
void synthesize(const coefs<T> &coefs, int64_t t0, int64_t t1, T *signal) const;
Synthesize signal samples from the coefficients coefs
and store them at *signal
.
coefs
t0
analyze()
.t1
signal
t0
and
and ending with the last sample before time t1
,
for a total of t1 - t0
samples.The time range t0
...t1
may extend outside
the range analyzed using analyze()
, in which case the
signal is assumed to be zero in the un-analyzed range.
A signal may be synthesized in blocks by making multiple calls to
analyze()
with different sample ranges. For efficiency,
the blocks should be large, and each t0
should
be a multiple of a large power of two.
The frequency bands of the analysis filter bank are numbered by non-negative integers that increase towards lower (sic) frequencies. Although this numbering may seem backwards, it makes a certain amount of sense for constant-Q analysis where the frequency range has a hard upper limit (the Nyquist frequency) but no hard lower limit. Numbering the bands from high to low frequency allows the frequency range to be extended into arbitrarily low frequencies without using negative band numbers or having to renumber existing bands. It also maps logically to computer graphics coordinate systems where the Y coordinate increases downwards.
When using a logarithmic frequency scale, there is a number of bandpass bands corresponding to the logarithmically spaced bandpass analysis filters, from near 0.5 (the Nyquist frequency, or half the sample rate) to near fmin, and a single lowpass band containing the residual signal from frequencies below fmin. Note that there is no special highpass band; frequencies at or close to the Nyquist frequency are included in the highest-frequency bandpass band(s).
When using a linear or mel frequency scale, all the bands are considered bandpass bands and there is no special lowpass band.
The band numbering can be examined using the following methods:
int bands_begin() const;
Return the lowest valid band number. This is always 0.
int bands_end() const;
Return the highest valid band number plus one. This is also the total number of bands.
int bandpass_bands_begin() const;
Return the lowest valid bandpass band number, corresponding to the
highest-frequency bandpass band. This is currently 0, but should
a future version of the Gaborator support a highpass band, that will
become band 0 and bandpass_bands_begin()
will return 1.
int bandpass_bands_end() const;
Return the highest valid bandpass band number plus one, corresponding to one past the lowest-frequency bandpass band.
int band_lowpass() const;
For an analyzer using a logarithmic frequency scale only,
return the band number of the lowpass band. This returns the same
number as bandpass_bands_end()
, but is preferred for
clarity when referring to the lowpass band rather than
the excluded upper bound of the range of bandpass bands.
int band_ref() const;
For an analyzer using a logarithmic frequency scale only,
return the band number corresponding to the reference frequency
ff_ref
. If ff_ref
falls within
the frequency range of the bandpass filter bank, this is
a valid bandpass band number, otherwise it is not.
double band_ff(int band) const;
Return the center frequency of band number band, in units of the sample rate. The center frequency of the lowpass band (if present) is 0.
double bandpass_band_ff(double band) const;
Return the center frequency of bandpass band number band, in
units of the sample rate. Unlike band_ff
, this takes a
floating point argument and supports interpolating between bands by
giving a non-integer band number, and/or extrapolating outside the valid
range of bandpass band numbers.
double band_q(double band) const;
Return the Q (quality factor) of the analysis filter of bandpass band band. Q is defined as the -3 dB bandwidth divided by the center frequency. When using a logarithmic frequency scale, the bands form a constant-Q filter bank and Q is the same for all bands, and when using a linear or mel frequency scale, Q will vary from band to band.
double band_analysis_support(int band) const;
Return the one-sided time domain support of the analysis
filter for band band.
When calling analyze()
with a sample at time t,
the spectrogram coefficients of band band will change
significantly only within the time range t ± support.
Coefficients outside the range may change slightly,
but the changes will sufficiently small that they may be ignored without
significantly reducing accuracy.
double analysis_support() const;
Return the largest band_analysis_support(band)
of any band.
double band_synthesis_support(int band) const;
Returns the one-sided time domain support of the
reconstruction filter for band band. When
calling synthesize()
to synthesize a sample at
time t, the sample will only be significantly affected by
spectrogram coefficients of band band in the time range t
± support. Coefficients outside the range may be used in
the synthesis, but substituting zeroes for the actual coefficient
values will not significantly reduce accuracy.
double synthesis_support() const;
Return the largest band_synthesis_support(band)
of any band.
};
template <class T, class F, class C0, class... CI> void process(F f, int b0, int b1, int64_t t0, int64_t t1, coefs<T, C0> &coefs0, coefs<T, CI>&... coefsi);
Process one or more coefficient sets coefs0
... by applying
the function f
to each coefficient present in coefs0
,
in an indeterminate order.
This can be optionally limited to coefficients whose
band number b and sample time t satisfy
b0
≤ b < b1
and
t0
≤ t < t1
.
To process every coefficient present
in coefs0
, pass INT_MIN, INT_MAX, INT64_MIN, INT64_MAX
for the arguments b0
, b1
, t0
,
and t1
, respectively.
The function f
should have the call signature
template <class T> void f(int b, int64_t t, std::complex<T> &c0, std::complex<T> &ci...);
where
b
c0
and ci...
pertain to.
This may be either a bandpass band or the lowpass band.t
c0
and
ci...
pertain to, in samplesc0
coefs0
ci...
coefsi...
.The function f
may read and/or modify each of the
coefficients passed through c0
and each
ci...
.
The first coefficient set c0
is a special case when
it comes to the treatment of missing values. Coefficients missing
from c0
will not be iterated over at all, but when a
coefficient is iterated over and is missing from one of the additional
coefficient sets ci...
, it will be automatically created
and initialized to zero in that additional coefficient set.
For example, in the common case where the processing takes one input
and produces one output, c0
should be the input
and c1
should be the output. This ensures that the
entire input is iterated over, and that the output coefficients get
created as needed.
Note: The template parameters C0
and CI
... exist to support the processing of coefficient
sets containing data of types other
than std::complex<T>
, which is not currently part of the
documented API. In typical use, there is no need to specify them
because the template parameter list
can be deduced, but if they are explicitly specified, they should all
be std::complex<T>
.
template <class T, class F, class C0, class... CI> void fill(F f, int b0, int b1, int64_t t0, int64_t t1, coefs<T, C0> &coefs0, coefs<T, CI>&... coefsi);
Fill a region of the time-frequency plane with coefficients
and apply the function f
to each.
This works like process()
except that it is not limited
to processing coefficients that already exist in coefs0
;
instead, any missing coefficients in coefs0
as well as
any of the coefsi
... are created and initialized to zero
before f
is called.
The t0
and t1
arguments must specify an
explicit, bounded time range — they must not be given as
INT64_MIN and/or INT64_MAX as that would mean creating coefficients
for an an astronomically large time range, requiring a correspondingly
astronomical amount of memory.
template <class T> void forget_before(const analyzer<T> &a, coefs<T> &c, int64_t limit);
Allow the coefficients for points in time before limit
(a time in units of samples) to be forgotten.
Streaming applications can use this to free memory used by coefficients
that are no longer needed. Coefficients that have been forgotten will
read as zero. This does not guarantee that all coefficients before
limit
are forgotten, only that ones for
limit
or later are not, and that the amount of memory
consumed by any remaining coefficients before limit
is
bounded.
Prior to version 1.5, iteration over the coefficients was done
using the apply()
function.
It is similar to process()
, except that it
analyzer
argument,f
taking arguments in a different order,In new code, process()
is preferred.
template <class T, class F> void apply(const analyzer<T> &a, coefs<T> &c, F f, int64_t t0 = INT64_MIN, int64_t t1 = INT64_MAX);
Apply the function f
to each coefficient in the coefficient
set c
for points in time t that satisfy
t0
≤ t < t1
.
If the t0
and t1
arguments are omitted, f
is applied to every coefficient.
a
c
c
f
c
,
with the call signature
template <class T> void f(std::complex<T> &coef, int band, int64_t t);
coef
band
coef
pertains to.
This may be either a bandpass band or the lowpass band.t
coef
pertains to, in samplest0
INT64_MIN
, only apply f
to the coefficients for time ≥ t0
t1
INT64_MAX
, only apply f
to the coefficients for time < t1