This example shows how to process streaming audio a block at a time, rather than operating on a complete recording at once as in the previous examples.
This program doesn't do anything particularly useful — it just
inverts the phase of the signal, but not using the obvious method of
changing the sign of each sample but by changing the sign of each
spectrogram coefficient. Consider the expression coef =
-coef
a placeholder for your own streaming filter or effect
code.
#include <memory.h> #include <iostream> #include <sndfile.h> #include <gaborator/gaborator.h> int main(int argc, char **argv) { if (argc < 3) { std::cerr << "usage: stream input.wav output.wav\n"; exit(1); }
We again use libsndfile to read the input and write the output. To keep it simple, this example only handles mono files.
SF_INFO sfinfo; memset(&sfinfo, 0, sizeof(sfinfo)); SNDFILE *sf_in = sf_open(argv[1], SFM_READ, &sfinfo); if (! sf_in) { std::cerr << "could not open input audio file: " << sf_strerror(sf_in) << "\n"; exit(1); } if (sfinfo.channels != 1) { std::cerr << "only mono files are supported\n"; exit(1); } double fs = sfinfo.samplerate; SNDFILE *sf_out = sf_open(argv[2], SFM_WRITE, &sfinfo); if (! sf_out) { std::cerr << "could not open output audio file: " << sf_strerror(sf_out) << "\n"; exit(1); }
The next couple of lines work around a design flaw in libsndfile. By default, when reading a 16-bit audio file as floating point data and then writing them as another 16-bit audio file, libsndfile will use slightly different scale factors on input and output, and the output will not be bit-identical to the input. To make it easier to verify that this example actually yields correct results to within the full 16-bit precision, we select a non-normalized floating point representation, which does not suffer from this flaw.
sf_command(sf_in, SFC_SET_NORM_FLOAT, NULL, SF_FALSE); sf_command(sf_out, SFC_SET_NORM_FLOAT, NULL, SF_FALSE);
As in Example 1, the parameters are chosen for analyzing music, but to reduce the latency, the number of frequency bands per octave is reduced from 48 to 12 (one per semitone), and the lower frequency limit of the bandpass filter bank is raised from 20 Hz to 200 Hz.
gaborator::log_fq_scale scale(12, 200.0 / fs, 440.0 / fs); gaborator::parameters params(scale); gaborator::analyzer<float> analyzer(params);
The spectrogram coefficients are calculated by applying symmetric
FIR filters to the audio signal. This means a spectrogram coefficient
for any given point in time t is a weighted average of samples
from both before and after t, representing both past and future
signal. The width of the filter impulse response depends on the
bandwidth, which in turn depends on the center frequency of its band.
The lowest-frequency filters have the narrowest bandwidths, and
therefore the widest impulses response, and need the greatest amount
of past and future signal. The width of the filter impulse response
is called its support, and the worst-case (widest) support of
any analysis filter can be found by calling the function
gaborator::analyzer::analysis_support()
. This returns
the one-sided support, the width of the impulse
response to each side of its center, as a floating point number.
To be on the safe side, let's round this up to the next integer:
size_t analysis_support = ceil(analyzer.analysis_support());
Similarly, when resynthesizing audio from coefficients, calculating
a sample at time t involves applying symmetric FIR
reconstruction filters, calculating a weighted average of both past and
future spectrogram coefficients. The support of the widest reconstruction
filter can be calculated by calling
gaborator::analyzer::synthesis_support()
:
size_t synthesis_support = ceil(analyzer.synthesis_support());
In a real-time application, the need to access future signal samples and/or coefficients causes latency. A real-time audio analysis application that needs to examine the coefficients for time t can only do so when it has received the input samples up to time t + analysis_support, and therefore has a minimum latency of analysis_support. A real-time filtering or effect application, such as the present example, incurs latency from both analysis and reconstruction filters, and can only produce the output sample for time t once it has received the input samples up to t + analysis_support + synthesis_support, for a minimum latency of analysis_support + synthesis_support. Let's print this total latency to standard output:
std::cerr << "latency: " << analysis_support + synthesis_support << " samples\n";
In a practical real-time system, there will be additional latency caused by processing the signal in blocks of samples rather than a sample at a time. Since the block size is a property of the overall system, and causes latency even if the Gaborator is not involved, that latency is considered outside the scope of this discussion.
To mimic a typical real-time system, the audio is processed
in fixed-size blocks (here, 1024 samples). If the size
of the input file is not divisible by the block size, the last block
is padded with zeroes.
The variable t_in
keeps track of time, indicating
the sampling time of the first sample of the current input block,
in units of samples.
gaborator::coefs<float> coefs(analyzer); const size_t blocksize = 1024; std::vector<float> buf(blocksize); int64_t t_in = 0; for (;;) { sf_count_t n_read = sf_readf_float(sf_in, buf.data(), blocksize); if (n_read == 0) break; if (n_read < blocksize) std::fill(buf.data() + n_read, buf.data() + blocksize, 0);
Now we can spectrum analyze the current block of samples. Note how
the time range,
t_in
...t_in + blocksize
,
is explicitly passed to analyze()
.
analyzer.analyze(buf.data(), t_in, t_in + blocksize, coefs);
The call to analyze()
updates the coefficients
for the time range from t_in - analysis_support
to
t_in + blocksize + analysis_support
. The oldest
blocksize
samples of this time range,
that is, from t_in - analysis_support
to
t_in - analysis_support + blocksize
, were now updated for
the last time and will not be affected by future input blocks.
Therefore, it is now safe to examine and/or modify these
coefficients as required by your application. Here, by way
of example, we simply change their signs to invert the phase of the signal.
Note that unlike the earlier filter example where process()
applied a function to all the coefficients, here it is applied only to
the coefficients within a limited time range.
process( [&](int, int64_t, std::complex<float> &coef) { coef = -coef; }, INT_MIN, INT_MAX, t_in - (int)analysis_support, t_in - (int)analysis_support + (int)blocksize, coefs);
Next, we will generate a block of output samples. To get correct results,
we can only generate output when the coefficients that the output samples
depend on will no longer change. Specifically, a resynthesized audio
sample for time t
will depend on the coefficients of the
time range t - synthesis_support
...t +
synthesis_support
. To ensure that the resynthesis uses only
coefficients that have already been processed by
the process()
call above, the most recent block of samples
that can safely be resynthesized ranges from t_out = t_in -
analysis_support - synthesis_support
to t_out +
blocksize
.
int64_t t_out = t_in - analysis_support - synthesis_support; analyzer.synthesize(coefs, t_out, t_out + blocksize, buf.data());
The synthesized audio can now be written to the output file:
sf_count_t n_written = sf_writef_float(sf_out, buf.data(), blocksize); if (n_written != blocksize) { std::cerr << "write error\n"; exit(1); }
Coefficients older than t_out + blocksize - synthesis_support
will no longer be needed to synthesize the next block of output signal, so
we can now forget them to free up the memory they used:
forget_before(analyzer, coefs, t_out + blocksize - synthesis_support);
This concludes the block-by-block processing loop.
t_in += blocksize; }
sf_close(sf_in); sf_close(sf_out); return 0; }
Like the previous ones, this example can also be built using a one-line build command:
c++ -std=c++11 -I.. -O3 -ffast-math $(pkg-config --cflags sndfile) stream.cc $(pkg-config --libs sndfile) -o stream
Or using the vDSP FFT on macOS:
c++ -std=c++11 -I.. -O3 -ffast-math -DGABORATOR_USE_VDSP $(pkg-config --cflags sndfile) stream.cc $(pkg-config --libs sndfile) -framework Accelerate -o stream
Or using PFFFT (see Example 1 for how to download and build PFFFT):
c++ -std=c++11 -I.. -Ipffft -O3 -ffast-math -DGABORATOR_USE_PFFFT $(pkg-config --cflags sndfile) stream.cc pffft/pffft.o pffft/fftpack.o $(pkg-config --libs sndfile) -o stream
Running the following shell commands will download an example audio file containing an impulse (a single sample of maximum amplitude) padded with silence to a total of 65536 samples, and process it.
wget http://download.gaborator.com/audio/impulse.wav ./stream impulse.wav impulse_streamed.wav
The file impulse_streamed.wav
will be identical to
impulse.wav
except that the impulse will be of
opposite polarity, and delayed by the latency of
analysis_support + synthesis_support
samples.