Example 3: Streaming

Introduction

This example shows how to process streaming audio a block at a time, rather than operating on a complete recording at once as in the previous examples.

This program doesn't do anything particularly useful — it just inverts the phase of the signal, but not using the obvious method of changing the sign of each sample but by changing the sign of each spectrogram coefficient. Consider the expression coef = -coef a placeholder for your own streaming filter or effect code.

Preamble

#include <memory.h>
#include <iostream>
#include <sndfile.h>
#include <gaborator/gaborator.h>

int main(int argc, char **argv) {
    if (argc < 3) {
        std::cerr << "usage: stream input.wav output.wav\n";
        exit(1);
    }

Opening the Streams

We again use libsndfile to read the input and write the output. To keep it simple, this example only handles mono files.

    SF_INFO sfinfo;
    memset(&sfinfo, 0, sizeof(sfinfo));
    SNDFILE *sf_in = sf_open(argv[1], SFM_READ, &sfinfo);
    if (! sf_in) {
        std::cerr << "could not open input audio file: "
            << sf_strerror(sf_in) << "\n";
        exit(1);
    }
    if (sfinfo.channels != 1) {
        std::cerr << "only mono files are supported\n";
        exit(1);
    }
    double fs = sfinfo.samplerate;

    SNDFILE *sf_out = sf_open(argv[2], SFM_WRITE, &sfinfo);
    if (! sf_out) {
        std::cerr << "could not open output audio file: "
            << sf_strerror(sf_out) << "\n";
        exit(1);
    }

The next couple of lines work around a design flaw in libsndfile. By default, when reading a 16-bit audio file as floating point data and then writing them as another 16-bit audio file, libsndfile will use slightly different scale factors on input and output, and the output will not be bit-identical to the input. To make it easier to verify that this example actually yields correct results to within the full 16-bit precision, we select a non-normalized floating point representation, which does not suffer from this flaw.

    sf_command(sf_in, SFC_SET_NORM_FLOAT, NULL, SF_FALSE);
    sf_command(sf_out, SFC_SET_NORM_FLOAT, NULL, SF_FALSE);

Spectrum Analysis Parameters

As in Example 1, the parameters are chosen for analyzing music, but to reduce the latency, the number of frequency bands per octave is reduced from 48 to 12 (one per semitone), and the lower frequency limit of the bandpass filter bank is raised from 20 Hz to 200 Hz.

    gaborator::log_fq_scale scale(12, 200.0 / fs, 440.0 / fs);
    gaborator::parameters params(scale);
    gaborator::analyzer<float> analyzer(params);

Calculating Latency

The spectrogram coefficients are calculated by applying symmetric FIR filters to the audio signal. This means a spectrogram coefficient for any given point in time t is a weighted average of samples from both before and after t, representing both past and future signal. The width of the filter impulse response depends on the bandwidth, which in turn depends on the center frequency of its band. The lowest-frequency filters have the narrowest bandwidths, and therefore the widest impulses response, and need the greatest amount of past and future signal. The width of the filter impulse response is called its support, and the worst-case (widest) support of any analysis filter can be found by calling the function gaborator::analyzer::analysis_support(). This returns the one-sided support, the width of the impulse response to each side of its center, as a floating point number. To be on the safe side, let's round this up to the next integer:

    size_t analysis_support = ceil(analyzer.analysis_support());

Similarly, when resynthesizing audio from coefficients, calculating a sample at time t involves applying symmetric FIR reconstruction filters, calculating a weighted average of both past and future spectrogram coefficients. The support of the widest reconstruction filter can be calculated by calling gaborator::analyzer::synthesis_support():

    size_t synthesis_support = ceil(analyzer.synthesis_support());

In a real-time application, the need to access future signal samples and/or coefficients causes latency. A real-time audio analysis application that needs to examine the coefficients for time t can only do so when it has received the input samples up to time t + analysis_support, and therefore has a minimum latency of analysis_support. A real-time filtering or effect application, such as the present example, incurs latency from both analysis and reconstruction filters, and can only produce the output sample for time t once it has received the input samples up to t + analysis_support + synthesis_support, for a minimum latency of analysis_support + synthesis_support. Let's print this total latency to standard output:

    std::cerr << "latency: " << analysis_support + synthesis_support << " samples\n";

In a practical real-time system, there will be additional latency caused by processing the signal in blocks of samples rather than a sample at a time. Since the block size is a property of the overall system, and causes latency even if the Gaborator is not involved, that latency is considered outside the scope of this discussion.

Streaming

To mimic a typical real-time system, the audio is processed in fixed-size blocks (here, 1024 samples). If the size of the input file is not divisible by the block size, the last block is padded with zeroes. The variable t_in keeps track of time, indicating the sampling time of the first sample of the current input block, in units of samples.

    gaborator::coefs<float> coefs(analyzer);
    const size_t blocksize = 1024;
    std::vector<float> buf(blocksize);
    int64_t t_in = 0;
    for (;;) {
        sf_count_t n_read = sf_readf_float(sf_in, buf.data(), blocksize);
        if (n_read == 0)
            break;
        if (n_read < blocksize)
            std::fill(buf.data() + n_read, buf.data() + blocksize, 0);

Now we can spectrum analyze the current block of samples. Note how the time range, t_in...t_in + blocksize, is explicitly passed to analyze().

        analyzer.analyze(buf.data(), t_in, t_in + blocksize, coefs);

The call to analyze() updates the coefficients for the time range from t_in - analysis_support to t_in + blocksize + analysis_support. The oldest blocksize samples of this time range, that is, from t_in - analysis_support to t_in - analysis_support + blocksize, were now updated for the last time and will not be affected by future input blocks. Therefore, it is now safe to examine and/or modify these coefficients as required by your application. Here, by way of example, we simply change their signs to invert the phase of the signal. Note that unlike the earlier filter example where process() applied a function to all the coefficients, here it is applied only to the coefficients within a limited time range.

        process(
            [&](int, int64_t, std::complex<float> &coef) {
                 coef = -coef;
            },
            INT_MIN, INT_MAX,
            t_in - (int)analysis_support,
            t_in - (int)analysis_support + (int)blocksize,
            coefs);

Next, we will generate a block of output samples. To get correct results, we can only generate output when the coefficients that the output samples depend on will no longer change. Specifically, a resynthesized audio sample for time t will depend on the coefficients of the time range t - synthesis_support...t + synthesis_support. To ensure that the resynthesis uses only coefficients that have already been processed by the process() call above, the most recent block of samples that can safely be resynthesized ranges from t_out = t_in - analysis_support - synthesis_support to t_out + blocksize.

        int64_t t_out = t_in - analysis_support - synthesis_support;
        analyzer.synthesize(coefs, t_out, t_out + blocksize, buf.data());

The synthesized audio can now be written to the output file:

        sf_count_t n_written = sf_writef_float(sf_out, buf.data(), blocksize);
        if (n_written != blocksize) {
            std::cerr << "write error\n";
            exit(1);
        }

Coefficients older than t_out + blocksize - synthesis_support will no longer be needed to synthesize the next block of output signal, so we can now forget them to free up the memory they used:

        forget_before(analyzer, coefs, t_out + blocksize - synthesis_support);

This concludes the block-by-block processing loop.

        t_in += blocksize;
    }

Postamble

    sf_close(sf_in);
    sf_close(sf_out);
    return 0;
}

Compiling

Like the previous ones, this example can also be built using a one-line build command:

c++ -std=c++11 -I.. -O3 -ffast-math $(pkg-config --cflags sndfile) stream.cc $(pkg-config --libs sndfile) -o stream

Or using the vDSP FFT on macOS:

c++ -std=c++11 -I.. -O3 -ffast-math -DGABORATOR_USE_VDSP $(pkg-config --cflags sndfile) stream.cc $(pkg-config --libs sndfile) -framework Accelerate -o stream

Or using PFFFT (see Example 1 for how to download and build PFFFT):

c++ -std=c++11 -I.. -Ipffft -O3 -ffast-math -DGABORATOR_USE_PFFFT $(pkg-config --cflags sndfile) stream.cc pffft/pffft.o pffft/fftpack.o $(pkg-config --libs sndfile) -o stream

Running

Running the following shell commands will download an example audio file containing an impulse (a single sample of maximum amplitude) padded with silence to a total of 65536 samples, and process it.

wget http://download.gaborator.com/audio/impulse.wav
./stream impulse.wav impulse_streamed.wav

The file impulse_streamed.wav will be identical to impulse.wav except that the impulse will be of opposite polarity, and delayed by the latency of analysis_support + synthesis_support samples.