Sox Sampling rate conversion

The program xsndcardtest can be used to look at the effects of rate changes on the noise/distortion of the signal. I will look at the three rate change options provided by sox, the sound program under Linux. The three forms of rate change are are called rate, resample, and polyphase. rate is a simple linear interpolation scheme, and has the advantage of speed. resample uses the reconstruction of a signal from its discrete samples, and then resamples the signal at the different rate. The polyphase is by far the slowest technique. It is better at maintaining the spectrum near the sampling rate limit. As we will see below looking at the rate change of the test signal, the rate has by far the worst peformance, even if it is fast. It introduces noise only about 20dB below the signal level.

In each case we take a signal recorded at 44100. The signal has an equal number of sin waves signals per octave with random phases, and extends over 10 octaves. Ie it is a "pink noise" type of signal (equal energy per octave) but allows one to look for increases in noise of whatever form (distortions and noise) in the regions of the spectrum between the sin wave signals. This spectrum is far more typical of real signals than is the usual single sine wave or IM distortion double sine wave signals.

I use sox to convert the frequency of the source from 44100 up to 48000 using the three conversion techniques, and look at the spectrum and the noise. I then resample that 48000 samples/sec file back down to 44100. This gives a measure of the the combined conversion noise from both up and down conversions. Note that this is a far more stringent test of the conversion than say doubling or halving the rate. It is also more typical of the situation encounted in conversion since many soundcards (eg Soundblasters) only play at 48000 while all CDs are recorded at 44100.

The signal file is an 11MB file ( about 1 min).

Original Signal

This is the analysis of the original file. The signal is flat from 16Hz to 16KHz as shown in the top graph. The noise (signals at all frequencies outside the specific frequencies at which the original signal is located) is truncation error noise, and lies from 120dB to 100 dB below the signal. (the noise increases with frequency because of the larger number of noise frequency bands per octave as the frequencies increase.) The black stars indicate the noise level if there were 1 bit noise per frequency interval (1/rate). Note that the truncation error is significantly lower than 1 bit per frequency interval.

The signal amplitude is almost maximum (ie at some time, the signal has the maximum value which is 98% of the maximum allowed with 16 bit per sample.)

rate conversion

Command: sox play.wav -r 48000 play-rate-48000.wav rate
Time: 0.37user 0.06system 0:00.44elapsed 97%CPU

Note the severe falloff in sound level above about 3KHz, with a 5 dB falloff by 16KHz. Note also the huge noise level introduced by the simple linear interpolation conversion, increased by over 50dB at low frequencies to 70dB at higher frequencies.

Command: sox play-rate-48000.wav -r 44100 play-rate-44100.wav rate
Time:0.34user 0.06system 0:00.49elapsed 81%CPU

As would be expected the falloff is now double in dB from the previous case, and the noise has increased by about another 5-10dB across the band. This is attrocious performance and would be highly audible.

resample conversion

Command: sox play.wav -r 48000 play-resample-48000.wav resample
Time:1.35user 0.08system 0:01.50elapsed 95%CPU

Note that the noise level has increased hardly at all from the original (at most about 3 dB) and the roll off at high frequencies is minimal. The roll off is very sharp startingat about 14KHz but falls only about .5dB by 16KHz.

Command: sox play-resample-48000.wav -r 44100 play-resample-44100.wav resample
Time: 5.05user 0.07system 0:05.17elapsed 99%CPU

The time to down resample is significantly longer (factor of 4) than the time to up-resample. However, we notice that again the noise level has increased by about 5dB at most. The roll off at the highest frequencies is slightly worse than in the case of the upsampling, but very little.

These added noises would be inaudible, and correspond to the noise introduced into the signal by a very good soundcard itself. The mild rolloff at high frequencies would again by inaudible-- the frequency band from 16KHz to 22KHz being inaudible, no matter what the sound level to most men older than about 25, and to many women. Even for people who can hear these frequencies, the energy in any piece of music in this band is very small.

polyphase conversion

Command: sox play.wav -r 48000 play-polyphase-48000.wav polyphase
Time:11.31user 0.07system 0:11.62elapsed 97%CPU

The only difference from the resample is that there is no frequency roll-off at all at the highest frequencies. The noise level is essentially identical to that of the resample option. The conversion time is about 8 times slower than the resample option.

Command: sox play-polyphase-48000.wav -r 44100 play-polyphase-44100.wav polyphase
Time:11.26user 0.09system 0:11.62elapsed 97%CPU

The down and up conversion times are the same and again the noise level is the same as for the resample option. However, in this case there is only a difference of about 2 in the times between the down conversion of the resample and polyphase options. Again the only difference seems to be in the very small high frequency roll off of the resample option.
The only advantage of the polyphase conversion would seem to be the slightly better frequency behaviour at the highest frequencies. Given the huge time penalty that polyphase has over the resample conversion, especially for up-conversion, this slight advantage in noise/distortion performance does not seem to be worth much.

Copyright W. G. Unruh 2006. Please send any comments to . This document may be reproduced as long as the author's name is not removed, and as long as any changes to the document are clearly marked as such and are sent to the author.