-
Notifications
You must be signed in to change notification settings - Fork 1
Project Evaluation
A latency of around 20ms was chosen as this delay time would not be noticeable. However, in order to achieve this the maximum frequency resolution acheivable was 86Hz which would not be small enough to distinguish individual notes. As described in the next section,the deadline was compromised to 40ms in order to achieve a higher frequency resolution.
At 200ms delays, audio feedback becomes distracting and very noticeable. This was identified as being the hard real-time limit for our project. Audio must be acquired, shifted, and returned to the user within this time-frame. However, a trained musician can detect latencies around 20ms. An additional goal was set to reduce this latency down to <20ms using real-time techniques such as threading and callbacks. Although we were able to reduce our latency below this limit only at the expense of the accuracy of the frequency detection. A compromise was found with latencies of slightly above 40ms, which is noticeable to a trained ear, but still reasonable hard to detect.
It was decided that the practical deadline from audio input to output was to be <<200ms, with a softer goal of <20ms. However, we deemed anything under around 50ms to be a success.
Sampling was performed at 44100Hz in order to have high quality audio data while keeping the data-rate into the pi, and thus the amount of processing required, at a minimum.
Data was taken either in buffers of 512 samples for maximum temporal resolution with poor frequency resolution, or - in most cases - at 2048 samples to achieve a good frequency resolution while keeping temporal latency under 50ms.
Buffering is the main cause of latency in PitchPerfector, introducing up to 46ms of latency at the highest frequency resolution, or 11ms at the lowest frequency resolution.
Due to the high latency caused by buffering we were forced to minimise processing latency as much as possible. In order to do this, the FFTW3 library was used to perform fourier transforms. Despite being harder to use than other FFT libraries (KISS FFT for example), its low latency fft performance allowed us to maintain a reasonable latency even with our large buffer sizes.
This application uses a single channel input (microphone) and a single channel output (mono jack).
The I2C protocol, specifically designed for the transfer of audio data, was used for data acquisition in the project.
The Linux realtime kernel was used for the project to reduce the input latency. It modifies the inbuilt prioritisation of the input stream to optimise performance.
I2S data is captured through the pins on the raspberry Pi. The I2S data is converted to wave form audio and then processed with FFTs. The GUI displays both the waveform audio and the FFT data. Originally the audio output was to be done through the custom hardware but due to the changes in circumstances the built in DAC and AUX output on the raspberry pi was used.
How many threads are needed and/or how can the load be distributed to allow a responsive application?
3 Threads are used. One to obtain data which then uses a callback function to process the incoming data. The third is used for the GUI.
A refresh rate of 10ms is used.
Our software was implemented in 3 distinct classes. The vocoder class: which analyses and manipulated the frequency spectrums; the fft class: which takes audio buffers from either the RtAudio API or the vocoder class and performs forward and inverse fourier transforms on them using the FFTW3 library; and the dispatch class: This class is used as a 'work-around' for the callback functionality of the RtAudio Api, which only accepts a function and not objects. The dispatch class collects the fft and vocoder objects when it is instantiated, and contains a static method (performing the entire pitch shifting process) which can be passed to the API.
Tests were developed for all of the classes, along with an end to end testing suit to ensure that every method was doing what was intended, and that the system could operate under real-time constraints. This involved testing functions with pre-generated signals of known frequencies, and testing the software using the rtaudio API with a constant output of information from the classes.
Regular meetings were held to discuss distribution of tasks among defined, but flexible, team roles. See the Team Roles and Contributions Wiki for more details.
The team roles that were established were one hardware designer and two software designers. Some non-custom hardware was acquired at the start of the project to allow for any software development to be tested. This would allow for both processes to continue without relying on the other. The custom hardware, which was unfortunately ceased due to the shutting of the University preventing the reprinting of the PCB, was postponed indefinitely. At this point, the team decided, in collaboration with the project supervisors, that the non-custom hardware would be used for the project. The team member in charge of hardware design transitioned into a role focusing on the build system and documentation of the project. The software for the project is in continuous development with a number of additional features outlined for the next release.
Github has been adopted by the project. The first release of the project (v1.0) was made on the 20th April 2020. There are a number of further improvement activities which have been added to the Further Improvements milestone to be completed after the project deadline.
What is the release strategy / publication / publicity? How is that measured and deemed to be successful?
Pitch Perfector is an open-source project primarily maintained on GitHub. Releases have been used to maintain steady versions of the software. The project has been publicized on a number of social media channels, namely YouTube and Twitter.
We have experienced a lot of activity and interaction on Twitter. For example, one of our followers who Tweeted us directly and said "@PerfectorPitch great project. I'll keep a beady eye on your progress. Keep at it. #Covid_19 doesn't seem to be slowing you down."
Over the course of the project, we have accumulated over 600 followers on Twitter including tech enthusiasts, promotion accounts and some other interesting accounts. See the Promotion Wiki section for more details.
The application successfully captures input audio, process it in real-time and shifts it to the correct note as set out in the project aim.
Although Auto-tune itself is widely used, the benefit of PitchPerfector is the ability to apply the effect in real-time with minimal costs using a handheld and portable device as if it were happening in the microphone itself.