Dr. Hawley had already developed a Python-based application with a graphical interface, and he was interested in creating one with a standard desktop application experience that would be easier to use for a wider audience. That meant it would need to be easily distributable and installable, along with a polished GUI.
What are Spectrograms?
Training is an automated process performed by using a set of examples that have been paired with the correct result for each being fed through the network. The network’s output is compared to the correct answer, and the function’s parameters are adjusted to try to improve the results. This is repeated until the accuracy of the function stops improving.
Dr. Hawley’s Panotti convolutional neural network takes an image as an input. The network’s first few layers extract features from the image. Subsequent layers use that information to decide upon the classification of that image as the final output.
Training involves helping the user to find a set of examples for each type of sound that they wish to identify, uploading them to the training server, then downloading the trained network’s parameters for storage and use on the user’s computer.
Training can be computationally expensive. Vibrary supports training on a remote computer so that the user isn’t required to have expensive GPU hardware which can accelerate training significantly.
// Goals + Objectives
Ease Of Use
Using a neural network, especially training, is complicated. Vibrary needed to make this easy as possible but still be useful to users at all skill levels.
To that end Vibrary:
- Allows the user to focus on only their audio, hiding CNN details
- Provides a simple UI for handling large sets of audio files
- Gives guidance on creating training data sets
- Uses a simple tag metaphor as the search criteria
- Auto-completes tag names to prevent entry errors
The target feature set was kept small, ignoring or postponing anything not directly related to finding audio with specific qualities. There are many ways we could have allowed users to have more control over the whole process. We favored simplicity and a single paradigm.
Without creating future headaches, we kept the development bare-bones. To that end, the remote training server particularly was left to work as it did in Dr. Hawley’s application rather than creating something like a REST API available through a web interface.
TensorFlow is Google’s powerful and useful open-source machine learning library. Managing custom builds of TensorFlow for C++ and its dependencies in a way that allowed the easy distribution of the application was outside its typical use-case.
Quick Development vs Ease Of Use
- Creating data sets
- Uploading to the server
- Triggering remote training
- Locally creating the neural network for classification
- Error recovery
It takes work to hide the complexity of those tasks while still keeping users informed of progress and protecting them from loss of work in error conditions.
Tensorflow, SQLite, C++
AWS Ubuntu Linux server instance,
Panotti, Tensorflow, Python
// The Team
Dr. Scott Hawley
J. Carlos Perez
Art+Logic is now open-sourcing Vibrary, encouraging the community to take over and expand it in ways we might not imagine so that it can be useful to as many people as need it.