One of the goals of Magenta is to use machine learning to develop new avenues of human expression. And so today we are proud to announce NSynth (Neural Synthesizer), a novel approach to music synthesis designed to aid the creative process.

Unlike a traditional synthesizer which generates audio from hand-designed components like oscillators and wavetables, NSynth uses deep neural networks to generate sounds at the level of individual samples. Learning directly from data, NSynth provides artists with intuitive control over timbre and dynamics and the ability to explore new sounds that would be difficult or impossible to produce with a hand-tuned synthesizer.

The acoustic qualities of the learned instrument depend on both the model used and the available training data, so we are delighted to release improvements to both:

A full description of the dataset and the algorithm can be found in our arXiv paper.

The NSynth Dataset

We wanted to develop a creative tool for musicians and also provide a new challenge for the machine learning community to galvanize research in generative models for music. To satisfy both of these objectives, we built the NSynth dataset, a large collection of annotated musical notes sampled from individual instruments across a range of pitches and velocities. With ~300k notes from ~1000 instruments, it is an order of magnitude larger than comparable public datasets. You can download it here.

A motivation behind the NSynth dataset is that it lets us explicitly factorize the generation of music into notes and other musical qualities. We could further factorize those qualities, but for simplicity we don’t and get the following: