Audio alaysis and Deep Learning

来源:互联网 发布:js设置div的margin 编辑:程序博客网 时间:2024/06/07 10:00

Genre of a song with Deep Learning 

General overview:
simplified representation of each song in the library
train a deep neural network to classify the songs
use the classifier to fill in the missing genre

Data
iTunes library
2000 songs

Data preprocessing
too many genres and subgenres, simplified: removing some examples and assigning them to a broader genre

Sampling frequency: 44100 Hz
every second of audio has 44100 values
Tip:
discarding the stereo channel

这里写图片描述
这里写图片描述
这里写图片描述
这里写图片描述

Use Fourier’s Transform to convert audio data to the frequency domain. Export as a spectrogram. Picture type is PNG file, it contains all the frequencies of song through time.

这里写图片描述

The 44100 Hz sampling rate allows to reconstruct frequencies up to 22050 Hz. ( Nyquist-Shannon sampling theorem)

Use 50 pixel per second (20ms per pixel) is enough.
Use a spectrogram with 128 frequency levels.

Further processing
deal with the length of the songs:
independent samples representing the genre: create fixed length slices of the spectrogram
cut down the spectrogram into 128x128 pixel slices, each 2.56s

Tips:
we can expand the dataset, add random noise to the images, or slightly stretch them horizontally and then crop them. but we can’t rotate the images, nor flip them horizontally because sounds are not symmetrical.

Model–classifier
sample: songs are square spectral images
algorithm: Deep Convolutional Neural Network to classify these samples
tool: Tensorflow’s wrapper TFLearn

这里写图片描述

Details:
dataset split: Training (70%), validation (20%), testing (10%)
model: Convolutional neural network.
layers: Kernels of size 2x2 with stride of 2
optimizer: RMSProp.
activation function: ELU (Exponential Linear Unit), because of the performance it has shown when compared to ReLUs
initialization: Xavier for the weights matrices in all layers.
regularization: Dropout with probability 0.5

Result:
2000 songs, 6 genre
12,000 128x128 spectrogram slices
accuracy: 90%

Classify:
slice the new song
put together the predicted classes(voting system)

这里写图片描述

这里写图片描述

Recognizing Sounds (A Deep Learning Case Study)

Machine Learning is Fun Part 6: How to do Speech Recognition with Deep Learning

References
https://chatbotslife.com/finding-the-genre-of-a-song-with-deep-learning-da8f59a61194
https://medium.com/@awjuliani/recognizing-sounds-a-deep-learning-case-study-1bc37444d44d
https://medium.com/@ageitgey/machine-learning-is-fun-part-6-how-to-do-speech-recognition-with-deep-learning-28293c162f7a

原创粉丝点击