基于TensorFlow的歌曲曲风变换
来源:互联网 发布:电脑能看淘宝直播吗 编辑:程序博客网 时间:2024/04/29 21:26
背景:
在图像上的风格变换(Style Transfer,论文,基于Torch的代码实现:neural-style)同样可以应用于音频中。
本文采用的是英文TravelingLight.mp3和东风破.mp3作为音频源和参考源。考虑到内存有限,仅仅截取音频中的10s进行风格的变换。
代码:
# -*- coding: utf-8 -*-__author__ = 'jason'import tensorflow as tfimport librosa# 用来提取音频文件, 参看<中文语音识别>import numpy as npimport osimport pdb#import shlex # python2 pipes# 音频文件路径content_audio = "TravelingLight.mp3"style_audio = "东风破.mp3"# 为英文歌曲<Traveling Light>添加周杰伦风味# 剪辑一段音频, 默认取开头的10s, 太大内存吃不消def cut_audio(filename, start_pos='00:00:00', lens=10): newfile = os.path.splitext(os.path.basename(filename))[0] + '_' + str(lens) + 's.mp3' # 确保系统中已安装ffmpeg,这是ffmpeg的命令行方式,更加详细的使用方法可以google cmd = "ffmpeg -i {} -ss {} -t {} -acodec copy {}".format(filename, start_pos, lens, newfile) os.system(cmd) return newfile#上面的ffmpeg注意-acodec copy参数,否则会报错content_audio_10s = cut_audio(content_audio, start_pos='00:00:33')style_audio_10s = cut_audio(style_audio, start_pos='00:00:38')#content_audio_10s = "TravelingLight_10s.mp3"#style_audio_10s = "东风破_10s.mp3" # Short Time Fourier Transform音频转spectrogram(把1维信号转为2维, 可以被视作图像)# https://en.wikipedia.org/wiki/Short-time_Fourier_transformN_FFT = 2048def read_audio(filename): x, fs = librosa.load(filename) S = librosa.stft(x, N_FFT) p = np.angle(S) S = np.log1p(np.abs(S[:,:430])) return S, fscontent_data, _ = read_audio(content_audio_10s)style_data, fs = read_audio(style_audio_10s)samples_n = content_data.shape[1] # 430channels_n = style_data.shape[0] # 1025style_data = style_data[:channels_n, :samples_n]content_data_tf = np.ascontiguousarray(content_data.T[None,None,:,:])style_data_tf = np.ascontiguousarray(style_data.T[None,None,:,:])# filter shape "[filter_height, filter_width, in_channels, out_channels]"N_FILTERS = 4096std = np.sqrt(2) * np.sqrt(2.0 / ((channels_n + N_FILTERS) * 11))kernel = np.random.randn(1, 11, channels_n, N_FILTERS)*std# content and style featuresg = tf.Graph()with g.as_default(), g.device('/cpu:0'), tf.Session() as sess: # data shape "[batch, in_height, in_width, in_channels]", x = tf.placeholder('float32', [1, 1, samples_n, channels_n], name="x") kernel_tf = tf.constant(kernel, name="kernel", dtype='float32') conv = tf.nn.conv2d(x, kernel_tf, strides=[1, 1, 1, 1], padding="VALID", name="conv") net = tf.nn.relu(conv) content_features = net.eval(feed_dict={x: content_data_tf}) style_features = net.eval(feed_dict={x: style_data_tf}) features = np.reshape(style_features, (-1, N_FILTERS)) style_gram = np.matmul(features.T, features) / samples_n# OptimizeALPHA= 0.01 # ALPHA越大,content越占主导; 如果ALPHA为0,表示没有contentresult = Nonewith tf.Graph().as_default(): learning_rate= 0.001 x = tf.Variable(np.random.randn(1, 1, samples_n, channels_n).astype(np.float32)*learning_rate, name="x") kernel_tf = tf.constant(kernel, name="kernel", dtype='float32') conv = tf.nn.conv2d(x, kernel_tf, strides=[1, 1, 1, 1], padding="VALID", name="conv") net = tf.nn.relu(conv) content_loss = ALPHA * 2 * tf.nn.l2_loss(net - content_features) style_loss = 0 _, height, width, number = map(lambda i: i.value, net.get_shape()) size = height * width * number feats = tf.reshape(net, (-1, number)) gram = tf.matmul(tf.transpose(feats), feats) / samples_n style_loss = 2 * tf.nn.l2_loss(gram - style_gram) # loss loss = content_loss + style_loss opt = tf.contrib.opt.ScipyOptimizerInterface(loss, method='L-BFGS-B', options={'maxiter': 300}) # Optimization init_op = tf.initialize_all_variables()#注意这里的初始化操作 with tf.Session() as sess: #sess.run(tf.global_variables_initializer())#这是原来的初始化操作,会报错 sess.run(init_op) opt.minimize(sess) result = x.eval()# 把spectrogram转回wav音频audio = np.zeros_like(content_data)audio[:channels_n,:] = np.exp(result[0,0].T) - 1p = 2 * np.pi * np.random.random_sample(audio.shape) - np.pifor i in range(500): S = audio * np.exp(1j*p) x = librosa.istft(S) p = np.angle(librosa.stft(x, N_FFT))librosa.output.write_wav("output.mp3", x, fs)
结果:
运行后的数据结果下载
0 0
- 基于TensorFlow的歌曲曲风变换
- 基于tensorflow实现图像风格的变换
- 基于FPGA的FFT变换
- 基于tensorflow的AlexNet实现
- 基于tensorflow的增强学习
- 基于Tensorflow的BEGAN实现
- 基于HOUGH变换的矩形的检测
- 基于整数变换的可逆水印总结
- 基于OpenCV的傅里叶变换及逆变换
- 基于小波变换的图像融合
- 基于距离变换的手掌中心提取
- 基于opencv的小波变换
- 基于 Sage 的坐标变换(符号运算)
- 基于opencv的仿射变换
- 基于离散余弦变换的数字水印
- 基于c++的灰度阈值变换
- 基于opencv的小波变换
- 基于小波变换的图像融合
- OC内存管理
- org.apache.catalina.connector.ClientAbortException: java.io.IOException: Broken pipe
- PHP表单向导设置
- js提升行为
- GCD-dispatch_apply
- 基于TensorFlow的歌曲曲风变换
- 配置Tomcat的HTTPS
- 编程基础知识
- 观察者模式
- Memcache技术笔记
- 各种HTTPS站点的SSL证书 ,扩展SSL证书,密钥交换和身份验证机制汇总
- Android 透明度百分比对应的 十六进制
- 获取视图view所在的控制器
- C++基础(笔记)