写个python脚本下载并解压 MNIST 数据集(1)

来源:互联网 发布:php计算器代码 编辑:程序博客网 时间:2024/06/05 16:35

【UpdateTime:201706011】

写个python脚本下载并解压 MNIST 数据集

一、本文目的

MNIST之于机器学习&&深度学习,就相当于cout<<"hello world"之于编程(引用于tensorflow教程)。最近刚入门深度学习,当然也不忘学习机器学习,接触了各种MNIST相关的案例。本文的主要贡献是基于python语言编写一个自动下载和解压MNIST的程序,在此整理归纳并分享,后续根据学习情况继续更新。


本文涉及的相关插件,请看脚本最前面的import相关内容。由于本文实验之前安装过多种深度学习的框架,所以一些相关的插件也都已经存在于系统中。倘若读者遇到什么问题,可以根据提示安装相关的插件(pip install xxx)


本文的原理很简单,就是通过如下代码下载数据集(urllib 插件):

filepath, _ = urllib.request.urlretrieve(SOURCE_URL + filename, filepath)statinfo = os.stat(filepath)

然后通过如下代码解压数据集(uzip):

cmd = ['gzip', '-d', target_path]print('Unzip ', target_path)subprocess.call(cmd)

二、环境

1、Ubuntu环境:http://blog.csdn.net/houchaoqun_xmu/article/details/72453187

2、Anaconda2:http://blog.csdn.net/houchaoqun_xmu/article/details/72461592


三、代码

# Copyright 20170611 . All Rights Reserved.# Prerequisites:# Python 2.7# gzip, subprocess, numpy# # =============================================================================="""Functions for downloading and uzip MNIST data."""from __future__ import absolute_importfrom __future__ import divisionfrom __future__ import print_functionimport gzipimport subprocessimport osimport numpyfrom six.moves import urllibdef maybe_download(filename, data_dir, SOURCE_URL):"""Download the data from Yann's website, unless it's already here."""filepath = os.path.join(data_dir, filename)if not os.path.exists(filepath):filepath, _ = urllib.request.urlretrieve(SOURCE_URL + filename, filepath)statinfo = os.stat(filepath)print('Successfully downloaded', filename, statinfo.st_size, 'bytes.')def check_file(data_dir):if os.path.exists(data_dir):return Trueelse:os.mkdir(data_dir)return Falsedef uzip_data(target_path):# uzip mnist datacmd = ['gzip', '-d', target_path]print('Unzip ', target_path)subprocess.call(cmd)def read_data_sets(data_dir):if check_file(data_dir):print(data_dir)print('dir mnist already exist.')# delete the dir mnistcmd = ['rm', '-rf', data_dir]print('delete the dir', data_dir)subprocess.call(cmd)os.mkdir(data_dir)SOURCE_URL = 'http://yann.lecun.com/exdb/mnist/'data_keys = ['train-images-idx3-ubyte.gz', 'train-labels-idx1-ubyte.gz', 't10k-images-idx3-ubyte.gz', 't10k-labels-idx1-ubyte.gz']for key in data_keys:if os.path.isfile(os.path.join(data_dir, key)):print("[warning...]", key, "already exist.")else:maybe_download(key, data_dir, SOURCE_URL)# uzip the mnist data.uziped_data_keys = ['train-images-idx3-ubyte', 'train-labels-idx1-ubyte', 't10k-images-idx3-ubyte', 't10k-labels-idx1-ubyte']for key in uziped_data_keys:if os.path.isfile(os.path.join(data_dir, key)):print("[warning...]", key, "already exist.")else:target_path = os.path.join(data_dir, key)uzip_data(target_path)if __name__ == '__main__':print("===== running - input_data() script =====")read_data_sets("./mnist")print("=============   =============")

打开终端执行如下命令:

python get_mnist.py
效果如下所示:

代码下载地址:http://download.csdn.net/detail/houchaoqun_xmu/9867456

四、相关文献

Activation-Visualization-Histogram:https://github.com/shaohua0116/Activation-Visualization-Histogram

MNIST机器学习入门:http://wiki.jikexueyuan.com/project/tensorflow-zh/tutorials/mnist_beginners.html

Python读取mnist:http://blog.csdn.net/mmmwhy/article/details/62891092

Tesnorflow下载MNIST手写数字识别数据集的python代码:http://download.csdn.net/detail/yhhyhhyhhyhh/9738704

batch处理的MNIST代码(tensorflow_GPU):http://download.csdn.net/detail/houchaoqun_xmu/9851221




阅读全文
0 0
原创粉丝点击