如何生成可导入数据库的亿级别数据

来源:互联网 发布:在淘宝做直通车多少钱 编辑:程序博客网 时间:2024/06/06 01:39

1. 使用python脚本可以轻松生成满足条件的数据,具体如下

#coding: utf-8import os, sys, time, datetimefrom itertools import izipN = 100000000def gen_meid():   returndef gen_seq():   returndef generate_message(meid,seq):    ts = time.time();    time_st = datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S')    print '\t'.join(( meid, seq, '\N', '\N', '\N', '\N', '0', '0', '0', '0', time_st, '\N', '\N', '0', '\N', '\N', '\N', '\N', time_st ))def main(args):    print '\t'.join(( 'deviceID', 'battery', ... , 'accumulatedTime', 'createDate' ))  // for mongodb, mysql delete    for meid,seq in izip(gen_meid(),gen_seq()):        generate_message(meid,seq)        pass    return 0#==============================if __name__ == "__main__": import sys main(sys.argv) pass#==============================

$ python a.py > device.tsv

2. 切分数据(可选)

tail -n +1      device.csv | head -n 5000000 > part1.txt

tail -n +100001 device.csv | head -n 100000 > part2.txt

tail -n +200001 device.csv | head -n 100000 > part3.txt

tail -n +300001 device.csv | head -n 100000 > part4.txt


3. 生成txt 文件

python a.py > device.txt



0 0
原创粉丝点击