如何在多文本中提取指定数据，并按时间命名文件名

来源：互联网发布：linux中链接目的编辑：程序博客网时间：2024/05/28 09:34

#

”’（三个单引号）
本试验项目的目的为在e:\demo\res目录下进行对数据的额遍历搜索与X中的相同的关键内容，
并且通过提取的关键内容存储在e:\demo\res，并根据‘实时日期.txt’的格式进行文件命名，本次试验的文件为在res文件夹中的文件

被搜寻文本txt为自己预先在e:\demo\res中先设置好，txt文本名字不限
例如以下范文1：

1
2
3
4
5
6
7
gpgpu_n_mem_read_global
8
9
000
gpgpu_n_mem_write_global
11
gpgpu_n_mem_texture
12233
gpgpu_n_mem_const
1232223
12312
3123123

123123
gpgpu_n_param_mem_insn

范文2：

hellow Ayu, I miss you.
gpu_sim_insn
gpu_ipc
L1I_total_cache_accesses
L1D_total_cache_accesses
gpgpu_n_tot_thrd_icount
gpgpu_n_tot_w_icount
gpgpu_n_mem_read_local
f
33
4
5
6
2345
34
345
6
345
gpgpu_n_param_mem_insn

”’（三个单引号）

import datetime
import re
import sys
import os,glob

获得当前时间

now = datetime.datetime.now() # ->这是时间数组格式

转换为指定的格式:

otherStyleTime = now.strftime(“%Y_%m_%d %H_%M_%S”)

path = ‘e:\demo\res’

定义输出文件

fout = open(“e:\demo\res\%s.txt”%otherStyleTime, ‘w’)

x = [
‘gpu_sim_insn’,
‘gpu_ipc’,
‘L1I_total_cache_accesses’,
‘L1D_total_cache_accesses’,
‘gpgpu_n_tot_thrd_icount’,
‘gpgpu_n_tot_w_icount’,
‘gpgpu_n_mem_read_local’,
‘gpgpu_n_mem_write_local’,
‘gpgpu_n_mem_read_global’,
‘gpgpu_n_mem_write_global’,
‘gpgpu_n_mem_texture’,
‘gpgpu_n_mem_const’,
‘gpgpu_n_load_insn’,
‘gpgpu_n_store_insn’,
‘gpgpu_n_shmem_insn’,
‘gpgpu_n_tex_insn’,
‘gpgpu_n_const_mem_insn’,
‘gpgpu_n_param_mem_insn’
]

改变路径

os.chdir(path)

遍历目录下的所有文件

for filename in os.listdir():
fs = open(filename,’r’,encoding= ‘ANSI’) #py3必须指定编码格式现目前能使用ANSI(默认用的标准格式),Unicode,Unicode big endian,utf-8编码
#处理文件中的每一行数据
for line in fs.readlines():
a = line.split()
if a != [] and a[0] in x:
fout.write(a[0]+’\n’) #原来的指令为fout.write(a[-1]+’\n’)，为测试与a[0]的不同所以在以下部分做出改变
if a[0]==a[-1]: #对a[0]&a[-1]进行比较，若两者相同打印 print(‘a[0]==a[-1]’)；两者个不同则打印出print(‘a[0]!=a[-1]’)
print(‘a[0]==a[-1]’)
else:
print(‘a[0]!=a[-1]’)
if a[0] == ‘gpgpu_n_param_mem_insn’: #感觉对于此处略微欠缺实际考虑，仅能用于根据’gpgpu_n_param_mem_insn’结尾的文段，仍需要考虑改进
fout.write(‘\n’)
break

fout.write(‘\n’)
fout.close()

#

阅读全文

1 0