基于Python的网页文档处理脚本实现

来源：互联网发布：js将div内容清空编辑：程序博客网时间：2024/06/06 17:40

嵌入式web服务器不同于传统服务器，web需要转换成数组格式保存在flash中，才方便lwip网络接口的调用，最近因为业务需求，需要频繁修改网页，每次的压缩和转换就是个很繁琐的过程，因此我就有了利用所掌握的知识，利用python语言编写个能够批量处理网页文件，压缩并转换成数组的脚本。

　　脚本运行背景(后续版本兼容)：

　　Python 3.5.1(下载、安装、配置请参考网上教程)

　　node.js v4.4.7，安装uglifyjs管理包，支持js文件非文本压缩

　　具体实现代码如下:

　　#/usr/bin/pythonimport osimport binasciiimport shutil from functools import partial

　　def FileReduce(inpath, outpath):

　　infp = open(inpath, "r", encoding="utf-8")

　　outfp = open(outpath, "w", encoding="utf-8")

　　print(outpath+" 压缩成功")

　　for li in infp.readlines():

　　if li.split():

　　li = li.replace('\n', '').replace('\t', '');

　　li = ' '.join(li.split())

　　outfp.writelines(li)

　　infp.close()

　　outfp.close()

　　#shell命令行调用(用ugllifyjs2来压缩js文件)def ShellReduce(inpath, outpath):

　　Command = "uglifyjs "+inpath+" -m -o "+outpath

　　print(Command)

　　os.system(Command)

　　#将文件以二进制读取, 并转化成数组保存def filehex(inpath, outpath):

　　i = 0

　　count = 0

　　a = ''

　　inf = open(inpath, 'rb');

　　outf = open(outpath, 'w')

　　records = iter(partial(inf.read,1), b'')

　　print(outpath + " 转换成数组成功")

　　for r in records:

　　r_int = int.from_bytes(r, byteorder='big')

　　a += hex(r_int) + ', '

　　i += 1

　　count += 1

　　if i == 16:

　　a += '\n'

　　i = 0

　　a = "const static char " + outpath.split('.')[0].split('/')[-1] + "["+ str(count) +"]={\n" + a + "\n}\n\n"

　　outf.write(a)

　　inf.close()

　　outf.close()

　　#创建一个新文件夹def mkdir(path):

　　path=path.strip()

　　isExists=os.path.exists(path)

　　#判断文件夹是否存在，不存在则创建

　　if not isExists:

　　print(path+' 创建成功')

　　os.makedirs(path)

　　else:

　　pass

　　return path

　　#删除一个文件夹(包含内部所有文件)def deldir(path):

　　path = path.strip()

　　isExists=os.path.exists(path)

　　#判断文件夹是否存在，存在则删除

　　if isExists:

　　print(path + "删除成功")

　　shutil.rmtree(path)

　　else:

　　pass

　　def WebProcess(path):

　　#原网页 ..\basic\

　　#压缩网页 ..\reduce\

　　#编译完成.c网页 ..\programe

　　BasicPath = path + "\\basic"

　　ProgramPath = path + "\\program"

　　ReducePath = path + "\\reduce"

　　#删除原文件夹，再创建新文件夹

　　deldir(ProgramPath)

　　deldir(ReducePath)

　　mkdir(ProgramPath)

　　for root, dirs, files in os.walk(BasicPath):

　　for item in files:

　　ext = item.split('.')

　　InFilePath = root + "/" + item

　　OutReducePath = mkdir(root.replace("basic", "reduce")) + "/" + item

　　OutProgramPath = ProgramPath + "/" + item.replace('.', '_') + '.c'

　　#根据后缀不同进行相应处理

　　#html/css 去除'\n','\t', 空格字符保留1个

　　#js 调用uglifyjs2进行压缩

　　#gif jpg ico 直接拷贝

　　#其它直接拷贝

　　#除其它外，剩余文件同时转化成16进制数组, 保存为.c文件

　　if ext[-1] in ["html", "css"]:

　　FileReduce(InFilePath, OutReducePath)

　　filehex(OutReducePath, OutProgramPath)

　　elif ext[-1] in ["js"]:

　　ShellReduce(InFilePath, OutReducePath)

　　filehex(OutReducePath, OutProgramPath)

　　elif ext[-1] in ["gif", "jpg", "ico"]:

　　shutil.copy(InFilePath, OutReducePath)

　　filehex(OutReducePath, OutProgramPath)

　　else:

　　shutil.copy(InFilePath, OutReducePath)

　　#获得当前路径

　　path = os.path.split(os.path.realpath(__file__))[0];

　　WebProcess(path)

　　上述实现的原理主要包含：

　　1.遍历待处理文件夹(路径为..\basic，需要用户创建，并将处理文件复制到其中，并将脚本放置到该文件夹上一层)-- WebProcess

　　2.创建压缩页面文件夹(..\reduce, 用于存储压缩后文件), 由脚本完成，处理动作：

　　html, css：删除文本中的多余空格，换行符

　　js：调用uglifyjs进行压缩处理

　　gif, jpg, ico和其它: 直接进行复制处理

　　3.创建处理页面文件夹(..\program, 用于存储压缩后文件), 由脚本完成，处理动作：

　　以二进制模式读取文件，并转换成16进制字符串写入到该文件夹中

　　在文件夹下(shift+鼠标右键)启用windows命令行，并输入python web.py, 就可以通过循环重复这三个过程就可以完成所有文件的处理。

　　特别注意：所有处理的文件需要以utf-8格式存储，否则读取时会报"gbk"读取错误。

　　实现效果如下图

　　html文件：

Python编写网页处理脚本的方法步骤详解

转换数组:

Python编写网页处理脚本的方法步骤详解

另外附送一个小的脚本，查询当前文件夹下文件行数和空行数(算是写这个脚本测试时衍生出来的):

　　#/usr/bin/pythonimport os

　　total_count = 0;

　　empty_count = 0;

　　def CountLine(path):

　　global total_count

　　global empty_count

　　tempfile = open(path)

　　for lines in tempfile:

　　total_count += 1

　　if len(lines.strip()) == 0:

　　empty_count += 1

　　def TotalLine(path):

　　for root, dirs, files in os.walk(path):

　　for item in files:

　　ext = item.split('.')

　　ext = ext[-1]

　　if(ext in ["cpp", "c", "h", "java", "php"]):

　　subpath = root + "/" + item

　　CountLine(subpath)

　　path = os.path.split(os.path.realpath(__file__))[0];

　　TotalLine(path)

　　print("Input Path:", path)

　　print("total lines: ",total_count)

　　print("empty lines: ",empty_count)

　　print("code lines: ", (total_count-empty_count))

来源：博客园

0 0