自动生成语料

来源：互联网发布：客户数据编辑：程序博客网时间：2024/05/19 09:12

这次又来了，很简单的用法，遗憾的事情是还是停留在awk的脚本上面，什么时候整一个C++的版本吧。直接贴代码了：

#!/bin/awk -fBEGIN{}{    tmp = $0;    if($0 ~ /\(/ && $0 ~ /\)/)     {        ind = index(tmp, "(");        if(ind > 0)        {                   if(ind > 1)                name1 = substr(tmp, 1, ind - 1);            else                    name1 = "";        }               tmp1 = substr(tmp, ind + 1, length(tmp) - ind);         ind1 = index(tmp1, ")");        if(ind1 > 0)        {                   name2 = substr(tmp1, 1, ind1 - 1);        }               if(ind1 < length(tmp1))            name3 = substr(tmp1, ind1 + 1, length(tmp1) - ind1);        else                name3 = "";        split(name2, name2_arr, "\|");        for(name_tmp in name2_arr)            printf("%s %s %s\n", name1, name2_arr[name_tmp], name3);     }else{          printf("%s\n", $0);    }}END{}

用法是 awk -f test2.awk infile > outfile
对一句话中以括号（英文的）包起来，以”|” 为分隔符的句子进行展开。
eg：
infile:

温度调到(26|27|28)度

相应的输出的文件是
outfile:

温度调到26度温度调到27度温度调到28度

自己一定要整一个C++的版本出来才行！fighting

0 0