关于对dtree的详细分析

来源:互联网 发布:数据挖掘中医证型 编辑:程序博客网 时间:2024/06/06 16:56

要查看原文,请进我的博客:http://blog.sina.com.cn/baoxiaopan
   这些天经过和阿城同学的激烈讨论,终于对下面的脚本有了大致明确的理解,虽然是简简单单的两行,但是涉及面还是很广的,学习就要有不止的学习欲望和不息的学习热情,我会把我的一些理解阐述如下,可能有些地方表述的不是很清楚,但是我还是认为我已经表达了需要表达的意思,如果发现有错误或者有补充的,欢迎email告诉我,不甚感激。当然如果有不明确的,也欢迎随时email或者QQ联系我:xiaopan3322@gmail.com,157526632。
Description
    dtree is a utility that will display a directory hierarchy or tree.
    While Linux comes with hundreds of utilities, something you gotused to on another system always seems to be missing. One program inthis category is something that will display a directory hierarchy ortree.
    While some file managers that run under X-Windows will do this sortof task, it is sometimes very handy to have a command-line version.While not Linux-specific, the dtree utility is such a program.
    I will first explain how to use dtree, then explain how it works.If you invoke it by just entering its name it will display thedirectory hierarchy starting at the current directory. If you invoke itwith an argument, that argument is used as the starting directory. Forexample, if you enter dtree /home/fyl/Cool, a tree of directories under/home/fyl/Cool will be displayed.
    dtree is written in the finest old-time Unix tradition using commonutilities with a short shell script to glue them together. Here is theprogram:

    脚本代码:

 


脚本分析:
1,第一句话(cd ${1-.};pwd),是为了放在一个sub-shell中执行两句脚本,这样的好处很明显,不会跑到别的路径中去,如果是非root用户,就避免了一些不必要的权限问题,而且sub-shell中很有用的一个好处是,执行结果和环境变量不会返回给父进程,这样就保证了独立性,不会影响到父进程。因此这句话的意图也就很明显了:为了显示你要查找的目录,所以才使用了sub-shell。
2,第一句中的${1-.},其实是一种选择,此条命令其实是有参数的,即用户需要查看的路径名,如果用户输入了路径,那么程序就会选择$1,如果用户没有输入路径参数,那么程序会自动引用当前目录,即.目录。
3,find ${1-.} -type d -print | sort -f,这句话很简单,就是为了查找用户输入目录(或是当前目录)下的所有目录,并不care大小写从a-z排序。
4,第二个管道的分析:
    4.1,第一个-e是将输入的参数目录(或者当前目录)替换为空行,以I为例,执行到第一个-e为止的结果为:
    tdlteman@hzling06:~$ sh dtree.sh bak_config
    /home/tdlteman/bak_config

    /bak_script
    /bak_script/dos2unix-3.1
    /bak_script/dos2unix-3.1/dos2unix-3.1
    /bak_script/L2_xp
    /bak_script/test
    /configFiles
    可见,第二行是一个空行
    4.2,第二个-e是将空行删除,目的是为了删除之前形成的那个空行,以I为例,执行到第二个-e后的结果为:
    /home/tdlteman/bak_config
    /bak_script
    /bak_script/dos2unix-3.1
    /bak_script/dos2unix-3.1/dos2unix-3.1
    /bak_script/L2_xp
    /bak_script/test
    /configFiles
    可见,第二行的空行已经删除
    4.3,理解第三个-e的关键是$和/(../)的用法,在sed中,$的作用是要锚定行的结束如:/sed$/匹配所有以sed结尾的行;而/(../)的作用是要保存匹配的字符,如s//(love/)able//1rs,loveable被替换成lovers。因此[^/]*//([^/]*/)$的意思是:锚定只要不是以/结尾的行,具体点说就是在最后一个字符前一定要出现一个/,至于是不是以/开头的无关紧要,这句话的目的,其实是为了找出后面标记为1的字串。以I为例,第一个找到的应该是/bak_script这一行,并且在符合这样的pattern的行中继续查找不以/开头并且以任意个字符结尾的字串,并且保存符合这样的pattern的字串并标志为1,以备之后的替换用,在此例中,第一个匹配并标志为1的字串为bak_script,接着就会以`-----bak_script去替换bak_script,以此类推,由于这里的替换没有/g参数,因此每行只操作一次,并没有对整行进行操作。以I为例,执行到第三个-e为止的结果为:
    /home/tdlteman/bak_config
    `-----bak_script
    /`-----dos2unix-3.1
    /bak_script/`-----dos2unix-3.1
    /`-----L2_xp
    /`-----test
    `-----configFiles
    可见,hiberarchy结构已经基本形成。
    有兴趣的朋友可以试一试去掉第一个[^/]*/的情况,即变为"s,/([^/]*/)$,/`-----/1,"的情况,这里可以贴出我的测试结果:
    /home/tdlteman/bak_config
    /`-----bak_script
    /bak_script/`-----dos2unix-3.1
    /bak_script/dos2unix-3.1/`-----dos2unix-3.1
    /bak_script/`-----L2_xp
    /bak_script/`-----test
    /`-----configFiles
    因此最终结果就成了
    /home/tdlteman/bak_config
    |    `-----bak_script
    |    |    `-----dos2unix-3.1
    |    |    |    `-----dos2unix-3.1
    |    |    `-----L2_xp
    |    |    `-----test
    |    `-----configFiles
    现象很明显,多了一个第二句开始每句都多了一个/,最终结果也就多了最外面的一层“|   ”。我们可以简单分析下,如果去掉了[^/]*/这句,那么关于/的匹配就没有了,只能等到第四个-e去匹配了,因此可以想象,执行完去掉[^/]*/后的结果总会被不去掉的结果多一个/,因此也就多了一次“|    ”的替换。因此这句脚本的目的是为了保证每次要替换的行中,比原来的行多去掉一个/(包括/之前的字符)。
    4.4,最后一个-e,和第三个-e类似,是为了把不是以/开头但是要以/结尾的字串替换为|    ,在这里其实就是指以/结尾的字串,因为即使是开头的/也会被替换(可看做是一种特殊情况),因此执行完所有的-e操作后,就会形成最终的结果:
    /home/tdlteman/bak_config
    `-----bak_script
    |    `-----dos2unix-3.1
    |    |    `-----dos2unix-3.1
    |    `-----L2_xp
    |    `-----test
    `-----configFiles

    可见,hiberarchy结构已经成型,非常的有层次感。
5,对于's,[^/]*//([^/]*/)$,`-----/1,'这句话,其实硬引用''也可以修改为软引用"",如果用了软引用,那么`的写法就需要加上转义字符/,此句话就变为"s,[^/]*//([^/]*/)$,`-----/1,"
    在这里值得注意的是,执行"sh dtree.sh bak_config"和"sh dtree.shbak_config/"的结果是不一样的,有着细微的差别,原因很明显,因为字串匹配的条件变了,这里就不做具体的分析,有兴趣的可以自己分析。其实过程完全一样。这里只附上执行结果。


示例:
I:
#####执行sh dtree.sh bak_config后的结果:
###只执行第一个管道(没有执行sed一句)的执行结果:
tdlteman@hzling06:~$ sh dtree.sh bak_config
/home/tdlteman/bak_config
bak_config
bak_config/bak_script
bak_config/bak_script/dos2unix-3.1
bak_config/bak_script/dos2unix-3.1/dos2unix-3.1
bak_config/bak_script/L2_xp
bak_config/bak_script/test
bak_config/configFiles
###整段脚本的运行结果:
tdlteman@hzling06:~$ sh dtree.sh bak_config
/home/tdlteman/bak_config
`-----bak_script
|    `-----dos2unix-3.1
|    |    `-----dos2unix-3.1
|    `-----L2_xp
|    `-----test
`-----configFiles

II:
#####执行sh dtree.sh bak_config/的结果:
###只执行第一个管道(没有执行sed一句)的执行结果:
tdlteman@hzling06:~$ sh dtree.sh bak_config/
/home/tdlteman/bak_config
bak_config/
bak_config/bak_script
bak_config/bak_script/dos2unix-3.1
bak_config/bak_script/dos2unix-3.1/dos2unix-3.1
bak_config/bak_script/L2_xp
bak_config/bak_script/test
bak_config/configFiles
###整段脚本的运行结果:
tdlteman@hzling06:~$ sh dtree.sh bak_config/
/home/tdlteman/bak_config
bak_script
`-----dos2unix-3.1
|    `-----dos2unix-3.1
`-----L2_xp
`-----test
configFiles


    最后给出作者的解释,有兴趣的可以参考下:
    The first line in the output is the name of the directory dtree wasrun on. This line was produced by the line that begins with (cd.Breaking this line down:
    *
      ${1-.} means use the first argument from the command line ($1) ifit is available, otherwise use . which is a synonym for the currentdirectory. Thus, the cd command either changes to the directoryspecified on the line that invoked dtree or to the current directory (avirtual no-op).
    *
      pwd then displays the path name of the current directory.
    *
      The parentheses around the whole line force the command to be runin a subshell. This means the cd command is local to this line andsubsequent commands will be executed from what was the currentdirectory when dtree was initially invoked.
    *
      The find command prints out all files whose type is d (for directory). The same directory reference is used as in cd.
    *
      The output of find is piped into find and the -f option tells sort to fold upper and lower case names together.
    *
      The tricky formatting of the tree is done by sed in four steps.Each step is set off by -e. This is how you tell sed a program follows.
    *
      The first expression_r_r_r_r_r, s,^${1-.},," is a substitutecommand which tells sed to replace everything between the first twodelimiters (a comma is used as the delimiter) with everything betweenthe second. The initial ^ causes the match to be performed only at thebeginning of the line. The expression_r_r_r_r_r that follows is, again,the starting directory reference, and the string between the secondpair of delimiters is null. Thus, the requested directory name from thebeginning of the output of sort is trimmed.
    *
      The second expression_r_r_r_r_r, /^$/d tells sed to delete allblank lines (lines with nothing between the beginning and the end).
    *
      The third expression_r_r_r_r_r is probably the trickiest. It usedthe ability to remember a string within a regular expression_r_r_r_r_rand then use it later. The expression_r_r_r_r_rs,[^/]*//([^/]*/)$,/`-----/1, tells sed to replace the last two stringsseparated by a slash (/) with a backquote, five dashes and the laststring (following the final slash).
    *
      Lastly, the final expression_r_r_r_r_r, -e "s,[^/]*/,|      ,g"tells sed to replace every occurrence of strings that do not contain aslash but are followed by a slash, with a pipe (|) and six spaces.

    Unless you are familiar with regular expression_r_r_r_r_rs youprobably didn't follow all that. But you probably learned something andyou can easily use dtree without having to understand how it works.
    差不多就这些了,脚本是死的,大家可以对这个脚本按照自己的意图进行修改,你会发现很多好玩的东西,哪怕只是改变了其中的一个字符,结果也会有所不同,这就是shell脚本的魅力所在。最后还是要感谢阿城同学。


原创粉丝点击