WhatWeb源码分析之运行流程

来源:互联网 发布:淘宝店铺托管收费标准 编辑:程序博客网 时间:2024/06/06 09:14

      第一篇熟悉了部分WhatWeb源码,这一篇记录调试WhatWeb,梳理得到的WhatWeb运行流程。

      调试之前,可以运行一下WhatWeb的帮助,得到WhatWeb提供的所有选项,大致知道WhatWeb提供的功能有哪些。

ruby whatweb -h
.$$$     $.                                   .$$$     $.
$$$$     $$. .$$$  $$$ .$$$$$$.  .$$$$$$$$$$. $$$$     $$. .$$$$$$$. .$$$$$$.
$ $$     $$$ $ $$  $$$ $ $$$$$$. $$$$$ $$$$$$ $ $$     $$$ $ $$   $$ $ $$$$$$.
$ `$     $$$ $ `$  $$$ $ `$  $$$ $$' $ `$ `$$ $ `$     $$$ $ `$      $ `$  $$$'
$. $     $$$ $. $$$$$$ $. $$$$$$ `$  $. $  :' $. $     $$$ $. $$$$   $. $$$$$.
$::$  .  $$$ $::$  $$$ $::$  $$$     $::$     $::$  .  $$$ $::$      $::$  $$$$
$;;$ $$$ $$$ $;;$  $$$ $;;$  $$$     $;;$     $;;$ $$$ $$$ $;;$      $;;$  $$$$
$$$$$$ $$$$$ $$$$  $$$ $$$$  $$$     $$$$     $$$$$$ $$$$$ $$$$$$$$$ $$$$$$$$$'
WhatWeb - Next generation web scanner version 0.4.8-dev.
Developed by Andrew Horton aka urbanadventurer and Brendan Coles.
Homepage: http://www.morningstarsecurity.com/research/whatweb
Usage: whatweb [options] <URLs>
TARGET SELECTION:
  <TARGETs>                     Enter URLs, hostnames, IP adddresses,
                                filenames, or nmap-format IP address ranges.
  --input-file=FILE, -i         Read targets from a file. You can pipe
                                hostnames or URLs directly with -i /dev/stdin.
TARGET MODIFICATION:
  --url-prefix                  Add a prefix to target URLs.
  --url-suffix                  Add a suffix to target URLs.
  --url-pattern                 Insert the targets into a URL.
                                e.g. example.com/%insert%/robots.txt
AGGRESSION:
The aggression level controls the trade-off between speed/stealth and
reliability.
  --aggression, -a=LEVEL        Set the aggression level. Default: 1.
  1. Stealthy                   Makes one HTTP request per target and also
                                follows redirects.
  3. Aggressive                 If a level 1 plugin is matched, additional
                                requests will be made.
  4. Heavy                      Makes a lot of HTTP requests per target. URLs
                                from all plugins are attempted.
HTTP OPTIONS:
  --user-agent, -U=AGENT        Identify as AGENT instead of WhatWeb/0.4.8-dev.
  --header, -H                  Add an HTTP header. eg "Foo:Bar". Specifying a
                                default header will replace it. Specifying an
                                empty value, e.g. "User-Agent:" will remove it.
  --follow-redirect=WHEN        Control when to follow redirects. WHEN may be
                                `never', `http-only', `meta-only', `same-site',
                                `same-domain' or `always'. Default: always.
  --max-redirects=NUM           Maximum number of redirects. Default: 10.
AUTHENTICATION:
  --user, -u=<user:password>    HTTP basic authentication.
  --cookie, -c=COOKIES          Use cookies, e.g. 'name=value; name2=value2'.
PROXY:
  --proxy                       <hostname[:port]> Set proxy hostname and port.
                                Default: 8080.
  --proxy-user                  <username:password> Set proxy user and password.
PLUGINS:
  --list-plugins, -l            List all plugins.
  --info-plugins, -I=[SEARCH]   List all plugins with detailed information.
                                Optionally search with keywords in a comma
                                delimited list.
  --search-plugins=STRING       Search plugins for a keyword.
  --plugins, -p=LIST            Select plugins. LIST is a comma delimited set
                                of selected plugins. Default is all.
                                Each element can be a directory, file or plugin
                                name and can optionally have a modifier, +/-.
                                Examples: +/tmp/moo.rb,+/tmp/foo.rb
                                title,md5,+./plugins-disabled/
                                ./plugins-disabled,-md5
                                -p + is a shortcut for -p +plugins-disabled.
  --grep, -g=STRING             Search for STRING in HTTP responses. Reports
                                with a plugin named Grep.
  --custom-plugin=DEFINITION    Define a custom plugin named Custom-Plugin,
                                Examples: ":text=>'powered by abc'"
                                ":version=>/powered[ ]?by ab[0-9]/"
                                ":ghdb=>'intitle:abc \"powered by abc\"'"
                                ":md5=>'8666257030b94d3bdb46e05945f60b42'"
                                "{:text=>'powered by abc'}"
  --dorks=PLUGIN                List Google dorks for the selected plugin.
OUTPUT:
  --verbose, -v                 Verbose output includes plugin descriptions.
                                Use twice for debugging.
  --colour,--color=WHEN         control whether colour is used. WHEN may be
                                `never', `always', or `auto'.
  --quiet, -q                   Do not display brief logging to STDOUT.
  --no-errors                   Suppress error messages.
LOGGING:
  --log-brief=FILE              Log brief, one-line output.
  --log-verbose=FILE            Log verbose output.
  --log-errors=FILE             Log errors.
  --log-xml=FILE                Log XML format.
  --log-json=FILE               Log JSON format.
  --log-sql=FILE                Log SQL INSERT statements.
  --log-sql-create=FILE         Create SQL database tables.
  --log-json-verbose=FILE       Log JSON Verbose format.
  --log-magictree=FILE          Log MagicTree XML format.
  --log-object=FILE             Log Ruby object inspection format.
  --log-mongo-database          Name of the MongoDB database.
  --log-mongo-collection        Name of the MongoDB collection.
                                Default: whatweb.
  --log-mongo-host              MongoDB hostname or IP address.
                                Default: 0.0.0.0.
  --log-mongo-username          MongoDB username. Default: nil.
  --log-mongo-password          MongoDB password. Default: nil.
PERFORMANCE & STABILITY:
  --max-threads, -t             Number of simultaneous threads. Default: 25.
  --open-timeout                Time in seconds. Default: 15.
  --read-timeout                Time in seconds. Default: 30.
  --wait=SECONDS                Wait SECONDS between connections.
                                This is useful when using a single thread.
HELP & MISCELLANEOUS:
  --short-help                  Short usage help.
  --help, -h                    Complete usage help.
  --debug                       Raise errors in plugins.
  --version                     Display version information.
EXAMPLE USAGE:
* Scan example.com.
  ./whatweb example.com
* Scan reddit.com slashdot.org with verbose plugin descriptions.
  ./whatweb -v reddit.com slashdot.org
* An aggressive scan of wired.com detects the exact version of WordPress.
  ./whatweb -a 3 www.wired.com
* Scan the local network quickly and suppress errors.
  whatweb --no-errors 192.168.0.0/24
* Scan the local network for https websites.
  whatweb --no-errors --url-prefix https:// 192.168.0.0/24
* Scan for crossdomain policies in the Alexa Top 1000.
  ./whatweb -i plugin-development/alexa-top-100.txt \
  --url-suffix /crossdomain.xml -p crossdomain_xml
OPTIONAL DEPENDENCIES
--------------------------------------------------------------------------------
To enable MongoDB logging install the mongo gem.
To enable character set detection and MongoDB logging install the rchardet gem.

      可以看到WhatWeb提供了丰富选项,在这里我选参数v运行WhatWeb获取一个特定目标的指纹,来梳理WhatWeb的运行流程。

      在whatweb源代码的680行下断点,开始调试。

2017-10-17_11-41-45

      上面这一段代码到741行结束是变量初始化的过程,其中GetoptLong.new是构建参数解析。继续向下执行,其中743行开始到933行结束是对用户输入的值进行解析。如下所示:

2017-10-17_11-52-12

      这里,我们只输入了-v参数,在745行下断点,可以看到如下所示:

2017-10-17_23-15-02

2017-10-17_23-15-27

      变量verbose的值会加1。继续运行会跳转到对终端输出颜色的配置,根据操作系统类型进行设置。

2017-10-17_23-17-35

      继续跟进,根据判断条件来进行检测插件的选择。如果没有指定自定义插件,那么就会加载缺省插件。因为我只指定-v参数,那么use_custom_plugin是false、plugin_selection是nil。我们跟进到PluginSupport.load_plugins函数看看,这个函数就是加载插件目录的。

2017-10-17_23-26-22

      进入load_plugins函数可以看到设置了缺省目录。

2017-10-17_23-27-35

2017-10-17_23-27-56

      在这个目录下搜索插件识别文件。下面的代码是加载相关插件文件。

2017-10-18_18-22-05

      继续跟进到函数load_plugin中去,跟进298这一行的load f,看下面三张截图

2017-10-18_18-26-56

2017-10-18_18-31-14

2017-10-18_18-31-50

      这个组合起来就比较好理解下面的赋值。

2017-10-18_18-35-43

      继续下去有一个优化插件的函数调用:

2017-10-18_18-43-33

      跟进去看一看,就是对插件识别脚本的进一步细化。

2017-10-18_20-19-29

      跳出上面的函数,继续往下调试,就到了定义HTTP Request的头了,可以用户自定义,也可以采用缺省值。

2017-10-18_18-46-29

      没什么好说的,继续下去,接着是对目标url的筛选:

2017-10-18_20-22-31

      跟进去看看:

Selection_001

      这里需要注意的是这几个正则表达式,下面这个是匹配类似192.168.0.1-200这种表示形式的IP范围字符串,后面一个正则表达式是不匹配单个IP地址。

Selection_002

      接下来是对URL地址进行规则化:

Selection_004

      最重要的部分就要来了,处理指定的URL,获取指纹信息。这算是核心代码段了。这一块我调试了很久,也比较迷惑,因为是多线程,存在线程切换,容易糊涂,不一定说的完全明白,试着写一写。

Selection_005

      这是调用next_target函数,获取目标URL地址。跟进到next_target函数看看:

Selection_008

Selection_006

      这是赋值的过程。最后有一个判断最近目标的值是不是超过100个:

Selection_007

      跳出next_target函数之后,继续执行,这里是一个类似do…while的代码段。跳转到判断线程是不是超过缺省值。

Selection_009

      继续跟进,跳转到Thread.new(do) do |thistarget|代码块中进行执行,如下:

Selection_010

      跟进进去,进入对target的初始化函数:

Selection_011

      继续调试的过程中,会在几个代码段来回切换,继续跟进下去:

Selection_012

Selection_013

      这里设置了一些参数,然后对目标URL进行访问,会得到HTTP请求的Response,包含很详细的各字段。

      继续跟进,到了根据HTTP返回值与插件文件进行匹配的部分:

Selection_015

      跟进去看看,就是实现的代码了:

Selection_014

      上面的实现,涉及到锁的使用。

      最后就是结果的输出了。

Selection_016

      上面的运行流程分析还是比较粗糙,下一篇继续深入分析部分实现细节。