bigchaindb源码分析（一）——命令行参数与配置文件解析

来源：互联网发布：禁止安装软件的软件编辑：程序博客网时间：2024/06/07 02:42

bigchaindb版本：

BigchainDB (1.0.0rc1)bigchaindb-driver (0.3.1)

命令行参数解析

使用whereis定位bigchaindb可执行文件为/usr/local/bin/bigchaindb，该文件调用了bigchaindb.commands.bigchaindb.main()函数。re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])相当于字符创的替换，将sys.argv[0]的结尾字符-script.pyw与.exe替换成空。

root@bigchain:~# whereis bigchaindbbigchaindb: /usr/local/bin/bigchaindbroot@bigchain:~# cat /usr/local/bin/bigchaindb #!/usr/bin/python3# -*- coding: utf-8 -*-import reimport sysfrom bigchaindb.commands.bigchaindb import mainif __name__ == '__main__':    sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])    sys.exit(main())root@bigchain:~#

main函数简单地执行了utils.start。其第一个参数为create_parser函数，该函数利用argparse模块定义了脚本能够解析的命令行参数，如configure\backend等等(称之为子命令)，并且将解析到的子命令复制给command变量。

def main():    utils.start(create_parser(), sys.argv[1:], globals())def create_parser():    parser = argparse.ArgumentParser(        description='Control your BigchainDB node.',        parents=[utils.base_parser])    # all the commands are contained in the subparsers object,    # the command selected by the user will be stored in `args.command`    # that is used by the `main` function to select which other    # function to call.    subparsers = parser.add_subparsers(title='Commands',                                       dest='command')    # parser for writing a config file    config_parser = subparsers.add_parser('configure',                                          help='Prepare the config file '                                               'and create the node keypair')    ...

此外，解析的参数除了create_parser函数之外，commands/utils/py同样给出了一些可以解析的命令行参数，包括用来读取配置文件的-c、日志输出级别的-l、对于提示默认设置为yes的-y以及用来查看版本信息的-v。

base_parser = argparse.ArgumentParser(add_help=False, prog='bigchaindb')base_parser.add_argument('-c', '--config',                         help='Specify the location of the configuration file '                              '(use "-" for stdout)')...

在定义了命令行解析器后，utils.start函数的第一步在于parse_args来进行解析，并确保用来表征configure\backend\start等的command变量存在，若command变量不存在，则命令行输入未带有子命令，从而弹出help。

def start(parser, argv, scope):    args = parser.parse_args(argv)    if not args.command:        parser.print_help()        raise SystemExit()

start函数的第三个参数为scope，调用时的形参为globals()。globals是一个python的内置函数，用来获取该模块的名字空间，包括函数、类、其它导入的模块、模块级的变量和常量，并以字典形式返回。start函数在解析完命令行参数后，接下来的是根据子命令找到对应的要调用的函数。其中函数的名字为子命令字符串中将’-‘替换为’_’，并在字符创前面加入run_。因此，当执行bigchaindb start时，args.command为start，而func则为run_start。若该模块中找不到func函数，则抛出NotImplementedError异常。

此后还根据命令行参数来设置multiprocess的值。之后调用func函数。

func = scope.get('run_' + args.command.replace('-', '_'))if not func:    raise NotImplementedError('Command `{}` not yet implemented'.                              format(args.command))...return func(args)

执行子命令

以命令行bigchaindb start为例，utils.start函数将调用run_start()函数，该函数位于commands.bigchaindb中。该函数拥有两个装饰器（decorator）。这意味着在调用run_start(args)时，将会执行run_start=start_logging_process(configure_bigchaindb(run_start))，之后才调用真正的run_start(args)。装饰器的例子可以阅读博客（http://www.cnblogs.com/SeasonLee/articles/1719444.html），不过注意是先调用的configure_bigchaindb。

@configure_bigchaindb@start_logging_processdef run_start(args):

我们先来阅读两个装饰器的代码，再来看run_start函数

配置bigchaindb

configure_bigchaindb位于commands.utils中。

def configure_bigchaindb(command):    @functools.wraps(command)    def configure(args):        try:            print(">>> enter configure")            config_from_cmdline = {                'log': {                    'level_console': args.log_level,                    'level_logfile': args.log_level,                },                'server': {'loglevel': args.log_level},            }        except AttributeError:            config_from_cmdline = None        bigchaindb.config_utils.autoconfigure(            filename=args.config, config=config_from_cmdline, force=True)        command(args)    return configure

此时传入的command可以看成是带有装饰器start_logging_process的run_start函数，因此，configure函数的最后一句command(args)相当于执行了

@start_logging_processdef run_start(args):    ...run_start(args)

也就是说，会先执行start_logging_process，再执行真正的run_start。至于configure函数上的装饰器@functools.wraps(command)的目的在于确保原函数的一些属性不被装饰器函数所覆盖。如下面的例子，add函数的__name__已经被赋值为run。而使用functools.wraps能够确保原函数的属性不变。

>>> def test(func):...     def run(x1, x2):...         print("run>>")...         return func(x1,x2)...     return run>>> @test... def add(x1, x2):...     print("x1+x2=%d" % (x1+x2))... add(1,2)run>>x1+x2=3>>> print(add.__name__)run

再来看configure函数的具体内容，该函数调用了config_utils.autoconfigure，第一个参数为命令行中输入的配置文件的路径，第二个参数为一个说明日志输出级别的字典。

def autoconfigure(filename=None, config=None, force=False):    # start with the current configuration    newconfig = bigchaindb.config    # update configuration from file    try:        newconfig = update(newconfig, file_config(filename=filename))    except FileNotFoundError as e:        if filename:            raise        else:            logger.info('Cannot find config file `%s`.' % e.filename)    # override configuration with env variables    newconfig = env_config(newconfig)    if config:        newconfig = update(newconfig, config)    set_config(newconfig)  # sets bigchaindb.config

该函数首先将newconfig设置为默认的配置（位于bigchaindb/__init.py中），然后调用update来将配置文件中的json更新到newconfig中。file_config的作用在于使用json.load将配置文件中的json加载进来。update函数如下。作用在于递归地遍历配置文件的json，将key value同步到newconfig。

def update(d, u):    for k, v in u.items():        if isinstance(v, collections.Mapping):            r = update(d.get(k, {}), v)            d[k] = r        else:            d[k] = u[k]    return d

autoconfigure之后再依次利用现有的环境变量、利用形参传入的说明日志级别字典来更新newconfig，最后将newconfig设置为当前bigchaindb实例所使用的配置。

我们先来看env_config函数，该函数将一直调用到env_config->map_leafs->_inner，_inner拥有的两个变量分别为func指向函数load_from_env、mapping指向newconfig。_inner的作用方式如上面的update一样，递归遍历newconfig的值，对每个key调用load_from_env来进行重新赋值，调用时第一个参数为newconfig中某个key的value，第二个参数为一个表示路径的path。

若newconfig中有一项{'database': {'host': 'localhost'}}，那么load_from_env的形参为localhost, ['database', 'host']。而该函数的函数体则是根据path拼凑出环境变量的名字，再调用os.environ.get来取该环境变量来更新newconfig，若环境变量不存在，则依旧使用原来的value。

CONFIG_PREFIX = 'BIGCHAINDB'CONFIG_SEP = '_'def env_config(config):    def load_from_env(value, path):        var_name = CONFIG_SEP.join([CONFIG_PREFIX] + list(map(lambda s: s.upper(), path)))        return os.environ.get(var_name, value)    return map_leafs(load_from_env, config)def map_leafs(func, mapping):    def _inner(mapping, path=None):        if path is None:            path = []        for key, val in mapping.items():            if isinstance(val, collections.Mapping):                _inner(val, path + [key])            else:                mapping[key] = func(val, path=path+[key])        return mapping    return _inner(copy.deepcopy(mapping))

具体来看如何根据path来获取环境变量，即语句

var_name = CONFIG_SEP.join([CONFIG_PREFIX] + list(map(lambda s: s.upper(), path)))

lambda相当于是一个简单地函数，lambda s: s.upper()的含义为对于输入的字符串s，返回s大写之后的字符串。而map(func, seq)则是对序列seq的每一项用func进行计算，故[CONFIG_PREFIX] + list(map(lambda s: s.upper(), path))返回将path中每个元素变为大写后的序列，并在该序列最前面插入一个元素CONFIG_PREFIX。join函数则将序列转化为字符串，并且两个相邻元素之间用CONFIG_SEP相连。因此，当load_from_env的形参为localhost, ['database', 'host']时，对应的环境变量为BIGCHAINDB_DATABASE_HOST。

至此，autoconfigure已经获取到了更新之后的newconfig，最后一句set_config(newconfig)将newconfig设置为目前的配置。其中利用到了函数update_types，来利用map_leafs来遍历newconfig，从而根据bigchaindb.__init__.py中定义的config来更新newconfig的类型。配置完成！最终的配置存储在变量bigchaindb.config中。

def set_config(config):    # Deep copy the default config into bigchaindb.config    bigchaindb.config = copy.deepcopy(bigchaindb._config)    # Update the default config with whatever is in the passed config    update(bigchaindb.config, update_types(config, bigchaindb.config))    bigchaindb.config['CONFIGURED'] = True

启动日志

在配置完成后将调用start_logging_process。该函数在调用setup_logging后将调用真正的run_start。在启动日志时，bigchaindb使用publisher\subscriber的结构。

def start_logging_process(command):    @functools.wraps(command)    def start_logging(args):        from bigchaindb import config        setup_logging(user_log_config=config.get('log'))        command(args)    return start_loggingdef setup_logging(*, user_log_config=None):    setup_pub_logger()    setup_sub_logger(user_log_config=user_log_config)

setup_pub_logger启动publisher，并打开DEFAULT_SOCKET_LOGGING_PORT端口来创建一个socket handler。

def setup_pub_logger():    dictConfig(PUBLISHER_LOGGING_CONFIG)    socket_handler = logging.handlers.SocketHandler(        DEFAULT_SOCKET_LOGGING_HOST, DEFAULT_SOCKET_LOGGING_PORT)    socket_handler.setLevel(logging.DEBUG)    logger = logging.getLogger()    logger.addHandler(socket_handler)

setup_sub_logger使用配置文件中key为log下的配置参数接收端口DEFAULT_TCP_LOGGING_PORT的信息。这也意味着如果在同一节点上要启动两个bigchaindb实例将会打开两次端口DEFAULT_TCP_LOGGING_PORT，会出现地址已经在使用的错。

def setup_sub_logger(*, user_log_config=None):    server = LogRecordSocketServer()    with server:        server_proc = Process(            target=server.serve_forever,            kwargs={'log_config': user_log_config},        )        server_proc.start()class LogRecordSocketServer(ThreadingTCPServer):    allow_reuse_address = True    def __init__(self,                 host='localhost',                 port=logging.handlers.DEFAULT_TCP_LOGGING_PORT,                 handler=LogRecordStreamHandler):        super().__init__((host, port), handler)    def serve_forever(self, *, poll_interval=0.5, log_config=None):        sub_logging_config = create_subscriber_logging_config(            user_log_config=log_config)        dictConfig(sub_logging_config)        try:            super().serve_forever(poll_interval=poll_interval)        except KeyboardInterrupt:            pass

`run_start`

终于到了真正的run_start。忽略掉生成密钥等操作，该函数其实只调用了_run_init()与process.start()。前者会对后端存储的数据库进行一些初始化操作，包括创建数据库、创建表，以及创建创世区块。

@configure_bigchaindb@start_logging_processdef run_start(args):    ...    try:        _run_init()    except DatabaseAlreadyExists:        pass    except KeypairNotFoundException:        sys.exit(CANNOT_START_KEYPAIR_NOT_FOUND)    ...    processes.start()def _run_init():    # Try to access the keypair, throws an exception if it does not exist    b = bigchaindb.Bigchain()    schema.init_database(connection=b.connection)    b.create_genesis_block()    logger.info('Genesis block created.')

process.start则依次启动block、vote、stale、election等进程。

def start():    events_queue = setup_events_queue()    # start the processes    logger.info('Starting block')    block.start()    logger.info('Starting voter')    vote.start()    logger.info('Starting stale transaction monitor')    stale.start()    logger.info('Starting election')    election.start(events_queue=events_queue)    # start the web api    app_server = server.create_server(bigchaindb.config['server'])    p_webapi = mp.Process(name='webapi', target=app_server.run)    p_webapi.start()    logger.info('WebSocket server started')    p_websocket_server = mp.Process(name='ws',                                    target=websocket_server.start,                                    args=(events_queue,))    p_websocket_server.start()    # start message    logger.info(BANNER.format(bigchaindb.config['server']['bind']))

关于数据库以及这些进程的逻辑，下一篇再进行源码跟踪。。

阅读全文

0 0