Scrapy源码分析(二):Setting相关类定义

来源:互联网 发布:软件过程模型 编辑:程序博客网 时间:2024/04/30 15:07

Scrapy中的BaseSetting的行为是一个包含优先级的字典,当多次set一个key时,value中只保存优先级最大的一个。

首先看几个辅助的元素,优先级字典:

SETTINGS_PRIORITIES = {    'default': 0,    'command': 10,    'project': 20,    'spider': 30,    'cmdline': 40,}

对于优先级字典的包装函数,如果是字符串型参数,则返回dict中对应的值,否则返回参数本身(当做是int型了):

def get_settings_priority(priority):    """    Small helper function that looks up a given string priority in the    :attr:`~scrapytest.settings.SETTINGS_PRIORITIES` dictionary and returns its    numerical value, or directly returns a given numerical priority.    """    if isinstance(priority, six.string_types):        return SETTINGS_PRIORITIES[priority]    else:        return priority

BaseSetting中的属性类,考虑到了value本身是BaseSettings类实例的情况,需要取较高的优先级:

class SettingsAttribute(object):    """Class for storing data related to settings attributes.    This class is intended for internal usage, you should try Settings class    for settings configuration, not this one.    """    def __init__(self, value, priority):        self.value = value        if isinstance(self.value, BaseSettings):            self.priority = max(self.value.maxpriority(), priority)        else:            self.priority = priority    def set(self, value, priority):        """Sets value if priority is higher or equal than current priority."""        if priority >= self.priority:            if isinstance(self.value, BaseSettings):                value = BaseSettings(value, priority=priority)            self.value = value            self.priority = priority    def __str__(self):        return "<SettingsAttribute value={self.value!r} " \               "priority={self.priority}>".format(self=self)    __repr__ = __str__
接下来是重头戏,BaseSetting类,这个类的特点类似于内置dict型,但是可以为属性设置优先级,并且冻结全部属性,使其无法被修改。

class BaseSettings(MutableMapping):    """    Instances of this class behave like dictionaries, but store priorities    along with their ``(key, value)`` pairs, and can be frozen (i.e. marked    immutable).    Key-value entries can be passed on initialization with the ``values``    argument, and they would take the ``priority`` level (unless ``values`` is    already an instance of :class:`~scrapytest.settings.BaseSettings`, in which    case the existing priority levels will be kept).  If the ``priority``    argument is a string, the priority name will be looked up in    :attr:`~scrapytest.settings.SETTINGS_PRIORITIES`. Otherwise, a specific integer    should be provided.    Once the object is created, new settings can be loaded or updated with the    :meth:`~scrapytest.settings.BaseSettings.set` method, and can be accessed with    the square bracket notation of dictionaries, or with the    :meth:`~scrapytest.settings.BaseSettings.get` method of the instance and its    value conversion variants. When requesting a stored key, the value with the    highest priority will be retrieved.    """    def __init__(self, values=None, priority='project'):        self.frozen = False        self.attributes = {}        self.update(values, priority)    def __getitem__(self, opt_name):        if opt_name not in self:            return None        return self.attributes[opt_name].value    def __contains__(self, name):        return name in self.attributes    def get(self, name, default=None):        """        Get a setting value without affecting its original type.        :param name: the setting name        :type name: string        :param default: the value to return if no setting is found        :type default: any        """        return self[name] if self[name] is not None else default    def getbool(self, name, default=False):        """        Get a setting value as a boolean.        ``1``, ``'1'``, and ``True`` return ``True``, while ``0``, ``'0'``,        ``False`` and ``None`` return ``False``.        For example, settings populated through environment variables set to        ``'0'`` will return ``False`` when using this method.        :param name: the setting name        :type name: string        :param default: the value to return if no setting is found        :type default: any        """        return bool(int(self.get(name, default)))    def getint(self, name, default=0):        """        Get a setting value as an int.        :param name: the setting name        :type name: string        :param default: the value to return if no setting is found        :type default: any        """        return int(self.get(name, default))    def getfloat(self, name, default=0.0):        """        Get a setting value as a float.        :param name: the setting name        :type name: string        :param default: the value to return if no setting is found        :type default: any        """        return float(self.get(name, default))    def getlist(self, name, default=None):        """        Get a setting value as a list. If the setting original type is a list, a        copy of it will be returned. If it's a string it will be split by ",".        For example, settings populated through environment variables set to        ``'one,two'`` will return a list ['one', 'two'] when using this method.        :param name: the setting name        :type name: string        :param default: the value to return if no setting is found        :type default: any        """        value = self.get(name, default or [])        if isinstance(value, six.string_types):            value = value.split(',')        return list(value)    def getdict(self, name, default=None):        """        Get a setting value as a dictionary. If the setting original type is a        dictionary, a copy of it will be returned. If it is a string it will be        evaluated as a JSON dictionary. In the case that it is a        :class:`~scrapytest.settings.BaseSettings` instance itself, it will be        converted to a dictionary, containing all its current settings values        as they would be returned by :meth:`~scrapytest.settings.BaseSettings.get`,        and losing all information about priority and mutability.        :param name: the setting name        :type name: string        :param default: the value to return if no setting is found        :type default: any        """        value = self.get(name, default or {})        if isinstance(value, six.string_types):            value = json.loads(value)        return dict(value)    def getwithbase(self, name):        """Get a composition of a dictionary-like setting and its `_BASE`        counterpart.        :param name: name of the dictionary-like setting        :type name: string        """        compbs = BaseSettings()        compbs.update(self[name + '_BASE'])        compbs.update(self[name])        return compbs    def getpriority(self, name):        """        Return the current numerical priority value of a setting, or ``None`` if        the given ``name`` does not exist.        :param name: the setting name        :type name: string        """        if name not in self:            return None        return self.attributes[name].priority    def maxpriority(self):        """        Return the numerical value of the highest priority present throughout        all settings, or the numerical value for ``default`` from        :attr:`~scrapytest.settings.SETTINGS_PRIORITIES` if there are no settings        stored.        """        if len(self) > 0:            return max(self.getpriority(name) for name in self)        else:            return get_settings_priority('default')    def __setitem__(self, name, value):        self.set(name, value)    def set(self, name, value, priority='project'):        """        Store a key/value attribute with a given priority.        Settings should be populated *before* configuring the Crawler object        (through the :meth:`~scrapytest.crawler.Crawler.configure` method),        otherwise they won't have any effect.        :param name: the setting name        :type name: string        :param value: the value to associate with the setting        :type value: any        :param priority: the priority of the setting. Should be a key of            :attr:`~scrapytest.settings.SETTINGS_PRIORITIES` or an integer        :type priority: string or int        """        self._assert_mutability()        priority = get_settings_priority(priority)        if name not in self:            if isinstance(value, SettingsAttribute):                self.attributes[name] = value            else:                self.attributes[name] = SettingsAttribute(value, priority)        else:            self.attributes[name].set(value, priority)    def setdict(self, values, priority='project'):        self.update(values, priority)    def setmodule(self, module, priority='project'):        """        Store settings from a module with a given priority.        This is a helper function that calls        :meth:`~scrapytest.settings.BaseSettings.set` for every globally declared        uppercase variable of ``module`` with the provided ``priority``.        :param module: the module or the path of the module        :type module: module object or string        :param priority: the priority of the settings. Should be a key of            :attr:`~scrapytest.settings.SETTINGS_PRIORITIES` or an integer        :type priority: string or int        """        self._assert_mutability()        if isinstance(module, six.string_types):            module = import_module(module)        for key in dir(module):            if key.isupper():                self.set(key, getattr(module, key), priority)    def update(self, values, priority='project'):        """        Store key/value pairs with a given priority.        This is a helper function that calls        :meth:`~scrapytest.settings.BaseSettings.set` for every item of ``values``        with the provided ``priority``.        If ``values`` is a string, it is assumed to be JSON-encoded and parsed        into a dict with ``json.loads()`` first. If it is a        :class:`~scrapytest.settings.BaseSettings` instance, the per-key priorities        will be used and the ``priority`` parameter ignored. This allows        inserting/updating settings with different priorities with a single        command.        :param values: the settings names and values        :type values: dict or string or :class:`~scrapy.settings.BaseSettings`        :param priority: the priority of the settings. Should be a key of            :attr:`~scrapytest.settings.SETTINGS_PRIORITIES` or an integer        :type priority: string or int        """        self._assert_mutability()        if isinstance(values, six.string_types):            values = json.loads(values)        if values is not None:            if isinstance(values, BaseSettings):                for name, value in six.iteritems(values):                    self.set(name, value, values.getpriority(name))            else:                for name, value in six.iteritems(values):                    self.set(name, value, priority)    def delete(self, name, priority='project'):        self._assert_mutability()        priority = get_settings_priority(priority)        if priority >= self.getpriority(name):            del self.attributes[name]    def __delitem__(self, name):        self._assert_mutability()        del self.attributes[name]    def _assert_mutability(self):        if self.frozen:            raise TypeError("Trying to modify an immutable Settings object")    def copy(self):        """        Make a deep copy of current settings.        This method returns a new instance of the :class:`Settings` class,        populated with the same values and their priorities.        Modifications to the new object won't be reflected on the original        settings.        """        return copy.deepcopy(self)    def freeze(self):        """        Disable further changes to the current settings.        After calling this method, the present state of the settings will become        immutable. Trying to change values through the :meth:`~set` method and        its variants won't be possible and will be alerted.        """        self.frozen = True    def frozencopy(self):        """        Return an immutable copy of the current settings.        Alias for a :meth:`~freeze` call in the object returned by :meth:`copy`.        """        copy = self.copy()        copy.freeze()        return copy    def __iter__(self):        return iter(self.attributes)    def __len__(self):        return len(self.attributes)    def _to_dict(self):        return {k: (v._to_dict() if isinstance(v, BaseSettings) else v)                for k, v in six.iteritems(self)}    def copy_to_dict(self):        """        Make a copy of current settings and convert to a dict.        This method returns a new dict populated with the same values        and their priorities as the current settings.        Modifications to the returned dict won't be reflected on the original        settings.        This method can be useful for example for printing settings        in Scrapy shell.        """        settings = self.copy()        return settings._to_dict()    def _repr_pretty_(self, p, cycle):        if cycle:            p.text(repr(self))        else:            p.text(pformat(self.copy_to_dict()))    @property    def overrides(self):        warnings.warn("`Settings.overrides` attribute is deprecated and won't "                      "be supported in Scrapy 0.26, use "                      "`Settings.set(name, value, priority='cmdline')` instead",                      category=ScrapyDeprecationWarning, stacklevel=2)        try:            o = self._overrides        except AttributeError:            self._overrides = o = _DictProxy(self, 'cmdline')        return o    @property    def defaults(self):        warnings.warn("`Settings.defaults` attribute is deprecated and won't "                      "be supported in Scrapy 0.26, use "                      "`Settings.set(name, value, priority='default')` instead",                      category=ScrapyDeprecationWarning, stacklevel=2)        try:            o = self._defaults        except AttributeError:            self._defaults = o = _DictProxy(self, 'default')        return o

下面我们把各个方法挑出来,看看都做了什么:

    def __init__(self, values=None, priority='project'):        self.frozen = False        self.attributes = {}        self.update(values, priority)    def __getitem__(self, opt_name):        if opt_name not in self:            return None        return self.attributes[opt_name].value    def __contains__(self, name):        return name in self.attributes

1、__init__

设定frozen为False,attributes为空字典,然后调用update方法更新value和priority。

2、__getitem__

这个方法影响使用key直接访问实例时,如print dict[key]。其中的in调用__contains__方法。

3、__contains__

影响实例的in操作

    def get(self, name, default=None):        """        Get a setting value without affecting its original type.        :param name: the setting name        :type name: string        :param default: the value to return if no setting is found        :type default: any        """        return self[name] if self[name] is not None else default    def getbool(self, name, default=False):        """        Get a setting value as a boolean.        ``1``, ``'1'``, and ``True`` return ``True``, while ``0``, ``'0'``,        ``False`` and ``None`` return ``False``.        For example, settings populated through environment variables set to        ``'0'`` will return ``False`` when using this method.        :param name: the setting name        :type name: string        :param default: the value to return if no setting is found        :type default: any        """        return bool(int(self.get(name, default)))    def getint(self, name, default=0):        """        Get a setting value as an int.        :param name: the setting name        :type name: string        :param default: the value to return if no setting is found        :type default: any        """        return int(self.get(name, default))    def getfloat(self, name, default=0.0):        """        Get a setting value as a float.        :param name: the setting name        :type name: string        :param default: the value to return if no setting is found        :type default: any        """        return float(self.get(name, default))

4、get

使用方法同dict.get

5、getbool、getint、getfloat

在get方法上加了一步类型转换

    def getlist(self, name, default=None):        """        Get a setting value as a list. If the setting original type is a list, a        copy of it will be returned. If it's a string it will be split by ",".        For example, settings populated through environment variables set to        ``'one,two'`` will return a list ['one', 'two'] when using this method.        :param name: the setting name        :type name: string        :param default: the value to return if no setting is found        :type default: any        """        value = self.get(name, default or [])        if isinstance(value, six.string_types):            value = value.split(',')        return list(value)    def getdict(self, name, default=None):        """        Get a setting value as a dictionary. If the setting original type is a        dictionary, a copy of it will be returned. If it is a string it will be        evaluated as a JSON dictionary. In the case that it is a        :class:`~scrapytest.settings.BaseSettings` instance itself, it will be        converted to a dictionary, containing all its current settings values        as they would be returned by :meth:`~scrapytest.settings.BaseSettings.get`,        and losing all information about priority and mutability.        :param name: the setting name        :type name: string        :param default: the value to return if no setting is found        :type default: any        """        value = self.get(name, default or {})        if isinstance(value, six.string_types):            value = json.loads(value)        return dict(value)

6、getlist、getdict

针对类型为list和dict的对象,考虑到了属性本身是字符串的情况。返回的是原值的一个copy。

    def getwithbase(self, name):        """Get a composition of a dictionary-like setting and its `_BASE`        counterpart.        :param name: name of the dictionary-like setting        :type name: string        """        compbs = BaseSettings()        compbs.update(self[name + '_BASE'])        compbs.update(self[name])        return compbs

7、getwithbase

使用BaseSetting类包装返回结果

    def getpriority(self, name):        """        Return the current numerical priority value of a setting, or ``None`` if        the given ``name`` does not exist.        :param name: the setting name        :type name: string        """        if name not in self:            return None        return self.attributes[name].priority    def maxpriority(self):        """        Return the numerical value of the highest priority present throughout        all settings, or the numerical value for ``default`` from        :attr:`~scrapytest.settings.SETTINGS_PRIORITIES` if there are no settings        stored.        """        if len(self) > 0:            return max(self.getpriority(name) for name in self)        else:            return get_settings_priority('default')

8、getpriority

返回一个key对应的priority

9、maxpriority

返回一个Setting对象中priority最大的priority值

    def __setitem__(self, name, value):        self.set(name, value)    def set(self, name, value, priority='project'):        """        Store a key/value attribute with a given priority.        Settings should be populated *before* configuring the Crawler object        (through the :meth:`~scrapytest.crawler.Crawler.configure` method),        otherwise they won't have any effect.        :param name: the setting name        :type name: string        :param value: the value to associate with the setting        :type value: any        :param priority: the priority of the setting. Should be a key of            :attr:`~scrapytest.settings.SETTINGS_PRIORITIES` or an integer        :type priority: string or int        """        self._assert_mutability()        priority = get_settings_priority(priority)        if name not in self:            if isinstance(value, SettingsAttribute):                self.attributes[name] = value            else:                self.attributes[name] = SettingsAttribute(value, priority)        else:            self.attributes[name].set(value, priority)

10、__setitem__

实现了set

11、set

使用SettingsAttribute包装属性,应注意如果name对应的SettingsAttribute存在时,只有当SettingsAttribute的priority低于新来的priority,才可以被set。

    def setdict(self, values, priority='project'):        self.update(values, priority)    def setmodule(self, module, priority='project'):        """        Store settings from a module with a given priority.        This is a helper function that calls        :meth:`~scrapytest.settings.BaseSettings.set` for every globally declared        uppercase variable of ``module`` with the provided ``priority``.        :param module: the module or the path of the module        :type module: module object or string        :param priority: the priority of the settings. Should be a key of            :attr:`~scrapytest.settings.SETTINGS_PRIORITIES` or an integer        :type priority: string or int        """        self._assert_mutability()        if isinstance(module, six.string_types):            module = import_module(module)        for key in dir(module):            if key.isupper():                self.set(key, getattr(module, key), priority)
12、setdict、setmodule

使用dict和module设置setting的对象。

def update(self, values, priority='project'):        """        Store key/value pairs with a given priority.        This is a helper function that calls        :meth:`~scrapytest.settings.BaseSettings.set` for every item of ``values``        with the provided ``priority``.        If ``values`` is a string, it is assumed to be JSON-encoded and parsed        into a dict with ``json.loads()`` first. If it is a        :class:`~scrapytest.settings.BaseSettings` instance, the per-key priorities        will be used and the ``priority`` parameter ignored. This allows        inserting/updating settings with different priorities with a single        command.        :param values: the settings names and values        :type values: dict or string or :class:`~scrapy.settings.BaseSettings`        :param priority: the priority of the settings. Should be a key of            :attr:`~scrapytest.settings.SETTINGS_PRIORITIES` or an integer        :type priority: string or int        """        self._assert_mutability()        if isinstance(values, six.string_types):            values = json.loads(values)        if values is not None:            if isinstance(values, BaseSettings):                for name, value in six.iteritems(values):                    self.set(name, value, values.getpriority(name))            else:                for name, value in six.iteritems(values):                    self.set(name, value, priority)

13、update

value参数只能有三种取值:BaseSetting对象、字典和json.dumps(dict)

    def delete(self, name, priority='project'):        self._assert_mutability()        priority = get_settings_priority(priority)        if priority >= self.getpriority(name):            del self.attributes[name]    def __delitem__(self, name):        self._assert_mutability()        del self.attributes[name]    def _assert_mutability(self):        if self.frozen:            raise TypeError("Trying to modify an immutable Settings object")

14、delete、__delitem__

删除一个key,delete方法多加了对priority的判断

15、_assert_mutability

如果试图修改一个已经被锁定的Setting,会报错。

    def copy(self):        """        Make a deep copy of current settings.        This method returns a new instance of the :class:`Settings` class,        populated with the same values and their priorities.        Modifications to the new object won't be reflected on the original        settings.        """        return copy.deepcopy(self)    def freeze(self):        """        Disable further changes to the current settings.        After calling this method, the present state of the settings will become        immutable. Trying to change values through the :meth:`~set` method and        its variants won't be possible and will be alerted.        """        self.frozen = True    def frozencopy(self):        """        Return an immutable copy of the current settings.        Alias for a :meth:`~freeze` call in the object returned by :meth:`copy`.        """        copy = self.copy()        copy.freeze()        return copy

16、copy

创造实例的副本

17、freeze

冻结(锁定)实例,不允许修改

18、frozencopy

返回一个被冻结的copy

    def __iter__(self):        return iter(self.attributes)    def __len__(self):        return len(self.attributes)    def _to_dict(self):        return {k: (v._to_dict() if isinstance(v, BaseSettings) else v)                for k, v in six.iteritems(self)}    def copy_to_dict(self):        """        Make a copy of current settings and convert to a dict.        This method returns a new dict populated with the same values        and their priorities as the current settings.        Modifications to the returned dict won't be reflected on the original        settings.        This method can be useful for example for printing settings        in Scrapy shell.        """        settings = self.copy()        return settings._to_dict()


19、__iter__、__len__、_to_dict、copy_to_dict

四个简单方法,不再赘述

    def _repr_pretty_(self, p, cycle):        if cycle:            p.text(repr(self))        else:            p.text(pformat(self.copy_to_dict()))
20、_repr_pretty_

格式化输出Setting

    @property    def overrides(self):        warnings.warn("`Settings.overrides` attribute is deprecated and won't "                      "be supported in Scrapy 0.26, use "                      "`Settings.set(name, value, priority='cmdline')` instead",                      category=ScrapyDeprecationWarning, stacklevel=2)        try:            o = self._overrides        except AttributeError:            self._overrides = o = _DictProxy(self, 'cmdline')        return o    @property    def defaults(self):        warnings.warn("`Settings.defaults` attribute is deprecated and won't "                      "be supported in Scrapy 0.26, use "                      "`Settings.set(name, value, priority='default')` instead",                      category=ScrapyDeprecationWarning, stacklevel=2)        try:            o = self._defaults        except AttributeError:            self._defaults = o = _DictProxy(self, 'default')        return o

21、overrides、defaults

此处是两个弃用方法。@property的用法见相关链接

class _DictProxy(MutableMapping):    def __init__(self, settings, priority):        self.o = {}        self.settings = settings        self.priority = priority    def __len__(self):        return len(self.o)    def __getitem__(self, k):        return self.o[k]    def __setitem__(self, k, v):        self.settings.set(k, v, priority=self.priority)        self.o[k] = v    def __delitem__(self, k):        del self.o[k]    def __iter__(self, k, v):        return iter(self.o)

_DictProxy这个类存在的意义也是为了兼容以前的代码吧,不做过多研究了。

class Settings(BaseSettings):    """    This object stores Scrapy settings for the configuration of internal    components, and can be used for any further customization.    It is a direct subclass and supports all methods of    :class:`~scrapytest.settings.BaseSettings`. Additionally, after instantiation    of this class, the new object will have the global default settings    described on :ref:`topics-settings-ref` already populated.    """    def __init__(self, values=None, priority='project'):        # Do not pass kwarg values here. We don't want to promote user-defined        # dicts, and we want to update, not replace, default dicts with the        # values given by the user        super(Settings, self).__init__()        self.setmodule(default_settings, 'default')        # Promote default dictionaries to BaseSettings instances for per-key        # priorities        for name, val in six.iteritems(self):            if isinstance(val, dict):                self.set(name, BaseSettings(val, 'default'), 'default')        self.update(values, priority)

该类是BaseSetting的直接派生类,在__init__当中干了两件事:

1、default_settings

使用default_settings初始化Setting对象

2、将default_settings中的dict转换为BaseSetting对象


class CrawlerSettings(Settings):    def __init__(self, settings_module=None, **kw):        self.settings_module = settings_module        Settings.__init__(self, **kw)    def __getitem__(self, opt_name):        if opt_name in self.overrides:            return self.overrides[opt_name]        if self.settings_module and hasattr(self.settings_module, opt_name):            return getattr(self.settings_module, opt_name)        if opt_name in self.defaults:            return self.defaults[opt_name]        return Settings.__getitem__(self, opt_name)    def __str__(self):        return "<CrawlerSettings module=%r>" % self.settings_moduleCrawlerSettings = create_deprecated_class(    'CrawlerSettings', CrawlerSettings,    new_class_path='scrapytest.settings.Settings')

一个将要废弃的类。

def iter_default_settings():    """Return the default settings as an iterator of (name, value) tuples"""    for name in dir(default_settings):        if name.isupper():            yield name, getattr(default_settings, name)def overridden_settings(settings):    """Return a dict of the settings that have been overridden"""    for name, defvalue in iter_default_settings():        value = settings[name]        if not isinstance(defvalue, dict) and value != defvalue:            yield name, value

iter_default_settings:

以生成器方式返回default setting的值。

overridden_settings:
返回setting中修改了default中的部分。



相关链接:

8.3. collections — High-performance container datatypes

Python进阶之“属性(property)”详解

0 0
原创粉丝点击