如何在PYTHON和C++中把mutable类型应用到集合类SET

来源:互联网 发布:阿里云在线客服 编辑:程序博客网 时间:2024/06/03 21:34

引言

手上的项目要求要保存滚动更新的item,实现方法是首先保存当前的所有item(60条),下一个周期所有item再次更新后再次保存(60条),两个集合求差集可求出新增的item,自然想到用set容器来解决问题。但是面临的新问题是set中的数据要求是immutable,但项目中的类型数据SinaItem中却包含要改变数据成员,无法应该set容器,怎么办?
一开始想到的笨方法是把项目类型SinaItem中的不变成员url放到set中,再创建个map容器,保存url到SinaItem的映射关系,再创建个新类型setmap来保存这两个容器,提供“增删改查”方法来保持set和map key的一一对应关系。这种方法虽然可行,但是太过low逼,不符合我追求完美解决方案的性格,放弃。
目前能想到的最好的方法是改造SinaItem类,在相应语言数据模型下提供set的容器要求的方法,满足set容器对元素immutable的要求,也就是告知set容器什么是可hash的。

python3实现:

#! /usr/bin/env python3                                                                                 import hashlib                                                                                          class Type:                                                                                                 def __init__(self, key, val):                                                                               self.key_ = key                                                                                         self.val_ = val                                                                                     def __hash__(self):                                                                                         return hash(self.key_)                                                                                                                                                                                         def __eq__(self, rhs):                                                                                      return (self.key_ == rhs.key_)                                                                  if __name__ == "__main__":                                                                                  hash_set = set()                                                                                        hash_set.add(Type('1', 2))                                                                              hash_set.add(Type('2', 2))                                                                              for e in hash_set:                                                                                          e.val_ = 3                                                                                              break                                                                                               for e in hash_set:                                                                                          print(e.key_, e.val_) 

运行结果:
$ python test_hash.py
1 3
2 2

C++11 实现:

#include <set>#include <unordered_set>#include <iostream>#include <functional>struct Type {    int key = 0;    int val = 0;    Type(const int& i_key, const int& i_val) : key{i_key}, val{i_val} {}    bool operator<(const Type& rhs) const {        return (this->key < rhs.key);    }    bool operator>(const Type& rhs) const {        return (this->key > rhs.key);    }    bool operator==(const Type& rhs) const {        return (this->key == rhs.key);    }};namespace std{template<>struct hash<Type> {    typedef Type argument_type;    typedef size_t result_type;    result_type operator()(const argument_type& t) const noexcept {        return std::hash<int>{}(t.key);    }};}  // namespace std// typedef std::set<Type> SET;typedef std::unordered_set<Type> SET;int main() {    Type i{1, 2};    Type j{2, 2};    SET hash_set;    hash_set.insert(i);    hash_set.insert(j);    for (const auto& e : hash_set) {        std::cout << e.key << '\t' << e.val << '\n';    }    return 0;}

编译:
g++ -std=c++11 test_hash.cpp -o test_hash
运行结果:
$ ./test_hash
2 2
1 2

可以看到set集合中的value是可以被赋值的。一个集合set就满足了项目要求,完美的解决方案。

下面附上hash()用法:

object.hash(self)
Called by built-in function hash() and for operations on members of hashed collections including set, frozenset, and dict. hash() should return an integer. The only required property is that objects which compare equal have the same hash value; it is advised to mix together the hash values of the components of the object that also play a part in comparison of objects by packing them into a tuple and hashing the tuple. Example:

def hash(self):
return hash((self.name, self.nick, self.color))
Note hash() truncates the value returned from an object’s custom hash() method to the size of a Py_ssize_t. This is typically 8 bytes on 64-bit builds and 4 bytes on 32-bit builds. If an object’s hash() must interoperate on builds of different bit sizes, be sure to check the width on all supported builds. An easy way to do this is with python -c “import sys; print(sys.hash_info.width)”.
If a class does not define an eq() method it should not define a hash() operation either; if it defines eq() but not hash(), its instances will not be usable as items in hashable collections. If a class defines mutable objects and implements an eq() method, it should not implement hash(), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).

User-defined classes have eq() and hash() methods by default; with them, all objects compare unequal (except with themselves) and x.hash() returns an appropriate value such that x == y implies both that x is y and hash(x) == hash(y).

A class that overrides eq() and does not define hash() will have its hash() implicitly set to None. When the hash() method of a class is None, instances of the class will raise an appropriate TypeError when a program attempts to retrieve their hash value, and will also be correctly identified as unhashable when checking isinstance(obj, collections.Hashable).

If a class that overrides eq() needs to retain the implementation of hash() from a parent class, the interpreter must be told this explicitly by setting hash = .hash.

If a class that does not override eq() wishes to suppress hash support, it should include hash = None in the class definition. A class which defines its own hash() that explicitly raises a TypeError would be incorrectly identified as hashable by an isinstance(obj, collections.Hashable) call.

Note By default, the hash() values of str, bytes and datetime objects are “salted” with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python.
This is intended to provide protection against a denial-of-service caused by carefully-chosen inputs that exploit the worst case performance of a dict insertion, O(n^2) complexity. See http://www.ocert.org/advisories/ocert-2011-003.html for details.
Changing hash values affects the iteration order of dicts, sets and other mappings. Python has never made guarantees about this ordering (and it typically varies between 32-bit and 64-bit builds).
See also PYTHONHASHSEED.
Changed in version 3.3: Hash randomization is enabled by default.

原创粉丝点击