鲁鲁酱 2020-05-05
在collections的源码中,可以看到:
‘‘‘This module implements specialized container datatypes providing
alternatives to Python‘s general purpose built-in containers, dict,
list, set, and tuple.
* namedtuple factory function for creating tuple subclasses with named fields
* deque list-like container with fast appends and pops on either end
* ChainMap dict-like class for creating a single view of multiple mappings
* Counter dict subclass for counting hashable objects
* OrderedDict dict subclass that remembers the order entries were added
* defaultdict dict subclass that calls a factory function to supply missing values
* UserDict wrapper around dictionary objects for easier dict subclassing
* UserList wrapper around list objects for easier list subclassing
* UserString wrapper around string objects for easier string subclassing
‘‘‘
__all__ = [‘deque‘, ‘defaultdict‘, ‘namedtuple‘, ‘UserDict‘, ‘UserList‘,
‘UserString‘, ‘Counter‘, ‘OrderedDict‘, ‘ChainMap‘]这也就说明collections模块包含以下内容:
namedTuple是Tuple的子类,所以Tuple有的特性,namedTuple都存在,那么Tuple有什么特性呢?
Tuple是不可变的数据类型:
>>> user_tuple = ("zhangsan",30) #创建Tuple对象一旦创建不可更改,比如做如下的更改操作:
>>> user_tuple[1] = 32
就会报错:
Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: ‘tuple‘ object does not support item assignment
但是,Tuple的不可变也不是绝对的,我们看到Tuple内部的元素都是不可变的,如果改变内部可变的数据类型是没有问题的:
>>> user_tuple = ("zhangsan",30,["reading","movies"])
>>> user_tuple[2].append("animals")
>>> user_tuple
(‘zhangsan‘, 30, [‘reading‘, ‘movies‘, ‘animals‘])
>>>Tuple像列表等数据类型一样,是可迭代对象,自然拥有循环取值、切片这些特性:
#for循环迭代取值
user_tuple = ("zhangsan",30)
for el in user_tuple:
print(el)number_Tuple = (1,2,3,4) first,*others = number_Tuple print(first,others) #1 [2, 3, 4]
字典的key都是不可变类型,也就是说必须是可哈希的:
condition = ("a","b")
filter_dict = {}
filter_dict[condition] = "result"
print(filter_dict) #{(‘a‘, ‘b‘): ‘result‘}我们一般创建类是这样来创建的:
class User:
def __init__(self,username,password):
self.username = username
self.password = password
user = User("zhangsan",123456)
print(user.username,user.password) #zhangsan 123456但是使用namedTuple可以更简单的创建:
User = namedtuple("User",["username","password"])
user = User("zhangsan",123456)
print(user.username,user.password) #zhangsan 123456至于参数传递实际上与class类中传递是一样的,可以通过*args,**kwargs。
#Tuple传值
User = namedtuple("User",["username","password"])
args = ("zhangsan",123456)
user = User(*args)
print(user.username,user.password) #zhangsan 123456
#Dict传值
User = namedtuple("User",["username","password"])
kwargs = {"username":"zhangsan","password":123456}
user = User(**kwargs)
print(user.username,user.password) #zhangsan 123456在上面的传值中,我们使用**args或者**kwargs来进行传值,那么通过_make方法可以更简单的进行传值:
from collections import namedtuple
#define class
User = namedtuple("User",["username","password"])
#define parameters
parameters_list = ["zhangsan",123456]
parameters_tuple = ("zhangsan",123456)
parameters_dict = {"username":"zhangsan","password":123456}
#init object
user = User._make(parameters_list)
user1 = User._make(parameters_tuple)
user2 = User._make(parameters_dict)
#output
print(user.username,user.password) #zhangsan 123456
print(user1.username,user.password) #zhangsan 123456
print(user2.username,user.password) #zhangsan 123456可以看到在_make方法中只需要传递可迭代对象的参数即可。
@classmethod
def _make(cls, iterable, new=tuple.__new__, len=len):
‘Make a new {typename} object from a sequence or iterable‘
result = new(cls, iterable)
if len(result) != {num_fields:d}:
raise TypeError(‘Expected {num_fields:d} arguments, got %d‘ % len(result))
return result_make
该方法可以输出OrderDict类型的结果,将字典进行排序后输出。
from collections import namedtuple
User = namedtuple("User",["username","password"])
kwargs = {"username":"zhangsan"}
user = User(**kwargs,password=123456)
print(user) #User(username=‘zhangsan‘, password=123456)
user_dict = user._asdict()
print(user_dict) #OrderedDict([(‘username‘, ‘zhangsan‘), (‘password‘, 123456)])defaultdict是内置dict的子类,也就是说dict有的特性它都有,另外在源码中:
class defaultdict(dict):
def __init__(self, default_factory=None, **kwargs): # known case of _collections.defaultdict.__init__
"""
defaultdict(default_factory[, ...]) --> dict with default factory
The default factory is called without arguments to produce
a new value when a key is not present, in __getitem__ only.
A defaultdict compares equal to a dict with the same items.
All remaining arguments are treated the same as if they were
passed to the dict constructor, including keyword arguments.
# (copied from class doc)
"""
pass从这里可以知道有一个参数是default_factory函数,它是在当dict中的key不存在时,会被给予给默认值。假如现在有这样一个实例:
s = [‘yellow‘, ‘blue‘,‘yellow‘, ‘blue‘,‘red‘]
统计s列表中每个元素出现的个数,我们可能更多的使用如下的方式来实现:
from collections import defaultdict
s = [‘yellow‘, ‘blue‘,‘yellow‘, ‘blue‘,‘red‘]
count_dict = {}
for i in s:
if i not in count_dict:
count_dict[i] = 1
else:
count_dict[i] += 1
print(count_dict) #{‘yellow‘: 2, ‘blue‘: 2, ‘red‘: 1}使用defaultdict可以更容易的来实现上述过程:
from collections import defaultdict
s = [‘yellow‘, ‘blue‘,‘yellow‘, ‘blue‘,‘red‘]
d = defaultdict(int) #key值不存在就会使用int类型的默认值,默认为0,相当于{“yellow”:0,"blue":0,"red":0}
for i in s:
d[i] += 1
print(d) #defaultdict(<class ‘int‘>, {‘red‘: 1, ‘yellow‘: 2, ‘blue‘: 2})另外可以使用其构造更为复杂的数据结构:
from collections import defaultdict
def gen_default():
return {
"username":"",
"age":0
}
d = defaultdict(gen_default)
d["g1"] #g1键值不存在会生成默认的数据结构{"g1":{"username":"","age":0}}先看源码:
class deque(object):
def __init__(self, iterable=(), maxlen=None): # known case of _collections.deque.__init__
"""
deque([iterable[, maxlen]]) --> deque object
A list-like sequence optimized for data accesses near its endpoints.
# (copied from class doc)
"""
pass初始化一个双端队列的话,需要传入一个可迭代对象。
from collections import deque d = deque(["a","b","c"]) print(d) #deque([‘a‘, ‘b‘, ‘c‘])
当然,也可以传入元祖和字典(得到的是key值)。
deque中有很多方法:
class deque(object):
"""
deque([iterable[, maxlen]]) --> deque object
A list-like sequence optimized for data accesses near its endpoints.
"""
def append(self, *args, **kwargs): # real signature unknown
""" Add an element to the right side of the deque. """
pass
def appendleft(self, *args, **kwargs): # real signature unknown
""" Add an element to the left side of the deque. """
pass
def clear(self, *args, **kwargs): # real signature unknown
""" Remove all elements from the deque. """
pass
def copy(self, *args, **kwargs): # real signature unknown
""" Return a shallow copy of a deque. """
pass
def count(self, value): # real signature unknown; restored from __doc__
""" D.count(value) -> integer -- return number of occurrences of value """
return 0
def extend(self, *args, **kwargs): # real signature unknown
""" Extend the right side of the deque with elements from the iterable """
pass
def extendleft(self, *args, **kwargs): # real signature unknown
""" Extend the left side of the deque with elements from the iterable """
pass
def index(self, value, start=None, stop=None): # real signature unknown; restored from __doc__
"""
D.index(value, [start, [stop]]) -> integer -- return first index of value.
Raises ValueError if the value is not present.
"""
return 0
def insert(self, index, p_object): # real signature unknown; restored from __doc__
""" D.insert(index, object) -- insert object before index """
pass
def pop(self, *args, **kwargs): # real signature unknown
""" Remove and return the rightmost element. """
pass
def popleft(self, *args, **kwargs): # real signature unknown
""" Remove and return the leftmost element. """
pass
def remove(self, value): # real signature unknown; restored from __doc__
""" D.remove(value) -- remove first occurrence of value. """
pass
def reverse(self): # real signature unknown; restored from __doc__
""" D.reverse() -- reverse *IN PLACE* """
pass
def rotate(self, *args, **kwargs): # real signature unknown
""" Rotate the deque n steps to the right (default n=1). If n is negative, rotates left. """
pass源码中的方法
from collections import deque d = deque(["a","b","c"]) print(d.pop()) #c print(d) #deque([‘a‘, ‘b‘])
from collections import deque d = deque(["a","b","c"]) print(d.popleft()) #a print(d) #deque([‘b‘, ‘c‘])
from collections import deque
d = deque(["a","b","c"])
d.append("d")
print(d) #deque([‘a‘, ‘b‘, ‘c‘, ‘d‘])from collections import deque
d = deque(["a","b","c"])
d.appendleft("d")
print(d) #deque([‘d‘, ‘a‘, ‘b‘, ‘c‘])from collections import deque d1 = deque(["a","b","c"]) d2 = deque(["d","e","f"]) d1.extend(d2) print(d1) #deque([‘a‘, ‘b‘, ‘c‘, ‘d‘, ‘e‘, ‘f‘])
注意:extend没有返回值,d1调用extend就是对d1的扩展。
from collections import deque d = deque(["a","b","c"]) d.insert(1,"d") print(d) #deque([‘a‘, ‘d‘, ‘b‘, ‘c‘])
from collections import deque d = deque(["a","b","c"]) d.reverse() print(d) #deque([‘c‘, ‘b‘, ‘a‘])
from collections import deque
d1 = deque(["a","b","c"])
d2 = d1.copy()
#id不同证明是不同的变量
print(id(d1)) #173766656
print(id(d2)) #173766864
#拷贝之后操作d1对d2没影响
d1.insert(2,"d")
print(d1) #deque([‘a‘, ‘b‘, ‘d‘, ‘c‘])
print(d2) #deque([‘a‘, ‘b‘, ‘c‘])
#如果d1中有可变元素
d3 = deque(["a","b",["c","d"]])
d4 = d3.copy()
print(id(d3)) #173570256
print(id(d4)) #173570360
#操作可变元素,也就是说虽然d3和d4是不同的变量了,但是对于内部的可变元素是指引,不可变元素才是真正的拷贝互不影响
d3[2].append("e")
print(d3) #deque([‘a‘, ‘b‘, [‘c‘, ‘d‘, ‘e‘]])
print(d4) #deque([‘a‘, ‘b‘, [‘c‘, ‘d‘, ‘e‘]])还有很多方法,其余的可以参考源码进行学习。
Counter类是Python内置dict的一个子类,也就是说dict有的特性它都有,它主要是用来进行数据统计的。它是一个无序集合,其中元素被存储为字典的键,计数被存储为字典的值。计数可以被允许是整数、零或者负数。
可以向Counter类中传递可迭代对象,比如:字符串、列表:
from collections import Counter
counter1 = Counter("ABCDAD")
print(counter1) #Counter({‘D‘: 2, ‘A‘: 2, ‘C‘: 1, ‘B‘: 1})
"""
因为Counter返回的是一个字典(是dict的子类),所以可以有字典的方法
"""
counter2 = Counter("DEFABC")
counter1.update(counter2)
print(counter1) #Counter({‘A‘: 3, ‘D‘: 3, ‘C‘: 2, ‘B‘: 2, ‘F‘: 1, ‘E‘: 1})from collections import Counter
counter1 = Counter(["A", "B", "C", "D", "A", "D"])
print(counter1) #Counter({‘D‘: 2, ‘A‘: 2, ‘C‘: 1, ‘B‘: 1})
"""
因为Counter返回的是一个字典(是dict的子类),所以可以有字典的方法
"""
counter2 = Counter(["D", "E", "F", "A", "B", "C"])
counter1.update(counter2)
print(counter1) #Counter({‘A‘: 3, ‘D‘: 3, ‘C‘: 2, ‘B‘: 2, ‘F‘: 1, ‘E‘: 1})在Counter类中有一个most_common方法返回的是个数最多的前几项。
from collections import Counter top3 = Counter(‘abcdeabcdabcaba‘).most_common(3) print(top3) #[(‘a‘, 5), (‘b‘, 4), (‘c‘, 3)]
源码:
class Counter(dict):
def most_common(self, n=None):
‘‘‘List the n most common elements and their counts from the most
common to the least. If n is None, then list all element counts.
>>> Counter(‘abcdeabcdabcaba‘).most_common(3)
[(‘a‘, 5), (‘b‘, 4), (‘c‘, 3)]
‘‘‘
# Emulate Bag.sortedByCount from Smalltalk
if n is None:
return sorted(self.items(), key=_itemgetter(1), reverse=True)
return _heapq.nlargest(n, self.items(), key=_itemgetter(1))#迭代器遍历每个元素的次数与它的计数相同
c = Counter("ABCDAD")
print(sorted(c.elements())) #[‘A‘, ‘A‘, ‘B‘, ‘C‘, ‘D‘, ‘D‘]元素从一个可迭代的或从另一个映射(或计数器)中减去。
from collections import Counter
c = Counter(a=4, b=2, c=0, d=-2)
d = Counter(a=1, b=2, c=3, d=4)
c.subtract(d)
print(c) #Counter({‘a‘: 3, ‘b‘: 0, ‘c‘: -3, ‘d‘: -6})OrderDict是dict的子类,它拥有dict的所有特性,而它本身是有序的(记住插入顺序的字典)。
from collections import OrderedDict d = OrderedDict() #初始化一个字典 d["a"] = 1 d["b"] = 2 d["c"] = 3 print(d) #OrderedDict([(‘a‘, 1), (‘b‘, 2), (‘c‘, 3)])
可以看到最后生成的结果并不是无序的,而是按照插入到字典中的元素进行排序的。
在OrderDict中有很多的方法,比如:
移除最后一个添加的元素。
from collections import OrderedDict d = OrderedDict() #初始化一个字典 d["a"] = 1 d["b"] = 2 d["c"] = 3 print(d) #OrderedDict([(‘a‘, 1), (‘b‘, 2), (‘c‘, 3)]) print(d.popitem()) #(‘c‘, 3) print(d) #OrderedDict([(‘a‘, 1), (‘b‘, 2)])
移动一个已经存在的元素到OrderDict的元素最后。
from collections import OrderedDict
d = OrderedDict() #初始化一个字典
d["a"] = 1
d["b"] = 2
d["c"] = 3
print(d) #OrderedDict([(‘a‘, 1), (‘b‘, 2), (‘c‘, 3)])
d.move_to_end("b")
print(d) #OrderedDict([(‘a‘, 1), (‘c‘, 3), (‘b‘, 2)])移除指定key值得元素
from collections import OrderedDict
d = OrderedDict() #初始化一个字典
d["a"] = 1
d["b"] = 2
d["c"] = 3
print(d) #OrderedDict([(‘a‘, 1), (‘b‘, 2), (‘c‘, 3)])
print(d.pop("b")) #2
print(d) #OrderedDict([(‘a‘, 1), (‘c‘, 3)])Chain是将多个dict或者映射组合在一起,从而创建单一的、可更新的视图。比如下面的情况:
d1 = {"a":1,"b":2}
d2 = {"c":1,"d":2}
#循环打印上面的字典
for k,v in d1.items():
print(k,v)
for k,v in d2.items():
print(k,v)上面的两个字典,分别单独使用for循环打印,如果使用ChainMap就可以这样来做:
from collections import ChainMap
d1 = {"a":1,"b":2}
d2 = {"c":1,"d":2}
d3 = ChainMap(d1,d2)
print(d3) #ChainMap({‘a‘: 1, ‘b‘: 2}, {‘c‘: 1, ‘d‘: 2})
for k,v in d3.items():
print(k,v)注意的是ChainMap对两个字典的合并并非是将其拷贝到另一个空间进行合并,只是对之前的两个字典进行指向。当然除了合并还有其它方法,比如:
返回一个新的ChainMap,其中包含一个新映射,以及当前实例中的所有映射。
from collections import ChainMap
d1 = {"a":1,"b":2}
d2 = {"c":1,"d":2}
d3 = ChainMap(d1,d2)
d4 = d3.new_child({"e":5}) #添加新的ChainMap
print(d4) # ChainMap({‘e‘: 5}, {‘b‘: 2, ‘a‘: 1}, {‘d‘: 2, ‘c‘: 1})这是一个属性,返回一个新的ChainMap包含当前实例中除了第一个以外所有的maps。
from collections import ChainMap
d1 = {"a":1,"b":2}
d2 = {"c":1,"d":2}
d3 = ChainMap(d1,d2)
print(d3.parents) #ChainMap({‘d‘: 2, ‘c‘: 1})这是一个属性,返回的是所有maps组成的列表。
from collections import ChainMap
d1 = {"a":1,"b":2}
d2 = {"c":1,"d":2}
d3 = ChainMap(d1,d2)
print(d3.maps) #[{‘b‘: 2, ‘a‘: 1}, {‘d‘: 2, ‘c‘: 1}]