Python之itertools模块

LULUBAO 2020-06-05

一、无限迭代器

1、itertools.count(start=0, step=1)

创建一个迭代器,返回一个以start开头,以step间隔的值。其大体如下:

def count(start=0, step=1):
    # count(10) --> 10 11 12 13 14 ...
    # count(2.5, 0.5) -> 2.5 3.0 3.5 ...
    n = start
    while True:
        yield n
        n += step

其实咧为:

from itertools import count
import time

for i in count(10):
    time.sleep(2)
    print(i) #10、11、12...

其中count(10)的类型为itertools.count类型,通过被用作map或者zip函数的参数。

比如:

#map使用
map(lambda x:x*2,count(5))

#zip使用
a = zip(count(10),‘xy‘)

print(list(a))
"""
[(10, ‘x‘), (11, ‘y‘)]
"""

 2、itertools.cycle(iterable)

创建一个迭代器,从迭代器返回元素,并且保存每个元素的副本。当迭代器迭代完毕后,从保存的副本中返回元素,无限重复。其大体如下:

def cycle(iterable):
    # cycle(‘ABCD‘) --> A B C D A B C D A B C D ...
    saved = []
    for element in iterable:
        yield element
        saved.append(element)
    while saved:
        for element in saved:
              yield element

实例为:

from itertools import cycle

print(cycle(‘ABCDE‘)) #<itertools.cycle object at 0x0000000000649448>

for item in cycle(‘ABCDE‘):
    print(item) # A、B、C、D、E、A、B、C、D、E...

3、itertools.repeat(object[, times])

 创建一个迭代器,一次又一次的返回对象,除非指定times对象,否则将一直运行下去。其大体如下:

def repeat(object, times=None):
    # repeat(10, 3) --> 10 10 10
    if times is None:
        while True:
            yield object
    else:
        for i in range(times):
            yield object

其可用于map和zip函数中:

In [1]: from itertools import repeat

In [2]: list(map(pow, range(10), repeat(2)))
Out[2]: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [3]: list(zip(range(5),repeat(10)))
Out[3]: [(0, 10), (1, 10), (2, 10), (3, 10), (4, 10)]

In [4]:

二、 迭代器终止最短输入序列

1、itertools.accumulate(iterable[, func])

创建一个迭代器,返回累加的总和或者是其它指定函数的累加结果(通过func函数进行指定),如果提供了func,则它应该是iterable输入的元素。如果输入的iterable为空,则输出的iterable也将为空。其大体如下:

def accumulate(iterable, func=operator.add):
    ‘Return running totals‘
    # accumulate([1,2,3,4,5]) --> 1 3 6 10 15
    # accumulate([1,2,3,4,5], operator.mul) --> 1 2 6 24 120
    it = iter(iterable)
    try:
        total = next(it)
    except StopIteration:
        return
    yield total
    for element in it:
        total = func(total, element)
        yield total

其实例为:

from itertools import accumulate

print(accumulate([1,2,3])) #<itertools.accumulate object at 0x00000000006E9448>
print(list(accumulate([1,2,3]))) #[1, 3, 6]
print(list(accumulate([1,2,3],lambda x,y:x*y))) #[1, 2, 6]

2、itertools.chain(*iterables) 

创建一个迭代器,该迭代器从第一个可迭代对象返回元素,直到耗尽为止,然后继续进行下一个可迭代对象,直到所有可迭代对象都耗尽为止。用于将连续序列视为单个序列。大致相当于:

def chain(*iterables):
    # chain(‘ABC‘, ‘DEF‘) --> A B C D E F
    for it in iterables:
        for element in it:
            yield element

其实例为:

from itertools import chain

print(list(chain([1,2,3],[5,6,7]))) #[1, 2, 3, 5, 6, 7]

 3、classmethod chain.from_iterable(iterable)

chain函数的替代构造函数,从一个单独的可迭代的参数获取连续的输入,大致相当于:

def from_iterable(iterables):
    # chain.from_iterable([‘ABC‘, ‘DEF‘]) --> A B C D E F
    for it in iterables:
        for element in it:
            yield element

其实例为:

from itertools import chain

print(list(chain.from_iterable([[1,2,3],[5,6,7]]))) #[1, 2, 3, 5, 6, 7]

4、itertools.compress(data, selectors)

创造一个迭代器,用于从数据中过滤元素,这些元素是选择器中对应的元素的结果为True。当数据或者选择器中的元素迭代完毕后停止,其大体相当于:

def compress(data, selectors):
    # compress(‘ABCDEF‘, [1,0,1,0,1,1]) --> A C E F
    return (d for d, s in zip(data, selectors) if s)

其实例为:

from itertools import compress

data = [1, 2, 3, 4]
selectors = [1, 0, 1, 0]
filter_data = compress(data, selectors)
print(filter_data)  # <itertools.compress object at 0x00000000009E5B00>
print(list(filter_data))  # [1, 3]

5、itertools.dropwhile(predicate, iterable)

创建一个迭代器,只要predicate为真就从iterable中删除对应的元素,然后返回iterable中剩余的元素。注意的是只要predicate为False,迭代器就不会产生任何元素了,其大体相当于:

def dropwhile(predicate, iterable):
    # dropwhile(lambda x: x<5, [1,4,6,4,1]) --> 6 4 1
    iterable = iter(iterable)
    for x in iterable:
        if not predicate(x):
            yield x
            break
    for x in iterable:
        yield x

其实例为:

from itertools import dropwhile

data = [1, 2, 3, 4, 5]
result = dropwhile(lambda x: x < 3, data)
print(result)  # <itertools.dropwhile object at 0x0000000000D5BD48>
print(list(result))  # [3, 4, 5]

6、itertools.filterfalse(predicate, iterable)

创建一个迭代器,过滤出那些当predicate为False时对应的iterable中的元素,如果predicate为None,则返回这个对应的元素。其大体相当于:

def filterfalse(predicate, iterable):
    # filterfalse(lambda x: x%2, range(10)) --> 0 2 4 6 8
    if predicate is None:
        predicate = bool
    for x in iterable:
        if not predicate(x):
            yield x

其实例为:

from itertools import filterfalse

data = [1, 2, 3, 4, 5]
result = filterfalse(lambda x: x % 2, data)
print(result)  # <itertools.filterfalse object at 0x0000000000675E10>
print(list(result))  # [2, 4]

7、itertools.groupby(iterable, key=None)

创建一个迭代器,从iterable中返回一系列的key和groups。其中key是一个函数,用于计算从iterable中每一个元素产生的key值。

class groupby:
    # [k for k, g in groupby(‘AAAABBBCCDAABBB‘)] --> A B C D A B
    # [list(g) for k, g in groupby(‘AAAABBBCCD‘)] --> AAAA BBB CC D
    def __init__(self, iterable, key=None):
        if key is None:
            key = lambda x: x
        self.keyfunc = key
        self.it = iter(iterable)
        self.tgtkey = self.currkey = self.currvalue = object()
    def __iter__(self):
        return self
    def __next__(self):
        while self.currkey == self.tgtkey:
            self.currvalue = next(self.it)    # Exit on StopIteration
            self.currkey = self.keyfunc(self.currvalue)
        self.tgtkey = self.currkey
        return (self.currkey, self._grouper(self.tgtkey))
    def _grouper(self, tgtkey):
        while self.currkey == tgtkey:
            yield self.currvalue
            try:
                self.currvalue = next(self.it)
            except StopIteration:
                return
            self.currkey = self.keyfunc(self.currvalue)

8、itertools.islice(iterable, start, stop[, step])

创建一个迭代器,返回从iterable中选择的元素。如果start非零,则iterable中的元素一直被取出直到取出的个数到达start截止;如果stop是None,则直到iterable中的元素耗尽为止,islice方法对于start、stop、step不支持负数。其大致相当于:

def islice(iterable, *args):
    # islice(‘ABCDEFG‘, 2) --> A B
    # islice(‘ABCDEFG‘, 2, 4) --> C D
    # islice(‘ABCDEFG‘, 2, None) --> C D E F G
    # islice(‘ABCDEFG‘, 0, None, 2) --> A C E G
    s = slice(*args)
    it = iter(range(s.start or 0, s.stop or sys.maxsize, s.step or 1))
    try:
        nexti = next(it)
    except StopIteration:
        return
    for i, element in enumerate(iterable):
        if i == nexti:
            yield element
            nexti = next(it)

特别的是如果start是None,迭代器是从0开始,如果step是None,默认是从1。

其实例为:

from itertools import islice

data = [1, 2, 3, 4, 5, 6]
result = islice(data, 1, 5)
print(result)  # <itertools.islice object at 0x000000000A426EF8>
print(list(result))  # [2, 3, 4, 5]

 9、itertools.starmap(function, iterable)

创建一个迭代器,从iterable中获取参数来计算函数,map()和starmap()的区别相当于function(a,b)和function(*c),其大体如下:

def starmap(function, iterable):
    # starmap(pow, [(2,5), (3,2), (10,3)]) --> 32 9 1000
    for args in iterable:
        yield function(*args)

10、itertools.takewhile(predicate, iterable)

创建一个迭代器,只要predicate为True,就返回与之对应的iterable中的元素。其大体如下:

def takewhile(predicate, iterable):
    # takewhile(lambda x: x<5, [1,4,6,4,1]) --> 1 4
    for x in iterable:
        if predicate(x):
            yield x
        else:
            break

11、itertools.tee(iterable, n=2)

从一个iterable返回n个独立的迭代器。其大体如下:

def tee(iterable, n=2):
    it = iter(iterable)
    deques = [collections.deque() for i in range(n)]
    def gen(mydeque):
        while True:
            if not mydeque:             # when the local deque is empty
                try:
                    newval = next(it)   # fetch a new value and
                except StopIteration:
                    return
                for d in deques:        # load it to all the deques
                    d.append(newval)
            yield mydeque.popleft()
    return tuple(gen(d) for d in deques)

实例为:

from itertools import tee

result = tee([1,2,3],2)
print(result) #(<itertools._tee object at 0x0000000000669448>, <itertools._tee object at 0x00000000006CBD08>)
for item in result:
    print(list(item)) #[1, 2, 3], [1, 2, 3]

 12、itertools.zip_longest(*iterables, fillvalue=None)

创建一个迭代器,以聚合每个iterable中的元素,如果iterable中元素的长度不均匀,则用fillvalue进行填充缺失值,迭代一直持续到最长的iterable耗尽为止。其大体相当于:

class ZipExhausted(Exception):
    pass

def zip_longest(*args, **kwds):
    # zip_longest(‘ABCD‘, ‘xy‘, fillvalue=‘-‘) --> Ax By C- D-
    fillvalue = kwds.get(‘fillvalue‘)
    counter = len(args) - 1
    def sentinel():
        nonlocal counter
        if not counter:
            raise ZipExhausted
        counter -= 1
        yield fillvalue
    fillers = repeat(fillvalue)
    iterators = [chain(it, sentinel(), fillers) for it in args]
    try:
        while iterators:
            yield tuple(map(next, iterators))
    except ZipExhausted:
        pass

三、组合迭代器

1、itertools.product(*iterables, repeat=1)

大致等效于生成器中的for循环:

((x,y) for x in A for y in B)

其大体如下:

def product(*args, repeat=1):
    # product(‘ABCD‘, ‘xy‘) --> Ax Ay Bx By Cx Cy Dx Dy
    # product(range(2), repeat=3) --> 000 001 010 011 100 101 110 111
    pools = [tuple(pool) for pool in args] * repeat
    result = [[]]
    for pool in pools:
        result = [x+[y] for x in result for y in pool]
    for prod in result:
        yield tuple(prod)

2、itertools.permutations(iterable, r=None)

def permutations(iterable, r=None):
    # permutations(‘ABCD‘, 2) --> AB AC AD BA BC BD CA CB CD DA DB DC
    # permutations(range(3)) --> 012 021 102 120 201 210
    pool = tuple(iterable)
    n = len(pool)
    r = n if r is None else r
    if r > n:
        return
    indices = list(range(n))
    cycles = list(range(n, n-r, -1))
    yield tuple(pool[i] for i in indices[:r])
    while n:
        for i in reversed(range(r)):
            cycles[i] -= 1
            if cycles[i] == 0:
                indices[i:] = indices[i+1:] + indices[i:i+1]
                cycles[i] = n - i
            else:
                j = cycles[i]
                indices[i], indices[-j] = indices[-j], indices[i]
                yield tuple(pool[i] for i in indices[:r])
                break
        else:
            return
permutations也可以用product函数来进行表示,只要排除那些重复的元素即可。
def permutations(iterable, r=None):
    pool = tuple(iterable)
    n = len(pool)
    r = n if r is None else r
    for indices in product(range(n), repeat=r):
        if len(set(indices)) == r:
            yield tuple(pool[i] for i in indices)

3、itertools.combinations(iterable, r)

组合按字典顺序排序。因此,如果对输入的iterable进行排序,则将按排序顺序生成组合元组。其大体相当于:

def combinations(iterable, r):
    # combinations(‘ABCD‘, 2) --> AB AC AD BC BD CD
    # combinations(range(4), 3) --> 012 013 023 123
    pool = tuple(iterable)
    n = len(pool)
    if r > n:
        return
    indices = list(range(r))
    yield tuple(pool[i] for i in indices)
    while True:
        for i in reversed(range(r)):
            if indices[i] != i + n - r:
                break
        else:
            return
        indices[i] += 1
        for j in range(i+1, r):
            indices[j] = indices[j-1] + 1
        yield tuple(pool[i] for i in indices)

4、itertools.combinations_with_replacement(iterable, r)

从输入迭代返回元素的r长度子序列, 允许单个元素重复多次。组合按字典顺序排序。因此,如果对输入的iterable进行排序,则将按排序顺序生成组合元组。其大体如下:

def combinations_with_replacement(iterable, r):
    # combinations_with_replacement(‘ABC‘, 2) --> AA AB AC BB BC CC
    pool = tuple(iterable)
    n = len(pool)
    if not n and r:
        return
    indices = [0] * r
    yield tuple(pool[i] for i in indices)
    while True:
        for i in reversed(range(r)):
            if indices[i] != n - 1:
                break
        else:
            return
        indices[i:] = [indices[i] + 1] * (r - i)
        yield tuple(pool[i] for i in indices)

相关推荐