10.1. itertools
— 为高效循环创建迭代器的函数¶
这个模块受APL、 Hkell 和 SML 的影响实现了大量的 iterator 构建基块构造。每个已经被重新塑造为适合 Python 的形式。
该模块标准化了一组核心的快速,内存高效的工具,这些工具本身或组合是有用的。在一起,它们形成一个“迭代代数”,使得可以在纯Python中简洁和高效地构建专用工具。
例如,SML 提供了一个制表工具︰ tabulate(f)
生产序列 f(0), f(1), ...
。同样的效果可以通过 Python 中 map ()
结合的 count ()
形如map (f, count())
。
这些工具和内置的相当的工具 operator
模块中高速的函数也能够出色地一起工作。例如,乘法运算符可以映射跨两个向量以形成有效的点积︰ sum (map(operator.mul, vector1, vector2))
。
无限迭代器︰
迭代器 | 参数 | 结果 | 例 |
---|---|---|---|
count() | start, [step] | start, start+step, start+2*step, ... | count(10) --> 10 11 12 13 14 ... |
cycle() | p | p0, p1, ... plast, p0, p1, ... | cycle('ABCD') - > A B C t5 > D A B C D ... t11 > |
repeat() | elem [,n] | elem, elem, elem, ... endlessly or up to n times | repeat(10, 3) --> 10 10 10 |
在最短输入序列上终止的迭代器:
迭代器 | 参数 | 结果 | 例 |
---|---|---|---|
accumulate() | p [,func] | p0,p0 + p1,p0 + p1 + p2,... | accumulate([1,2,3,4,5]) - > 1 3 t4 > 6 10 15 |
chain() | p, q, ... | p0,p1,... plast,q0,q1,... | chain('ABC', 'DEF') - A B C D E F |
chain.from_iterable() | iterable | p0,p1,... plast,q0,q1,... | chain.from_iterable(['ABC', 'DEF']) - > A B C D E F |
compress() | data, selectors | (d[0] if s[0]), (d[1] if s[1]), ... | compress('ABCDEF', [1,0,1,0,1,1]) - A C E F |
dropwhile() | pred, seq | seq[n], seq[n+1], starting when pred fails | (lambda x: x [1,4,6,4,1] t4> - > 6 4 1 |
filterfalse() | pred, seq | elements of seq where pred(elem) is false | filterfalse(lambda x: x%2, range(10)) > 0 2 4 6 8 |
groupby() | iterable[, keyfunc] | sub-iterators grouped by value of keyfunc(v) | |
islice() | seq, [start,] stop [, step] | elements from seq[start:stop:step] | islice('ABCDEFG', 2, None) - > / t5> D E F G |
starmap() | func, seq | func(*seq[0]), func(*seq[1]), ... | starmap(pow, [(2,5), (3,2), ) - > 32 9 1000 |
takewhile() | pred, seq | seq[0], seq[1], until pred fails | takewhile(lambda x: x [1,4,6,4,1] t4> - > 1 4 |
tee() | it, n | it1, it2, ... itn splits one iterator into n | |
zip_longest() | p, q, ... | (p[0], q[0]), (p[1], q[1]), ... | zip_longest('ABCD', 'xy', fillvalue =' - ') - > t4 > Ax 由 C - D - |
组合生成器:
迭代器 | 参数 | 结果 |
---|---|---|
product() | p, q, ... [repeat=1] | 笛卡尔积,相当于嵌套for循环 |
permutations() | p[, r] | r长度元组,所有可能的顺序,没有重复的元素 |
combinations() | p, r | r长度元组,按排序顺序,没有重复的元素 |
combinations_with_replacement() | p, r | r长度元组,按排序顺序,重复元素 |
product('ABCD', repeat=2) | AA AB AC AD BA BB BC BD CA CB CC CD DA DB DC DD | |
permutations('ABCD', 2) | AB AC AD BA BC BD t6 > CA CB CD DA DB DC / t0> | |
combinations('ABCD', 2) | AB AC AD BC BD CD t6 > | |
combinations_with_replacement('ABCD', 2) | AA AB AC AD BB BC t6 > BD CC CD DD |
10.1.1.Itertool函数¶
以下的模块级别函数都构造并返回一个迭代对象。有些对象提供无限长的流,所以它们只能通过函数或者循环来截断它们。
itertools.
accumulate
(iterable[, func])¶创建的迭代对象返回被计算的sums值或者其它二元函数的结果(通过指定func参数)。func应该是接收两个参数的函数。输入iterable的元素可以是可以作为func的参数接受的任何类型。(例如,使用默认的加法操作,元素可以是任何可添加类型,包括
Decimal
或Fraction
。)如果输入iterable为空,则输出iterable也将为空。大致相当于:
def accumulate(iterable, func=operator.add): 'Return running totals' # accumulate([1,2,3,4,5]) --> 1 3 6 10 15 # accumulate([1,2,3,4,5], operator.mul) --> 1 2 6 24 120 it = iter(iterable) try: total = next(it) except StopIteration: return yield total for element in it: total = func(total, element) yield total
对于func参数有许多用途。It can be set to
min()
for a running minimum,max()
for a running maximum, oroperator.mul()
for a running product. 摊销表可以通过累积利息和应用付款来构建。一阶递归关系可以通过在迭代中提供初始值并仅使用func参数中的累加总数来建模:>>> data = [3, 4, 6, 2, 1, 9, 0, 7, 5, 8] >>> list(accumulate(data, operator.mul)) # running product [3, 12, 72, 144, 144, 1296, 0, 0, 0, 0] >>> list(accumulate(data, max)) # running maximum [3, 4, 6, 6, 6, 9, 9, 9, 9, 9] # Amortize a 5% loan of 1000 with 4 annual payments of 90 >>> cashflows = [1000, -90, -90, -90, -90] >>> list(accumulate(cashflows, lambda bal, pmt: bal*1.05 + pmt)) [1000, 960.0, 918.0, 873.9000000000001, 827.5950000000001] # Chaotic recurrence relation https://en.wikipedia.org/wiki/Logistic_map >>> logistic_map = lambda x, _: r * x * (1 - x) >>> r = 3.8 >>> x0 = 0.4 >>> inputs = repeat(x0, 36) # only the initial value is used >>> [format(x, '.2f') for x in accumulate(inputs, logistic_map)] ['0.40', '0.91', '0.30', '0.81', '0.60', '0.92', '0.29', '0.79', '0.63', '0.88', '0.39', '0.90', '0.33', '0.84', '0.52', '0.95', '0.18', '0.57', '0.93', '0.25', '0.71', '0.79', '0.63', '0.88', '0.39', '0.91', '0.32', '0.83', '0.54', '0.95', '0.20', '0.60', '0.91', '0.30', '0.80', '0.60']
有关仅返回最终累加值的类似函数,请参见
functools.reduce()
。版本3.2中的新功能。
在版本3.3中已更改:添加了可选的func参数。
itertools.
chain
(*iterables)¶创建一个迭代器,从第一个迭代器返回元素,直到它耗尽,然后继续下一个迭代器,直到所有的迭代器都用尽。用于将连续序列作为单个序列处理。大致相当于:
def chain(*iterables): # chain('ABC', 'DEF') --> A B C D E F for it in iterables: for element in it: yield element
- classmethod
chain.
from_iterable
(iterable)¶ chain()
的替代构造函数。从一个可计算延迟的可迭代参数获取链接的输入。大致相当于:def from_iterable(iterables): # chain.from_iterable(['ABC', 'DEF']) --> A B C D E F for it in iterables: for element in it: yield element
itertools.
combinations
(iterable, r)¶从输入可迭代返回r个元素的长度子序列。
组合以词典排序顺序发出。因此,如果输入可迭代被排序,则组合元组将以排序顺序产生。
元素根据它们的位置而不是它们的值被视为唯一的。因此,如果输入元素是唯一的,则在每个组合中将不存在重复值。
大致相当于:
def combinations(iterable, r): # combinations('ABCD', 2) --> AB AC AD BC BD CD # combinations(range(4), 3) --> 012 013 023 123 pool = tuple(iterable) n = len(pool) if r > n: return indices = list(range(r)) yield tuple(pool[i] for i in indices) while True: for i in reversed(range(r)): if indices[i] != i + n - r: break else: return indices[i] += 1 for j in range(i+1, r): indices[j] = indices[j-1] + 1 yield tuple(pool[i] for i in indices)
combinations()
的代码还可以在过滤条目之后被表示为permutations()
的子序列,其中元素不是按照排序顺序(根据它们在输入中的位置池):def combinations(iterable, r): pool = tuple(iterable) n = len(pool) for indices in permutations(range(n), r): if sorted(indices) == list(indices): yield tuple(pool[i] for i in indices)
返回的项目数为
n! / r! / (nr)!
当0 r n当
r > n
时,t0>或零。
itertools.
combinations_with_replacement
(iterable, r)¶从输入可迭代返回元素的r长度子序列,允许单个元素重复一次以上。
组合以词典排序顺序发出。因此,如果输入可迭代被排序,则组合元组将以排序顺序产生。
元素根据它们的位置而不是它们的值被视为唯一的。因此,如果输入元素是唯一的,则生成的组合也将是唯一的。
大致相当于:
def combinations_with_replacement(iterable, r): # combinations_with_replacement('ABC', 2) --> AA AB AC BB BC CC pool = tuple(iterable) n = len(pool) if not n and r: return indices = [0] * r yield tuple(pool[i] for i in indices) while True: for i in reversed(range(r)): if indices[i] != n - 1: break else: return indices[i:] = [indices[i] + 1] * (r - i) yield tuple(pool[i] for i in indices)
combinations_with_replacement()
的代码也可以在过滤条目之后被表示为product()
的子序列,其中元素不是按照排序顺序(根据它们在输入中的位置池):def combinations_with_replacement(iterable, r): pool = tuple(iterable) n = len(pool) for indices in product(range(n), repeat=r): if sorted(indices) == list(indices): yield tuple(pool[i] for i in indices)
The number of items returned is
(n+r-1)! / r! / (n-1)!
当n > 0
时。版本3.1中的新功能。
itertools.
compress
(data, selectors)¶创建一个迭代器,用于过滤数据中的元素,只返回在选择器中具有对应元素的元素,其计算结果为
True
。当数据或选择器迭代可用时停止。大致相当于:def compress(data, selectors): # compress('ABCDEF', [1,0,1,0,1,1]) --> A C E F return (d for d, s in zip(data, selectors) if s)
版本3.1中的新功能。
itertools.
count
(start=0, step=1)¶使迭代器以数字开始开始返回均匀间隔的值。通常用作
map()
的参数以生成连续的数据点。此外,与zip()
一起使用可添加序列号。大致相当于:def count(start=0, step=1): # count(10) --> 10 11 12 13 14 ... # count(2.5, 0.5) -> 2.5 3.0 3.5 ... n = start while True: yield n n += step
When counting with floating point numbers, better accuracy can sometimes be achieved by substituting multiplicative code such as:
(start + step * i for i in count())
.在版本3.1中已更改:添加了步骤参数,并允许使用非整数参数。
itertools.
cycle
(iterable)¶使迭代器从可迭代器返回元素并保存每个元素的副本。当iterable耗尽时,从保存的副本返回元素。重复无限。大致相当于:
def cycle(iterable): # cycle('ABCD') --> A B C D A B C D A B C D ... saved = [] for element in iterable: yield element saved.append(element) while saved: for element in saved: yield element
注意,工具包的这个成员可能需要大量的辅助存储(取决于可迭代的长度)。
itertools.
dropwhile
(predicate, iterable)¶创建一个迭代器,只要谓词为真,就从迭代中删除元素;之后,返回每个元素。注意,迭代器不会产生任何输出,直到谓词首次变为假,因此它可能有一个冗长的启动时间。大致相当于:
def dropwhile(predicate, iterable): # dropwhile(lambda x: x<5, [1,4,6,4,1]) --> 6 4 1 iterable = iter(iterable) for x in iterable: if not predicate(x): yield x break for x in iterable: yield x
itertools.
filterfalse
(predicate, iterable)¶创建一个迭代器,过滤可迭代元素,只返回谓词为
False
的元素。如果谓词是None
,则返回false的项目。大致相当于:def filterfalse(predicate, iterable): # filterfalse(lambda x: x%2, range(10)) --> 0 2 4 6 8 if predicate is None: predicate = bool for x in iterable: if not predicate(x): yield x
itertools.
groupby
(iterable, key=None)¶创建一个迭代器,从可迭代返回连续的键和组。键是计算每个元素的键值的函数。如果未指定或
None
,则键默认为标识函数,并且不更改元素。通常,迭代需要已经在相同的键函数上排序。groupby()
的操作类似于Unix中的uniq
过滤器。每当键函数的值发生变化时(这就是为什么通常需要使用相同的键函数对数据进行排序),它会生成一个断点或新的组。这种行为不同于SQL的GROUP BY,它集合了公共元素,而不考虑它们的输入顺序。返回的组本身是一个迭代器,它与
groupby()
共享底层的可迭代器。因为源是共享的,所以当groupby()
对象高级时,先前的组不再可见。因此,如果以后需要该数据,则应将其存储为列表:groups = [] uniquekeys = [] data = sorted(data, key=keyfunc) for k, g in groupby(data, keyfunc): groups.append(list(g)) # Store group iterator as a list uniquekeys.append(k)
groupby()
大致相当于:class groupby: # [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B # [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D def __init__(self, iterable, key=None): if key is None: key = lambda x: x self.keyfunc = key self.it = iter(iterable) self.tgtkey = self.currkey = self.currvalue = object() def __iter__(self): return self def __next__(self): while self.currkey == self.tgtkey: self.currvalue = next(self.it) # Exit on StopIteration self.currkey = self.keyfunc(self.currvalue) self.tgtkey = self.currkey return (self.currkey, self._grouper(self.tgtkey)) def _grouper(self, tgtkey): while self.currkey == tgtkey: yield self.currvalue try: self.currvalue = next(self.it) except StopIteration: return self.currkey = self.keyfunc(self.currvalue)
itertools.
islice
(iterable, stop)¶itertools.
islice
(iterable, start, stop[, step])创建一个迭代器,从迭代器返回选定的元素。如果start不为零,则跳过来自可迭代的元素,直到达到开始。之后,连续返回元素,除非步骤设置为高于导致项目被跳过的元素。如果停止是
None
,则迭代继续,直到迭代器耗尽,如果有的话;否则,它停在指定位置。不像常规切片,islice()
不支持开始,停止或步骤的负值。可用于从内部结构已展平的数据中提取相关字段(例如,多行报告可在每第三行列出名称字段)。大致相当于:def islice(iterable, *args): # islice('ABCDEFG', 2) --> A B # islice('ABCDEFG', 2, 4) --> C D # islice('ABCDEFG', 2, None) --> C D E F G # islice('ABCDEFG', 0, None, 2) --> A C E G s = slice(*args) it = iter(range(s.start or 0, s.stop or sys.maxsize, s.step or 1)) try: nexti = next(it) except StopIteration: return for i, element in enumerate(iterable): if i == nexti: yield element nexti = next(it)
如果开始为
None
,则迭代从零开始。如果步骤为None
,则步骤默认为一。
itertools.
permutations
(iterable, r=None)¶返回可迭代中的元素的连续r长度排列。
如果r未指定或
None
,则r默认为可迭代的长度,长度排列。排列按照字典排序顺序排列。因此,如果输入iterable被排序,则排列元组将以排序顺序产生。
元素根据它们的位置而不是它们的值被视为唯一的。因此,如果输入元素是唯一的,则在每个排列中将不存在重复值。
大致相当于:
def permutations(iterable, r=None): # permutations('ABCD', 2) --> AB AC AD BA BC BD CA CB CD DA DB DC # permutations(range(3)) --> 012 021 102 120 201 210 pool = tuple(iterable) n = len(pool) r = n if r is None else r if r > n: return indices = list(range(n)) cycles = list(range(n, n-r, -1)) yield tuple(pool[i] for i in indices[:r]) while n: for i in reversed(range(r)): cycles[i] -= 1 if cycles[i] == 0: indices[i:] = indices[i+1:] + indices[i:i+1] cycles[i] = n - i else: j = cycles[i] indices[i], indices[-j] = indices[-j], indices[i] yield tuple(pool[i] for i in indices[:r]) break else: return
permutations()
的代码也可以表示为product()
的子序列,被过滤以排除具有重复元素的条目(来自输入池中相同位置的条目) :def permutations(iterable, r=None): pool = tuple(iterable) n = len(pool) r = n if r is None else r for indices in product(range(n), repeat=r): if len(set(indices)) == r: yield tuple(pool[i] for i in indices)
返回的项目数为
n! / (n-r)!
当0 r n当
r > n
时,t0>或零。
itertools.
product
(*iterables, repeat=1)¶输入迭代的笛卡尔乘积。
大致等同于生成器表达式中的嵌套for循环。例如,
乘积(A, B)
返回与((x,y) for x in A for y in B)
。嵌套循环像一个里程表,最右边的元素在每次迭代前进。此模式创建词典顺序,以便如果输入的iterable被排序,则产品元组按排序顺序发出。
要计算迭代值与自身的乘积,请使用可选的repeat关键字参数指定重复次数。例如,
产品(A, repeat = 4)
表示与A, A, A)
。这个函数大致相当于下面的代码,除了实际的实现不会在内存中建立中间结果:
def product(*args, repeat=1): # product('ABCD', 'xy') --> Ax Ay Bx By Cx Cy Dx Dy # product(range(2), repeat=3) --> 000 001 010 011 100 101 110 111 pools = [tuple(pool) for pool in args] * repeat result = [[]] for pool in pools: result = [x+[y] for x in result for y in pool] for prod in result: yield tuple(prod)
itertools.
repeat
(object[, times])¶创建一个反复返回对象的迭代器。除非指定次参数,否则无限期运行。用作
map()
的参数,用于调用函数的不变参数。还与zip()
一起使用以创建元组记录的不变部分。大致相当于:
def repeat(object, times=None): # repeat(10, 3) --> 10 10 10 if times is None: while True: yield object else: for i in range(times): yield object
repeat的常见用法是向map或zip提供常量值流:
>>> list(map(pow, range(10), repeat(2))) [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
itertools.
starmap
(function, iterable)¶创建一个迭代器,使用从可迭代获得的参数计算函数。当参数参数已经被分组到来自单个可迭代的元组(数据已经被“预压缩”)时,使用
map()
。map()
和starmap()
之间的区别与function(a,b)
和function(*c)
。大致相当于:def starmap(function, iterable): # starmap(pow, [(2,5), (3,2), (10,3)]) --> 32 9 1000 for args in iterable: yield function(*args)
itertools.
takewhile
(predicate, iterable)¶创建一个迭代器,只要谓词为真,它就返回可迭代的元素。大致相当于:
def takewhile(predicate, iterable): # takewhile(lambda x: x<5, [1,4,6,4,1]) --> 1 4 for x in iterable: if predicate(x): yield x else: break
itertools.
tee
(iterable, n=2)¶从单个可迭代中返回n独立迭代器。大致相当于:
def tee(iterable, n=2): it = iter(iterable) deques = [collections.deque() for i in range(n)] def gen(mydeque): while True: if not mydeque: # when the local deque is empty try: newval = next(it) # fetch a new value and except StopIteration: return for d in deques: # load it to all the deques d.append(newval) yield mydeque.popleft() return tuple(gen(d) for d in deques)
一旦
tee()
进行拆分,原始可迭代不应在任何其他地方使用;否则,可迭代可以获得高级,而无需通知三通对象。此itertool可能需要大量的辅助存储(取决于需要存储多少临时数据)。通常,如果一个迭代器在另一个迭代器开始之前使用大多数或所有数据,则使用
list()
而不是tee()
更快。
itertools.
zip_longest
(*iterables, fillvalue=None)¶创建一个迭代器,聚合来自每个迭代器的元素。如果迭代的长度不均匀,则缺少的值将被填充fillvalue。迭代继续,直到最长可迭代被耗尽。大致相当于:
class ZipExhausted(Exception): pass def zip_longest(*args, **kwds): # zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D- fillvalue = kwds.get('fillvalue') counter = len(args) - 1 def sentinel(): nonlocal counter if not counter: raise ZipExhausted counter -= 1 yield fillvalue fillers = repeat(fillvalue) iterators = [chain(it, sentinel(), fillers) for it in args] try: while iterators: yield tuple(map(next, iterators)) except ZipExhausted: pass
如果一个迭代可能是无限的,那么
zip_longest()
函数应该包含限制调用次数的东西(例如islice()
或takewhile()
)。如果未指定,fillvalue默认为None
。
10.1.2.Itertools食谱¶
此部分显示了使用现有itertools作为构造块创建扩展工具集的配方。
扩展工具提供与底层工具集相同的高性能。优越的存储器性能通过一次一个处理元件来保持,而不是一次性地将整个可迭代器带入存储器。通过以有助于消除临时变量的功能样式将工具链接在一起,使代码量保持较小。在使用for循环和generator的情况下,通过优选“向量化”构造块来保持高速,这引起解释器开销。
def take(n, iterable):
"Return first n items of the iterable as a list"
return list(islice(iterable, n))
def tabulate(function, start=0):
"Return function(0), function(1), ..."
return map(function, count(start))
def tail(n, iterable):
"Return an iterator over the last n items"
# tail(3, 'ABCDEFG') --> E F G
return iter(collections.deque(iterable, maxlen=n))
def consume(iterator, n):
"Advance the iterator n-steps ahead. If n is none, consume entirely."
# Use functions that consume iterators at C speed.
if n is None:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
else:
# advance to the empty slice starting at position n
next(islice(iterator, n, n), None)
def nth(iterable, n, default=None):
"Returns the nth item or a default value"
return next(islice(iterable, n, None), default)
def all_equal(iterable):
"Returns True if all the elements are equal to each other"
g = groupby(iterable)
return next(g, True) and not next(g, False)
def quantify(iterable, pred=bool):
"Count how many times the predicate is true"
return sum(map(pred, iterable))
def padnone(iterable):
"""Returns the sequence elements and then returns None indefinitely.
Useful for emulating the behavior of the built-in map() function.
"""
return chain(iterable, repeat(None))
def ncycles(iterable, n):
"Returns the sequence elements n times"
return chain.from_iterable(repeat(tuple(iterable), n))
def dotproduct(vec1, vec2):
return sum(map(operator.mul, vec1, vec2))
def flatten(listOfLists):
"Flatten one level of nesting"
return chain.from_iterable(listOfLists)
def repeatfunc(func, times=None, *args):
"""Repeat calls to func with specified arguments.
Example: repeatfunc(random.random)
"""
if times is None:
return starmap(func, repeat(args))
return starmap(func, repeat(args, times))
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return zip(a, b)
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
def roundrobin(*iterables):
"roundrobin('ABC', 'D', 'EF') --> A D E B F C"
# Recipe credited to George Sakkis
pending = len(iterables)
nexts = cycle(iter(it).__next__ for it in iterables)
while pending:
try:
for next in nexts:
yield next()
except StopIteration:
pending -= 1
nexts = cycle(islice(nexts, pending))
def partition(pred, iterable):
'Use a predicate to partition entries into false entries and true entries'
# partition(is_odd, range(10)) --> 0 2 4 6 8 and 1 3 5 7 9
t1, t2 = tee(iterable)
return filterfalse(pred, t1), filter(pred, t2)
def powerset(iterable):
"powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
def unique_justseen(iterable, key=None):
"List unique elements, preserving order. Remember only the element just seen."
# unique_justseen('AAAABBBCCDAABBB') --> A B C D A B
# unique_justseen('ABBCcAD', str.lower) --> A B C A D
return map(next, map(itemgetter(1), groupby(iterable, key)))
def iter_except(func, exception, first=None):
""" Call a function repeatedly until an exception is raised.
Converts a call-until-exception interface to an iterator interface.
Like builtins.iter(func, sentinel) but uses an exception instead
of a sentinel to end the loop.
Examples:
iter_except(functools.partial(heappop, h), IndexError) # priority queue iterator
iter_except(d.popitem, KeyError) # non-blocking dict iterator
iter_except(d.popleft, IndexError) # non-blocking deque iterator
iter_except(q.get_nowait, Queue.Empty) # loop over a producer Queue
iter_except(s.pop, KeyError) # non-blocking set iterator
"""
try:
if first is not None:
yield first() # For database APIs needing an initial cast to db.first()
while True:
yield func()
except exception:
pass
def first_true(iterable, default=False, pred=None):
"""Returns the first true value in the iterable.
If no true value is found, returns *default*
If *pred* is not None, returns the first item
for which pred(item) is true.
"""
# first_true([a,b,c], x) --> a or b or c or x
# first_true([a,b], x, f) --> a if f(a) else b if f(b) else x
return next(filter(pred, iterable), default)
def random_product(*args, repeat=1):
"Random selection from itertools.product(*args, **kwds)"
pools = [tuple(pool) for pool in args] * repeat
return tuple(random.choice(pool) for pool in pools)
def random_permutation(iterable, r=None):
"Random selection from itertools.permutations(iterable, r)"
pool = tuple(iterable)
r = len(pool) if r is None else r
return tuple(random.sample(pool, r))
def random_combination(iterable, r):
"Random selection from itertools.combinations(iterable, r)"
pool = tuple(iterable)
n = len(pool)
indices = sorted(random.sample(range(n), r))
return tuple(pool[i] for i in indices)
def random_combination_with_replacement(iterable, r):
"Random selection from itertools.combinations_with_replacement(iterable, r)"
pool = tuple(iterable)
n = len(pool)
indices = sorted(random.randrange(n) for i in range(r))
return tuple(pool[i] for i in indices)
注意,许多上述配方可以通过用定义为默认值的局部变量替换全局查找来优化。例如,dotproduct配方可以写为:
def dotproduct(vec1, vec2, sum=sum, map=map, mul=operator.mul):
return sum(map(mul, vec1, vec2))