A Mechanism for Overriding Ufuncs¶

作者：	布莱克格里菲斯
联系：	blake 。 g @ utexas 。 edu
日期：	2013-07-10
作者：	Pauli Virtanen
作者：	纳撒尼尔·史密斯

Executive summary¶

NumPy的通用函数（ufuncs）目前对于使用__array_prepare__和__array_wrap__ [1]的用户定义的ndarray子类进行操作有一些有限的功能，对任意对象很少甚至没有支持。例如SciPy的稀疏矩阵[2] [3]。

这里我们建议添加一个机制来覆盖ufuncs基于ufunc检查每个参数为__numpy_ufunc__方法。发现__numpy_ufunc__时，ufunc将交给该方法的操作。

这涵盖了与Travis Oliphant提出的通过多方法[4]重写NumPy的方法相同的地方，这将解决同样的问题。这里的机制更密切地遵循Python使类重写__mul__和其他二进制操作的方式。

[1]	http://docs.scipy.org/doc/numpy/user/basics.subclassing.html

[2]	https://github.com/scipy/scipy/issues/2123

[3]	https://github.com/scipy/scipy/issues/1569

[4]	http://technicaldiscovery.blogspot.com/2013/07/thoughts-after-scipy-2013-and-specific.html

Motivation¶

目前的调度Ufuncs的机制一般被认为不足。已经进行了长时间的讨论和其他建议的解决方案[5]。

使用带有ndarray子类的ufuncs仅限于__array_prepare__和__array_wrap__来准备参数，但这些不允许你更改参数的形状或数据。尝试ufunc不是子类ndarray的东西更困难，因为输入参数倾向于被转换为对象数组，最终产生令人惊讶的结果。

以这个ufuncs与稀疏矩阵的互操作性为例。：

In [1]: import numpy as np
import scipy.sparse as sp

a = np.random.randint(5, size=(3,3))
b = np.random.randint(5, size=(3,3))

asp = sp.csr_matrix(a)
bsp = sp.csr_matrix(b)

In [2]: a, b
Out[2]:(array([[0, 4, 4],
               [1, 3, 2],
               [1, 3, 1]]),
        array([[0, 1, 0],
               [0, 0, 1],
               [4, 0, 1]]))

In [3]: np.multiply(a, b) # The right answer
Out[3]: array([[0, 4, 0],
               [0, 0, 2],
               [4, 0, 1]])

In [4]: np.multiply(asp, bsp).todense() # calls __mul__ which does matrix multi
Out[4]: matrix([[16,  0,  8],
                [ 8,  1,  5],
                [ 4,  1,  4]], dtype=int64)

In [5]: np.multiply(a, bsp) # Returns NotImplemented to user, bad!
Out[5]: NotImplemted

将NotImplemented返回给用户不应发生。此外：

In [6]: np.multiply(asp, b)
Out[6]: array([[ <3x3 sparse matrix of type '<class 'numpy.int64'>'
                with 8 stored elements in Compressed Sparse Row format>,
                    <3x3 sparse matrix of type '<class 'numpy.int64'>'
                with 8 stored elements in Compressed Sparse Row format>,
                    <3x3 sparse matrix of type '<class 'numpy.int64'>'
                with 8 stored elements in Compressed Sparse Row format>],
                   [ <3x3 sparse matrix of type '<class 'numpy.int64'>'
                with 8 stored elements in Compressed Sparse Row format>,
                    <3x3 sparse matrix of type '<class 'numpy.int64'>'
                with 8 stored elements in Compressed Sparse Row format>,
                    <3x3 sparse matrix of type '<class 'numpy.int64'>'
                with 8 stored elements in Compressed Sparse Row format>],
                   [ <3x3 sparse matrix of type '<class 'numpy.int64'>'
                with 8 stored elements in Compressed Sparse Row format>,
                    <3x3 sparse matrix of type '<class 'numpy.int64'>'
                with 8 stored elements in Compressed Sparse Row format>,
                    <3x3 sparse matrix of type '<class 'numpy.int64'>'
                with 8 stored elements in Compressed Sparse Row format>]], dtype=object)

这里，看起来稀疏矩阵被转换为对象数组标量，其然后与b数组的所有元素相乘。然而，这种行为比有用的更混乱，并且具有TypeError将是优选的。

添加__numpy_ufunc__功能会修复此问题，并会弃用其他ufunc修改函数。

[5]	http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html

Proposed interface¶

要覆盖Ufuncs的对象可以定义__numpy_ufunc__方法。方法签名是：

def __numpy_ufunc__(self, ufunc, method, i, inputs, **kwargs)

这里：

ufunc是被调用的ufunc对象。
method is a string indicating which Ufunc method was called (one of "__call__", "reduce", "reduceat", "accumulate", "outer", "inner").
i是输入中self的索引。
输入是ufunc的输入参数的元组，
kwargs是传递给函数的关键字参数。out参数始终包含在kwargs中，以下讨论位置变量的传递方式。

ufunc的参数首先被归一化为输入数据的元组（inputs）和关键字参数的dict。如果有输出参数，它们的处理如下：

一个位置输出变量x在kwargs dict中作为out ： x传递。
Multiple positional output variables x0, x1, ... are passed as a tuple in the kwargs dict as out : (x0, x1, ...).
Keyword output variables like out = x and out = (x0, x1, ...) are passed unchanged to the kwargs dict like out : x and out : (x0, x1, ...) respectively.
不支持位置和关键字输出变量的组合。

函数调度进行如下：

如果其中一个输入参数实现__numpy_ufunc__，它将被执行，而不是Ufunc。
如果多个输入参数实现了__numpy_ufunc__，它们将按以下顺序尝试：在超类之前的子类，否则从左到右。返回除了NotImplemented之外的其他方法的第一个__numpy_ufunc__方法确定Ufunc的返回值。
如果输入参数的所有__numpy_ufunc__方法返回NotImplemented，则会引发TypeError。
如果__numpy_ufunc__方法引发错误，则会立即传播错误。

如果没有输入参数具有__numpy_ufunc__方法，则执行会返回到默认的ufunc行为。

In combination with Python’s binary operations¶

__numpy_ufunc__机制完全独立于Python的标准操作符覆盖机制，并且两者不直接交互。

然而，它们具有间接交互，因为NumPy的ndarray类型通过Ufuncs实现其二进制操作。有效地，我们有：

class ndarray(object):
    ...
    def __mul__(self, other):
        return np.multiply(self, other)

假设现在我们有第二个类：

class MyObject(object):
    def __numpy_ufunc__(self, *a, **kw):
        return "ufunc"
    def __mul__(self, other):
        return 1234
    def __rmul__(self, other):
        return 4321

在这种情况下，标准的Python覆盖规则结合上面的讨论暗示：

a = MyObject()
b = np.array([0])

a * b    # == 1234       OK
b * a    # == "ufunc"    surprising

这不是天真的预期，因此是一些不希望的行为。

发生这种情况的原因是：因为MyObject不是ndarray子类，所以Python解析表达式b * / t5>，通过先调用b.__mul__。由于NumPy通过Ufunc实现这一点，因此调用被转发到__numpy_ufunc__而不是__rmul__。注意，如果MyObject是ndarray的子类，Python首先调用a.__rmul__。因此，问题是__numpy_ufunc__实现了ndarray行为的“虚拟子类化”，而没有实际的子类化。

这个问题可以通过修改NumPy中的二进制操作方法来解决：

class ndarray(object):
    ...
    def __mul__(self, other):
        if (not isinstance(other, self.__class__)
                and hasattr(other, '__numpy_ufunc__')
                and hasattr(other, '__rmul__')):
            return NotImplemented
        return np.multiply(self, other)

    def __imul__(self, other):
        if (other.__class__ is not self.__class__
                and hasattr(other, '__numpy_ufunc__')
                and hasattr(other, '__rmul__')):
            return NotImplemented
        return np.multiply(self, other, out=self)

b * a    # == 4321    OK

这里的理由如下：由于用户类明确定义了__numpy_ufunc__和__rmul__，实现者很可能确定__rmul__可以处理ndarrays。如果没有，特殊情况很容易处理（只需调用np.multiply）。

可以排除自我的子类，因为Python本身在这种情况下首先调用右手方法。此外，期望ndarray子类能够从ndarray继承右侧二进制运算方法。

对于就地操作也需要进行相同的优先级重排，使得MyObject.__rmul__优先于ndarray.__imul__。

Demo¶

提出请求[6] _，包括本NEP中提出的更改。这里是一个突出显示功能的演示。：

In [1]: import numpy as np;

In [2]: a = np.array([1])

In [3]: class B():
   ...:     def __numpy_ufunc__(self, func, method, pos, inputs, **kwargs):
   ...:         return "B"
   ...:

In [4]: b = B()

In [5]: np.dot(a, b)
Out[5]: 'B'

In [6]: np.multiply(a, b)
Out[6]: 'B'

一个简单的__numpy_ufunc__被添加到SciPy的稀疏矩阵目前，这只处理np.dot和np.multiply，因为它是两个最常见的情况其中用户将尝试使用具有ufunc的稀疏矩阵。该方法定义如下：

def __numpy_ufunc__(self, func, method, pos, inputs, **kwargs):
    """Method for compatibility with NumPy's ufuncs and dot
    functions.
    """

    without_self = list(inputs)
    del without_self[pos]
    without_self = tuple(without_self)

    if func == np.multiply:
        return self.multiply(*without_self)

    elif func == np.dot:
        if pos == 0:
            return self.__mul__(inputs[1])
        if pos == 1:
            return self.__rmul__(inputs[0])
    else:
        return NotImplemented

所以我们现在得到的预期行为当使用ufuncs与稀疏矩阵。：

In [1]: import numpy as np; import scipy.sparse as sp

In [2]: a = np.random.randint(3, size=(3,3))

In [3]: b = np.random.randint(3, size=(3,3))

In [4]: asp = sp.csr_matrix(a); bsp = sp.csr_matrix(b)

In [5]: np.dot(a,b)
Out[5]:
array([[2, 4, 8],
       [2, 4, 8],
        [2, 2, 3]])

In [6]: np.dot(asp,b)
Out[6]:
array([[2, 4, 8],
       [2, 4, 8],
       [2, 2, 3]], dtype=int64)

In [7]: np.dot(asp, bsp).A
Out[7]:
array([[2, 4, 8],
       [2, 4, 8],
       [2, 2, 3]], dtype=int64)

目录

上一主题

下一主题