子类化ndarray
- 学分
- 简介
  - ndarrays和对象创建
- 查看投射
- 从模板创建新
- 视图投射和新建模板的关系
- 子类化的含义
  - 在__new__和__init__
  - __array_finalize__的作用
- 简单示例 - 向ndarray添加一个额外的属性
- 稍微更现实的例子 - 添加到现有数组的属性
- __array_wrap__ for ufuncs
- 额外的问题 - 自定义__del__方法和ndarray.base

上一主题

结构化数组

下一主题

其他

Subclassing ndarray¶

Credits¶

本页面是在感谢维基页面上由Pierre Gerard-Marchant - http://www.scipy.org/Subclasses分类。

Introduction¶

子类化ndarray相对简单，但与其他Python对象相比，它有一些复杂性。在这一页上，我们解释了允许你对ndarray进行子类化的机制，以及实现子类的含义。

ndarrays and object creation¶

子类化ndarray很复杂，因为ndarray类的新实例可以以三种不同的方式来实现。这些是：

显式构造函数调用 - 如MySubClass(params)。这是创建Python实例的通常路由。
查看转换 - 将现有的ndarray转换为给定的子类
从模板新建 - 从模板实例创建新实例。示例包括返回来自子类数组的切片，从ufuncs创建返回类型，以及复制数组。有关详细信息，请参阅Creating new from template

最后两个是ndarrays的特性 - 以支持数组切片。子类化ndarray的并发症是由于机制numpy必须支持后两种路由的实例创建。

View casting¶

视图转换是标准的ndarray机制，通过它，您可以获取任何子类的ndarray，并返回数组的视图作为另一个（指定的）子类：

>>> import numpy as np
>>> # create a completely useless ndarray subclass
>>> class C(np.ndarray): pass
>>> # create a standard ndarray
>>> arr = np.zeros((3,))
>>> # take a view of it, as our useless subclass
>>> c_arr = arr.view(C)
>>> type(c_arr)
<class 'C'>

Creating new from template¶

当numpy发现它需要从模板实例创建一个新实例时，ndarray子类的新实例也可以通过非常类似于View casting的机制来实现。最明显的地方是，当你正在采取切片的子类数组。例如：

>>> v = c_arr[1:]
>>> type(v) # the view is of type 'C'
<class 'C'>
>>> v is c_arr # but it's a new instance
False

该切片是到原始c_arr数据的视图。因此，当我们从ndarray获取视图时，我们返回一个新的ndarray，同一个类，指向原始数据。

There are other points in the use of ndarrays where we need such views, such as copying arrays (c_arr.copy()), creating ufunc output arrays (see also __array_wrap__ for ufuncs), and reducing methods (like c_arr.mean().

Relationship of view casting and new-from-template¶

这些路径都使用相同的机械。我们在这里区分，因为他们导致不同的输入到你的方法。具体来说，View casting意味着您已从ndarray的任何潜在子类中创建了数组类型的新实例。Creating new from template意味着您已从先前存在的实例创建了一个新类的实例，允许您例如跨特定于您的子类的属性进行复制。

Implications for subclassing¶

如果我们子类化ndarray，我们不仅需要处理我们的数组类型的显式构造，还要处理View casting或Creating new from template。Numpy有机械做到这一点，和这种机械，使子类略微不标准。

ndarray用来支持子类中的视图和new-from-template的机制有两个方面。

第一种是使用ndarray.__new__方法进行对象初始化的主要工作，而不是使用更常用的__init__方法。第二种是使用__array_finalize__方法允许子类在从模板创建视图和新实例后清理。

A brief Python primer on `new` and `init`¶

__new__是一个标准的Python方法，如果存在，则在创建类实例时在__init__之前调用。有关详细信息，请参阅python __new__文档。

例如，考虑下面的Python代码：

class C(object):
    def __new__(cls, *args):
        print('Cls in __new__:', cls)
        print('Args in __new__:', args)
        return object.__new__(cls, *args)

    def __init__(self, *args):
        print('type(self) in __init__:', type(self))
        print('Args in __init__:', args)

意思是我们得到：

>>> c = C('hello')
Cls in __new__: <class 'C'>
Args in __new__: ('hello',)
type(self) in __init__: <class 'C'>
Args in __init__: ('hello',)

当我们调用C('hello')时，__new__方法获得自己的类作为第一个参数，传递的参数是字符串'hello'。在python调用__new__之后，它通常调用我们的__init__方法，输出__new__作为第一个参数实例）和传递的参数。

As you can see, the object can be initialized in the __new__ method or the __init__ method, or both, and in fact ndarray does not have an __init__ method, because all the initialization is done in the __new__ method.

为什么使用__new__而不是通常的__init__？因为在某些情况下，对于ndarray，我们想要能够返回一个其他类的对象。考虑以下：

class D(C):
    def __new__(cls, *args):
        print('D cls is:', cls)
        print('D args in __new__:', args)
        return C.__new__(C, *args)

    def __init__(self, *args):
        # we never get here
        print('In D __init__')

意思是：

>>> obj = D('hello')
D cls is: <class 'D'>
D args in __new__: ('hello',)
Cls in __new__: <class 'C'>
Args in __new__: ('hello',)
>>> type(obj)
<class 'C'>

The definition of C is the same as before, but for D, the __new__ method returns an instance of class C rather than D. Note that the __init__ method of D does not get called. 通常，当__new__方法返回类别不是定义的类别的对象时，不会调用该类别的__init__方法。

这就是ndarray类的子类如何能够返回保留类类型的视图。当采用视图时，标准的ndarray机制创建新的ndarray对象，如：

obj = ndarray.__new__(subtype, shape, ...

其中subdtype是子类。因此，返回的视图与子类具有相同的类，而不是类ndarray。

这解决了返回相同类型的视图的问题，但现在我们有一个新的问题。ndarray的机制可以这样设置类，在它的标准方法中获取视图，但ndarray __new__方法什么都不知道我们在我们自己的__new__方法以便设置属性，等等。（Aside - 为什么不调用obj = subdtype .__ new __（...因为我们可能没有具有相同呼叫签名的__new__方法）。

The role of `__array_finalize__`¶

__array_finalize__是numpy提供的机制，允许子类处理创建新实例的各种方式。

记住子类实例可以通过以下三种方式来实现：

显式构造函数调用（obj = MySubClass（params））。这将调用MySubClass.__new__的通常序列（如果存在）MySubClass.__init__。
View casting
Creating new from template

我们的MySubClass.__new__方法仅在显式构造函数调用的情况下才被调用，因此我们不能依赖MySubClass.__new__或MySubClass.__init__事实证明，MySubClass.__array_finalize__ 会调用所有三个方法的对象创建，所以这是我们的对象创建内务通常去。

对于显式构造函数调用，我们的子类需要创建一个自己类的新的ndarray实例。在实践中，这意味着我们，代码的作者，将需要调用ndarray.__new__(MySubClass,...)，或者做一个现有数组的视图转换（见下文）
对于视图转换和new-from-template，相当于在C级别调用ndarray.__new__(MySubClass,...）。

__array_finalize__接收的参数对于上面的三个实例创建方法不同。

以下代码允许我们查看调用序列和参数：

import numpy as np

class C(np.ndarray):
    def __new__(cls, *args, **kwargs):
        print('In __new__ with class %s' % cls)
        return np.ndarray.__new__(cls, *args, **kwargs)

    def __init__(self, *args, **kwargs):
        # in practice you probably will not need or want an __init__
        # method for your subclass
        print('In __init__ with class %s' % self.__class__)

    def __array_finalize__(self, obj):
        print('In array_finalize:')
        print('   self type is %s' % type(self))
        print('   obj type is %s' % type(obj))

现在：

>>> # Explicit constructor
>>> c = C((10,))
In __new__ with class <class 'C'>
In array_finalize:
   self type is <class 'C'>
   obj type is <type 'NoneType'>
In __init__ with class <class 'C'>
>>> # View casting
>>> a = np.arange(10)
>>> cast_a = a.view(C)
In array_finalize:
   self type is <class 'C'>
   obj type is <type 'numpy.ndarray'>
>>> # Slicing (example of new-from-template)
>>> cv = c[:1]
In array_finalize:
   self type is <class 'C'>
   obj type is <class 'C'>

__array_finalize__的签名为：

def __array_finalize__(self, obj):

ndarray.__new__传递__array_finalize__新对象，我们自己的类（self）以及视图已经被捕获的对象obj）。从上面的输出可以看出，self始终是我们子类的一个新创建的实例，obj的类型对于三个实例创建方法是不同的：

当从显式构造函数调用时，obj是None
从视图投射调用时，obj可以是ndarray的任何子类的实例，包括我们自己的。
当从模板中调用时，obj是我们自己的子类的另一个实例，我们可以用它来更新新的self实例。

因为__array_finalize__是始终看到正在创建的新实例的唯一方法，因此在其他任务中填充新对象属性的实例默认值是明智的。

这可以用一个例子更清楚。

Simple example - adding an extra attribute to ndarray¶

import numpy as np

class InfoArray(np.ndarray):

    def __new__(subtype, shape, dtype=float, buffer=None, offset=0,
          strides=None, order=None, info=None):
        # Create the ndarray instance of our type, given the usual
        # ndarray input arguments.  This will call the standard
        # ndarray constructor, but return an object of our type.
        # It also triggers a call to InfoArray.__array_finalize__
        obj = np.ndarray.__new__(subtype, shape, dtype, buffer, offset, strides,
                         order)
        # set the new 'info' attribute to the value passed
        obj.info = info
        # Finally, we must return the newly created object:
        return obj

    def __array_finalize__(self, obj):
        # ``self`` is a new object resulting from
        # ndarray.__new__(InfoArray, ...), therefore it only has
        # attributes that the ndarray.__new__ constructor gave it -
        # i.e. those of a standard ndarray.
        #
        # We could have got to the ndarray.__new__ call in 3 ways:
        # From an explicit constructor - e.g. InfoArray():
        #    obj is None
        #    (we're in the middle of the InfoArray.__new__
        #    constructor, and self.info will be set when we return to
        #    InfoArray.__new__)
        if obj is None: return
        # From view casting - e.g arr.view(InfoArray):
        #    obj is arr
        #    (type(obj) can be InfoArray)
        # From new-from-template - e.g infoarr[:3]
        #    type(obj) is InfoArray
        #
        # Note that it is here, rather than in the __new__ method,
        # that we set the default value for 'info', because this
        # method sees all creation of default objects - with the
        # InfoArray.__new__ constructor, but also with
        # arr.view(InfoArray).
        self.info = getattr(obj, 'info', None)
        # We do not need to return anything

使用对象看起来像这样：

>>> obj = InfoArray(shape=(3,)) # explicit constructor
>>> type(obj)
<class 'InfoArray'>
>>> obj.info is None
True
>>> obj = InfoArray(shape=(3,), info='information')
>>> obj.info
'information'
>>> v = obj[1:] # new-from-template - here - slicing
>>> type(v)
<class 'InfoArray'>
>>> v.info
'information'
>>> arr = np.arange(10)
>>> cast_arr = arr.view(InfoArray) # view casting
>>> type(cast_arr)
<class 'InfoArray'>
>>> cast_arr.info is None
True

这个类不是很有用，因为它和裸ndarray对象有相同的构造函数，包括传递缓冲区和形状等等。我们可能更喜欢构造函数能够从通常的numpy调用np.array获取已经形成的ndarray并返回一个对象。

Slightly more realistic example - attribute added to existing array¶

这里是一个类，它使用一个标准的ndarray已经存在，强制转换为我们的类型，并添加一个额外的属性。

import numpy as np

class RealisticInfoArray(np.ndarray):

    def __new__(cls, input_array, info=None):
        # Input array is an already formed ndarray instance
        # We first cast to be our class type
        obj = np.asarray(input_array).view(cls)
        # add the new attribute to the created instance
        obj.info = info
        # Finally, we must return the newly created object:
        return obj

    def __array_finalize__(self, obj):
        # see InfoArray.__array_finalize__ for comments
        if obj is None: return
        self.info = getattr(obj, 'info', None)

所以：

>>> arr = np.arange(5)
>>> obj = RealisticInfoArray(arr, info='information')
>>> type(obj)
<class 'RealisticInfoArray'>
>>> obj.info
'information'
>>> v = obj[1:]
>>> type(v)
<class 'RealisticInfoArray'>
>>> v.info
'information'

`__array_wrap__` for ufuncs¶

在numpy ufuncs和其他numpy函数的末尾调用__array_wrap__，允许子类设置返回值的类型并更新属性和元数据。让我们用一个例子来说明这是如何工作的。首先我们使用与上面相同的子类，但使用不同的名称和一些打印语句：

import numpy as np

class MySubClass(np.ndarray):

    def __new__(cls, input_array, info=None):
        obj = np.asarray(input_array).view(cls)
        obj.info = info
        return obj

    def __array_finalize__(self, obj):
        print('In __array_finalize__:')
        print('   self is %s' % repr(self))
        print('   obj is %s' % repr(obj))
        if obj is None: return
        self.info = getattr(obj, 'info', None)

    def __array_wrap__(self, out_arr, context=None):
        print('In __array_wrap__:')
        print('   self is %s' % repr(self))
        print('   arr is %s' % repr(out_arr))
        # then just call the parent
        return np.ndarray.__array_wrap__(self, out_arr, context)

我们对我们的新数组的一个实例运行一个ufunc：

>>> obj = MySubClass(np.arange(5), info='spam')
In __array_finalize__:
   self is MySubClass([0, 1, 2, 3, 4])
   obj is array([0, 1, 2, 3, 4])
>>> arr2 = np.arange(5)+1
>>> ret = np.add(arr2, obj)
In __array_wrap__:
   self is MySubClass([0, 1, 2, 3, 4])
   arr is array([1, 3, 5, 7, 9])
In __array_finalize__:
   self is MySubClass([1, 3, 5, 7, 9])
   obj is MySubClass([0, 1, 2, 3, 4])
>>> ret
MySubClass([1, 3, 5, 7, 9])
>>> ret.info
'spam'

Note that the ufunc (np.add) has called the __array_wrap__ method of the input with the highest __array_priority__ value, in this case MySubClass.__array_wrap__, with arguments self as obj, and out_arr as the (ndarray) result of the addition. 反过来，默认的__array_wrap__（ndarray.__array_wrap__）将结果转换为类MySubClass，并调用__array_finalize__ - 因此复制info属性。这一切都发生在C级。

但是，我们可以做任何我们想要的：

class SillySubClass(np.ndarray):

    def __array_wrap__(self, arr, context=None):
        return 'I lost your data'

>>> arr1 = np.arange(5)
>>> obj = arr1.view(SillySubClass)
>>> arr2 = np.arange(5)
>>> ret = np.multiply(obj, arr2)
>>> ret
'I lost your data'

因此，通过为我们的子类定义一个特定的__array_wrap__方法，我们可以调整ufuncs的输出。__array_wrap__方法需要self，然后是参数 - 这是ufunc的结果和可选参数上下文。此参数由一些ufuncs作为3元素元组返回：（ufunc的名称，ufunc的参数，ufunc的域）。__array_wrap__应返回其包含类的实例。有关实现，请参阅masked数组子类。

除了在离开ufunc的过程中调用的__array_wrap__，还有一个__array_prepare__方法，在进入ufunc的路上，在输出数组是在执行任何计算之前创建的。默认实现只传递数组。__array_prepare__不应尝试访问数组数据或调整数组大小，它旨在设置输出数组类型，更新属性和元数据，以及根据计算前可能需要的输入执行任何检查开始。像__array_wrap__，__array_prepare__必须返回一个ndarray或其子类或引发错误。

Extra gotchas - custom `del` methods and ndarray.base¶

ndarray解决的问题之一是跟踪ndarrays的内存所有权及其视图。考虑这样的情况：我们创建了一个ndarray arr并且采用了v = arr [1： ]。两个对象正在查看相同的内存。Numpy使用base属性跟踪特定数据组或视图的数据来源：

>>> # A normal ndarray, that owns its own data
>>> arr = np.zeros((4,))
>>> # In this case, base is None
>>> arr.base is None
True
>>> # We take a view
>>> v1 = arr[1:]
>>> # base now points to the array that it derived from
>>> v1.base is arr
True
>>> # Take a view of a view
>>> v2 = v1[1:]
>>> # base points to the view it derived from
>>> v2.base is v1
True

一般来说，如果数组拥有自己的内存，在这种情况下arr，则arr.base将为None - 有一些例外 - 见numpy预订更多详情。

base属性有助于判断我们是否有一个视图或原始数组。如果我们需要知道当子类数组被删除时是否做一些特定的清理，这反过来是有用的。例如，如果原始数组被删除，我们可能只想做清除，但不是视图。有关如何工作的示例，请查看numpy.core中的memmap类。

目录