Arrays base tools

Functions

clean_dtype(dtype[, sort])

Remove offsets from dtype, keeping only names and dtype.

remove_structured_offset(array)

Remove offset fields from structured array.

set_or_add_to_structured(array, data[, copy])

Updates existing structured array, either by replacing the data for existing fields, or by adding new fields to the array.

to_structured(arrays)

Casts list of array and names (and optional dtypes) to numpy structured array.

to_structured(arrays)

Casts list of array and names (and optional dtypes) to numpy structured array.

See Numpy’s documentation for how to use numpy’s structured arrays.

A pandas DataFrame can be converted into a numpy record array, using DataFrame.to_records() (to_records documentation, documentation on record arrays). A record array can then be converted into a structured array by using:

>>> recordarr.view(recordarr.dtype.fields, numpy.ndarray)
Parameters

arrays (list-of-tuple) – list((name, array-like, *dtype-infos))) or {name: array-like}. A lone value will be broadcasted as an array full of this value and of the size of the other arrays

Returns

struct-array

Examples

>>> to_structured([
>>>     ('a', numpy.arange(5), 'uint32'),
>>>     ('b', 2 * numpy.arange(5), 'float32')
>>> ])
array([(0, 0.), (1, 2.), (2, 4.), (3, 6.), (4, 8.)],
  dtype=[('a', '<u4'), ('b', '<f4')])

A single value can also be used, and will be broadcasted as an array full of this value, and of the size of the other arrays, for instance:

>>> to_structured([
>>>     ('a', numpy.arange(5), 'uint32'),
>>>     ('b', 2, 'float32')
>>> ])
array([(0, 2.), (1, 2.), (2, 2.), (3, 2.), (4, 2.)],
  dtype=[('a', '<u4'), ('b', '<f4')])

Not specifying the dtype in the tuples will cause the function to use the array’s dtype, or to infer it in case of sequences of Python objects.

>>> to_structured([
>>>    ('a', numpy.arange(5, dtype='uint32')),
>>>    ('b', [2 * i for i in range(5)])
>>> ])
array([(0, 0), (1, 2), (2, 4), (3, 6), (4, 8)],
  dtype=[('a', '<u4'), ('b', '<i8')])

Using dictionaries:

>>> to_structured({
>>>     'a': numpy.arange(5, dtype='uint32'),
>>>     'b': [2 * i for i in range(5)]
>>> })
array([(0, 0), (1, 2), (2, 4), (3, 6), (4, 8)],
  dtype=[('a', '<u4'), ('b', '<i8')])

(n, m) 2D numpy arrays are viewed as (n,) arrays of (m,) 1D arrays:

>>> to_structured([
>>> ('a', numpy.arange(5)),
>>> ('b', numpy.arange(15).reshape(5, 3))
>>> ])
array([(0, [ 0,  1,  2]), (1, [ 3,  4,  5]), (2, [ 6,  7,  8]),
   (3, [ 9, 10, 11]), (4, [12, 13, 14])],
  dtype=[('a', '<i8'), ('b', '<i8', (3,))])
set_or_add_to_structured(array, data, copy=True)

Updates existing structured array, either by replacing the data for existing fields, or by adding new fields to the array.

Fast alternative to numpy.lib.recfunctions.append_fields.

Parameters
  • array (struct-array) – array to update

  • data (list-of-tuple) – list((name, array-or-scalar)) (scalars are broadcasted)

  • copy (bool?) – set to False to avoid copy when possible (default: True)

Returns

struct-array

Examples

Adding field to existing structured array:

>>> array = to_structured([
>>>     ('a', numpy.arange(5, dtype='uint8')),
>>>     ('b', 2 * numpy.arange(5, dtype='uint16')),
>>> ])
>>> new_data = 3 * numpy.arange(5, dtype='float32')
>>> updated_array = set_or_add_to_structured(array, [
>>>     ('c', new_data),
>>> ])
>>> updated_array
array([(0, 0,  0.), (1, 2,  3.), (2, 4,  6.), (3, 6,  9.), (4, 8, 12.)],
  dtype=[('a', 'u1'), ('b', '<u2'), ('c', '<f4')])

Replacing data from a structured array:

>>> updated_array = set_or_add_to_structured(array, [
>>>     ('b', new_data)
>>> ])
array([(0,  0), (1,  3), (2,  6), (3,  9), (4, 12)],
  dtype=[('a', 'u1'), ('b', '<u2')])

Or doing both, while adding broadcasted constants:

>>> updated_array = set_or_add_to_structured(array, [
>>>     ('b', new_data),
>>>     ('c', 2 * new_data),
>>>     ('d', 1),
>>>     ('e', b'1')
>>> ])
array([
    (0,  0,  0., 1, b'1'),
    (1,  3,  6., 1, b'1'),
    (2,  6, 12., 1, b'1'),
    (3,  9, 18., 1, b'1'),
    (4, 12, 24., 1, b'1')],
    dtype=[('a', 'u1'), ('b', '<u2'), ('c', '<f4'), ('d', '<i8'), ('e', 'S1')])
clean_dtype(dtype, sort=False)

Remove offsets from dtype, keeping only names and dtype. (See Numpy dtype documentation.)

Parameters
  • dtype (dtype-descr) – either a numpy.dtype, or a description of it

  • sort (bool?) – (default: False)

Returns

clean dtype, without offsets, sorted by field-names

Example

>>> d = numpy.dtype({
>>>    'names': ['z_col', 'd_col', 'a_col'],
>>>    'formats': ['i4', 'f4','i4'],
>>>    'offsets': [0, 4, 40]
>>> })
>>> d
dtype({'names':['z_col','d_col','a_col'], 'formats':['<i4','<f4','<i4'], 'offsets':[0,4,40], 'itemsize':44})
>>> clean_dtype(d)
[('a_col', dtype('int32')),
('d_col', dtype('float32')),
('z_col', dtype('int32'))]
remove_structured_offset(array)

Remove offset fields from structured array. Does not copy the data if the dtype does not have offsets.

Parameters

array (array) – structured array

Returns

structured array without offsets

Example

>>> a = numpy.array([(1, 2, 3), (4, 5, 6)], [('a', 'i4'), ('b', 'i4'), ('c', 'i4')])
>>> b = a[['c', 'a']]
>>> b.dtype
dtype({'names':['c','a'], 'formats':['<i4','<i4'], 'offsets':[8,0], 'itemsize':12})
>>> b = remove_structured_offset(b)
>>> b.dtype
dtype([('c', '<i4'), ('a', '<i4')])