Arrays base tools¶
Functions
|
Remove offsets from dtype, keeping only names and dtype. |
|
Remove offset fields from structured array. |
|
Updates existing structured array, either by replacing the data for existing fields, or by adding new fields to the array. |
|
Casts list of array and names (and optional dtypes) to numpy structured array. |
- to_structured(arrays)¶
Casts list of array and names (and optional dtypes) to numpy structured array.
See Numpy’s documentation for how to use numpy’s structured arrays.
A pandas DataFrame can be converted into a numpy record array, using DataFrame.to_records() (to_records documentation, documentation on record arrays). A record array can then be converted into a structured array by using:
>>> recordarr.view(recordarr.dtype.fields, numpy.ndarray)
- Parameters
arrays (list-of-tuple) –
list((name, array-like, *dtype-infos)))
or{name: array-like}
. A lone value will be broadcasted as an array full of this value and of the size of the other arrays- Returns
struct-array
Examples
>>> to_structured([ >>> ('a', numpy.arange(5), 'uint32'), >>> ('b', 2 * numpy.arange(5), 'float32') >>> ]) array([(0, 0.), (1, 2.), (2, 4.), (3, 6.), (4, 8.)], dtype=[('a', '<u4'), ('b', '<f4')])
A single value can also be used, and will be broadcasted as an array full of this value, and of the size of the other arrays, for instance:
>>> to_structured([ >>> ('a', numpy.arange(5), 'uint32'), >>> ('b', 2, 'float32') >>> ]) array([(0, 2.), (1, 2.), (2, 2.), (3, 2.), (4, 2.)], dtype=[('a', '<u4'), ('b', '<f4')])
Not specifying the dtype in the tuples will cause the function to use the array’s dtype, or to infer it in case of sequences of Python objects.
>>> to_structured([ >>> ('a', numpy.arange(5, dtype='uint32')), >>> ('b', [2 * i for i in range(5)]) >>> ]) array([(0, 0), (1, 2), (2, 4), (3, 6), (4, 8)], dtype=[('a', '<u4'), ('b', '<i8')])
Using dictionaries:
>>> to_structured({ >>> 'a': numpy.arange(5, dtype='uint32'), >>> 'b': [2 * i for i in range(5)] >>> }) array([(0, 0), (1, 2), (2, 4), (3, 6), (4, 8)], dtype=[('a', '<u4'), ('b', '<i8')])
(n, m) 2D numpy arrays are viewed as (n,) arrays of (m,) 1D arrays:
>>> to_structured([ >>> ('a', numpy.arange(5)), >>> ('b', numpy.arange(15).reshape(5, 3)) >>> ]) array([(0, [ 0, 1, 2]), (1, [ 3, 4, 5]), (2, [ 6, 7, 8]), (3, [ 9, 10, 11]), (4, [12, 13, 14])], dtype=[('a', '<i8'), ('b', '<i8', (3,))])
- set_or_add_to_structured(array, data, copy=True)¶
Updates existing structured array, either by replacing the data for existing fields, or by adding new fields to the array.
Fast alternative to
numpy.lib.recfunctions.append_fields
.- Parameters
array (struct-array) – array to update
data (list-of-tuple) – list((name, array-or-scalar)) (scalars are broadcasted)
copy (bool?) – set to False to avoid copy when possible
(default: True)
- Returns
struct-array
Examples
Adding field to existing structured array:
>>> array = to_structured([ >>> ('a', numpy.arange(5, dtype='uint8')), >>> ('b', 2 * numpy.arange(5, dtype='uint16')), >>> ]) >>> new_data = 3 * numpy.arange(5, dtype='float32') >>> updated_array = set_or_add_to_structured(array, [ >>> ('c', new_data), >>> ]) >>> updated_array array([(0, 0, 0.), (1, 2, 3.), (2, 4, 6.), (3, 6, 9.), (4, 8, 12.)], dtype=[('a', 'u1'), ('b', '<u2'), ('c', '<f4')])
Replacing data from a structured array:
>>> updated_array = set_or_add_to_structured(array, [ >>> ('b', new_data) >>> ]) array([(0, 0), (1, 3), (2, 6), (3, 9), (4, 12)], dtype=[('a', 'u1'), ('b', '<u2')])
Or doing both, while adding broadcasted constants:
>>> updated_array = set_or_add_to_structured(array, [ >>> ('b', new_data), >>> ('c', 2 * new_data), >>> ('d', 1), >>> ('e', b'1') >>> ]) array([ (0, 0, 0., 1, b'1'), (1, 3, 6., 1, b'1'), (2, 6, 12., 1, b'1'), (3, 9, 18., 1, b'1'), (4, 12, 24., 1, b'1')], dtype=[('a', 'u1'), ('b', '<u2'), ('c', '<f4'), ('d', '<i8'), ('e', 'S1')])
- clean_dtype(dtype, sort=False)¶
Remove offsets from dtype, keeping only names and dtype. (See Numpy dtype documentation.)
- Parameters
dtype (dtype-descr) – either a numpy.dtype, or a description of it
sort (bool?) – (default: False)
- Returns
clean dtype, without offsets, sorted by field-names
Example
>>> d = numpy.dtype({ >>> 'names': ['z_col', 'd_col', 'a_col'], >>> 'formats': ['i4', 'f4','i4'], >>> 'offsets': [0, 4, 40] >>> }) >>> d dtype({'names':['z_col','d_col','a_col'], 'formats':['<i4','<f4','<i4'], 'offsets':[0,4,40], 'itemsize':44}) >>> clean_dtype(d) [('a_col', dtype('int32')), ('d_col', dtype('float32')), ('z_col', dtype('int32'))]
- remove_structured_offset(array)¶
Remove offset fields from structured array. Does not copy the data if the dtype does not have offsets.
- Parameters
array (array) – structured array
- Returns
structured array without offsets
Example
>>> a = numpy.array([(1, 2, 3), (4, 5, 6)], [('a', 'i4'), ('b', 'i4'), ('c', 'i4')]) >>> b = a[['c', 'a']] >>> b.dtype dtype({'names':['c','a'], 'formats':['<i4','<i4'], 'offsets':[8,0], 'itemsize':12}) >>> b = remove_structured_offset(b) >>> b.dtype dtype([('c', '<i4'), ('a', '<i4')])