Array tools¶
Functions
|
Compute the rank of appearance of entries grouped by input value. |
|
Efficiently find next index satisfying predicate in a 1d array. |
|
Test whether each element of a 1d array is also present in a second 1d array. |
|
Returns the indices of the k maximum values along a specified axis. |
|
Returns the indices of the k minimum values along a specified axis. |
|
faster alternative to numpy.lexsort for two uint32 |
|
Return whether each element of a 1d array is present in a second 1d array and the corresponding indexes. |
|
Assigns |
|
Computes average of each field given a list of struct-arrays. |
|
Count the number of unique elements in |
- structured_arrays_mean(arrays, keep_missing=False)¶
Computes average of each field given a list of struct-arrays.
By default, will only average and return fields that are present in every struct-array in input.
The average will be over concatenated fields as if they were in a single structured array, as opposed to averaging the results of several
arr['field'].mean()
- Parameters
arrays (list-of-array) – list(struct-array)
keep_missing (boolean?) – if True, use all fields from all arrays, including ones missing from some arrays; if False, only keep the intersection of fields defined in all arrays
(default: False)
- Returns
struct-array of shape (1,)
Examples
Obtaining the average of every field of a structured array:
>>> arr1 = to_structured([ >>> ('a', numpy.arange(5)), >>> ('b', 2 * numpy.arange(5)) >>> ]) >>> structured_arrays_mean([arr1]) array([(2., 4.)], dtype=[('a', '<f8'), ('b', '<f8')])
Getting the average of the same field scattered accross several structured arrays:
>>> arr1 = to_structured([ >>> ('a', numpy.arange(5)), >>> ('b', 2 * numpy.arange(5)) >>> ]) >>> arr2 = to_structured([ >>> ('a', numpy.ones(3)), >>> ('b', 2 * numpy.ones(3)), >>> ('c', 3 * numpy.ones(3)) >>> ]) >>> structured_arrays_mean([arr1, arr2]) array([(1.625, 3.25)], dtype=[('a', '<f8'), ('b', '<f8')])
If you want to include all fields, including fields that are only available in a few arrays, use option
keep_missing=True
:>>> structured_arrays_mean([arr1, arr2], keep_missing=True) array([(1.625, 3.25, 3.)], dtype=[('a', '<f8'), ('b', '<f8'), ('c', '<f8')])
- first(a, predicate=None, batch_size=None, offset=0)¶
Efficiently find next index satisfying predicate in a 1d array.
Can also be used on a array of booleans without a predicate.
Will be added in numpy2.0: https://github.com/numpy/numpy/issues/2269
- Parameters
a (array) –
(n,) dtype array
predicate (callable?) – function(dtype-array -> bool-array) (or
None
if dtype=``bool``)batch_size (int?) –
(default: 4k)
offset (int?) –
(default: 0)
- Returns
int, index of first value satisfying predicate
- Raises
StopIteration – if there is no index satisfying predicate after
offset
Examples
With predicate:
>>> a = numpy.array([0, 1, 1, 2, 3, 2, 4, 2]) >>> idx = first(a, lambda x: x == 2) >>> idx 3
>>> idx = first(a, lambda x: x == 2, offset=idx + 1) >>> idx 5
>>> idx = first(a, lambda x: x == 2, offset=idx + 1) >>> idx 7
>>> idx = first(a, lambda x: x == 2, offset=idx + 1) StopIteration:
Without predicate on array of booleans:
>>> mask = numpy.array([0, 0, 0, 1, 0, 1, 0, 0], '?') >>> idx = first(mask) 3
- set_or_reallocate(array, values, offset, growing_factor=2.0, fill=None)¶
Assigns
values
(of lengthn2
) inarray
(of lengthn1
), starting at array’soffset
index.Returns the same array if there is enough space in array for assigning
array[offset:offset+n2, :]
= values and otherwise returns a new array, expanded bygrowing_factor
whenn2 > n1 - offset
- Parameters
array (array) –
(n1, *extra-dims)
arrayvalues (array) –
(n2, *extra-dims)
arrayoffset (int) – index of
array
to start assigningvalues
.array[offset:offset+n2, :] = values
growing_factor (float?) – growing factor > 1 (
default:2
)fill (float?) – fill value or
None
to leave empty
- Returns
same array if
n1 >= offset + n2
, or new array with copied data otherwise assignsarray[offset:offset+n2, :] = values
Examples
If
n1 >= offset + n2
, the same array is returned witharray[offset:offset+n2, :] = values
. In this case, the assignment is inplace and the input array will be affected.>>> array = numpy.arange(10) >>> values = - 2 * numpy.arange(5) >>> set_or_reallocate(array, values, offset=5) array([ 0, 1, 2, 3, 4, 0, -2, -4, -6, -8]) >>> array array([ 0, 1, 2, 3, 4, 0, -2, -4, -6, -8])
Otherwise, a new expanded array with copied data is created and with
array[offset:offset+n2, :] = values
. Without specifying afill
value, the newly expanded data will be arbitrary, starting at indexoffset + n2
. Since a new array is created, the input array will not be affected.>>> array = numpy.arange(10) >>> set_or_reallocate(array, values, offset=10) array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, -2, -4, -6, -8, 5572452860762084442, 140512663005448, 670512663005448, 0, 2814751914590207]) >>> array array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> set_or_reallocate(array, values, offset=10, fill=0) array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, -2, -4, -6, -8, 0, 0, 0, 0, 0]) >>> array array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
- lexsort_uint32_pair(a, b)¶
faster alternative to numpy.lexsort for two uint32
- Parameters
a (array) – (n,) uint32-array
b (array) – (n,) uint32-array
- Returns
permutation array of n to sort by a first and b second
- kargmax(a, k, axis=0, do_sort=False)¶
Returns the indices of the k maximum values along a specified axis. Set
do_sort
atTrue
to get these indices ordered to geta[output]
sorted. User-friendly wrapper around numpy.argpartition- Parameters
a (arr) – (n,) input array
k (int) – sets the k in ‘top k maximum values’
axis (int?) – axis along which to look for argmax Default: 0
do_sort (bool?) – if False, then the order of the output will be arbitrary if True, then
a[output]
will be sorted, starting with the a.max(). Default: False
- Return arr
(k,) indexes of the k maximum values along the specified axis
Examples
>>> a = numpy.array([3, 1, 9, 6, 4, 4, 0, 6, 4, 8, 1, 3]) >>> kargmax(a, 2) array([9, 2])
The output array’s order is arbitrary - to have the output ordered so that
a[output]
is sorted, then usedo_sort=True
:>>> kargmax(a, 2, do_sort=True) array([2, 9])
On ndarrays:
>>> a = numpy.array([3, 1, 9, 5, 6, 4, 0, 6, 4, 8, 1, 3]).reshape((3, 4)) array([[3, 1, 9, 5], [6, 4, 0, 6], [4, 8, 1, 3]])
>>> kargmax(a, 2, axis=0) array([[2, 1, 2, 0], [1, 2, 0, 1]])
>>> kargmax(a, 2, axis=1) array([[3, 2], [0, 3], [0, 1]])
- kargmin(a, k, axis=0, do_sort=False)¶
Returns the indices of the k minimum values along a specified axis. Set
do_sort
atTrue
to get these indices ordered to get a[output] sorted. User-friendly wrapper around numpy.argpartition.- Parameters
a (arr) – (n,) input array
k (int) – sets the k in ‘top k minimum values’
axis (int?) – axis along which to look for argmin Default: 0
do_sort (bool?) – if False, then the order of the output will be arbitrary if True, then
a[output]
will be sorted, starting with the a.min(). Default: False
- Return arr
(k,) indexes of the k minimum values along the specified axis
Examples
>>> a = numpy.array([3, 1, 9, 6, 4, 4, 0, 6, 4, 8, 1, 3]) >>> kargmin(a, 2) array([1, 6])
The output array’s order is arbitrary - to have the output ordered so that
a[output]
is sorted, then usedo_sort=True
:>>> kargmin(a, 2) array([6, 1])
On ndarrays:
>>> a = numpy.array([3, 1, 9, 5, 6, 4, 0, 6, 4, 8, 1, 3]).reshape((3, 4)) array([[3, 1, 9, 5], [6, 4, 0, 6], [4, 8, 1, 3]])
>>> kargmin(a, 2, axis=0) array([[0, 0, 1, 2], [2, 1, 2, 0]])
>>> kargmin(a, 2, axis=1) array([[1, 0], [2, 1], [2, 3]])
- in1d(needles, haystack, invert=False)¶
Test whether each element of a 1d array is also present in a second 1d array.
- Parameters
needles (array) – (n1,) dtype array
haystack (array) – (n2,) dtype array
invert (bool?) – if True, the values in the returned array are inverted,
in1d(a, b, invert=True)
is equivalent to~in1d(a, b)
. (default: False
)
- Returns
(n1,) bool array
Example
>>> needles = numpy.array([5,10,20]) >>> haystack = numpy.arange(15) >>> in1d(needles, haystack) array([True, True, False]) >>> in1d(needles, haystack, invert=True) array([False, False, True])
- search(needles, haystack, idx_dtype='uint32')¶
Return whether each element of a 1d array is present in a second 1d array and the corresponding indexes.
If an element of
needles
is not found inhaystack
, the corresponding value returned inindexes
is 0.- Parameters
needles (array) – (n1,) dtype array
haystack (array) – (n2,) dtype array
idx_dtype (dtype?) – (
default: uint32
)
- Returns
tuple( indexes: (n1,) idx_dtype array <n2 of indexes in haystack, found: (n1,) bool array,)
Example
>>> needles = numpy.array([1000, 2000, 3000]) >>> haystack = numpy.arange(50) >>> haystack[10] = needles[0] >>> haystack[20] = needles[1] >>> search(needles, haystack) (array([10, 20, 0], dtype=uint32), array([ True, True, False]))
- cumcount_by_value(values, assume_sorted=False)¶
Compute the rank of appearance of entries grouped by input value.
If a value appears for the 5th time in the
values
array at indexi
,output[i]
will be a 5.- Parameters
values (array) – uint32 array
assume_sorted (bool?) – if True, the input values are assumed sorted (
default: False
) for non-integer values, the sort order can be arbitrary (see examples below) for integers values, the order must be the one of increasing integers
- Returns
array of int
Example
>>> array = numpy.array([0, 0, 0, 0, 1, 1, 1, 3, 3, 3, 3, 3]) >>> cumcount_by_value(array) array([0, 1, 2, 3, 0, 1, 2, 0, 1, 2, 3, 4]) >>> array = numpy.array(['beta', 'alpha', 'gamma', 'alpha', 'beta', 'alpha', 'delta']) >>> cumcount_by_value(array) array([0, 0, 0, 1, 1, 2, 0]) >>> array = numpy.array(['A', 'B', 'B', 'C', 'C', 'C']) >>> cumcount_by_value(array, assume_sorted=True) array([0, 0, 1, 0, 1, 2])
When using
assume_sorted
on non-integer values, the order can be arbitrary:>>> ['B', 'B', 'C', 'C', 'C', 'A'] >>> cumcount_by_value(array, assume_sorted=True) array([0, 1, 0, 1, 2, 0])
- unique_count(values)¶
Count the number of unique elements in
values
.- Parameters
values (array) – (n,) dtype array
- Returns
int number of unique values