Array tools¶

Functions

`cumcount_by_value`(values[, assume_sorted])	Compute the rank of appearance of entries grouped by input value.
`first`(a[, predicate, batch_size, offset])	Efficiently find next index satisfying predicate in a 1d array.
`in1d`(needles, haystack[, invert])	Test whether each element of a 1d array is also present in a second 1d array.
`kargmax`(a, k[, axis, do_sort])	Returns the indices of the k maximum values along a specified axis.
`kargmin`(a, k[, axis, do_sort])	Returns the indices of the k minimum values along a specified axis.
`lexsort_uint32_pair`(a, b)	faster alternative to numpy.lexsort for two uint32
`search`(needles, haystack[, idx_dtype])	Return whether each element of a 1d array is present in a second 1d array and the corresponding indexes.
`set_or_reallocate`(array, values, offset[, ...])	Assigns `values` (of length `n2`) in `array` (of length `n1`), starting at array's `offset` index.
`structured_arrays_mean`(arrays[, keep_missing])	Computes average of each field given a list of struct-arrays.
`unique_count`(values)	Count the number of unique elements in `values`.

structured_arrays_mean(arrays, keep_missing=False)¶

Computes average of each field given a list of struct-arrays.

By default, will only average and return fields that are present in every struct-array in input.

The average will be over concatenated fields as if they were in a single structured array, as opposed to averaging the results of several arr['field'].mean()

Parameters

arrays (list-of-array) – list(struct-array)
keep_missing (boolean?) – if True, use all fields from all arrays, including ones missing from some arrays; if False, only keep the intersection of fields defined in all arrays (default: False)

Returns

struct-array of shape (1,)

Examples

Obtaining the average of every field of a structured array:

>>> arr1 = to_structured([
>>>     ('a', numpy.arange(5)),
>>>     ('b', 2 * numpy.arange(5))
>>> ])
>>> structured_arrays_mean([arr1])
array([(2., 4.)],
  dtype=[('a', '<f8'), ('b', '<f8')])

Getting the average of the same field scattered accross several structured arrays:

>>> arr1 = to_structured([
>>>     ('a', numpy.arange(5)),
>>>     ('b', 2 * numpy.arange(5))
>>> ])
>>> arr2 = to_structured([
>>>     ('a', numpy.ones(3)),
>>>     ('b', 2 * numpy.ones(3)),
>>>     ('c', 3 * numpy.ones(3))
>>> ])
>>> structured_arrays_mean([arr1, arr2])
array([(1.625, 3.25)],
  dtype=[('a', '<f8'), ('b', '<f8')])

If you want to include all fields, including fields that are only available in a few arrays, use option keep_missing=True:

>>> structured_arrays_mean([arr1, arr2], keep_missing=True)
array([(1.625, 3.25, 3.)],
  dtype=[('a', '<f8'), ('b', '<f8'), ('c', '<f8')])

first(a, predicate=None, batch_size=None, offset=0)¶

Efficiently find next index satisfying predicate in a 1d array.

Can also be used on a array of booleans without a predicate.

Will be added in numpy2.0: https://github.com/numpy/numpy/issues/2269

Parameters

a (array) – (n,) dtype array
predicate (callable?) – function(dtype-array -> bool-array) (or None if dtype=``bool``)
batch_size (int?) – (default: 4k)
offset (int?) – (default: 0)

Returns

int, index of first value satisfying predicate

Raises

StopIteration – if there is no index satisfying predicate after offset

Examples

With predicate:

>>> a = numpy.array([0, 1, 1, 2, 3, 2, 4, 2])
>>> idx = first(a, lambda x: x == 2)
>>> idx
3

>>> idx = first(a, lambda x: x == 2, offset=idx + 1)
>>> idx
5

>>> idx = first(a, lambda x: x == 2, offset=idx + 1)
>>> idx
7

>>> idx = first(a, lambda x: x == 2, offset=idx + 1)
StopIteration:

Without predicate on array of booleans:

>>> mask = numpy.array([0, 0, 0, 1, 0, 1, 0, 0], '?')
>>> idx = first(mask)
3

set_or_reallocate(array, values, offset, growing_factor=2.0, fill=None)¶

Assigns values (of length n2) in array (of length n1), starting at array’s offset index.

Returns the same array if there is enough space in array for assigning array[offset:offset+n2, :] = values and otherwise returns a new array, expanded by growing_factor when n2 > n1 - offset

Parameters

array (array) – (n1, *extra-dims) array
values (array) – (n2, *extra-dims) array
offset (int) – index of array to start assigning values. array[offset:offset+n2, :] = values
growing_factor (float?) – growing factor > 1 (default:2)
fill (float?) – fill value or None to leave empty

Returns

same array if n1 >= offset + n2, or new array with copied data otherwise assigns array[offset:offset+n2, :] = values

Examples

If n1 >= offset + n2, the same array is returned with array[offset:offset+n2, :] = values. In this case, the assignment is inplace and the input array will be affected.

>>> array  = numpy.arange(10)
>>> values = - 2 * numpy.arange(5)
>>> set_or_reallocate(array, values, offset=5)
array([ 0,  1,  2,  3,  4,  0, -2, -4, -6, -8])
>>> array
array([ 0,  1,  2,  3,  4,  0, -2, -4, -6, -8])

Otherwise, a new expanded array with copied data is created and with array[offset:offset+n2, :] = values. Without specifying a fill value, the newly expanded data will be arbitrary, starting at index offset + n2. Since a new array is created, the input array will not be affected.

>>> array  = numpy.arange(10)
>>> set_or_reallocate(array, values, offset=10)
array([                  0,                   1,                   2,
                         3,                   4,                   5,
                         6,                   7,                   8,
                         9,                   0,                  -2,
                        -4,                  -6,                  -8,
       5572452860762084442,     140512663005448,     670512663005448,
                         0,    2814751914590207])
>>> array
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

>>> set_or_reallocate(array, values, offset=10, fill=0)
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  0, -2, -4, -6, -8,  0,  0,
    0,  0,  0])
>>> array
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

lexsort_uint32_pair(a, b)¶

faster alternative to numpy.lexsort for two uint32

Parameters

a (array) – (n,) uint32-array
b (array) – (n,) uint32-array

Returns

permutation array of n to sort by a first and b second

kargmax(a, k, axis=0, do_sort=False)¶

Returns the indices of the k maximum values along a specified axis. Set do_sort at True to get these indices ordered to get a[output] sorted. User-friendly wrapper around numpy.argpartition

Parameters

a (arr) – (n,) input array
k (int) – sets the k in ‘top k maximum values’
axis (int?) – axis along which to look for argmax Default: 0
do_sort (bool?) – if False, then the order of the output will be arbitrary if True, then a[output] will be sorted, starting with the a.max(). Default: False

Return arr

(k,) indexes of the k maximum values along the specified axis

Examples

>>> a = numpy.array([3, 1, 9, 6, 4, 4, 0, 6, 4, 8, 1, 3])
>>> kargmax(a, 2)
array([9, 2])

The output array’s order is arbitrary - to have the output ordered so that a[output] is sorted, then use do_sort=True:

>>> kargmax(a, 2, do_sort=True)
array([2, 9])

On ndarrays:

>>> a = numpy.array([3, 1, 9, 5, 6, 4, 0, 6, 4, 8, 1, 3]).reshape((3, 4))
array([[3, 1, 9, 5],
       [6, 4, 0, 6],
       [4, 8, 1, 3]])

>>> kargmax(a, 2, axis=0)
array([[2, 1, 2, 0],
       [1, 2, 0, 1]])

>>> kargmax(a, 2, axis=1)
array([[3, 2],
       [0, 3],
       [0, 1]])

kargmin(a, k, axis=0, do_sort=False)¶

Returns the indices of the k minimum values along a specified axis. Set do_sort at True to get these indices ordered to get a[output] sorted. User-friendly wrapper around numpy.argpartition.

Parameters

a (arr) – (n,) input array
k (int) – sets the k in ‘top k minimum values’
axis (int?) – axis along which to look for argmin Default: 0
do_sort (bool?) – if False, then the order of the output will be arbitrary if True, then a[output] will be sorted, starting with the a.min(). Default: False

Return arr

(k,) indexes of the k minimum values along the specified axis

Examples

>>> a = numpy.array([3, 1, 9, 6, 4, 4, 0, 6, 4, 8, 1, 3])
>>> kargmin(a, 2)
array([1, 6])

The output array’s order is arbitrary - to have the output ordered so that a[output] is sorted, then use do_sort=True:

>>> kargmin(a, 2)
array([6, 1])

On ndarrays:

>>> a = numpy.array([3, 1, 9, 5, 6, 4, 0, 6, 4, 8, 1, 3]).reshape((3, 4))
array([[3, 1, 9, 5],
       [6, 4, 0, 6],
       [4, 8, 1, 3]])

>>> kargmin(a, 2, axis=0)
array([[0, 0, 1, 2],
       [2, 1, 2, 0]])

>>> kargmin(a, 2, axis=1)
array([[1, 0],
       [2, 1],
       [2, 3]])

in1d(needles, haystack, invert=False)¶

Test whether each element of a 1d array is also present in a second 1d array.

Parameters

needles (array) – (n1,) dtype array
haystack (array) – (n2,) dtype array
invert (bool?) – if True, the values in the returned array are inverted, in1d(a, b, invert=True) is equivalent to ~in1d(a, b). (default: False)

Returns

(n1,) bool array

Example

>>> needles = numpy.array([5,10,20])
>>> haystack = numpy.arange(15)
>>> in1d(needles, haystack)
array([True, True, False])
>>> in1d(needles, haystack, invert=True)
array([False, False, True])

search(needles, haystack, idx_dtype='uint32')¶

Return whether each element of a 1d array is present in a second 1d array and the corresponding indexes.

If an element of needles is not found in haystack, the corresponding value returned in indexes is 0.

Parameters

needles (array) – (n1,) dtype array
haystack (array) – (n2,) dtype array
idx_dtype (dtype?) – (default: uint32)

Returns

tuple( indexes: (n1,) idx_dtype array <n2 of indexes in haystack, found: (n1,) bool array,)

Example

>>> needles = numpy.array([1000, 2000, 3000])
>>> haystack = numpy.arange(50)
>>> haystack[10] = needles[0]
>>> haystack[20] = needles[1]
>>> search(needles, haystack)
(array([10, 20,  0], dtype=uint32), array([ True,  True, False]))

cumcount_by_value(values, assume_sorted=False)¶

Compute the rank of appearance of entries grouped by input value.

If a value appears for the 5th time in the values array at index i, output[i] will be a 5.

Parameters

values (array) – uint32 array
assume_sorted (bool?) – if True, the input values are assumed sorted (default: False) for non-integer values, the sort order can be arbitrary (see examples below) for integers values, the order must be the one of increasing integers

Returns

array of int

Example

>>> array = numpy.array([0, 0, 0, 0, 1, 1, 1, 3, 3, 3, 3, 3])
>>> cumcount_by_value(array)
array([0, 1, 2, 3, 0, 1, 2, 0, 1, 2, 3, 4])
>>> array = numpy.array(['beta', 'alpha', 'gamma', 'alpha', 'beta', 'alpha', 'delta'])
>>> cumcount_by_value(array)
array([0, 0, 0, 1, 1, 2, 0])
>>> array = numpy.array(['A', 'B', 'B', 'C', 'C', 'C'])
>>> cumcount_by_value(array, assume_sorted=True)
array([0, 0, 1, 0, 1, 2])

When using assume_sorted on non-integer values, the order can be arbitrary:

>>> ['B', 'B', 'C', 'C', 'C', 'A']
>>> cumcount_by_value(array, assume_sorted=True)
array([0, 1, 0, 1, 2, 0])

unique_count(values)¶

Count the number of unique elements in values.

Parameters: values (array) – (n,) dtype array
Returns: int number of unique values