Array tools

Functions

cumcount_by_value(values[, assume_sorted])

Compute the rank of appearance of entries grouped by input value.

first(a[, predicate, batch_size, offset])

Efficiently find next index satisfying predicate in a 1d array.

in1d(needles, haystack[, invert])

Test whether each element of a 1d array is also present in a second 1d array.

kargmax(a, k[, axis, do_sort])

Returns the indices of the k maximum values along a specified axis.

kargmin(a, k[, axis, do_sort])

Returns the indices of the k minimum values along a specified axis.

lexsort_uint32_pair(a, b)

faster alternative to numpy.lexsort for two uint32

search(needles, haystack[, idx_dtype])

Return whether each element of a 1d array is present in a second 1d array and the corresponding indexes.

set_or_reallocate(array, values, offset[, ...])

Assigns values (of length n2) in array (of length n1), starting at array's offset index.

structured_arrays_mean(arrays[, keep_missing])

Computes average of each field given a list of struct-arrays.

unique_count(values)

Count the number of unique elements in values.

structured_arrays_mean(arrays, keep_missing=False)

Computes average of each field given a list of struct-arrays.

By default, will only average and return fields that are present in every struct-array in input.

The average will be over concatenated fields as if they were in a single structured array, as opposed to averaging the results of several arr['field'].mean()

Parameters
  • arrays (list-of-array) – list(struct-array)

  • keep_missing (boolean?) – if True, use all fields from all arrays, including ones missing from some arrays; if False, only keep the intersection of fields defined in all arrays (default: False)

Returns

struct-array of shape (1,)

Examples

Obtaining the average of every field of a structured array:

>>> arr1 = to_structured([
>>>     ('a', numpy.arange(5)),
>>>     ('b', 2 * numpy.arange(5))
>>> ])
>>> structured_arrays_mean([arr1])
array([(2., 4.)],
  dtype=[('a', '<f8'), ('b', '<f8')])

Getting the average of the same field scattered accross several structured arrays:

>>> arr1 = to_structured([
>>>     ('a', numpy.arange(5)),
>>>     ('b', 2 * numpy.arange(5))
>>> ])
>>> arr2 = to_structured([
>>>     ('a', numpy.ones(3)),
>>>     ('b', 2 * numpy.ones(3)),
>>>     ('c', 3 * numpy.ones(3))
>>> ])
>>> structured_arrays_mean([arr1, arr2])
array([(1.625, 3.25)],
  dtype=[('a', '<f8'), ('b', '<f8')])

If you want to include all fields, including fields that are only available in a few arrays, use option keep_missing=True:

>>> structured_arrays_mean([arr1, arr2], keep_missing=True)
array([(1.625, 3.25, 3.)],
  dtype=[('a', '<f8'), ('b', '<f8'), ('c', '<f8')])
first(a, predicate=None, batch_size=None, offset=0)

Efficiently find next index satisfying predicate in a 1d array.

Can also be used on a array of booleans without a predicate.

Will be added in numpy2.0: https://github.com/numpy/numpy/issues/2269

Parameters
  • a (array) – (n,) dtype array

  • predicate (callable?) – function(dtype-array -> bool-array) (or None if dtype=``bool``)

  • batch_size (int?) – (default: 4k)

  • offset (int?) – (default: 0)

Returns

int, index of first value satisfying predicate

Raises

StopIteration – if there is no index satisfying predicate after offset

Examples

With predicate:

>>> a = numpy.array([0, 1, 1, 2, 3, 2, 4, 2])
>>> idx = first(a, lambda x: x == 2)
>>> idx
3
>>> idx = first(a, lambda x: x == 2, offset=idx + 1)
>>> idx
5
>>> idx = first(a, lambda x: x == 2, offset=idx + 1)
>>> idx
7
>>> idx = first(a, lambda x: x == 2, offset=idx + 1)
StopIteration:

Without predicate on array of booleans:

>>> mask = numpy.array([0, 0, 0, 1, 0, 1, 0, 0], '?')
>>> idx = first(mask)
3
set_or_reallocate(array, values, offset, growing_factor=2.0, fill=None)

Assigns values (of length n2) in array (of length n1), starting at array’s offset index.

Returns the same array if there is enough space in array for assigning array[offset:offset+n2, :] = values and otherwise returns a new array, expanded by growing_factor when n2 > n1 - offset

Parameters
  • array (array) – (n1, *extra-dims) array

  • values (array) – (n2, *extra-dims) array

  • offset (int) – index of array to start assigning values. array[offset:offset+n2, :] = values

  • growing_factor (float?) – growing factor > 1 (default:2)

  • fill (float?) – fill value or None to leave empty

Returns

same array if n1 >= offset + n2, or new array with copied data otherwise assigns array[offset:offset+n2, :] = values

Examples

If n1 >= offset + n2, the same array is returned with array[offset:offset+n2, :] = values. In this case, the assignment is inplace and the input array will be affected.

>>> array  = numpy.arange(10)
>>> values = - 2 * numpy.arange(5)
>>> set_or_reallocate(array, values, offset=5)
array([ 0,  1,  2,  3,  4,  0, -2, -4, -6, -8])
>>> array
array([ 0,  1,  2,  3,  4,  0, -2, -4, -6, -8])

Otherwise, a new expanded array with copied data is created and with array[offset:offset+n2, :] = values. Without specifying a fill value, the newly expanded data will be arbitrary, starting at index offset + n2. Since a new array is created, the input array will not be affected.

>>> array  = numpy.arange(10)
>>> set_or_reallocate(array, values, offset=10)
array([                  0,                   1,                   2,
                         3,                   4,                   5,
                         6,                   7,                   8,
                         9,                   0,                  -2,
                        -4,                  -6,                  -8,
       5572452860762084442,     140512663005448,     670512663005448,
                         0,    2814751914590207])
>>> array
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> set_or_reallocate(array, values, offset=10, fill=0)
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  0, -2, -4, -6, -8,  0,  0,
    0,  0,  0])
>>> array
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
lexsort_uint32_pair(a, b)

faster alternative to numpy.lexsort for two uint32

Parameters
  • a (array) – (n,) uint32-array

  • b (array) – (n,) uint32-array

Returns

permutation array of n to sort by a first and b second

kargmax(a, k, axis=0, do_sort=False)

Returns the indices of the k maximum values along a specified axis. Set do_sort at True to get these indices ordered to get a[output] sorted. User-friendly wrapper around numpy.argpartition

Parameters
  • a (arr) – (n,) input array

  • k (int) – sets the k in ‘top k maximum values’

  • axis (int?) – axis along which to look for argmax Default: 0

  • do_sort (bool?) – if False, then the order of the output will be arbitrary if True, then a[output] will be sorted, starting with the a.max(). Default: False

Return arr

(k,) indexes of the k maximum values along the specified axis

Examples

>>> a = numpy.array([3, 1, 9, 6, 4, 4, 0, 6, 4, 8, 1, 3])
>>> kargmax(a, 2)
array([9, 2])

The output array’s order is arbitrary - to have the output ordered so that a[output] is sorted, then use do_sort=True:

>>> kargmax(a, 2, do_sort=True)
array([2, 9])

On ndarrays:

>>> a = numpy.array([3, 1, 9, 5, 6, 4, 0, 6, 4, 8, 1, 3]).reshape((3, 4))
array([[3, 1, 9, 5],
       [6, 4, 0, 6],
       [4, 8, 1, 3]])
>>> kargmax(a, 2, axis=0)
array([[2, 1, 2, 0],
       [1, 2, 0, 1]])
>>> kargmax(a, 2, axis=1)
array([[3, 2],
       [0, 3],
       [0, 1]])
kargmin(a, k, axis=0, do_sort=False)

Returns the indices of the k minimum values along a specified axis. Set do_sort at True to get these indices ordered to get a[output] sorted. User-friendly wrapper around numpy.argpartition.

Parameters
  • a (arr) – (n,) input array

  • k (int) – sets the k in ‘top k minimum values’

  • axis (int?) – axis along which to look for argmin Default: 0

  • do_sort (bool?) – if False, then the order of the output will be arbitrary if True, then a[output] will be sorted, starting with the a.min(). Default: False

Return arr

(k,) indexes of the k minimum values along the specified axis

Examples

>>> a = numpy.array([3, 1, 9, 6, 4, 4, 0, 6, 4, 8, 1, 3])
>>> kargmin(a, 2)
array([1, 6])

The output array’s order is arbitrary - to have the output ordered so that a[output] is sorted, then use do_sort=True:

>>> kargmin(a, 2)
array([6, 1])

On ndarrays:

>>> a = numpy.array([3, 1, 9, 5, 6, 4, 0, 6, 4, 8, 1, 3]).reshape((3, 4))
array([[3, 1, 9, 5],
       [6, 4, 0, 6],
       [4, 8, 1, 3]])
>>> kargmin(a, 2, axis=0)
array([[0, 0, 1, 2],
       [2, 1, 2, 0]])
>>> kargmin(a, 2, axis=1)
array([[1, 0],
       [2, 1],
       [2, 3]])
in1d(needles, haystack, invert=False)

Test whether each element of a 1d array is also present in a second 1d array.

Parameters
  • needles (array) – (n1,) dtype array

  • haystack (array) – (n2,) dtype array

  • invert (bool?) – if True, the values in the returned array are inverted, in1d(a, b, invert=True) is equivalent to ~in1d(a, b). (default: False)

Returns

(n1,) bool array

Example

>>> needles = numpy.array([5,10,20])
>>> haystack = numpy.arange(15)
>>> in1d(needles, haystack)
array([True, True, False])
>>> in1d(needles, haystack, invert=True)
array([False, False, True])
search(needles, haystack, idx_dtype='uint32')

Return whether each element of a 1d array is present in a second 1d array and the corresponding indexes.

If an element of needles is not found in haystack, the corresponding value returned in indexes is 0.

Parameters
  • needles (array) – (n1,) dtype array

  • haystack (array) – (n2,) dtype array

  • idx_dtype (dtype?) – (default: uint32)

Returns

tuple( indexes: (n1,) idx_dtype array <n2 of indexes in haystack, found: (n1,) bool array,)

Example

>>> needles = numpy.array([1000, 2000, 3000])
>>> haystack = numpy.arange(50)
>>> haystack[10] = needles[0]
>>> haystack[20] = needles[1]
>>> search(needles, haystack)
(array([10, 20,  0], dtype=uint32), array([ True,  True, False]))
cumcount_by_value(values, assume_sorted=False)

Compute the rank of appearance of entries grouped by input value.

If a value appears for the 5th time in the values array at index i, output[i] will be a 5.

Parameters
  • values (array) – uint32 array

  • assume_sorted (bool?) – if True, the input values are assumed sorted (default: False) for non-integer values, the sort order can be arbitrary (see examples below) for integers values, the order must be the one of increasing integers

Returns

array of int

Example

>>> array = numpy.array([0, 0, 0, 0, 1, 1, 1, 3, 3, 3, 3, 3])
>>> cumcount_by_value(array)
array([0, 1, 2, 3, 0, 1, 2, 0, 1, 2, 3, 4])
>>> array = numpy.array(['beta', 'alpha', 'gamma', 'alpha', 'beta', 'alpha', 'delta'])
>>> cumcount_by_value(array)
array([0, 0, 0, 1, 1, 2, 0])
>>> array = numpy.array(['A', 'B', 'B', 'C', 'C', 'C'])
>>> cumcount_by_value(array, assume_sorted=True)
array([0, 0, 1, 0, 1, 2])

When using assume_sorted on non-integer values, the order can be arbitrary:

>>> ['B', 'B', 'C', 'C', 'C', 'A']
>>> cumcount_by_value(array, assume_sorted=True)
array([0, 1, 0, 1, 2, 0])
unique_count(values)

Count the number of unique elements in values.

Parameters

values (array) – (n,) dtype array

Returns

int number of unique values