Aggregate tools¶
Functions
|
Given array of indexes |
|
Given array of indexes |
|
Compute average-by-idx given array of indexes |
|
For each group_id in |
|
Given array of indexes |
|
Given array of indexes |
|
Efficiently converts two arrays representing a relation (the |
|
Given array of indexes |
|
Given array of indexes |
|
Abstract wrapper to compute ufunc grouped by values in array |
|
Wrapper around |
|
Wrapper around argmin_by_idx and get_value_by_idx. |
- igroupby(ids, values, n=None, logging_prefix=None, assume_sorted=False, find_next_hint=512)¶
Efficiently converts two arrays representing a relation (the
ids
and the associatedvalues
) to an iterable(id, values_associated)
.The
values
are grouped byids
and a sequence of tuples is generated.The
i
th tuple generated is(id_i, values[ids == id_i])
,id_i
being thei
th element of theids
array, once sorted in ascending order.- Parameters
ids (array) –
(>=n,) dtype array
values (array) –
(>=n, *shape) uint32 array
n (int?) – length of array to consider, applying igroupby to
(ids[:n], values[:n])
. Uses full array when not set.logging_prefix (string?) – prefix to include while logging progress.
(default:
Does not log``)``.assume_sorted (bool?) – whether ids is sorted.
(default: False)
find_next_hint (int?) – hint for find_next_lookup.
(default: 512)
- Generates
tuple(id:int, values_associated:
(m, *shape) array slice
)
Example
>>> ids = numpy.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 3, 3, 3]) >>> values = numpy.array([0, 1, 2, 3, 4, 0, 2, 4, 6, 0, 4, 6]) >>> gen = igroupby(ids, values) >>> next(gen) (0, array([0, 1, 2, 3, 4]))
>>> next(gen) (1, array([0, 2, 4, 6]))
>>> next(gen) (3, array([0, 4, 6]))
Example with strings as ids:
>>> ids = numpy.array(["alpha", "alpha", "beta", "omega", "alpha", "gamma", "beta"]) >>> values = numpy.array([1, 2, 10, 100, 3, 1000, 20]) >>> gen = igroupby(ids, values) >>> next(gen) ('alpha', array([1, 2, 3])) >>> next(gen) ('beta', array([10, 20])) >>> next(gen) ('gamma', array([1000])) >>> next(gen) ('omega', array([100]))
- ufunc_group_by_idx(idx, values, ufunc, init, minlength=None)¶
Abstract wrapper to compute ufunc grouped by values in array
idx
.Return an array containing the results of
ufunc
applied tovalues
grouped by the indexes in arrayidx
. (See available ufuncs here).Warning: the
init
parameter is not a filling value for missing indexes. If indexi
is missing, thenout[i] = init
but this value also serves as the initialization ofufunc
on all the groups ofvalues
.For example, if
ufunc
isnumpy.add
andinit = -1
then for each index, the sum of the corresponding values will be decreased by one.- Parameters
idx (array) –
(n,) int array
values (array) –
(n,) dtype array
ufunc (numpy.ufunc) – universal function applied to the groups of
values
init (dtype) – initialization value
minlength (int?) –
(default: idx.max() + 1)
- Returns
(min-length,) dtype array, such that
out[i] = ufunc(values[idx==i])
Example
>>> idx = numpy.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 3, 3, 3]) >>> values = numpy.array([0, 1, 2, 3, 4, 0, 2, 4, 6, 0, 4, 6]) >>> ufunc_group_by_idx(idx, values, numpy.maximum, -1) array([ 4, 6, -1, 6]) >>> ufunc_group_by_idx(idx, values, numpy.add, -1) array([ 9, 11, -1, 9]) >>> ufunc_group_by_idx(idx, values, numpy.add, 0) array([ 10, 12, -0, 10])
- min_by_idx(idx, values, minlength=None, fill=None)¶
Given array of indexes
idx
and arrayvalues
, outputs the max value by idx, aligned onarange(idx.max() + 1)
. See alsoargmin_by_idx
andvalue_at_argmin_by_idx
.- Parameters
idx (array) – (n,) int array
values (array) – (n,) float array
minlength (int?) – (default: idx.max() + 1)
fill (float?) – filling value for missing idx (default: +inf)
- Returns
(min-length,) float array, such that out[i] = min(values[idx==i])
Example
>>> idx = numpy.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 3, 3, 3]) >>> values = numpy.array([1, 1, 2, 3, 4, 0, 2, 4, 6, 0, 4, 6]) >>> min_by_idx(idx, values, fill=100) array([ 1, 0, 100, 0]) >>> min_by_idx(idx, values) array([1, 0, 9223372036854775807, 0])
- max_by_idx(idx, values, minlength=None, fill=None)¶
Given array of indexes
idx
and arrayvalues
, outputs the max value by idx, aligned onarange(idx.max() + 1)
. See alsoargmax_by_idx
andvalue_at_argmax_by_idx
.- Parameters
idx (array) – (n,) int array
values (array) – (n,) float array
minlength (int?) – (default: idx.max() + 1)
fill (float?) – filling value for missing idx (default: -inf)
- Returns
(min-length,) float array, such that out[i] = max(values[idx==i])
Example
>>> idx = numpy.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 3, 3, 3]) >>> values = numpy.array([0, 1, 2, 3, 4, 0, 2, 4, 6, 0, 4, 6]) >>> max_by_idx(idx, values, fill=-1) array([ 4, 6, -1, 6]) >>> max_by_idx(idx, values, minlength=10, fill=-1) array([ 4, 6, -1, 6, -1, -1, -1, -1, -1, -1]) >>> max_by_idx(idx, values) array([ 4, 6, -9223372036854775808, 6])
- argmin_by_idx(idx, values, minlength=None, fill=None)¶
Given array of indexes
idx
and arrayvalues
, outputs the argmin of the values by idx, aligned onarange(idx.max() + 1)
. See alsomin_by_idx
andvalue_at_argmin_by_idx
.- Parameters
idx (array) – (n,) int array
values (array) – (n,) float array
minlength (int?) – (default: idx.max() + 1)
fill (float?) – filling value for missing idx (default: -1)
- Returns
(min-length,) int32 array, such that out[i] = argmin_{idx}(values[idx] : idx[idx] == i)
Example
>>> idx = numpy.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 3, 3, 3]) >>> values = numpy.array([0, 1, 2, 3, 4, 0, 2, 4, 6, 0, 4, 6]) >>> argmin_by_idx(idx, values, fill=-1) array([ 0, 5, -1, 9]) >>> argmin_by_idx(idx, values, minlength=10, fill=-1) array([ 0, 5, -1, 9, -1, -1, -1, -1, -1, -1])
- value_at_argmin_by_idx(idx, sorting_values, fill, output_values=None, minlength=None)¶
Wrapper around argmin_by_idx and get_value_by_idx. Allows to use a different value for the output and for detecting the minimum Allows to set a specific fill value that is not compared with the sorting_values
- Parameters
idx (array) – (n,) uint array with values < max_idx
values (array) – (n,) array
fill – filling value for output[i] if there is no idx == i
output_values (array?) – (n,) dtype array Useful if you want to select the min based on one array, and get the value on another array
minlength (int?) – minimum shape for the output array.
- Returns array
(max_idx+1,), dtype array such that out[i] = min(values[idx==i])
Example
>>> idx = numpy.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 3, 3, 3]) >>> values = numpy.array([0, 1, 2, 3, 4, 0, 2, 4, 6, 0, 4, 6]) >>> value_at_argmin_by_idx(idx, values, fill=-1) array([ 0, 0, -1, 0]) >>> value_at_argmin_by_idx(idx, values, minlength=10, fill=-1) array([ 0, 0, -1, 0, -1, -1, -1, -1, -1, -1])
- argmax_by_idx(idx, values, minlength=None, fill=None)¶
Given array of indexes
idx
and arrayvalues
, outputs the argmax of the values by idx, aligned onarange(idx.max() + 1)
. See alsomax_by_idx
andvalue_at_argmax_by_idx
.- Parameters
idx (array) – (n,) int array
values (array) – (n,) float array
minlength (int?) – (default: idx.max() + 1)
fill (float?) – filling value for missing idx (default: -1)
- Returns
(min-length,) int32 array, such that out[i] = argmax_{idx}(values[idx] : idx[idx] == i)
Example
>>> idx = numpy.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 3, 3, 3]) >>> values = numpy.array([0, 1, 2, 3, 4, 0, 2, 4, 6, 0, 4, 6]) >>> argmax_by_idx(idx, values, fill=-1) array([ 4, 8, -1, 11]) >>> argmax_by_idx(idx, values, minlength=10, fill=-1) array([ 4, 8, -1, 11, -1, -1, -1, -1, -1, -1])
- value_at_argmax_by_idx(idx, sorting_values, fill, output_values=None, minlength=None)¶
Wrapper around
argmax_by_idx
andget_value_by_id
. Allows to use a different value for the output and for detecting the minimum Allows to set a specific fill value that is not compared with the sorting_values- Parameters
idx (array) – (n,) uint array with values < max_idx
values (array) – (n,) array
fill – filling value for output[i] if there is no idx == i
output_values (array?) – (n,) dtype array Useful if you want to select the min based on one array, and get the value on another array
minlength (int?) – minimum shape for the output array.
- Returns array
(max_idx+1,), dtype array such that out[i] = max(values[idx==i])
Example
>>> idx = numpy.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 3, 3, 3]) >>> values = numpy.array([0, 1, 2, 3, 4, 0, 2, 4, 6, 0, 4, 6]) >>> value_at_argmax_by_idx(idx, values, fill=-1) array([ 4, 6, -1, 6]) >>> value_at_argmax_by_idx(idx, values, minlength=10, fill=-1) array([ 4, 6, -1, 6, -1, -1, -1, -1, -1, -1])
- connect_adjacents_in_groups(group_ids, values, max_gap)¶
For each group_id in
group_ids
, connect values that are closer thanmax_gap
together.Return an array mapping the values to the indexes of the newly formed connected components they belong to.
Two values that don’t have the same input group_id can’s be connected in the same connected component.
connect_adjacents_in_groups
is faster when an array of indexes is provided asgroup_ids
, but also accepts other types of ids.- Parameters
group_ids (array) –
(n,) dtype array
values (array) –
(n,) float array
max_gap (float) – maximum distance between a value and the nearest value in the same group.
- Returns
(n,) uint array
, such thatout[s[i]]==out[s[i+1]]
\(\iff\)group_ids[s[i]]==group_ids[s[i+1]]
&|values[s[i]]-values[s[i+1]]| <= max_gap
wheres[i]
is thei
-th index when sorting by id and value
Example
>>> group_ids = numpy.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 3, 3, 3, 3]) >>> values = numpy.array([ 0, 35, 20, 25, 30, 0, 5, 10, 20, 0, 5, 10, 15]) >>> connect_adjacents_in_groups(group_ids, values, max_gap = 5) array([0, 1, 1, 1, 1, 2, 2, 2, 3, 4, 4, 4, 4], dtype=uint32)
Example with string
group_ids
:>>> group_ids = numpy.array(['alpha', 'alpha', 'alpha', 'alpha', 'alpha', 'beta', 'beta', 'beta', 'beta', 'gamma', 'gamma', 'gamma', 'gamma']) >>> values = numpy.array([ 0, 35, 20, 25, 30, 0, 5, 10, 20, 0, 5, 10, 15]) >>> connected_components_ids = connect_adjacents_in_groups(group_ids, values, max_gap = 5)
The function does not require the
group_ids
or thevalues
to be sorted:>>> shuffler = numpy.random.permutation(len(group_ids)) >>> group_ids_shuffled = group_ids[shuffler] >>> values_shuffled = values[shuffler] >>> connect_adjacents_in_groups(group_ids_shuffled, values_shuffled, max_gap = 5) array([2, 1, 0, 2, 4, 1, 1, 4, 1, 4, 3, 2, 4], dtype=uint32) >>> connected_components_ids[shuffler] array([2, 1, 0, 2, 4, 1, 1, 4, 1, 4, 3, 2, 4], dtype=uint32)
- get_value_by_idx(idx, values, default, check_unique=True, minlength=None)¶
Given array of indexes
idx
and arrayvalues
(unordered, not necesarilly full), output array such thatout[i] = values[idx==i]
.If all indexes in
idx
are unique, it is equivalent to sorting thevalues
by theiridx
and filling withdefault
for missingidx
.If
idx
elements are not unique and you still want to proceed, you can setcheck_unique
toFalse
. The output values for the non-unique indexes will be chosen arbitrarily among the multiple values corresponding.- Parameters
idx (array) –
(n,) uint array
with values < max_idxvalues (array) –
(n,) dtype array
default (dtype) – filling value for
output[i]
if there is noidx == i
check_unique (bool) – if
True
, will check thatidx
are unique IfFalse
, if theidx
are not unique, then an arbitrary value will be chosen.minlength (int?) – minimum shape for the output array (
default: idx.max() + 1
).
- Returns array
(max_idx+1,), dtype array such that
out[i] = values[idx==i]
.
Example
>>> idx = numpy.array([8,2,4,7]) >>> values = numpy.array([100, 200, 300, 400]) >>> get_value_by_idx(idx, values, -1, check_unique=False, minlength=None) array([ -1, -1, 200, -1, 300, -1, -1, 400, 100])
Example with non-unique elements in
idx
:>>> idx = numpy.array([2,2,4,7]) >>> values = numpy.array([100, 200, 300, 400]) >>> get_value_by_idx(idx, values, -1, check_unique=False, minlength=None) array([ -1, -1, 200, -1, 300, -1, -1, 400])
- get_most_common_by_idx(idx, values, fill, minlength=None)¶
Given array of indexes
idx
and arrayvalues
, outputs the most common value by idx.- Parameters
idx (array) – (n,) uint array with values < max_idx
values (array) – (n,) non-float, dtype array
fill – filling value for output[i] if there is no idx == i
minlength – minimum shape for the output array.
- Returns
(max_idx+1,), dtype array such that out[i] = the most common value such that (values[idx==i])
- average_by_idx(idx, values, weights=None, minlength=None, fill=0, dtype='float64')¶
Compute average-by-idx given array of indexes
idx
,values
, and optionalweights
- Parameters
idx (array) – (n,) int array
values (array) – (n,) float array
weights (array?) – (n,) float array
minlength (int?) – (default: idx.max() + 1)
fill (float?) – filling value for missing idx (default: 0)
dtype (str?) – (default: ‘float32’)
- Returns
(min-length,) float array, such that out[i] = mean(values[idx==i])
Example
>>> idx = numpy.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 3, 3, 3]) >>> values = numpy.array([0, 1, 2, 3, 4, 0, 2, 4, 6, 0, 4, 6]) >>> average_by_idx(idx, values, fill=0) array([ 2. , 3. , 0. , 3.33333333]) >>> weights = numpy.array([0, 1, 0, 0, 0, 1, 2, 3, 4, 1, 1, 0]) >>> average_by_idx(idx, values, weights=weights, fill=0) array([ 1., 4., 0., 2.])