Aggregate tools¶
Functions
|
Given array of indexes |
|
Given array of indexes |
|
Compute average-by-idx given array of indexes |
|
For each group_id in |
|
Given array of indexes |
|
Given array of indexes |
|
Efficiently converts two arrays representing a relation (the |
|
Given array of indexes |
|
Given array of indexes |
|
Abstract wrapper to compute ufunc grouped by values in array |
|
Wrapper around |
|
Wrapper around argmin_by_idx and get_value_by_idx. |
- igroupby(ids, values, n=None, logging_prefix=None, assume_sorted=False, find_next_hint=512)¶
Efficiently converts two arrays representing a relation (the
idsand the associatedvalues) to an iterable(id, values_associated).The
valuesare grouped byidsand a sequence of tuples is generated.The
ith tuple generated is(id_i, values[ids == id_i]),id_ibeing theith element of theidsarray, once sorted in ascending order.- Parameters
ids (array) –
(>=n,) dtype arrayvalues (array) –
(>=n, *shape) uint32 arrayn (int?) – length of array to consider, applying igroupby to
(ids[:n], values[:n]). Uses full array when not set.logging_prefix (string?) – prefix to include while logging progress.
(default:Does not log``)``.assume_sorted (bool?) – whether ids is sorted.
(default: False)find_next_hint (int?) – hint for find_next_lookup.
(default: 512)
- Generates
tuple(id:int, values_associated:
(m, *shape) array slice)
Example
>>> ids = numpy.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 3, 3, 3]) >>> values = numpy.array([0, 1, 2, 3, 4, 0, 2, 4, 6, 0, 4, 6]) >>> gen = igroupby(ids, values) >>> next(gen) (0, array([0, 1, 2, 3, 4]))
>>> next(gen) (1, array([0, 2, 4, 6]))
>>> next(gen) (3, array([0, 4, 6]))
Example with strings as ids:
>>> ids = numpy.array(["alpha", "alpha", "beta", "omega", "alpha", "gamma", "beta"]) >>> values = numpy.array([1, 2, 10, 100, 3, 1000, 20]) >>> gen = igroupby(ids, values) >>> next(gen) ('alpha', array([1, 2, 3])) >>> next(gen) ('beta', array([10, 20])) >>> next(gen) ('gamma', array([1000])) >>> next(gen) ('omega', array([100]))
- ufunc_group_by_idx(idx, values, ufunc, init, minlength=None)¶
Abstract wrapper to compute ufunc grouped by values in array
idx.Return an array containing the results of
ufuncapplied tovaluesgrouped by the indexes in arrayidx. (See available ufuncs here).Warning: the
initparameter is not a filling value for missing indexes. If indexiis missing, thenout[i] = initbut this value also serves as the initialization ofufuncon all the groups ofvalues.For example, if
ufuncisnumpy.addandinit = -1then for each index, the sum of the corresponding values will be decreased by one.- Parameters
idx (array) –
(n,) int arrayvalues (array) –
(n,) dtype arrayufunc (numpy.ufunc) – universal function applied to the groups of
valuesinit (dtype) – initialization value
minlength (int?) –
(default: idx.max() + 1)
- Returns
(min-length,) dtype array, such that
out[i] = ufunc(values[idx==i])
Example
>>> idx = numpy.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 3, 3, 3]) >>> values = numpy.array([0, 1, 2, 3, 4, 0, 2, 4, 6, 0, 4, 6]) >>> ufunc_group_by_idx(idx, values, numpy.maximum, -1) array([ 4, 6, -1, 6]) >>> ufunc_group_by_idx(idx, values, numpy.add, -1) array([ 9, 11, -1, 9]) >>> ufunc_group_by_idx(idx, values, numpy.add, 0) array([ 10, 12, -0, 10])
- min_by_idx(idx, values, minlength=None, fill=None)¶
Given array of indexes
idxand arrayvalues, outputs the max value by idx, aligned onarange(idx.max() + 1). See alsoargmin_by_idxandvalue_at_argmin_by_idx.- Parameters
idx (array) – (n,) int array
values (array) – (n,) float array
minlength (int?) – (default: idx.max() + 1)
fill (float?) – filling value for missing idx (default: +inf)
- Returns
(min-length,) float array, such that out[i] = min(values[idx==i])
Example
>>> idx = numpy.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 3, 3, 3]) >>> values = numpy.array([1, 1, 2, 3, 4, 0, 2, 4, 6, 0, 4, 6]) >>> min_by_idx(idx, values, fill=100) array([ 1, 0, 100, 0]) >>> min_by_idx(idx, values) array([1, 0, 9223372036854775807, 0])
- max_by_idx(idx, values, minlength=None, fill=None)¶
Given array of indexes
idxand arrayvalues, outputs the max value by idx, aligned onarange(idx.max() + 1). See alsoargmax_by_idxandvalue_at_argmax_by_idx.- Parameters
idx (array) – (n,) int array
values (array) – (n,) float array
minlength (int?) – (default: idx.max() + 1)
fill (float?) – filling value for missing idx (default: -inf)
- Returns
(min-length,) float array, such that out[i] = max(values[idx==i])
Example
>>> idx = numpy.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 3, 3, 3]) >>> values = numpy.array([0, 1, 2, 3, 4, 0, 2, 4, 6, 0, 4, 6]) >>> max_by_idx(idx, values, fill=-1) array([ 4, 6, -1, 6]) >>> max_by_idx(idx, values, minlength=10, fill=-1) array([ 4, 6, -1, 6, -1, -1, -1, -1, -1, -1]) >>> max_by_idx(idx, values) array([ 4, 6, -9223372036854775808, 6])
- argmin_by_idx(idx, values, minlength=None, fill=None)¶
Given array of indexes
idxand arrayvalues, outputs the argmin of the values by idx, aligned onarange(idx.max() + 1). See alsomin_by_idxandvalue_at_argmin_by_idx.- Parameters
idx (array) – (n,) int array
values (array) – (n,) float array
minlength (int?) – (default: idx.max() + 1)
fill (float?) – filling value for missing idx (default: -1)
- Returns
(min-length,) int32 array, such that out[i] = argmin_{idx}(values[idx] : idx[idx] == i)
Example
>>> idx = numpy.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 3, 3, 3]) >>> values = numpy.array([0, 1, 2, 3, 4, 0, 2, 4, 6, 0, 4, 6]) >>> argmin_by_idx(idx, values, fill=-1) array([ 0, 5, -1, 9]) >>> argmin_by_idx(idx, values, minlength=10, fill=-1) array([ 0, 5, -1, 9, -1, -1, -1, -1, -1, -1])
- value_at_argmin_by_idx(idx, sorting_values, fill, output_values=None, minlength=None)¶
Wrapper around argmin_by_idx and get_value_by_idx. Allows to use a different value for the output and for detecting the minimum Allows to set a specific fill value that is not compared with the sorting_values
- Parameters
idx (array) – (n,) uint array with values < max_idx
values (array) – (n,) array
fill – filling value for output[i] if there is no idx == i
output_values (array?) – (n,) dtype array Useful if you want to select the min based on one array, and get the value on another array
minlength (int?) – minimum shape for the output array.
- Returns array
(max_idx+1,), dtype array such that out[i] = min(values[idx==i])
Example
>>> idx = numpy.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 3, 3, 3]) >>> values = numpy.array([0, 1, 2, 3, 4, 0, 2, 4, 6, 0, 4, 6]) >>> value_at_argmin_by_idx(idx, values, fill=-1) array([ 0, 0, -1, 0]) >>> value_at_argmin_by_idx(idx, values, minlength=10, fill=-1) array([ 0, 0, -1, 0, -1, -1, -1, -1, -1, -1])
- argmax_by_idx(idx, values, minlength=None, fill=None)¶
Given array of indexes
idxand arrayvalues, outputs the argmax of the values by idx, aligned onarange(idx.max() + 1). See alsomax_by_idxandvalue_at_argmax_by_idx.- Parameters
idx (array) – (n,) int array
values (array) – (n,) float array
minlength (int?) – (default: idx.max() + 1)
fill (float?) – filling value for missing idx (default: -1)
- Returns
(min-length,) int32 array, such that out[i] = argmax_{idx}(values[idx] : idx[idx] == i)
Example
>>> idx = numpy.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 3, 3, 3]) >>> values = numpy.array([0, 1, 2, 3, 4, 0, 2, 4, 6, 0, 4, 6]) >>> argmax_by_idx(idx, values, fill=-1) array([ 4, 8, -1, 11]) >>> argmax_by_idx(idx, values, minlength=10, fill=-1) array([ 4, 8, -1, 11, -1, -1, -1, -1, -1, -1])
- value_at_argmax_by_idx(idx, sorting_values, fill, output_values=None, minlength=None)¶
Wrapper around
argmax_by_idxandget_value_by_id. Allows to use a different value for the output and for detecting the minimum Allows to set a specific fill value that is not compared with the sorting_values- Parameters
idx (array) – (n,) uint array with values < max_idx
values (array) – (n,) array
fill – filling value for output[i] if there is no idx == i
output_values (array?) – (n,) dtype array Useful if you want to select the min based on one array, and get the value on another array
minlength (int?) – minimum shape for the output array.
- Returns array
(max_idx+1,), dtype array such that out[i] = max(values[idx==i])
Example
>>> idx = numpy.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 3, 3, 3]) >>> values = numpy.array([0, 1, 2, 3, 4, 0, 2, 4, 6, 0, 4, 6]) >>> value_at_argmax_by_idx(idx, values, fill=-1) array([ 4, 6, -1, 6]) >>> value_at_argmax_by_idx(idx, values, minlength=10, fill=-1) array([ 4, 6, -1, 6, -1, -1, -1, -1, -1, -1])
- connect_adjacents_in_groups(group_ids, values, max_gap)¶
For each group_id in
group_ids, connect values that are closer thanmax_gaptogether.Return an array mapping the values to the indexes of the newly formed connected components they belong to.
Two values that don’t have the same input group_id can’s be connected in the same connected component.
connect_adjacents_in_groupsis faster when an array of indexes is provided asgroup_ids, but also accepts other types of ids.- Parameters
group_ids (array) –
(n,) dtype arrayvalues (array) –
(n,) float arraymax_gap (float) – maximum distance between a value and the nearest value in the same group.
- Returns
(n,) uint array, such thatout[s[i]]==out[s[i+1]]\(\iff\)group_ids[s[i]]==group_ids[s[i+1]]&|values[s[i]]-values[s[i+1]]| <= max_gapwheres[i]is thei-th index when sorting by id and value
Example
>>> group_ids = numpy.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 3, 3, 3, 3]) >>> values = numpy.array([ 0, 35, 20, 25, 30, 0, 5, 10, 20, 0, 5, 10, 15]) >>> connect_adjacents_in_groups(group_ids, values, max_gap = 5) array([0, 1, 1, 1, 1, 2, 2, 2, 3, 4, 4, 4, 4], dtype=uint32)
Example with string
group_ids:>>> group_ids = numpy.array(['alpha', 'alpha', 'alpha', 'alpha', 'alpha', 'beta', 'beta', 'beta', 'beta', 'gamma', 'gamma', 'gamma', 'gamma']) >>> values = numpy.array([ 0, 35, 20, 25, 30, 0, 5, 10, 20, 0, 5, 10, 15]) >>> connected_components_ids = connect_adjacents_in_groups(group_ids, values, max_gap = 5)
The function does not require the
group_idsor thevaluesto be sorted:>>> shuffler = numpy.random.permutation(len(group_ids)) >>> group_ids_shuffled = group_ids[shuffler] >>> values_shuffled = values[shuffler] >>> connect_adjacents_in_groups(group_ids_shuffled, values_shuffled, max_gap = 5) array([2, 1, 0, 2, 4, 1, 1, 4, 1, 4, 3, 2, 4], dtype=uint32) >>> connected_components_ids[shuffler] array([2, 1, 0, 2, 4, 1, 1, 4, 1, 4, 3, 2, 4], dtype=uint32)
- get_value_by_idx(idx, values, default, check_unique=True, minlength=None)¶
Given array of indexes
idxand arrayvalues(unordered, not necesarilly full), output array such thatout[i] = values[idx==i].If all indexes in
idxare unique, it is equivalent to sorting thevaluesby theiridxand filling withdefaultfor missingidx.If
idxelements are not unique and you still want to proceed, you can setcheck_uniquetoFalse. The output values for the non-unique indexes will be chosen arbitrarily among the multiple values corresponding.- Parameters
idx (array) –
(n,) uint arraywith values < max_idxvalues (array) –
(n,) dtype arraydefault (dtype) – filling value for
output[i]if there is noidx == icheck_unique (bool) – if
True, will check thatidxare unique IfFalse, if theidxare not unique, then an arbitrary value will be chosen.minlength (int?) – minimum shape for the output array (
default: idx.max() + 1).
- Returns array
(max_idx+1,), dtype array such that
out[i] = values[idx==i].
Example
>>> idx = numpy.array([8,2,4,7]) >>> values = numpy.array([100, 200, 300, 400]) >>> get_value_by_idx(idx, values, -1, check_unique=False, minlength=None) array([ -1, -1, 200, -1, 300, -1, -1, 400, 100])
Example with non-unique elements in
idx:>>> idx = numpy.array([2,2,4,7]) >>> values = numpy.array([100, 200, 300, 400]) >>> get_value_by_idx(idx, values, -1, check_unique=False, minlength=None) array([ -1, -1, 200, -1, 300, -1, -1, 400])
- get_most_common_by_idx(idx, values, fill, minlength=None)¶
Given array of indexes
idxand arrayvalues, outputs the most common value by idx.- Parameters
idx (array) – (n,) uint array with values < max_idx
values (array) – (n,) non-float, dtype array
fill – filling value for output[i] if there is no idx == i
minlength – minimum shape for the output array.
- Returns
(max_idx+1,), dtype array such that out[i] = the most common value such that (values[idx==i])
- average_by_idx(idx, values, weights=None, minlength=None, fill=0, dtype='float64')¶
Compute average-by-idx given array of indexes
idx,values, and optionalweights- Parameters
idx (array) – (n,) int array
values (array) – (n,) float array
weights (array?) – (n,) float array
minlength (int?) – (default: idx.max() + 1)
fill (float?) – filling value for missing idx (default: 0)
dtype (str?) – (default: ‘float32’)
- Returns
(min-length,) float array, such that out[i] = mean(values[idx==i])
Example
>>> idx = numpy.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 3, 3, 3]) >>> values = numpy.array([0, 1, 2, 3, 4, 0, 2, 4, 6, 0, 4, 6]) >>> average_by_idx(idx, values, fill=0) array([ 2. , 3. , 0. , 3.33333333]) >>> weights = numpy.array([0, 1, 0, 0, 0, 1, 2, 3, 4, 1, 1, 0]) >>> average_by_idx(idx, values, weights=weights, fill=0) array([ 1., 4., 0., 2.])