Aggregated data¶

These are the API components of datreant.data for working with datasets from multiple Treants at once, and treating them in aggregate.

AggData¶

The class datreant.data.agglimbs.AggData is the interface used by Bundles and Views to access their members’ datasets in aggregate.

class datreant.data.agglimbs.AggData(collection)¶

Manipulators for collection data.

keys(scope='all')¶

List available datasets.

Parameters:	scope ({'all', 'any'}) – Keys to list. ‘all’ returns only handles that are present in all members. ‘any’ returns a list of all handles present in at least one member.
Returns:	handles – list of handles to available datasets
Return type:	list

retrieve(handle, by='path', **kwargs)¶

Retrieve aggregated dataset from all members.

This is a convenience method. The stored data structure for each member is read from disk and aggregated. The aggregation scheme is dependent on the form of the data structures pulled from each member:

pandas DataFrames or Series: the structures are appended together, with a new level added to the index giving the member (see by) each set of rows came from
pandas Panel or Panel4D, numpy arrays, pickled python objects: the structures are returned as a dictionary, with keys giving the member (see by) and each value giving the corresponding data structure

This method tries to do smart things with the data it reads from each member. In particular:

members for which there is no data with the given handle are skipped
the lowest-common-denominator data structure is output; this means that if all data structures read are pandas DataFrames, then a multi-index DataFrame is returned; if some structures are pandas DataFrames, while some are anything else, a dictionary is returned

Arguments:	handle name of data to retrieve
Keywords:	by top-level index of output data structure; ‘path’ uses member path, ‘name’ uses member names, ‘uuid’ uses member uuids; if names are not unique, it is better to go with ‘path’ or ‘uuid’ [‘path’]

See datreant.data.limbs.Data.retrieve() for more information on keyword usage.

Keywords for pandas data structures:
	where conditions for what rows/columns to return start row number to start selection stop row number to stop selection columns list of columns to return; all columns returned by default iterator if True, return an iterator [`False`] chunksize number of rows to include in iteration; implies `iterator=True`
Returns:	data aggregated data structure