Aggregated data

These are the API components of for working with datasets from multiple Treants at once, and treating them in aggregate.


The class is the interface used by Bundles and Views to access their members’ datasets in aggregate.


Manipulators for collection data.


List available datasets.

Parameters:scope ({'all', 'any'}) – Keys to list. ‘all’ returns only handles that are present in all members. ‘any’ returns a list of all handles present in at least one member.
Returns:handles – list of handles to available datasets
Return type:list
retrieve(handle, by='path', **kwargs)

Retrieve aggregated dataset from all members.

This is a convenience method. The stored data structure for each member is read from disk and aggregated. The aggregation scheme is dependent on the form of the data structures pulled from each member:

pandas DataFrames or Series
the structures are appended together, with a new level added to the index giving the member (see by) each set of rows came from
pandas Panel or Panel4D, numpy arrays, pickled python objects
the structures are returned as a dictionary, with keys giving the member (see by) and each value giving the corresponding data structure

This method tries to do smart things with the data it reads from each member. In particular:

  • members for which there is no data with the given handle are skipped
  • the lowest-common-denominator data structure is output; this means that if all data structures read are pandas DataFrames, then a multi-index DataFrame is returned; if some structures are pandas DataFrames, while some are anything else, a dictionary is returned

name of data to retrieve


top-level index of output data structure; ‘path’ uses member path, ‘name’ uses member names, ‘uuid’ uses member uuids; if names are not unique, it is better to go with ‘path’ or ‘uuid’ [‘path’]

See for more information on keyword usage.

Keywords for pandas data structures:

conditions for what rows/columns to return


row number to start selection


row number to stop selection


list of columns to return; all columns returned by default


if True, return an iterator [False]


number of rows to include in iteration; implies iterator=True


aggregated data structure