Aggregated data

These are the API components of datreant.data for working with datasets from multiple Treants at once, and treating them in aggregate.

AggData

The class datreant.data.agglimbs.AggData is the interface used by Bundles and Views to access their members’ datasets in aggregate.

class datreant.data.agglimbs.AggData(collection)

Manipulators for collection data.

keys(scope='all')

List available datasets.

Parameters:scope ({'all', 'any'}) – Keys to list. ‘all’ returns only handles that are present in all members. ‘any’ returns a list of all handles present in at least one member.
Returns:handles – list of handles to available datasets
Return type:list
retrieve(handle, by='path', **kwargs)

Retrieve aggregated dataset from all members.

This is a convenience method. The stored data structure for each member is read from disk and aggregated. The aggregation scheme is dependent on the form of the data structures pulled from each member:

pandas DataFrames or Series
the structures are appended together, with a new level added to the index giving the member (see by) each set of rows came from
pandas Panel or Panel4D, numpy arrays, pickled python objects
the structures are returned as a dictionary, with keys giving the member (see by) and each value giving the corresponding data structure

This method tries to do smart things with the data it reads from each member. In particular:

  • members for which there is no data with the given handle are skipped
  • the lowest-common-denominator data structure is output; this means that if all data structures read are pandas DataFrames, then a multi-index DataFrame is returned; if some structures are pandas DataFrames, while some are anything else, a dictionary is returned
Arguments:
handle

name of data to retrieve

Keywords:
by

top-level index of output data structure; ‘path’ uses member path, ‘name’ uses member names, ‘uuid’ uses member uuids; if names are not unique, it is better to go with ‘path’ or ‘uuid’ [‘path’]

See datreant.data.limbs.Data.retrieve() for more information on keyword usage.

Keywords for pandas data structures:
 
where

conditions for what rows/columns to return

start

row number to start selection

stop

row number to stop selection

columns

list of columns to return; all columns returned by default

iterator

if True, return an iterator [False]

chunksize

number of rows to include in iteration; implies iterator=True

Returns:
data

aggregated data structure