Aggregated data¶
These are the API components of datreant.data
for working with datasets
from multiple Treants at once, and treating them in aggregate.
AggData¶
The class datreant.data.agglimbs.AggData
is the interface used by
Bundles and Views to access their members’ datasets in aggregate.
-
class
datreant.data.agglimbs.
AggData
(collection)¶ Manipulators for collection data.
-
keys
(scope='all')¶ List available datasets.
Parameters: scope ({'all', 'any'}) – Keys to list. ‘all’ returns only handles that are present in all members. ‘any’ returns a list of all handles present in at least one member. Returns: handles – list of handles to available datasets Return type: list
-
retrieve
(handle, by='path', **kwargs)¶ Retrieve aggregated dataset from all members.
This is a convenience method. The stored data structure for each member is read from disk and aggregated. The aggregation scheme is dependent on the form of the data structures pulled from each member:
- pandas DataFrames or Series
- the structures are appended together, with a new level added to the index giving the member (see by) each set of rows came from
- pandas Panel or Panel4D, numpy arrays, pickled python objects
- the structures are returned as a dictionary, with keys giving the member (see by) and each value giving the corresponding data structure
This method tries to do smart things with the data it reads from each member. In particular:
- members for which there is no data with the given handle are skipped
- the lowest-common-denominator data structure is output; this means that if all data structures read are pandas DataFrames, then a multi-index DataFrame is returned; if some structures are pandas DataFrames, while some are anything else, a dictionary is returned
Arguments: - handle
name of data to retrieve
Keywords: - by
top-level index of output data structure; ‘path’ uses member path, ‘name’ uses member names, ‘uuid’ uses member uuids; if names are not unique, it is better to go with ‘path’ or ‘uuid’ [‘path’]
See
datreant.data.limbs.Data.retrieve()
for more information on keyword usage.Keywords for pandas data structures: - where
conditions for what rows/columns to return
- start
row number to start selection
- stop
row number to stop selection
- columns
list of columns to return; all columns returned by default
- iterator
if True, return an iterator [
False
]- chunksize
number of rows to include in iteration; implies
iterator=True
Returns: - data
aggregated data structure
-