Individual datasets¶

These are the API components of datreant.data for storing and retrieving datasets from individual Treants and Trees.

Data¶

The class datreant.data.limbs.Data is the interface used by Treants to access their stored datasets.

class datreant.data.limbs.Data(tree)¶

Interface to stored data.

add(handle, *args, **kwargs)¶

Store data in Treant.

A data instance can be a pandas object (Series, DataFrame, Panel), a numpy array, or a pickleable python object. If the dataset doesn’t exist, it is added. If a dataset already exists for the given handle, it is replaced.

Arguments:	handle name given to data; needed for retrieval data data structure to store

append(handle, *args, **kwargs)¶

Append rows to an existing dataset.

The object must be of the same pandas class (Series, DataFrame, Panel) as the existing dataset, and it must have exactly the same columns (names included).

Arguments:	handle name of data to append to data data to append

keys()¶

List available datasets.

Returns:	handles list of handles to available datasets

remove(handle, **kwargs)¶

Remove a dataset, or some subset of a dataset.

Note: in the case the whole dataset is removed, the directory containing the dataset file (Data.h5) will NOT be removed if it still contains file(s) after the removal of the dataset file.

For pandas objects (Series, DataFrame, or Panel) subsets of the whole dataset can be removed using keywords such as start and stop for ranges of rows, and columns for selected columns.

Arguments:	handle name of dataset to delete
Keywords:	where conditions for what rows/columns to remove start row number to start selection stop row number to stop selection columns columns to remove

retrieve(handle, *args, **kwargs)¶

Retrieve stored data.

The stored data structure is read from disk and returned.

If dataset doesn’t exist, None is returned.

For pandas objects (Series, DataFrame, or Panel) subsets of the whole dataset can be returned using keywords such as start and stop for ranges of rows, and columns for selected columns.

Also for pandas objects, the where keyword takes a string as input and can be used to filter out rows and columns without loading the full object into memory. For example, given a DataFrame with handle ‘mydata’ with columns (A, B, C, D), one could return all rows for columns A and C for which column D is greater than .3 with:

retrieve('mydata', where='columns=[A,C] & D > .3')

Or, if we wanted all rows with index = 3 (there could be more than one):

retrieve('mydata', where='index = 3')

See pandas.HDFStore.select() for more information.

Arguments:	handle name of data to retrieve
Keywords:	where conditions for what rows/columns to return start row number to start selection stop row number to stop selection columns list of columns to return; all columns returned by default iterator if True, return an iterator [`False`] chunksize number of rows to include in iteration; implies `iterator=True`
Returns:	data stored data; `None` if nonexistent