Individual datasets

These are the API components of datreant.data for storing and retrieving datasets from individual Treants and Trees.

Data

The class datreant.data.limbs.Data is the interface used by Treants to access their stored datasets.

class datreant.data.limbs.Data(tree)

Interface to stored data.

add(handle, *args, **kwargs)

Store data in Treant.

A data instance can be a pandas object (Series, DataFrame, Panel), a numpy array, or a pickleable python object. If the dataset doesn’t exist, it is added. If a dataset already exists for the given handle, it is replaced.

Arguments:
handle

name given to data; needed for retrieval

data

data structure to store

append(handle, *args, **kwargs)

Append rows to an existing dataset.

The object must be of the same pandas class (Series, DataFrame, Panel) as the existing dataset, and it must have exactly the same columns (names included).

Arguments:
handle

name of data to append to

data

data to append

keys()

List available datasets.

Returns:
handles

list of handles to available datasets

remove(handle, **kwargs)

Remove a dataset, or some subset of a dataset.

Note: in the case the whole dataset is removed, the directory containing the dataset file (Data.h5) will NOT be removed if it still contains file(s) after the removal of the dataset file.

For pandas objects (Series, DataFrame, or Panel) subsets of the whole dataset can be removed using keywords such as start and stop for ranges of rows, and columns for selected columns.

Arguments:
handle

name of dataset to delete

Keywords:
where

conditions for what rows/columns to remove

start

row number to start selection

stop

row number to stop selection

columns

columns to remove

retrieve(handle, *args, **kwargs)

Retrieve stored data.

The stored data structure is read from disk and returned.

If dataset doesn’t exist, None is returned.

For pandas objects (Series, DataFrame, or Panel) subsets of the whole dataset can be returned using keywords such as start and stop for ranges of rows, and columns for selected columns.

Also for pandas objects, the where keyword takes a string as input and can be used to filter out rows and columns without loading the full object into memory. For example, given a DataFrame with handle ‘mydata’ with columns (A, B, C, D), one could return all rows for columns A and C for which column D is greater than .3 with:

retrieve('mydata', where='columns=[A,C] & D > .3')

Or, if we wanted all rows with index = 3 (there could be more than one):

retrieve('mydata', where='index = 3')

See pandas.HDFStore.select() for more information.

Arguments:
handle

name of data to retrieve

Keywords:
where

conditions for what rows/columns to return

start

row number to start selection

stop

row number to stop selection

columns

list of columns to return; all columns returned by default

iterator

if True, return an iterator [False]

chunksize

number of rows to include in iteration; implies iterator=True

Returns:
data

stored data; None if nonexistent