Individual datasets¶
These are the API components of datreant.data
for storing and retrieving
datasets from individual Treants and Trees.
Data¶
The class datreant.data.limbs.Data
is the interface used by Treants to
access their stored datasets.
-
class
datreant.data.limbs.
Data
(tree)¶ Interface to stored data.
-
add
(handle, *args, **kwargs)¶ Store data in Treant.
A data instance can be a pandas object (Series, DataFrame, Panel), a numpy array, or a pickleable python object. If the dataset doesn’t exist, it is added. If a dataset already exists for the given handle, it is replaced.
Arguments: - handle
name given to data; needed for retrieval
- data
data structure to store
-
append
(handle, *args, **kwargs)¶ Append rows to an existing dataset.
The object must be of the same pandas class (Series, DataFrame, Panel) as the existing dataset, and it must have exactly the same columns (names included).
Arguments: - handle
name of data to append to
- data
data to append
-
keys
()¶ List available datasets.
Returns: - handles
list of handles to available datasets
-
remove
(handle, **kwargs)¶ Remove a dataset, or some subset of a dataset.
Note: in the case the whole dataset is removed, the directory containing the dataset file (
Data.h5
) will NOT be removed if it still contains file(s) after the removal of the dataset file.For pandas objects (Series, DataFrame, or Panel) subsets of the whole dataset can be removed using keywords such as start and stop for ranges of rows, and columns for selected columns.
Arguments: - handle
name of dataset to delete
Keywords: - where
conditions for what rows/columns to remove
- start
row number to start selection
- stop
row number to stop selection
- columns
columns to remove
-
retrieve
(handle, *args, **kwargs)¶ Retrieve stored data.
The stored data structure is read from disk and returned.
If dataset doesn’t exist,
None
is returned.For pandas objects (Series, DataFrame, or Panel) subsets of the whole dataset can be returned using keywords such as start and stop for ranges of rows, and columns for selected columns.
Also for pandas objects, the where keyword takes a string as input and can be used to filter out rows and columns without loading the full object into memory. For example, given a DataFrame with handle ‘mydata’ with columns (A, B, C, D), one could return all rows for columns A and C for which column D is greater than .3 with:
retrieve('mydata', where='columns=[A,C] & D > .3')
Or, if we wanted all rows with index = 3 (there could be more than one):
retrieve('mydata', where='index = 3')
See
pandas.HDFStore.select()
for more information.Arguments: - handle
name of data to retrieve
Keywords: - where
conditions for what rows/columns to return
- start
row number to start selection
- stop
row number to stop selection
- columns
list of columns to return; all columns returned by default
- iterator
if True, return an iterator [
False
]- chunksize
number of rows to include in iteration; implies
iterator=True
Returns: - data
stored data;
None
if nonexistent
-