Basic DataLoader class
- class mhealthdata.dataloader.DataLoader(path)[source]
This is the class from which all loaders inherit.
DataLoader subclasses make numpy arrays from basic health sensor data, all at local time zone loaded from different mobile health apps.
Basic health data are: - per minute values of “steps”, “sleep”, and “bpm” - per day values of “weight”, “rhr”, and “hrv” - per user values of “dob”, “sex”, and “height”
Data can be accessed: - as pandas DataFrame in self.df dict attribute - as numpy arrays using get_device_data() or save_device_npz() methods
Methods are device-centric and can be used to get data for any specific device connected to your health app aggregater, e.g. iPhone, or Apple Watch, or Fitbit wristband
Notes
Other data can be found in data export (e.g. VO2Max), but not processed by DataLoader subclasses. In case those data needed, the self.category dict attribute should be modified.
- Parameters:
path (str) – Path to unzipped local folder containing health app data
- df
Dictionary of pandas DataFrames of loaded health data for “steps”, “bpm”, etc.
- Type:
dict
- categories
Dictionary of health data categories. Keys used to find files. Attributes: “name” - to rename, “column” - to seek value column in corresponding DataFrame.
- Type:
dict
- userdata
Dictionary of “Date-of-birth”, “Biological sex”, and “Height”. Other data like country of residence, or phone number, etc. are ignored.
- Type:
dict
- start_keys
List of keywords to seek for start timestamp column in a DataFrame.
- Type:
list
- end_keys
List of keywords to seek for end timestamp column in a DataFrame.
- Type:
list
- tz_keys
List of keywords to seek for timezone in a DataFrame (NOT to be applied timestamps).
- Type:
list
- tz_offset
List of keywords to seek for timezone in a DataFrame (to be applied timestamps).
- Type:
list
- dev_col
List of keywords to seek for device id column in a DataFrame.
- Type:
list
- path
Path to unzipped local folder containing health app data.
- Type:
list
- property all_categories
Get list of all data categories found (not all loaded).
- Returns:
List of all data categories found in provided path.
- Return type:
list
- property dataframes
Get list of loaded DataFrames.
- Returns:
List of loaded DataFrames.
- Return type:
list
- property devices
Get list of loaded devices.
- Returns:
List of loaded devices.
- Return type:
list
- property devices_dict
Get dictionary of loaded devices identifiers.
- Returns:
Dictionary of loaded devices identifiers.
- Return type:
dict
- get_device_data(device='all', idate=None, uint8=True)[source]
Get dictionary of per day and per minute ndarrays. The method is device-centric and can output data for specified device.
- Parameters:
device (str, default "all") – Device name (sould match any one of self.devices).
date_range (ndarray or None, default None) – 1D array of continuous range of ordinal days. If None, automatically get from on min and max dates of “steps”.
uint8 (bool, default True) – Flag to cast all health data to np.uint8 to save disk space.
- Returns:
Dictionary of ountput ndarrays.
- Return type:
dict
- save_device_npz(output_file, device='all', idate=None, uint8=True)[source]
Save dictionary of per day and per minute ndarrays to npz. The method is device-centric and can output data for specified device.
- Parameters:
output_file (str) – Path to output .npz file.
device (str, default "all") – Device name (sould match any one of self.devices).
date_range (ndarray or None, default None) – 1D array of continuous range of ordinal days. If None, automatically get from on min and max dates of “steps”.
uint8 (bool, default True) – Flag to cast all health data to np.uint8 to save disk space.
- Returns:
True if data ndarrays length not zero, else False.
- Return type:
bool
Load Fitbit data
- class mhealthdata.dataloader.FitbitLoader(path)[source]
Notes
One may note that Fitbit exported
sleepin local timestepsandbpmtimestamps in UTC
FitbitLoader
Attempts to infer time zone offset from data mismatch
Converts all timestamps to local time, see
self._fix_timezone()
Example
Assume we have data export
MyFitbitData.zipdownloaded to folder/Users/username/Downloads/wearable_data/and unzipped into a subfolder/Users/username/Downloads/wearable_data/User/.>>> import mhealthdata >>> path = '/Users/username/Downloads/wearable_data/User/' >>> wdata = mhealthdata.FitbitLoader(path)
- property all_categories
Get list of all data categories found (not all loaded).
- Returns:
List of all data categories found in provided path.
- Return type:
list
- get_device_data(device='all', idate=None, trunc=True)[source]
Get dictionary of per day and per minute ndarrays. The method is device-centric and can output data for specified device.
- Parameters:
device (str, default "all") – Device name (sould match any one of self.devices).
date_range (ndarray or None, default None) – 1D array of continuous range of ordinal days. If None, automatically get from on min and max dates of “steps”.
uint8 (bool, default True) – Flag to cast all health data to np.uint8 to save disk space.
- Returns:
Dictionary of ountput ndarrays.
- Return type:
dict
- load_data()[source]
Load data from .csv and .json files. Cycling over category from self.categories attribute. Path to seek files is taken from self.path attribute.
- Returns:
List of loiaded DataFrames.
- Return type:
list
Load Samsung Health data
- class mhealthdata.dataloader.ShealthLoader(path)[source]
Notes
Samsung Health exports:
.json(stepbinning data) timestamps in local time.csv(sleep,bpm,weight) in UTC with additional time zone columntime_offset
ShealthLoader
Converts all timestamps to local time, see
utils.columnscolumns_to_datetime()
Example
Assume we have data export downloaded to folder
/Users/username/Downloads/wearable_data/Samsung Health/which contains a subfoldersamsunghealth_<username>_<date-time>.>>> import mhealthdata >>> path = '/Users/username/Downloads/wearable_data/Samsung Health/' >>> wdata = mhealthdata.ShealthLoader(path)
- property all_categories
Get list of all data categories found (not all loaded).
- Returns:
List of all data categories found in provided path.
- Return type:
list
- property devices_dict
Get dictionary of loaded devices identifiers.
- Returns:
Dictionary of loaded devices identifiers.
- Return type:
dict
- load_csv(category)[source]
Load data from .csv file. Path to seek files is taken from self.path attribute.
- Parameters:
category (str) – Key used to find health data files.
- Returns:
DataFrame of raw data loaded from .csv.
- Return type:
DataFrame
- load_data()[source]
Load data from .csv and .json files. Cycling over category from self.categories attribute. Path to seek files is taken from self.path attribute.
- Returns:
List of loiaded DataFrames.
- Return type:
list
- load_jsons(category, idx='binning_data')[source]
Load data from .json files. Path to seek files is taken from self.path attribute.
- Parameters:
category (str) – Key used to find health data files.
idx (str, default "binning_data") – Column name containing binning data file names.
- Returns:
DataFrame of raw data loaded from .csv.
- Return type:
DataFrame
Load Apple Healthkit data
- class mhealthdata.dataloader.HealthkitLoader(path)[source]
Example
Assume path contains unzipped data
export.xmlorexportación.xml.>>> import mhealthdata >>> path = '/Users/username/Downloads/wearable_data/apple_health_export/' >>> wdata = mhealthdata.HealthkitLoader(path)
- property all_categories
Get list of all data categories found (not all loaded).
- Returns:
List of all data categories found in provided path.
- Return type:
list
- property devices_dict
Get dictionary of loaded devices identifiers.
- Returns:
Dictionary of loaded devices identifiers.
- Return type:
dict