Basic DataLoader class

class mhealthdata.dataloader.DataLoader(path)[source]

This is the class from which all loaders inherit.

DataLoader subclasses make numpy arrays from basic health sensor data, all at local time zone loaded from different mobile health apps.

Basic health data are: - per minute values of “steps”, “sleep”, and “bpm” - per day values of “weight”, “rhr”, and “hrv” - per user values of “dob”, “sex”, and “height”

Data can be accessed: - as pandas DataFrame in self.df dict attribute - as numpy arrays using get_device_data() or save_device_npz() methods

Methods are device-centric and can be used to get data for any specific device connected to your health app aggregater, e.g. iPhone, or Apple Watch, or Fitbit wristband

Notes

Other data can be found in data export (e.g. VO2Max), but not processed by DataLoader subclasses. In case those data needed, the self.category dict attribute should be modified.

Parameters:

path (str) – Path to unzipped local folder containing health app data

df

Dictionary of pandas DataFrames of loaded health data for “steps”, “bpm”, etc.

Type:

dict

categories

Dictionary of health data categories. Keys used to find files. Attributes: “name” - to rename, “column” - to seek value column in corresponding DataFrame.

Type:

dict

userdata

Dictionary of “Date-of-birth”, “Biological sex”, and “Height”. Other data like country of residence, or phone number, etc. are ignored.

Type:

dict

start_keys

List of keywords to seek for start timestamp column in a DataFrame.

Type:

list

end_keys

List of keywords to seek for end timestamp column in a DataFrame.

Type:

list

tz_keys

List of keywords to seek for timezone in a DataFrame (NOT to be applied timestamps).

Type:

list

tz_offset

List of keywords to seek for timezone in a DataFrame (to be applied timestamps).

Type:

list

dev_col

List of keywords to seek for device id column in a DataFrame.

Type:

list

path

Path to unzipped local folder containing health app data.

Type:

list

property all_categories

Get list of all data categories found (not all loaded).

Returns:

List of all data categories found in provided path.

Return type:

list

property dataframes

Get list of loaded DataFrames.

Returns:

List of loaded DataFrames.

Return type:

list

property devices

Get list of loaded devices.

Returns:

List of loaded devices.

Return type:

list

property devices_dict

Get dictionary of loaded devices identifiers.

Returns:

Dictionary of loaded devices identifiers.

Return type:

dict

get_device_data(device='all', idate=None, uint8=True)[source]

Get dictionary of per day and per minute ndarrays. The method is device-centric and can output data for specified device.

Parameters:
  • device (str, default "all") – Device name (sould match any one of self.devices).

  • date_range (ndarray or None, default None) – 1D array of continuous range of ordinal days. If None, automatically get from on min and max dates of “steps”.

  • uint8 (bool, default True) – Flag to cast all health data to np.uint8 to save disk space.

Returns:

Dictionary of ountput ndarrays.

Return type:

dict

save_device_npz(output_file, device='all', idate=None, uint8=True)[source]

Save dictionary of per day and per minute ndarrays to npz. The method is device-centric and can output data for specified device.

Parameters:
  • output_file (str) – Path to output .npz file.

  • device (str, default "all") – Device name (sould match any one of self.devices).

  • date_range (ndarray or None, default None) – 1D array of continuous range of ordinal days. If None, automatically get from on min and max dates of “steps”.

  • uint8 (bool, default True) – Flag to cast all health data to np.uint8 to save disk space.

Returns:

True if data ndarrays length not zero, else False.

Return type:

bool

Load Fitbit data

class mhealthdata.dataloader.FitbitLoader(path)[source]

Notes

One may note that Fitbit exported

  • sleep in local time

  • steps and bpm timestamps in UTC

FitbitLoader

  • Attempts to infer time zone offset from data mismatch

  • Converts all timestamps to local time, see self._fix_timezone()

Example

Assume we have data export MyFitbitData.zip downloaded to folder /Users/username/Downloads/wearable_data/ and unzipped into a subfolder /Users/username/Downloads/wearable_data/User/.

>>> import mhealthdata
>>> path = '/Users/username/Downloads/wearable_data/User/'
>>> wdata = mhealthdata.FitbitLoader(path)
property all_categories

Get list of all data categories found (not all loaded).

Returns:

List of all data categories found in provided path.

Return type:

list

get_device_data(device='all', idate=None, trunc=True)[source]

Get dictionary of per day and per minute ndarrays. The method is device-centric and can output data for specified device.

Parameters:
  • device (str, default "all") – Device name (sould match any one of self.devices).

  • date_range (ndarray or None, default None) – 1D array of continuous range of ordinal days. If None, automatically get from on min and max dates of “steps”.

  • uint8 (bool, default True) – Flag to cast all health data to np.uint8 to save disk space.

Returns:

Dictionary of ountput ndarrays.

Return type:

dict

load_data()[source]

Load data from .csv and .json files. Cycling over category from self.categories attribute. Path to seek files is taken from self.path attribute.

Returns:

List of loiaded DataFrames.

Return type:

list

load_nonsleep(category)[source]

Load non-sleep data from .json files. Path to seek files is taken from self.path attribute.

Parameters:

category (str) – Key used to find health data files.

Returns:

DataFrame of raw data loaded from .csv.

Return type:

DataFrame

load_sleep()[source]

Load sleep data from .json files. Path to seek files is taken from self.path attribute.

Returns:

DataFrame of raw data loaded from .csv.

Return type:

DataFrame

Load Samsung Health data

class mhealthdata.dataloader.ShealthLoader(path)[source]

Notes

Samsung Health exports:

  • .json (step binning data) timestamps in local time

  • .csv (sleep, bpm, weight) in UTC with additional time zone column time_offset

ShealthLoader

  • Converts all timestamps to local time, see utils.columnscolumns_to_datetime()

Example

Assume we have data export downloaded to folder /Users/username/Downloads/wearable_data/Samsung Health/ which contains a subfolder samsunghealth_<username>_<date-time>.

>>> import mhealthdata
>>> path = '/Users/username/Downloads/wearable_data/Samsung Health/'
>>> wdata = mhealthdata.ShealthLoader(path)
property all_categories

Get list of all data categories found (not all loaded).

Returns:

List of all data categories found in provided path.

Return type:

list

property devices_dict

Get dictionary of loaded devices identifiers.

Returns:

Dictionary of loaded devices identifiers.

Return type:

dict

load_csv(category)[source]

Load data from .csv file. Path to seek files is taken from self.path attribute.

Parameters:

category (str) – Key used to find health data files.

Returns:

DataFrame of raw data loaded from .csv.

Return type:

DataFrame

load_data()[source]

Load data from .csv and .json files. Cycling over category from self.categories attribute. Path to seek files is taken from self.path attribute.

Returns:

List of loiaded DataFrames.

Return type:

list

load_jsons(category, idx='binning_data')[source]

Load data from .json files. Path to seek files is taken from self.path attribute.

Parameters:
  • category (str) – Key used to find health data files.

  • idx (str, default "binning_data") – Column name containing binning data file names.

Returns:

DataFrame of raw data loaded from .csv.

Return type:

DataFrame

Load Apple Healthkit data

class mhealthdata.dataloader.HealthkitLoader(path)[source]

Example

Assume path contains unzipped data export.xml or exportación.xml.

>>> import mhealthdata
>>> path = '/Users/username/Downloads/wearable_data/apple_health_export/'
>>> wdata = mhealthdata.HealthkitLoader(path)
property all_categories

Get list of all data categories found (not all loaded).

Returns:

List of all data categories found in provided path.

Return type:

list

property devices_dict

Get dictionary of loaded devices identifiers.

Returns:

Dictionary of loaded devices identifiers.

Return type:

dict

load_data()[source]

Load data from .xml file. Cycling over category from self.categories attribute. Path to seek files is taken from self.path attribute.

Returns:

List of loiaded DataFrames.

Return type:

list