Utility functions

mhealthdata.df2numpy.combine_arrays(*args, labels=None, mode='valid')[source]

Convert arrays of e.g. steps, bpm, sleep into the same length and combine in a dictionary

Parameters:

*args – Tuples of (data, date) e.g. as output by to_2darray()
labels (list or None, default None) – List of keyword labels for data
mode ({"valid", "full"}, default "valid") – If “valid” all arrys shrinked to min overlapping range, else expanded

Returns:

Dictionary with numpy arrays of the same length

Return type:

dict

mhealthdata.df2numpy.to_1darray(df, column, tstart, tend=None, tz=None, idate=None, x=None, uint8=False)[source]

Get value-per-day health data (“weight”, “rhr”, or “hrv”) as 1D array.

Parameters:

df (DataFrame) – DataFrames of health data records - “steps”, “bpm”, etc.
column (str) – Name of values column
tstart (list) – List of columns to seek for start date/time
tend (list or None, default None) – List of columns to seek for end date/time
tz (list or None, default None) – List of columns to seek for date/time time zone
idate (ndarray or None, default None) – 1D array of continuous range of ordinal days
x (ndarray or None, default None) – Initialized 1D array; if None, will be initialized with np.zeros()
uint8 (bool, default False) – Flag to cast all health data to np.uint8 to save disk space

Returns:

x (ndarray) – 1D array of values of size (N days).
idate (ndarray) – 1D array of record ordinal days.

mhealthdata.df2numpy.to_2darray(df, column, tstart, tend=None, tz=None, dt=None, idate=None, x=None, uint8=False, mode='rate')[source]

Get value-per-minute health data (“steps”, “sleep”, or “bpm”) as 2D array.

Parameters:

df (DataFrame) – DataFrames of health data records - “steps”, “bpm”, etc.
column (str) – Name of values column
tstart (list) – List of columns to seek for start date/time
tend (list or None, default None) – List of columns to seek for end date/time
tz (list or None, default None) – List of columns to seek for date/time time zone
dt (str, ndarray, or None, default None) – Column name or 1D array of record durations [seconds].
idate (ndarray or None, default None) – 1D array of continuous range of ordinal days
x (ndarray or None, default None) – Initialized 1D array; if None, will be initialized with np.zeros()
uint8 (bool, default False) – Flag to cast all health data to np.uint8 to save disk space
mode ({"rate", "count"}, default "rate") – Way to treat values of records longer than 1 minute: if “rate” - duplicate values, if “count” - split evenly between minutes

Returns:

x (ndarray) – 2D array of values of size (N days x 1440 minutes).
idate (ndarray) – 1D array of record ordinal days.

mhealthdata.utils.anomaly_detection(x, wlen=10, wtype='step', cutoff=None)[source]

Detects anomalies using running step or boxcar window.

Parameters:

x (ndarray) – 1D array of data points
wlen (int, default 10) – Window length
wtype ({"step", "box"}, default "box") – Window type
cutoff (float or None, default None) – Cuoff scale for std

Returns:

1D array of anomaly indices

Return type:

ndarray

mhealthdata.utils.calc_cadence(steps)[source]

Calculates walking and running cadence (steps/min).

Parameters:

steps (ndarray) – Array of equispaced time series data

Returns:

walk (float) – Walking cadence
run (float) – Running cadence

mhealthdata.utils.calc_covariance(x, window)[source]

Calculates covariance with running window.

Parameters:

x (ndarray) – 1D array of data points
window (ndarray) – 1D window array

Returns:

1D array of covariance

Return type:

ndarray

mhealthdata.utils.calc_interpolation(x)[source]

Linearly interpolates data array.

Parameters:: x (ndarray) – 1D array of data points
Returns:: 1D array of interpolated data
Return type:: ndarray

mhealthdata.utils.columns_to_datetime(df, tstart, tend=None, tz=None)[source]

Converts DataFrame date/time columns to pandas.Timestamp.

Parameters:

df (DataFrame) – DataFrame
columns (array_like) – List of columns to convert into datetime
tz_col (str, optional) – Time zone column

Returns:

DataFrame with selected columns converted to pandas.Timestamp

Return type:

DataFrame

mhealthdata.utils.defragment_sleep(sleep, tol=60)[source]

Defragments sleep intervals. Some samples have fragmented sleep records. This function imputes short gaps (defalut 60 min) to merge a series of fragmented sleep bouts into a countinous sleep interval.

Parameters:

sleep (ndarray) – Array of equispaced time series data (N days x 1440 min)
tol (int, default 60) – Max length of imputted intervals [minutes]

Returns:

Defragmented sleep of the same shape as the input sleep

Return type:

ndarray

mhealthdata.utils.fill_gaps(x, gap=None, fill=1)[source]

Fills zero gaps in 1d-array.

Parameters:

x (ndarray) – 1D array of non-negative numeric values
gap (int or None, default None) – Max gap duration (if None - fill all gaps)
fill (float, default 1) – Value to fill zeros

Returns:

1D array - arrray with all gaps <= gap filled

Return type:

ndarray

mhealthdata.utils.find_columns_by_key(df, keys)[source]

Finds all DataFrame column names containing any of the keys.

Parameters:

df (DataFrame) – DataFrame
keys (array_like) – List of keys of str type

Returns:

List of column names

Return type:

list

mhealthdata.utils.find_intervals(x, tol=0, nmin=None, sort=False)[source]

Finds continuous positive intervals in 1d-array.

Parameters:

x (ndarray) – 1D array of non-negative numeric values
tol (int, default 0) – Gap duration tolerance
nmin (int or None, default None) – Minimal length of intervals
sort (bool, default False) – If False - keep interval order by index in the array If True - sort descending by interval duration

Returns:

2D array - N intervals x 2 indices (start, end)

Return type:

ndarray

mhealthdata.utils.from_ordinal(date, fmt='%Y-%m-%d')[source]

Converts ordinal day(s) to date(s), where day 1 = Jan 1st, 1 AD.

Parameters:

date (array_like or int) – Day(s) of type int (ordinal)
fmt (str, default "%Y-%m-%d") – Output date format

Returns:

Date(s)

Return type:

ndarray or str

mhealthdata.utils.histogram_peaks(x, bins=100, smooth=False)[source]

Finds local peaks in histogram of values.

Parameters:

x (ndarray) – Array of data values
bins (int or ndarray, default 100) – Histogram bins or number of bins
smooth (bool, default False) – If True - apply Hann window averaging smooth

Returns:

idx (ndarray) – 1D array of peak coordinate(s)
score (ndarray) – 1D array of peak height(s)

mhealthdata.utils.impute_bpm(bpm, tol=15)[source]

Impputes short bpm gaps (defaukt 15 min) by linear interpolation. Some devices output bpm every 5 or 10 min. This function imputes short gaps to make them compatible with bpm output every 1 min.

Parameters:

bpm (ndarray) – Array of equispaced time series data (N days x 1440 min)
tol (int, default 15) – Max length of imputted intervals [minutes]

Returns:

Imputed bpm of the same shape as the input bpm

Return type:

ndarray

mhealthdata.utils.series_peaks(x, window, smooth=False)[source]

Finds local peaks in array of time series.

Notes

NaN, Inf values NOT allowed

Parameters:

x (ndarray) – Array of equispaced time series data
window (int) – Window size
smooth (bool, default False) – If True - apply Hann window averaging smooth

Returns:

idx (ndarray) – 1D array of peak coordinate(s)
score (ndarray) – 1D array of peak height(s)

mhealthdata.utils.sleep_stage_dict(mode='decode')[source]

Gets dictionary to encode/decode sleep stage.

Parameters:: mode ({"decode", "encode"}, default "decode") – If “decode”, return dict num -> str If “encode”, return dict str -> num
Returns:: Dictionary for decoding/encoding sleep stages
Return type:: dict

mhealthdata.utils.smoother(x, window, pad=nan, epsilon=1e-10, roll=False)[source]

Smooth data using running Hann window.

Parameters:

x (ndarray) – Array of equispaced time series data
window (int,) – Window size
pad (float, default np.nan) – Value to pad x if roll is False
epsilon (float, default 1e-10) – Cutoff to account values as non-zeros
roll (bool, default False) – If True, pad with rolled x, else pad with zeros

Returns:

Smoothed time series data

Return type:

ndarray

mhealthdata.utils.timezone_txt_to_minutes(tz)[source]

Converts timezone name to minutes relative to UTC

Parameters:: tz (str) – Timezone, e.g. “Europe/Madrid” or “UTC+0100”
Returns:: Minutes
Return type:: int

mhealthdata.utils.to_month_abbr(date)[source]

Converts date(s) to name(s) of month (Jan - Dec).

Parameters:: date (array_like or str or int) – Date(s)
Returns:: Month(s)
Return type:: ndarray or str

mhealthdata.utils.to_month_name(date)[source]

Converts date(s) to name(s) of month (January - December).

Parameters:: date (array_like or str or int) – Date(s)
Returns:: Month(s)
Return type:: ndarray or str

mhealthdata.utils.to_ordinal(date)[source]

Converts date(s) to ordinal day(s), where day 1 = Jan 1st, 1 AD.

Parameters:: date (array_like or str) – Date(s) of type str
Returns:: Ordinal day(s)
Return type:: ndarray or int

mhealthdata.utils.to_ordinal_day(date)[source]

Converts date(s) to ordinal day(s), where day 1 = Jan 1st, 1 AD.

Parameters:: date (array_like or str) – Date(s) of type str
Returns:: Ordinal day(s)
Return type:: ndarray or int

mhealthdata.utils.to_ordinal_month(date)[source]

Converts date(s) to ordinal month(s), where month 1 = Jan 1st-31st, 1 AD.

Parameters:: date (array_like or str) – Date(s) of type str
Returns:: Ordinal month(s)
Return type:: ndarray or int

mhealthdata.utils.to_ordinal_week(date)[source]

Converts date(s) to ordinal week(s), where week 1 = Jan 1st-7th, 1 AD.

Parameters:: date (array_like or str) – Date(s) of type str
Returns:: Ordinal week(s)
Return type:: ndarray or int

mhealthdata.utils.to_ordinal_year(date)[source]

Converts date(s) to ordinal year(s).

Parameters:: date (array_like or str) – Date(s) of type str
Returns:: Ordinal year(s)
Return type:: ndarray or int

mhealthdata.utils.to_range(date, pad_to_full_week=True)[source]

Encloses date(s) into a continuos range of ordinal dates.

Parameters:

date (array_like) – List of dates of type int (ordinal) or str
pad_to_full_week (bool, default True) – If True - pad range to full weeks so that its length % 7 == 0

Returns:

Continuous range of ordinal days

Return type:

ndarray

mhealthdata.utils.to_weekdayiso(date)[source]

Converts date(s) to int day of the week (1 - Monday, 7 - Sunday).

Parameters:: date (array_like or str or int) – Date(s)
Returns:: Day(s) of the week
Return type:: ndarray or int

mhealthdata.utils.to_weekdayiso_abbr(date)[source]

Converts date(s) to name(s) of day of the week (Mon - Sun).

Parameters:: date (array_like or str or int) – Date(s)
Returns:: Day(s) of the week
Return type:: ndarray or str

mhealthdata.utils.to_weekdayiso_name(date)[source]

Converts date(s) to name(s) of day of the week (Monday - Sunday).

Parameters:: date (array_like or str or int) – Date(s)
Returns:: Day(s) of the week
Return type:: ndarray or str

mhealthdata.utils.to_year_month_day(date)[source]

Converts date(s) to (arrays of) year, month, day.

Parameters:

date (array_like or str) – Date(s) of type str

Returns:

year (ndarray or int) – Year(s)
month (ndarray or int) – Month(s) of the year
day (ndarray or int) – Day(s) of the month

mhealthdata.utils.unique_sorted(x, return_dict=False)[source]

Sorts unique values in descending order.

Parameters:

x (ndarray) – 1D array of numeric values
return_dict (bool, default False) – If False - return value, count arrays If True - return dict value -> count

Returns:

value (ndarray, optional) – Unique values array (if return_dict is False)
count (ndarray, optional) – Unique values counts (if return_dict is False)
dict (dict, optional) – Dictionary unique value -> count (if return_dict is True)

mhealthdata.utils.window_avg_std(t, x, window=14, smooth=0)[source]

Calculates running window average and std.

Parameters:

t (ndarray) – 1D array of time indices of values
x (ndarray) – 1D array of values
window (int, default 14) – Window size
smooth (int, default 0) – Window size to smooth average and std

Returns:

t_avg (ndarray) – 1D array of time indices of avg / std values
x_avg (ndarray) – 1D array of running window average values
x_std (ndarray) – 1D array of running window std values

mhealthdata.utils.window_boxcar(n, m=None)[source]

Generates boxcar window (-1 to 1) of length n and width m.

Parameters:

n (int) – Window length
m (int or None, default None) – Window width, if None, m = int(max(1, 0.2 * n))

Returns:

1D array of boxcar window

Return type:

ndarray

mhealthdata.utils.window_sigmoid(n, m=None)[source]

Generates sigmoid window (-1 to 1) of length n and width m.

Parameters:

n (int) – Window length
m (int or None, default None) – Window width, if None, m = int(max(1, 0.2 * n))

Returns:

1D array of sigmoid window

Return type:

ndarray

mhealthdata.utils.xticks_dates(idate, mode='day', ax=None, **kwargs)[source]

Matplotlib xticks as date(s).

Parameters:

idate (array_like) – Array or list of dates of type int (day 1 = Jan 1st, 1 AD)
mode ({"day", "week", "fortnight", "month"}, default "day") – Date spacing
ax (matplotlib.pyplot.Axes object, default None) – Axes for plotting
**kwargs – Keyword arguments

Returns:

ax – Axes for plotting

Return type:

matplotlib.pyplot.Axes object

mhealthdata.utils.xticks_days(x, ax=None, **kwargs)[source]

Matplotlib xticks as days, assuming xlim is (0, 1440 x N days) [min].

Parameters:

x (array_like) – Data, to infer number of days
ax (matplotlib.pyplot.Axes object, default None) – Axes for plotting
**kwargs – Keyword arguments

Returns:

ax – Axes for plotting

Return type:

matplotlib.pyplot.Axes object

mhealthdata.utils.xticks_hours(dt=1, mode='24H', ax=None, **kwargs)[source]

Matplotlib xticks as hours, assuming xlim is (0,1440) [min/day].

Parameters:

dt (int, default 1) – stride, hours
mode ({"24H", "12H"}, default "24H") – Time format
ax (matplotlib.pyplot.Axes object, default None) – Axes for plotting
**kwargs – Keyword arguments

Returns:

ax – Axes for plotting

Return type:

matplotlib.pyplot.Axes object

mhealthdata.timezone.find_timezone_mismatch(data, nday=3, dt=60)[source]

For Fitbit only: Identify timezone by mismatch of sleep, steps, and bpm - Assume “sleep” is in local time - Automatically detect day-wise “steps” & “bpm” offsets to match “sleep”

Parameters:

data (dict) – Dictionary of data, should have keys “sleep”, “steps”, “bpm”, each containing array of size N days x 1440 minutes
nday (int, default 3) – Window to find best match timezone offset, [days]
dt (int, default 60) – Stride to find best match timezone offset, [minutes]

Returns:

1D array of length N days with timezone offset [minutes]

Return type:

ndarray

mhealthdata.timezone.fix_timezone_mismatch(data, tz=None)[source]

Fix timezone offset for data imported from Fitbit - Assume “sleep” is in local time - Automatically detect day-wise “steps” and “bpm” offsets to match “sleep”

Parameters:

data (dict) – Dictionary of data, should have keys “sleep”, “steps”, “bpm” Each key should point to an array of size N days x 1440 minutes
tz (ndarray or None, default None) – 1D array of length N days with timezone offset [minutes] If None, timezone will be detected automatically

Returns:

Dictionary of data with “steps” and “bpm” rolled to match “sleep”

Return type:

dict