Utility functions
- mhealthdata.df2numpy.combine_arrays(*args, labels=None, mode='valid')[source]
Convert arrays of e.g. steps, bpm, sleep into the same length and combine in a dictionary
- Parameters:
*args – Tuples of (data, date) e.g. as output by to_2darray()
labels (list or None, default None) – List of keyword labels for data
mode ({"valid", "full"}, default "valid") – If “valid” all arrys shrinked to min overlapping range, else expanded
- Returns:
Dictionary with numpy arrays of the same length
- Return type:
dict
- mhealthdata.df2numpy.to_1darray(df, column, tstart, tend=None, tz=None, idate=None, x=None, uint8=False)[source]
Get value-per-day health data (“weight”, “rhr”, or “hrv”) as 1D array.
- Parameters:
df (DataFrame) – DataFrames of health data records - “steps”, “bpm”, etc.
column (str) – Name of values column
tstart (list) – List of columns to seek for start date/time
tend (list or None, default None) – List of columns to seek for end date/time
tz (list or None, default None) – List of columns to seek for date/time time zone
idate (ndarray or None, default None) – 1D array of continuous range of ordinal days
x (ndarray or None, default None) – Initialized 1D array; if None, will be initialized with np.zeros()
uint8 (bool, default False) – Flag to cast all health data to np.uint8 to save disk space
- Returns:
x (ndarray) – 1D array of values of size (N days).
idate (ndarray) – 1D array of record ordinal days.
- mhealthdata.df2numpy.to_2darray(df, column, tstart, tend=None, tz=None, dt=None, idate=None, x=None, uint8=False, mode='rate')[source]
Get value-per-minute health data (“steps”, “sleep”, or “bpm”) as 2D array.
- Parameters:
df (DataFrame) – DataFrames of health data records - “steps”, “bpm”, etc.
column (str) – Name of values column
tstart (list) – List of columns to seek for start date/time
tend (list or None, default None) – List of columns to seek for end date/time
tz (list or None, default None) – List of columns to seek for date/time time zone
dt (str, ndarray, or None, default None) – Column name or 1D array of record durations [seconds].
idate (ndarray or None, default None) – 1D array of continuous range of ordinal days
x (ndarray or None, default None) – Initialized 1D array; if None, will be initialized with np.zeros()
uint8 (bool, default False) – Flag to cast all health data to np.uint8 to save disk space
mode ({"rate", "count"}, default "rate") – Way to treat values of records longer than 1 minute: if “rate” - duplicate values, if “count” - split evenly between minutes
- Returns:
x (ndarray) – 2D array of values of size (N days x 1440 minutes).
idate (ndarray) – 1D array of record ordinal days.
- mhealthdata.utils.anomaly_detection(x, wlen=10, wtype='step', cutoff=None)[source]
Detects anomalies using running step or boxcar window.
- Parameters:
x (ndarray) – 1D array of data points
wlen (int, default 10) – Window length
wtype ({"step", "box"}, default "box") – Window type
cutoff (float or None, default None) – Cuoff scale for std
- Returns:
1D array of anomaly indices
- Return type:
ndarray
- mhealthdata.utils.calc_cadence(steps)[source]
Calculates walking and running cadence (steps/min).
- Parameters:
steps (ndarray) – Array of equispaced time series data
- Returns:
walk (float) – Walking cadence
run (float) – Running cadence
- mhealthdata.utils.calc_covariance(x, window)[source]
Calculates covariance with running window.
- Parameters:
x (ndarray) – 1D array of data points
window (ndarray) – 1D window array
- Returns:
1D array of covariance
- Return type:
ndarray
- mhealthdata.utils.calc_interpolation(x)[source]
Linearly interpolates data array.
- Parameters:
x (ndarray) – 1D array of data points
- Returns:
1D array of interpolated data
- Return type:
ndarray
- mhealthdata.utils.columns_to_datetime(df, tstart, tend=None, tz=None)[source]
Converts DataFrame date/time columns to pandas.Timestamp.
- Parameters:
df (DataFrame) – DataFrame
columns (array_like) – List of columns to convert into datetime
tz_col (str, optional) – Time zone column
- Returns:
DataFrame with selected columns converted to pandas.Timestamp
- Return type:
DataFrame
- mhealthdata.utils.defragment_sleep(sleep, tol=60)[source]
Defragments sleep intervals. Some samples have fragmented sleep records. This function imputes short gaps (defalut 60 min) to merge a series of fragmented sleep bouts into a countinous sleep interval.
- Parameters:
sleep (ndarray) – Array of equispaced time series data (N days x 1440 min)
tol (int, default 60) – Max length of imputted intervals [minutes]
- Returns:
Defragmented sleep of the same shape as the input sleep
- Return type:
ndarray
- mhealthdata.utils.fill_gaps(x, gap=None, fill=1)[source]
Fills zero gaps in 1d-array.
- Parameters:
x (ndarray) – 1D array of non-negative numeric values
gap (int or None, default None) – Max gap duration (if None - fill all gaps)
fill (float, default 1) – Value to fill zeros
- Returns:
1D array - arrray with all gaps <= gap filled
- Return type:
ndarray
- mhealthdata.utils.find_columns_by_key(df, keys)[source]
Finds all DataFrame column names containing any of the keys.
- Parameters:
df (DataFrame) – DataFrame
keys (array_like) – List of keys of str type
- Returns:
List of column names
- Return type:
list
- mhealthdata.utils.find_intervals(x, tol=0, nmin=None, sort=False)[source]
Finds continuous positive intervals in 1d-array.
- Parameters:
x (ndarray) – 1D array of non-negative numeric values
tol (int, default 0) – Gap duration tolerance
nmin (int or None, default None) – Minimal length of intervals
sort (bool, default False) – If False - keep interval order by index in the array If True - sort descending by interval duration
- Returns:
2D array - N intervals x 2 indices (start, end)
- Return type:
ndarray
- mhealthdata.utils.from_ordinal(date, fmt='%Y-%m-%d')[source]
Converts ordinal day(s) to date(s), where day 1 = Jan 1st, 1 AD.
- Parameters:
date (array_like or int) – Day(s) of type int (ordinal)
fmt (str, default "%Y-%m-%d") – Output date format
- Returns:
Date(s)
- Return type:
ndarray or str
- mhealthdata.utils.histogram_peaks(x, bins=100, smooth=False)[source]
Finds local peaks in histogram of values.
- Parameters:
x (ndarray) – Array of data values
bins (int or ndarray, default 100) – Histogram bins or number of bins
smooth (bool, default False) – If True - apply Hann window averaging smooth
- Returns:
idx (ndarray) – 1D array of peak coordinate(s)
score (ndarray) – 1D array of peak height(s)
- mhealthdata.utils.impute_bpm(bpm, tol=15)[source]
Impputes short bpm gaps (defaukt 15 min) by linear interpolation. Some devices output bpm every 5 or 10 min. This function imputes short gaps to make them compatible with bpm output every 1 min.
- Parameters:
bpm (ndarray) – Array of equispaced time series data (N days x 1440 min)
tol (int, default 15) – Max length of imputted intervals [minutes]
- Returns:
Imputed bpm of the same shape as the input bpm
- Return type:
ndarray
- mhealthdata.utils.series_peaks(x, window, smooth=False)[source]
Finds local peaks in array of time series.
Notes
NaN, Inf values NOT allowed
- Parameters:
x (ndarray) – Array of equispaced time series data
window (int) – Window size
smooth (bool, default False) – If True - apply Hann window averaging smooth
- Returns:
idx (ndarray) – 1D array of peak coordinate(s)
score (ndarray) – 1D array of peak height(s)
- mhealthdata.utils.sleep_stage_dict(mode='decode')[source]
Gets dictionary to encode/decode sleep stage.
- Parameters:
mode ({"decode", "encode"}, default "decode") – If “decode”, return dict num -> str If “encode”, return dict str -> num
- Returns:
Dictionary for decoding/encoding sleep stages
- Return type:
dict
- mhealthdata.utils.smoother(x, window, pad=nan, epsilon=1e-10, roll=False)[source]
Smooth data using running Hann window.
- Parameters:
x (ndarray) – Array of equispaced time series data
window (int,) – Window size
pad (float, default np.nan) – Value to pad x if roll is False
epsilon (float, default 1e-10) – Cutoff to account values as non-zeros
roll (bool, default False) – If True, pad with rolled x, else pad with zeros
- Returns:
Smoothed time series data
- Return type:
ndarray
- mhealthdata.utils.timezone_txt_to_minutes(tz)[source]
Converts timezone name to minutes relative to UTC
- Parameters:
tz (str) – Timezone, e.g. “Europe/Madrid” or “UTC+0100”
- Returns:
Minutes
- Return type:
int
- mhealthdata.utils.to_month_abbr(date)[source]
Converts date(s) to name(s) of month (Jan - Dec).
- Parameters:
date (array_like or str or int) – Date(s)
- Returns:
Month(s)
- Return type:
ndarray or str
- mhealthdata.utils.to_month_name(date)[source]
Converts date(s) to name(s) of month (January - December).
- Parameters:
date (array_like or str or int) – Date(s)
- Returns:
Month(s)
- Return type:
ndarray or str
- mhealthdata.utils.to_ordinal(date)[source]
Converts date(s) to ordinal day(s), where day 1 = Jan 1st, 1 AD.
- Parameters:
date (array_like or str) – Date(s) of type str
- Returns:
Ordinal day(s)
- Return type:
ndarray or int
- mhealthdata.utils.to_ordinal_day(date)[source]
Converts date(s) to ordinal day(s), where day 1 = Jan 1st, 1 AD.
- Parameters:
date (array_like or str) – Date(s) of type str
- Returns:
Ordinal day(s)
- Return type:
ndarray or int
- mhealthdata.utils.to_ordinal_month(date)[source]
Converts date(s) to ordinal month(s), where month 1 = Jan 1st-31st, 1 AD.
- Parameters:
date (array_like or str) – Date(s) of type str
- Returns:
Ordinal month(s)
- Return type:
ndarray or int
- mhealthdata.utils.to_ordinal_week(date)[source]
Converts date(s) to ordinal week(s), where week 1 = Jan 1st-7th, 1 AD.
- Parameters:
date (array_like or str) – Date(s) of type str
- Returns:
Ordinal week(s)
- Return type:
ndarray or int
- mhealthdata.utils.to_ordinal_year(date)[source]
Converts date(s) to ordinal year(s).
- Parameters:
date (array_like or str) – Date(s) of type str
- Returns:
Ordinal year(s)
- Return type:
ndarray or int
- mhealthdata.utils.to_range(date, pad_to_full_week=True)[source]
Encloses date(s) into a continuos range of ordinal dates.
- Parameters:
date (array_like) – List of dates of type int (ordinal) or str
pad_to_full_week (bool, default True) – If True - pad range to full weeks so that its length % 7 == 0
- Returns:
Continuous range of ordinal days
- Return type:
ndarray
- mhealthdata.utils.to_weekdayiso(date)[source]
Converts date(s) to int day of the week (1 - Monday, 7 - Sunday).
- Parameters:
date (array_like or str or int) – Date(s)
- Returns:
Day(s) of the week
- Return type:
ndarray or int
- mhealthdata.utils.to_weekdayiso_abbr(date)[source]
Converts date(s) to name(s) of day of the week (Mon - Sun).
- Parameters:
date (array_like or str or int) – Date(s)
- Returns:
Day(s) of the week
- Return type:
ndarray or str
- mhealthdata.utils.to_weekdayiso_name(date)[source]
Converts date(s) to name(s) of day of the week (Monday - Sunday).
- Parameters:
date (array_like or str or int) – Date(s)
- Returns:
Day(s) of the week
- Return type:
ndarray or str
- mhealthdata.utils.to_year_month_day(date)[source]
Converts date(s) to (arrays of) year, month, day.
- Parameters:
date (array_like or str) – Date(s) of type str
- Returns:
year (ndarray or int) – Year(s)
month (ndarray or int) – Month(s) of the year
day (ndarray or int) – Day(s) of the month
- mhealthdata.utils.unique_sorted(x, return_dict=False)[source]
Sorts unique values in descending order.
- Parameters:
x (ndarray) – 1D array of numeric values
return_dict (bool, default False) – If False - return value, count arrays If True - return dict value -> count
- Returns:
value (ndarray, optional) – Unique values array (if return_dict is False)
count (ndarray, optional) – Unique values counts (if return_dict is False)
dict (dict, optional) – Dictionary unique value -> count (if return_dict is True)
- mhealthdata.utils.window_avg_std(t, x, window=14, smooth=0)[source]
Calculates running window average and std.
- Parameters:
t (ndarray) – 1D array of time indices of values
x (ndarray) – 1D array of values
window (int, default 14) – Window size
smooth (int, default 0) – Window size to smooth average and std
- Returns:
t_avg (ndarray) – 1D array of time indices of avg / std values
x_avg (ndarray) – 1D array of running window average values
x_std (ndarray) – 1D array of running window std values
- mhealthdata.utils.window_boxcar(n, m=None)[source]
Generates boxcar window (-1 to 1) of length n and width m.
- Parameters:
n (int) – Window length
m (int or None, default None) – Window width, if None, m = int(max(1, 0.2 * n))
- Returns:
1D array of boxcar window
- Return type:
ndarray
- mhealthdata.utils.window_sigmoid(n, m=None)[source]
Generates sigmoid window (-1 to 1) of length n and width m.
- Parameters:
n (int) – Window length
m (int or None, default None) – Window width, if None, m = int(max(1, 0.2 * n))
- Returns:
1D array of sigmoid window
- Return type:
ndarray
- mhealthdata.utils.xticks_dates(idate, mode='day', ax=None, **kwargs)[source]
Matplotlib xticks as date(s).
- Parameters:
idate (array_like) – Array or list of dates of type int (day 1 = Jan 1st, 1 AD)
mode ({"day", "week", "fortnight", "month"}, default "day") – Date spacing
ax (matplotlib.pyplot.Axes object, default None) – Axes for plotting
**kwargs – Keyword arguments
- Returns:
ax – Axes for plotting
- Return type:
matplotlib.pyplot.Axes object
- mhealthdata.utils.xticks_days(x, ax=None, **kwargs)[source]
Matplotlib xticks as days, assuming xlim is (0, 1440 x N days) [min].
- Parameters:
x (array_like) – Data, to infer number of days
ax (matplotlib.pyplot.Axes object, default None) – Axes for plotting
**kwargs – Keyword arguments
- Returns:
ax – Axes for plotting
- Return type:
matplotlib.pyplot.Axes object
- mhealthdata.utils.xticks_hours(dt=1, mode='24H', ax=None, **kwargs)[source]
Matplotlib xticks as hours, assuming xlim is (0,1440) [min/day].
- Parameters:
dt (int, default 1) – stride, hours
mode ({"24H", "12H"}, default "24H") – Time format
ax (matplotlib.pyplot.Axes object, default None) – Axes for plotting
**kwargs – Keyword arguments
- Returns:
ax – Axes for plotting
- Return type:
matplotlib.pyplot.Axes object
- mhealthdata.timezone.find_timezone_mismatch(data, nday=3, dt=60)[source]
For Fitbit only: Identify timezone by mismatch of sleep, steps, and bpm - Assume “sleep” is in local time - Automatically detect day-wise “steps” & “bpm” offsets to match “sleep”
- Parameters:
data (dict) – Dictionary of data, should have keys “sleep”, “steps”, “bpm”, each containing array of size N days x 1440 minutes
nday (int, default 3) – Window to find best match timezone offset, [days]
dt (int, default 60) – Stride to find best match timezone offset, [minutes]
- Returns:
1D array of length N days with timezone offset [minutes]
- Return type:
ndarray
- mhealthdata.timezone.fix_timezone_mismatch(data, tz=None)[source]
Fix timezone offset for data imported from Fitbit - Assume “sleep” is in local time - Automatically detect day-wise “steps” and “bpm” offsets to match “sleep”
- Parameters:
data (dict) – Dictionary of data, should have keys “sleep”, “steps”, “bpm” Each key should point to an array of size N days x 1440 minutes
tz (ndarray or None, default None) – 1D array of length N days with timezone offset [minutes] If None, timezone will be detected automatically
- Returns:
Dictionary of data with “steps” and “bpm” rolled to match “sleep”
- Return type:
dict