isaricanalytics.redcap_data

isaricanalytics.redcap_data.add_answer_dict(dictionary: DataFrame) DataFrame[source]

pandas.DataFrame : Returns the REDCap schema data dictionary with a lookup dict of labels and values.

By default, ignores Yes/No/Unknown radio variables.

Parameters:
dictionarypandas.DataFrame

REDCap schema data dictionary.

Returns:
pandas.DataFrame

An updated REDCap schema data dictionary with a lookup dict of labels and values.

isaricanalytics.redcap_data.add_onehot_variables(data: DataFrame, dictionary: DataFrame, sep: str = '___') DataFrame[source]

pandas.DataFrame : Returns the data dictionary with rows for onehot-encoded categorical variables.

Add new rows to the dictionary for onehot-encoded categorical variables, using only the answers that exist within the data, e.g. if checkbox columns exist (after removing columns with only ‘Unchecked’) or if radio column answers are present for at least one subjid.

Parameters:
datapandas.DataFrame

The incoming data.

dictionarypandas.DataFrame

The data dictionary.

sepstr, default=”___”

Optional separator of field/variable names and values.

Returns:
pandas.DataFrame

The data dictionary with rows for onehot-encoded categorical variables.

isaricanalytics.redcap_data.combine_unlisted_variables(data: DataFrame, dictionary: DataFrame, sep: str = '___') tuple[DataFrame][source]

tuple : Combine variables in repetitions of a question.

Combine variables that exist in repeated versions of the same question (e.g. additional dropdown questions asked after Yes/No/Unknown questions for established variables).

Parameters:
datapandas.DataFrame

The incoming data.

dictionarypandas.DataFrame

The REDCap data dictionary.

sepstr, default=”___”

Optional value separator.

Returns:
tuple

The update data and data dictionary.

isaricanalytics.redcap_data.convert_dictionary_field_type(dictionary: DataFrame) DataFrame[source]

pandas.DataFrame : Return a dictionary of variable types, based on REDCAP structure.

Parameters:
dictionarypandas.DataFrame

The REDCap data dictionary.

Returns:
pandas.DataFrame

a dictionary of variable types, based on REDCAP structure.

isaricanalytics.redcap_data.convert_onehot_to_binary(data: DataFrame, dictionary: DataFrame) DataFrame[source]

pandas.DataFrame : Converts onehot-encoded columns in the data.

The conversions will be True/False/NaN values, and answers from the data dictionary discarded if they exist.

Parameters:
datapandas.DataFrame

The incoming data.

dictionarypandas.DataFrame

The REDCap data dictionary.

Returns:
pandas.DataFrame

The data with the one-hot columns appropriately converted.

isaricanalytics.redcap_data.get_branching_logic_variables(branching_logic: str) list[str][source]

list : Return all variables included in the branching logic (including checkboxes variables).

Parameters:
branching_logicstr

THe branching logic string.

Returns:
list

The list of all variables included in the branching logic.

isaricanalytics.redcap_data.get_data_dictionary(redcap_url: str, redcap_api_key: str) DataFrame[source]

pandas.DataFrame : Returns a data dictionary from the REDCap API.

Parameters:
redcap_urlstr

REDCap URL.

redcap_api_keystr

REDCap API key.

Returns:
pandas.DataFrame

Data dictionary from the REDCap API.

isaricanalytics.redcap_data.get_df_forms(data: DataFrame, dictionary: DataFrame) dict[str, DataFrame][source]

dict : Returns a dict of clinical form names and associated dataframes.

Parameters:
datapandas.DataFrame

The incoming REDCap data.

dictionarypandas.DataFrame

The data dictionary.

Returns:
dict

The dict of clinical form names and associated dataframes.

isaricanalytics.redcap_data.get_df_map(data: DataFrame, dictionary: DataFrame) tuple[DataFrame | dict[str, Any]][source]

pandas.DataFrame : Returns a dataframe with single-event rows converted to a format with one row per patient.

Parameters:
datapandas.DataFrame

The incoming REDCap data.

dictionarypandas.DataFrame

The REDCap data.

Returns:
tuple

Three dataframes, consisting of the transformed data. the data dictionary, and the quality report.

isaricanalytics.redcap_data.get_events_and_forms_info(redcap_url: str, redcap_api_key: str) DataFrame[source]

pandas.DataFrame : Returns a combined dataframe of events, forms and their mapppings from the REDCap API.

Parameters:
redcap_urlstr

REDCap URL.

redcap_api_keystr

REDCap API key.

Returns:
pandas.DataFrame

Events, forms and their mapppings from the REDCap API.

isaricanalytics.redcap_data.get_form_event(redcap_url: str, redcap_api_key: str) DataFrame[source]

pandas.DataFrame : Returns a combined dataframe of events, forms and their mapppings from the REDCap API.

Warning

DEPRECATED.

Parameters:
redcap_urlstr

REDCap URL.

redcap_api_keystr

REDCap API key.

Returns:
pandas.DataFrame

Events, forms and their mapppings from the REDCap API.

isaricanalytics.redcap_data.get_label(x: Iterable[str]) list[str][source]

list : Returns a list of labels.

Warning

DEPRECATED.

Parameters:
xtyping.Iterable

An iterable of label tuples.

Returns:
list

A list of labels.

isaricanalytics.redcap_data.get_labels(x: Iterable[str]) list[str][source]

list : Returns a list of labels.

Parameters:
xtyping.Iterable

An iterable of label tuples.

Returns:
list

A list of labels.

isaricanalytics.redcap_data.get_missing_data_codes(redcap_url: str, redcap_api_key: str) dict[str, str][source]

dict : Returns missing data codes from the REDCAP API, using the project metadata.

Parameters:
redcap_urlstr

REDCap URL.

redcap_api_keystr

REDCap API key.

Returns:
dict

A dict of missing data codes from the REDCap API, using the project metadata. An empty dict is returned in the case there are no missing data codes.

isaricanalytics.redcap_data.get_records(redcap_url: str, redcap_api_key: str, data_access_groups: Iterable[str] | None = None, user_assigned_to_dag: bool = False) DataFrame[source]

pandas.DataFrame : Returns a dataframe of records from the REDCap API.

Parameters:
redcap_urlstr

REDCap URL.

redcap_api_keystr

REDCap API key.

data_access_groupstyping.Iterable, default=None

An iterable of data access group names.

user_assigned_to_dagbool, default=False

Whether the user is assigned to a data access group (DAG).

Returns:
pandas.DataFrame

Records from the REDCap API data.

isaricanalytics.redcap_data.get_redcap_data(redcap_url: str, redcap_api_key: str, data_access_groups: Iterable[str] | None = None, user_assigned_to_dag: bool | None = False, country_mapping: dict | None = None) tuple[DataFrame | dict[str, DataFrame] | dict[str, Any]][source]

tuple : Returns data from REDCap API and transforms them into analysis-ready dataframes.

Parameters:
redcap_urlstr

The REDCap database URL.

redcap_api_keystr

The REDCap API key.

data_access_groupstyping.Iterable, default=None

Optional iterable of data access group (DAG) names.

user_assigned_to_dagbool, default=None

Whether the user is assigned to a DAG.

country_mappingdict

The countries table.

isaricanalytics.redcap_data.get_section_prefix(x: str) str[source]

str : Returns the section prefix.

Parameters:
xstr

Section name/value.

Returns:
str

The section prefix.

isaricanalytics.redcap_data.get_value(x: Iterable[str]) list[str][source]

list : Returns a list of values.

Warning

DEPRECATED.

Parameters:
xtyping.Iterable

An iterable of value tuples.

Returns:
list

A list of values.

isaricanalytics.redcap_data.get_values(x: Iterable[str]) list[str][source]

list : Returns a list of values.

Parameters:
xtyping.Iterable

An iterable of value tuples.

Returns:
list

A list of values.

isaricanalytics.redcap_data.harmonise_age(data: DataFrame, age_columns: Iterable[str] = ['demog_age', 'demog_age_units']) DataFrame[source]

pandas.DataFrame : The data with ages harmonised.

Warning

DEPRECATED. Age should now be included in conversion_table.csv. Convert age from any units into age in years.

Parameters:
datapandas.DataFrame

The incoming data.

age_columnstyping.Iterable, default=[“demog_age”, “demog_age_units”]

An iterable (e.g. list) of age columns.

Returns:
pandas.DataFrame

The data with ages harmonised.

isaricanalytics.redcap_data.homogenise_variables(data: DataFrame, dictionary: DataFrame) tuple[DataFrame][source]

pandas.DataFrame : Converts variables in given units in the data based on a conversion table.

Parameters:
datapandas.DataFrame

The incoming data.

dictionarypandas.DataFrame

Conversion table/dictionary, as a Pandas dataframe.

Returns:
pandas.DataFrame

The data with unit conversions applied.

isaricanalytics.redcap_data.initial_data_processing(data: DataFrame, dictionary: DataFrame, missing_data_codes: dict[str, Any]) tuple[DataFrame][source]

tuple : Initial processing function invoked after the REDCap API call.

Parameters:
datapandas.DataFrame

The incoming REDCap data.

dictionarypandas.DataFrame

The REDCap data dictionary.

missing_data_codesdict

The dict of missing code keys and values.

Returns:
tuple

A tuple consisting of the updated data and data dictionary dataframes.

isaricanalytics.redcap_data.is_unlisted_item(x: Iterable[str]) str[source]

str :

Parameters:

x (Iterable) – An iterable of strings.

Returns:
str
isaricanalytics.redcap_data.is_yesno(x: str) str[source]

str : Returns a cleaned version of a Yes/No/Unknown question string.

Warning

DEPRECATED.

Checks if the string is a Yes/No/Unknown question, and removes spaces in case there are variations in the same string.

Parameters:
xstr

A question string.

Returns:
str

A cleaned version of the string if it is a Yes/No/Unknown question. Otherwise the original is returned.

isaricanalytics.redcap_data.is_yesno_question(x: str) str[source]

str : Returns a cleaned version of a Yes/No/Unknown question string.

Checks if the string is a Yes/No/Unknown question, and removes spaces in case there are variations in the same string.

Parameters:
xstr

A question string.

Returns:
str

A cleaned version of the string if it is a Yes/No/Unknown question. Otherwise the original is returned.

isaricanalytics.redcap_data.list_categorical_onehot_columns(dictionary_row: dict[str, Any], data: DataFrame, sep: str = '___') list[str][source]

list Returns a list of categorical onehot-encoded columns in the given dataframe.

Parameters:
dictionary_rowdict

A row of the data dictionary.

datapandas.DataFrame

The incoming data.

sepstr, default=”___”

Separator of field/variable name and value in the list.

Returns:
list

A list of categorical onehot-encoded columns in the given dataframe.

isaricanalytics.redcap_data.list_checkbox_onehot_columns(dictionary_row: dict[str, Any], data: DataFrame, sep: str = '___') list[str][source]

list Returns a list of checkbox onehot-encoded columns in the given dataframe.

Parameters:
dictionary_rowdict

A row of the data dictionary.

datapandas.DataFrame

The incoming data.

sepstr, default=”___”

Optional separator of field/variable name and value in the list.

Returns:
list

A list of checkbox onehot-encoded columns in the given dataframe.

isaricanalytics.redcap_data.load_countries_table(encoding: str = 'latin-1') DataFrame[source]

pandas.DataFrame : Loads countries from a CSV.

Parameters:
encodingstr, default=”latin-1”

Optional file encoding.

Returns:
pandas.DataFrame

The countries table.

isaricanalytics.redcap_data.load_units_conversion_table() DataFrame[source]

pandas.DataFrame : Loads the conversion table from a CSV.

Returns:
pandas.DataFrame

The conversion table.

isaricanalytics.redcap_data.map_variable(variable: Series, mapping_dict: dict[str, Any], non_nan_value: str = 'Other / Unknown') Series[source]

pandas.Series : Map a variable column using a dict.

Any non-NaN value not in the dict keys is converted to the value specified by other_value_str.

Parameters:
variablepandas.Series

The variable column to map.

mapping_dictdict

The mapping dict.

non_nan_valuestr, default=”Other / Unknown”

Optional value with which to replace non-NaN values not in the dict keys.

isaricanalytics.redcap_data.rename_checkbox_variables(data: DataFrame, dictionary: DataFrame) DataFrame[source]

pandas.DataFrame : Rename checkbox variable columns.

By default the suffix is their answer option value. Convert this answer option value to the answer option name.

Parameters:
datapandas.DataFrame

The incoming data.

dictionarypandas.DataFrame

The REDCap data dictionary.

Returns:
pandas.DataFrame

The updated data.

isaricanalytics.redcap_data.replace_with_nan_for_missing_code_checkbox(data: DataFrame, missing_data_codes: dict[str, Any]) DataFrame[source]

pandas.DataFrame : Return the input dataframe with missing code checkbox values converted to NaN.

Parameters:
datapandas.DataFrame

The incoming data.

missing_data_codesdict

A dict of missing code keys and values.

Returns:
pandas.DataFrame

The input dataframe with missing code checkbox values converted to NaN.

isaricanalytics.redcap_data.resolve_checkbox_branching_logic(data: DataFrame, dictionary: DataFrame) DataFrame[source]

pandas.DataFrame : Resolves checkbox logic.

By default, a cell is marked as ‘Unchecked’ in the absence of the positive, even if the question was not asked to the subjid. If the question was not asked to the subjid because of the branching logic, then set this to be NaN instead. This does not completely check the branching logic, which is a data quality issue!.

Parameters:
datapandas.DataFrame

The incoming data.

dictionarypandas.DataFrame

The REDCap data dictionary.

Returns:
pandas.DataFrame

The data with the checkbox branching logic resolved.

isaricanalytics.redcap_data.user_assigned_to_dag(redcap_url: str, redcap_api_key: str) bool[source]

bool : Whether the user is assigned to a data access group (DAG).

Parameters:
redcap_urlstr

REDCap URL.

redcap_api_keystr

REDCap API key.

Returns:
bool

Whether the user is assigned to a REDCap DAG.