isaricanalytics.redcap_data¶
- isaricanalytics.redcap_data.add_answer_dict(dictionary: DataFrame) DataFrame[source]¶
pandas.DataFrame: Returns the REDCap schema data dictionary with a lookup dict of labels and values.By default, ignores Yes/No/Unknown radio variables.
- Parameters:
- dictionarypandas.DataFrame
REDCap schema data dictionary.
- Returns:
- pandas.DataFrame
An updated REDCap schema data dictionary with a lookup dict of labels and values.
- isaricanalytics.redcap_data.add_onehot_variables(data: DataFrame, dictionary: DataFrame, sep: str = '___') DataFrame[source]¶
pandas.DataFrame: Returns the data dictionary with rows for onehot-encoded categorical variables.Add new rows to the dictionary for onehot-encoded categorical variables, using only the answers that exist within the data, e.g. if checkbox columns exist (after removing columns with only ‘Unchecked’) or if radio column answers are present for at least one subjid.
- Parameters:
- datapandas.DataFrame
The incoming data.
- dictionarypandas.DataFrame
The data dictionary.
- sepstr, default=”___”
Optional separator of field/variable names and values.
- Returns:
- pandas.DataFrame
The data dictionary with rows for onehot-encoded categorical variables.
- isaricanalytics.redcap_data.combine_unlisted_variables(data: DataFrame, dictionary: DataFrame, sep: str = '___') tuple[DataFrame][source]¶
tuple: Combine variables in repetitions of a question.Combine variables that exist in repeated versions of the same question (e.g. additional dropdown questions asked after Yes/No/Unknown questions for established variables).
- Parameters:
- datapandas.DataFrame
The incoming data.
- dictionarypandas.DataFrame
The REDCap data dictionary.
- sepstr, default=”___”
Optional value separator.
- Returns:
- tuple
The update data and data dictionary.
- isaricanalytics.redcap_data.convert_dictionary_field_type(dictionary: DataFrame) DataFrame[source]¶
pandas.DataFrame: Return a dictionary of variable types, based on REDCAP structure.- Parameters:
- dictionarypandas.DataFrame
The REDCap data dictionary.
- Returns:
- pandas.DataFrame
a dictionary of variable types, based on REDCAP structure.
- isaricanalytics.redcap_data.convert_onehot_to_binary(data: DataFrame, dictionary: DataFrame) DataFrame[source]¶
pandas.DataFrame: Converts onehot-encoded columns in the data.The conversions will be True/False/NaN values, and answers from the data dictionary discarded if they exist.
- Parameters:
- datapandas.DataFrame
The incoming data.
- dictionarypandas.DataFrame
The REDCap data dictionary.
- Returns:
- pandas.DataFrame
The data with the one-hot columns appropriately converted.
- isaricanalytics.redcap_data.get_branching_logic_variables(branching_logic: str) list[str][source]¶
list: Return all variables included in the branching logic (including checkboxes variables).- Parameters:
- branching_logicstr
THe branching logic string.
- Returns:
- list
The list of all variables included in the branching logic.
- isaricanalytics.redcap_data.get_data_dictionary(redcap_url: str, redcap_api_key: str) DataFrame[source]¶
pandas.DataFrame: Returns a data dictionary from the REDCap API.- Parameters:
- redcap_urlstr
REDCap URL.
- redcap_api_keystr
REDCap API key.
- Returns:
- pandas.DataFrame
Data dictionary from the REDCap API.
- isaricanalytics.redcap_data.get_df_forms(data: DataFrame, dictionary: DataFrame) dict[str, DataFrame][source]¶
dict: Returns a dict of clinical form names and associated dataframes.- Parameters:
- datapandas.DataFrame
The incoming REDCap data.
- dictionarypandas.DataFrame
The data dictionary.
- Returns:
- dict
The dict of clinical form names and associated dataframes.
- isaricanalytics.redcap_data.get_df_map(data: DataFrame, dictionary: DataFrame) tuple[DataFrame | dict[str, Any]][source]¶
pandas.DataFrame: Returns a dataframe with single-event rows converted to a format with one row per patient.- Parameters:
- datapandas.DataFrame
The incoming REDCap data.
- dictionarypandas.DataFrame
The REDCap data.
- Returns:
- tuple
Three dataframes, consisting of the transformed data. the data dictionary, and the quality report.
- isaricanalytics.redcap_data.get_events_and_forms_info(redcap_url: str, redcap_api_key: str) DataFrame[source]¶
pandas.DataFrame: Returns a combined dataframe of events, forms and their mapppings from the REDCap API.- Parameters:
- redcap_urlstr
REDCap URL.
- redcap_api_keystr
REDCap API key.
- Returns:
- pandas.DataFrame
Events, forms and their mapppings from the REDCap API.
- isaricanalytics.redcap_data.get_form_event(redcap_url: str, redcap_api_key: str) DataFrame[source]¶
pandas.DataFrame: Returns a combined dataframe of events, forms and their mapppings from the REDCap API.Warning
DEPRECATED.
- Parameters:
- redcap_urlstr
REDCap URL.
- redcap_api_keystr
REDCap API key.
- Returns:
- pandas.DataFrame
Events, forms and their mapppings from the REDCap API.
- isaricanalytics.redcap_data.get_label(x: Iterable[str]) list[str][source]¶
list: Returns a list of labels.Warning
DEPRECATED.
- Parameters:
- xtyping.Iterable
An iterable of label tuples.
- Returns:
- list
A list of labels.
- isaricanalytics.redcap_data.get_labels(x: Iterable[str]) list[str][source]¶
list: Returns a list of labels.- Parameters:
- xtyping.Iterable
An iterable of label tuples.
- Returns:
- list
A list of labels.
- isaricanalytics.redcap_data.get_missing_data_codes(redcap_url: str, redcap_api_key: str) dict[str, str][source]¶
dict: Returns missing data codes from the REDCAP API, using the project metadata.- Parameters:
- redcap_urlstr
REDCap URL.
- redcap_api_keystr
REDCap API key.
- Returns:
- dict
A dict of missing data codes from the REDCap API, using the project metadata. An empty dict is returned in the case there are no missing data codes.
- isaricanalytics.redcap_data.get_records(redcap_url: str, redcap_api_key: str, data_access_groups: Iterable[str] | None = None, user_assigned_to_dag: bool = False) DataFrame[source]¶
pandas.DataFrame: Returns a dataframe of records from the REDCap API.- Parameters:
- redcap_urlstr
REDCap URL.
- redcap_api_keystr
REDCap API key.
- data_access_groupstyping.Iterable, default=None
An iterable of data access group names.
- user_assigned_to_dagbool, default=False
Whether the user is assigned to a data access group (DAG).
- Returns:
- pandas.DataFrame
Records from the REDCap API data.
- isaricanalytics.redcap_data.get_redcap_data(redcap_url: str, redcap_api_key: str, data_access_groups: Iterable[str] | None = None, user_assigned_to_dag: bool | None = False, country_mapping: dict | None = None) tuple[DataFrame | dict[str, DataFrame] | dict[str, Any]][source]¶
tuple: Returns data from REDCap API and transforms them into analysis-ready dataframes.- Parameters:
- redcap_urlstr
The REDCap database URL.
- redcap_api_keystr
The REDCap API key.
- data_access_groupstyping.Iterable, default=None
Optional iterable of data access group (DAG) names.
- user_assigned_to_dagbool, default=None
Whether the user is assigned to a DAG.
- country_mappingdict
The countries table.
- isaricanalytics.redcap_data.get_section_prefix(x: str) str[source]¶
str: Returns the section prefix.- Parameters:
- xstr
Section name/value.
- Returns:
- str
The section prefix.
- isaricanalytics.redcap_data.get_value(x: Iterable[str]) list[str][source]¶
list: Returns a list of values.Warning
DEPRECATED.
- Parameters:
- xtyping.Iterable
An iterable of value tuples.
- Returns:
- list
A list of values.
- isaricanalytics.redcap_data.get_values(x: Iterable[str]) list[str][source]¶
list: Returns a list of values.- Parameters:
- xtyping.Iterable
An iterable of value tuples.
- Returns:
- list
A list of values.
- isaricanalytics.redcap_data.harmonise_age(data: DataFrame, age_columns: Iterable[str] = ['demog_age', 'demog_age_units']) DataFrame[source]¶
pandas.DataFrame: The data with ages harmonised.Warning
DEPRECATED. Age should now be included in conversion_table.csv. Convert age from any units into age in years.
- Parameters:
- datapandas.DataFrame
The incoming data.
- age_columnstyping.Iterable, default=[“demog_age”, “demog_age_units”]
An iterable (e.g. list) of age columns.
- Returns:
- pandas.DataFrame
The data with ages harmonised.
- isaricanalytics.redcap_data.homogenise_variables(data: DataFrame, dictionary: DataFrame) tuple[DataFrame][source]¶
pandas.DataFrame: Converts variables in given units in the data based on a conversion table.- Parameters:
- datapandas.DataFrame
The incoming data.
- dictionarypandas.DataFrame
Conversion table/dictionary, as a Pandas dataframe.
- Returns:
- pandas.DataFrame
The data with unit conversions applied.
- isaricanalytics.redcap_data.initial_data_processing(data: DataFrame, dictionary: DataFrame, missing_data_codes: dict[str, Any]) tuple[DataFrame][source]¶
tuple: Initial processing function invoked after the REDCap API call.- Parameters:
- datapandas.DataFrame
The incoming REDCap data.
- dictionarypandas.DataFrame
The REDCap data dictionary.
- missing_data_codesdict
The dict of missing code keys and values.
- Returns:
- tuple
A tuple consisting of the updated data and data dictionary dataframes.
- isaricanalytics.redcap_data.is_unlisted_item(x: Iterable[str]) str[source]¶
str:- Parameters:
x (Iterable) – An iterable of strings.
- Returns:
- str
- isaricanalytics.redcap_data.is_yesno(x: str) str[source]¶
str: Returns a cleaned version of a Yes/No/Unknown question string.Warning
DEPRECATED.
Checks if the string is a Yes/No/Unknown question, and removes spaces in case there are variations in the same string.
- Parameters:
- xstr
A question string.
- Returns:
- str
A cleaned version of the string if it is a Yes/No/Unknown question. Otherwise the original is returned.
- isaricanalytics.redcap_data.is_yesno_question(x: str) str[source]¶
str: Returns a cleaned version of a Yes/No/Unknown question string.Checks if the string is a Yes/No/Unknown question, and removes spaces in case there are variations in the same string.
- Parameters:
- xstr
A question string.
- Returns:
- str
A cleaned version of the string if it is a Yes/No/Unknown question. Otherwise the original is returned.
- isaricanalytics.redcap_data.list_categorical_onehot_columns(dictionary_row: dict[str, Any], data: DataFrame, sep: str = '___') list[str][source]¶
listReturns a list of categorical onehot-encoded columns in the given dataframe.- Parameters:
- dictionary_rowdict
A row of the data dictionary.
- datapandas.DataFrame
The incoming data.
- sepstr, default=”___”
Separator of field/variable name and value in the list.
- Returns:
- list
A list of categorical onehot-encoded columns in the given dataframe.
- isaricanalytics.redcap_data.list_checkbox_onehot_columns(dictionary_row: dict[str, Any], data: DataFrame, sep: str = '___') list[str][source]¶
listReturns a list of checkbox onehot-encoded columns in the given dataframe.- Parameters:
- dictionary_rowdict
A row of the data dictionary.
- datapandas.DataFrame
The incoming data.
- sepstr, default=”___”
Optional separator of field/variable name and value in the list.
- Returns:
- list
A list of checkbox onehot-encoded columns in the given dataframe.
- isaricanalytics.redcap_data.load_countries_table(encoding: str = 'latin-1') DataFrame[source]¶
pandas.DataFrame: Loads countries from a CSV.- Parameters:
- encodingstr, default=”latin-1”
Optional file encoding.
- Returns:
- pandas.DataFrame
The countries table.
- isaricanalytics.redcap_data.load_units_conversion_table() DataFrame[source]¶
pandas.DataFrame: Loads the conversion table from a CSV.- Returns:
- pandas.DataFrame
The conversion table.
- isaricanalytics.redcap_data.map_variable(variable: Series, mapping_dict: dict[str, Any], non_nan_value: str = 'Other / Unknown') Series[source]¶
pandas.Series: Map a variable column using a dict.Any non-NaN value not in the dict keys is converted to the value specified by
other_value_str.- Parameters:
- variablepandas.Series
The variable column to map.
- mapping_dictdict
The mapping dict.
- non_nan_valuestr, default=”Other / Unknown”
Optional value with which to replace non-NaN values not in the dict keys.
- isaricanalytics.redcap_data.rename_checkbox_variables(data: DataFrame, dictionary: DataFrame) DataFrame[source]¶
pandas.DataFrame: Rename checkbox variable columns.By default the suffix is their answer option value. Convert this answer option value to the answer option name.
- Parameters:
- datapandas.DataFrame
The incoming data.
- dictionarypandas.DataFrame
The REDCap data dictionary.
- Returns:
- pandas.DataFrame
The updated data.
- isaricanalytics.redcap_data.replace_with_nan_for_missing_code_checkbox(data: DataFrame, missing_data_codes: dict[str, Any]) DataFrame[source]¶
pandas.DataFrame: Return the input dataframe with missing code checkbox values converted to NaN.- Parameters:
- datapandas.DataFrame
The incoming data.
- missing_data_codesdict
A dict of missing code keys and values.
- Returns:
- pandas.DataFrame
The input dataframe with missing code checkbox values converted to NaN.
- isaricanalytics.redcap_data.resolve_checkbox_branching_logic(data: DataFrame, dictionary: DataFrame) DataFrame[source]¶
pandas.DataFrame: Resolves checkbox logic.By default, a cell is marked as ‘Unchecked’ in the absence of the positive, even if the question was not asked to the subjid. If the question was not asked to the subjid because of the branching logic, then set this to be NaN instead. This does not completely check the branching logic, which is a data quality issue!.
- Parameters:
- datapandas.DataFrame
The incoming data.
- dictionarypandas.DataFrame
The REDCap data dictionary.
- Returns:
- pandas.DataFrame
The data with the checkbox branching logic resolved.