Welcome to bmdrc!¶
bmdrc is a python package for fitting benchmark dose curves to dichotomous, proportional, and larval photomotor response data. This package is a statistics toolkit which can be followed in 5 main steps: 1. Upload Class Options, 2. Preprocessing Options, 3. Filtering Options, 4. Model Fit Options, and 5. Output File Options.
NOTE! Backend function names do not match the frontend function names in objects. Pay close attention to the example code and vignettes, instead of the backend function descriptions. Thank you.
1. Upload Class Options¶
Upload data using one of the following class options, depending on your input data format:
Binary Class is for binary data that needs to be converted to propotions. LPR Class is for continuous larval photomotor response data that needs to be converted to proportions. Simplified Class is for data that is already formatted as proportions.
Binary Class¶
- class bmdrc.BinaryClass.BinaryClass(df, chemical, plate, well, concentration, endpoint=None, value=None, format='long')¶
Generates a bmdrc object where input values are either a 0, 1, or NA. For propotional data, use SimplifiedClass().
- Parameters:
df – A pandas dataframe containing columns containing chemical, plate, well, concentration, endpoint (long format only), value (long format only) information. If the data is in wide format, all additional columns are assumed to be endpoints.
chemical – A string indicating the name of the column containing the chemical IDs, which should be strings.
plate – A string incidating the name of the column indicating the plate IDs, which should be strings.
well – A string indicating the name of the column with the well IDs, which should be strings.
concentration – A string indicating the name of the column containing the concentrations, which should be numerics.
endpoint – A string indicating the name of the column containing endpoints, which should be a string. Note that this parameter is not needed if the data is in wide format.
value – A string indicating the name of the column containing the binary values, which should be 0 for absent, and 1 for present. Note that this parameter is not needed if the data is in wide format.
format – A string to indicate whether the data is in ‘long’ or ‘wide’ format. Wide format requires only the chemical, plate, well, and concentration columns. The rest of the columns are assumed to be endpoints. Wide formats are then converted to the long format.
# Long format
BinaryClass(
df = pd.read_csv("path/to/longfile.csv"), # Input is a pandas DataFrame
chemical = "chemical.id", # The name of the chemical column
plate = "plate.id", # The name of the plate ID column
well = "well", # The name of the column with well names
concentration = "concentration", # The name of the concentration column
endpoint = "endpoint", # The name of the column with endpoints
value = "value", # The name of the column with values
format = "long" # The format of the input data, either 'long' or 'wide' is accepted
)
# Wide format
BinaryClass(
df = pd.read_csv("path/to/widefile.csv"),
chemical = "chemical.id",
plate = "plate.id",
well = "well",
concentration = "conc",
endpoint = "endpoint",
value = "value",
format = "wide"
)
LPR Class¶
- class bmdrc.LPRClass.LPRClass(df, chemical, plate, well, concentration, time, value, cycle_length=20.0, cycle_cooldown=10.0, starting_cycle='light')¶
Generates a bmdrc object from larval photomotor response data, which must be in long format.
- Parameters:
df – A pandas dataframe containing columns with the chemical, concentration, plate, well, time, and value information.
chemical – A string indicating the name of the column containing the chemical IDs, which should be strings
plate – A string indicating the name of the column indicating the plate IDs, which should be strings
well – A string indicating the name of the column with the well IDs, which should be strings
concentration – A string indicating the name of the column containing the concentrations, which should be numerics
time – A string indicating the name of the column containing time, which should be a string or integer. Strings should contain a number.
value – A string indicating the name of the column containing the binary values, which should be 0 for absent, and 1 for present. Not used if the light photomotor response
cycle_time – A numeric for the length of a light or dark cycle. Default is 20. The unit is a 6-second measure, so 20 six second measures is 2 minutes.
cycle_cooldown – A numeric for the length of time between cycles. Default is 10. The unit is a 6-second measure, so 10 six second measures is 1 minute.
starting_cycle – A string of either the “light” or “dark” cycle depending on whether the first measurement was a light or dark cycle. Default is “light”.
# Convert the continuous data to dichotomous
LPRClass(
df = pd.read_csv("path/to/lpr.csv"),
chemical = "chemical.id", # Column in file
plate = "plate.id", # Column in file
well = "well", # Column in file
concentration = "conc", # Column in file
time = "variable", # Column in file
value = "value", # Column in file
cycle_length = 20.0, # Length of cycle in 6 second intervals. 20 * 6 = 120 seconds
cycle_cooldown = 10.0, # Length of cycle in 6 second intervals. 10 * 6 = 60 seconds
starting_cycle = "light" # Starting cycle
)
Simplified Class¶
- class bmdrc.SimplifiedClass.SimplifiedClass(df, chemical, concentration, endpoint, response)¶
Generates a bmdrc object from proportions (ranging from 0 to 1). Does not contain the pre-processing & filtering options of BinaryClass.
- Parameters:
df – A pandas dataframe containing columns with chemical, concentration, endpoint, and response information.
chemical – A string indicating the name of the column containing the chemical IDs, which should be strings
concentration – A string indicating the name of the column containing the concentrations, which should be numerics
endpoint – A string indicating the name of the column containing endpoints, which should be strings.
response – A string indicating the name of the column containing the response values, which should range from 0 to 1.
SimplifiedClass(
df = pd.read_table("path/to/proportions.csv"), # Input is a pandas DataFrame
chemical = "chemical.id", # The name of the chemical column
endpoint = "endpoint", # The name of the column with endpoints
concentration = "concentration", # The name of the concentration column
response = "response" # The name of the column with response value ranging from 0 to 1
)
2. Preprocessing Options¶
There are currently three preprocessing options including: Combining Endpoints, Removing Endpoints, and Removing Wells.
Combining Endpoints¶
- class bmdrc.preprocessing.endpoint_combine(self, endpoint_dict: dict)¶
Combine endpoints and create new endpoints. For example, multiple 24 hour endpoints can be combined to create an “Any 24” endpoint. New endpoints are created with a binary or statement, meaning that if there is a 1 in any of the other endpoints, the resulting endpoint is a 1. Otherwise, it is 0 unless the other endpoints are all NA. Then the final value is NA.
- Parameters:
endpoint_dict (dict) – A dictionary where names are the new endpoint, and values are a list containing the endpoints to calculate these values from.
# Dictionary of terms to add
endpoint_dict = {"ANY24":["NC24", "DP24", "SM24"], "ANY":["NC24", "DP24", "SM24", "JAW"]}
# Create a bmdrc object and save it as Long. See the vignettes. Add new endpoint
Long.combine_and_create_new_endpoints(endpoint_dict)
Removing Endpoints¶
- class bmdrc.preprocessing.remove_endpoints(self, endpoint_name: list[str])¶
Completely remove an endpoint or set of endpoints from the dataset
- Parameters:
endpoint_name (list[str]) – A list of endpoints to remove written as strings
# Create a bmdrc object and save it as Long. See the vignettes
# Remove the endpoint that should not be modeled
Long.remove_endpoints("DNC")
Removing Wells¶
- class bmdrc.preprocessing.well_to_na(self, endpoint_name: list[str], endpoint_value: list[float], except_endpoint: list[str])¶
Remove any wells where a specific endpoint has a specific value. Wells are set to NA.
- Parameters:
endpoint_name (list[str]) – A list of endpoints to remove written as strings
endpoint_value (list[float]) – A list of specific values that the endpoints which need to be removed have. For example, if you would like to remove all endpoints in endpoint name with a 0, this value should be 0. See the vignettes for examples.
except_endpoint (list[str]) – A list of endpoints that should not have their wells affected by this rule. For example, a 24 hour mortality endpoint that should not affect an overall mortality endpoint. See the vignettes for examples.
# Create a bmdrc object and save it as Long. See the vignettes. Set invalid endpoints to NA.
Long.set_well_to_na(endpoint_name = "DNC", endpoint_value = 1, except_endpoint = ["ANY24"])
3. Filtering Options¶
There are currently three filtering options including: Minimum Concentration Filter, Negative Control Filter, and Correlation Score Filter.
Correlation Score Filter¶
- class bmdrc.filtering.correlation_score(self, scor: float, apply: bool, diagnostic_plot: bool)¶
Filter to remove endpoints with low correlation score thresholds.
- Parameters:
score – A threshold for the correlation score as a float (ranging from -1 to 1).
apply (bool) – A boolean to determine whether the filter is applied to the data. Default is False.
diagnostic_plot (bool) – A boolean to determine whether to make a diagnostic plot if apply is False. Default is False.
# Create a bmdrc object and save it as Long. See the vignettes.
# Apply filter with suggested value of 0.2
Long.filter_correlation_score(score = 0.2, diagnostic_plot = True, apply = False)
Minimum Concentration Filter¶
- class bmdrc.filtering.min_concentration(self, count: int, apply: bool, diagnostic_plot: bool)¶
Filter to remove endpoints without enough concentration measurements. This count does not include the baseline/control measurement of a concentration of 0.
- Parameters:
count (int) – An integer indicating the minimum number of concentrations an endpoint and chemical combination needs. Default is 3.
apply (bool) – A boolean to indicate whether the filter should be applied. Default is False.
diagnostic_plot (bool) – A boolean to determine whether to make a diagnostic plot if apply is False. Default is False.
# Create a bmdrc object and save it as Long. See the vignettes.
# Set the percentage and build the diagnostic plot, but don't actually apply the filter.
Long.filter_min_concentration(count = 3, apply = False, diagnostic_plot = True)
Negative Control Filter¶
- class bmdrc.filtering.negative_control(self, percentage: float, apply: bool, diagnostic_plot: bool)¶
Filter to remove plates with unusually high expression in the controls.
- Parameters:
percentage (float) – A float between 0 and 100 indicating the percentage of phenotypic expression in the controls that is permissable. Default is 50.
apply (bool) – A boolean to determine whether the filter should be applied. Default is False.
diagnostic_plot (bool) – A boolean to determine whether to make a diagnostic plot if apply is False. Default is False.
# Create a bmdrc object and save it as Long. See the vignettes.
# Filter data with unusually high responses in the controls.
Long.filter_negative_control(percentage = 50, apply = False, diagnostic_plot = False)
4. Model Fit Options¶
Fit Models to Response Curves¶
- class bmdrc.model_fitting.fit_the_models(self, gof_threshold: float, aic_threshold: float, model_selection: str, diagnostic_mode: bool)¶
Fit the EPA recommended models to your dataset.
- Parameters:
gof_threshold (float) – A float for the minimum p-value for the goodness-of-fit (gof) test. The default is 0.1
aic_threshold (float) – A float for the Akaike Information Criterion (AIC) threshold. The default is 2.
model_selection (str) – A string for the model_selection model. Currently, only “lowest BMDL” is supported.
diagnostic_mode (bool) – A boolean to indicate whether diagnostic messages should be printed. Default is False
# Create a bmdrc object and save it as Long. See the vignettes
Long.fit_models(gof_threshold = 0.1, aic_threshold = 2, model_selection = "lowest BMDL")
Visualize a Curve¶
- class bmdrc.model_fitting.gen_response_curve(self, chemical_name: str, endpoint_name: str, model: str, steps: int)¶
Generate the x and y coordinates of a specific curve, and optionally a plot
- Parameters:
chemical_name (str) – A string denoting the name of the chemical to generate a curve for
endpoint_name (str) – A string denoting the name of the endpoint to generate a curve for
model (str) – A string denoting the model engine used to generate the curve. Options are “logistic”, “gamma”, “weibull”, “log logistic”, “probit”, “log probit”, “multistage2”, or “quantal linear”
steps (int) – An integer for the number of doses between the minimum and maximum dose. Default is 10.
# Create a bmdrc object and save it as Long. See the vignettes
Long.response_curve(chemical_name = "2", endpoint_name = "JAW", model = "log probit")
5. Output File Options¶
Output Benchmark Doses¶
- class bmdrc.output_modules.benchmark_dose(self, path: str)¶
Calculate high level of statistics of benchmark dose fits
- Parameters:
path (str) – The path to write the benchmark dose file to
# Create a bmdrc object and save it as Long. Run the fit_models() fuction first. See the vignettes
Long.output_benchmark_dose()
Output Dose Tables¶
- class bmdrc.output_modules.dose_table(self, path: str)¶
Calculate confidence intervals for each measured dose
- Parameters:
path (str) – The path to write the dose table file to
# Create a bmdrc object and save it as Long. Run the fit_models() fuction first. See the vignettes
Long.output_dose_table()
Output Fits Table¶
- class bmdrc.model_fitting.fits_table(self, path: str)¶
Calculate several points along a curve for visualization purposes
- Parameters:
path (str) – The path to write the curve fits file to
# Create a bmdrc object and save it as Long. Run the fit_models() fuction first. See the vignettes
Long.output_fits_table()
Report Files¶
- class bmdrc.output_modules.report_binary(self, out_folder: str, report_name: str, file_type: str)¶
Generate either a markdown or json report files
- Parameters:
out_folder (str) – A string indicating the path to write the report file to
report_name (str) – A string of the name used for the the rport
file_type (str) – A string to indicate whether the output file should be a markdown “.md” or json “.json”
# Create a bmdrc object and save it as Long. Run the fit_models() fuction first. See the vignettes
Long.report(out_folder = "out/file.md", report_name = "example_out", file_type = ".md")