| Title: | Handy and Minimalistic Common Data Model Characterization |
|---|---|
| Description: | Extracts covariates from Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) domains using an R-only pipeline. Supports configurable temporal windows, domain-specific covariates for drug exposure, drug era (including Anatomical Therapeutic Chemical (ATC) groupings), condition occurrence, condition era, concept sets and cohorts. Methods are based on the Observational Health Data Sciences and Informatics (OHDSI) framework described in Hripcsak et al. (2015) <doi:10.1038/sdata.2015.35> and "The Book of OHDSI" OHDSI (2019, ISBN:978-1-7923-0589-8). |
| Authors: | Alexander Alexeyuk [aut, cre] |
| Maintainer: | Alexander Alexeyuk <[email protected]> |
| License: | Apache License (>= 2) |
| Version: | 0.0.1 |
| Built: | 2026-05-23 07:39:45 UTC |
| Source: | https://github.com/cran/OdysseusCharacterizationModule |
A convenience wrapper around buildConceptSetQuery that
resolves every element of a named list of concept-set expressions to
SQL in a single call.
buildConceptSetQueries( conceptSetList, conceptSetNames = names(conceptSetList), vocabularyDatabaseSchema = "@vocabulary_database_schema" )buildConceptSetQueries( conceptSetList, conceptSetNames = names(conceptSetList), vocabularyDatabaseSchema = "@vocabulary_database_schema" )
conceptSetList |
A named list of concept-set expressions.
Each element must conform to the format accepted by
|
conceptSetNames |
Character vector of concept-set labels, one
per element of |
vocabularyDatabaseSchema |
Character string.
The fully qualified schema containing the OMOP vocabulary tables.
Passed through to |
The function iterates over conceptSetList and
conceptSetNames in parallel using map2,
calling buildConceptSetQuery for each pair.
A named list of character strings, each containing the SQL
query for the corresponding concept set.
Names match those of conceptSetList.
Elements whose concept sets resolve to no included concepts are
returned as "" (empty strings).
buildConceptSetQuery for the single-expression
resolver and full documentation of the expression format.
csList <- list( diabetes = list(items = list( list(concept = list(CONCEPT_ID = 201820), includeDescendants = TRUE) )), hypertension = list(items = list( list(concept = list(CONCEPT_ID = 316866), includeDescendants = TRUE) )) ) queries <- buildConceptSetQueries( csList, vocabularyDatabaseSchema = "cdm_v5" ) # Each element is a SQL string cat(queries$diabetes) cat(queries$hypertension)csList <- list( diabetes = list(items = list( list(concept = list(CONCEPT_ID = 201820), includeDescendants = TRUE) )), hypertension = list(items = list( list(concept = list(CONCEPT_ID = 316866), includeDescendants = TRUE) )) ) queries <- buildConceptSetQueries( csList, vocabularyDatabaseSchema = "cdm_v5" ) # Each element is a SQL string cat(queries$diabetes) cat(queries$hypertension)
Translates a single CIRCE-format concept-set expression into a
stand-alone SQL query that resolves to the set of
concept_id values implied by the expression.
The function handles the four combination of per-item flags
(includeDescendants, includeMapped) as well as
concept exclusion (isExcluded), and does not require
Java or the CirceR package.
buildConceptSetQuery( conceptSetExpression, conceptSetName = "plug", vocabularyDatabaseSchema = "@vocabulary_database_schema" )buildConceptSetQuery( conceptSetExpression, conceptSetName = "plug", vocabularyDatabaseSchema = "@vocabulary_database_schema" )
conceptSetExpression |
A concept-set expression in one of two forms:
See Details for the full item specification. |
conceptSetName |
Character string.
A label embedded in the output SQL as the |
vocabularyDatabaseSchema |
Character string.
The fully qualified schema containing the OMOP vocabulary tables
( |
Each element of conceptSetExpression$items must be a list with
the following structure:
conceptRequired list.
Must contain CONCEPT_ID — a single, non-NA,
whole-number numeric value.
includeDescendantsOptional logical scalar.
When TRUE, all descendant concepts (via
CONCEPT_ANCESTOR) are included.
Defaults to FALSE when absent.
includeMappedOptional logical scalar.
When TRUE, concepts reached through "Maps to"
relationships (via CONCEPT_RELATIONSHIP) are included.
Defaults to FALSE when absent.
isExcludedOptional logical scalar.
When TRUE, the resolved concepts for this item are
removed from the final set via a LEFT JOIN …
WHERE … IS NULL anti-join pattern.
Defaults to FALSE when absent.
Items are first partitioned into included and excluded
groups based on isExcluded.
Within each group, items are further classified into four categories
by their flag combinations:
Plain — neither descendants nor mapped.
Descendants only — includeDescendants = TRUE.
Mapped only — includeMapped = TRUE.
Descendants and mapped — both flags TRUE.
A UNION of the appropriate SQL blocks is built for each group.
If excluded items exist, the excluded set is anti-joined against the
included set.
The conceptSetName value is injected into the SQL via
render using the @cs_name token.
The vocabularyDatabaseSchema is interpolated directly via
sprintf.
A single character string containing a SQL SELECT
statement that produces two columns: cs_name (the concept-set
label) and concept_id.
Returns "" (an empty string) when items is an empty
list or when all items are excluded and no included items remain.
The function performs extensive validation of all inputs and stops with an informative error message if:
vocabularyDatabaseSchema is not a single non-empty
character string.
conceptSetExpression is neither a list nor a valid
JSON string.
The items element is missing or is not a list.
Any item lacks a concept element or a valid
CONCEPT_ID.
Any optional logical flag is present but not a single logical value.
A warning is issued when a logical flag is NA (treated as
FALSE).
buildConceptSetQueries for batch resolution of multiple
concept sets,
render for SQL parameterisation,
fromJSON for JSON parsing.
# A concept set with descendants expr <- list(items = list( list(concept = list(CONCEPT_ID = 201820), includeDescendants = TRUE), list(concept = list(CONCEPT_ID = 433962)) )) sql <- buildConceptSetQuery( conceptSetExpression = expr, conceptSetName = "diabetes", vocabularyDatabaseSchema = "cdm_v5" ) cat(sql) # From a JSON string json <- '{"items":[{"concept":{"CONCEPT_ID":316866},"includeDescendants":true}]}' sql2 <- buildConceptSetQuery(json, conceptSetName = "hypertension") cat(sql2) # Exclusion example expr_excl <- list(items = list( list(concept = list(CONCEPT_ID = 201820), includeDescendants = TRUE), list(concept = list(CONCEPT_ID = 201254), isExcluded = TRUE) )) sql3 <- buildConceptSetQuery(expr_excl, conceptSetName = "diabetes_refined") cat(sql3)# A concept set with descendants expr <- list(items = list( list(concept = list(CONCEPT_ID = 201820), includeDescendants = TRUE), list(concept = list(CONCEPT_ID = 433962)) )) sql <- buildConceptSetQuery( conceptSetExpression = expr, conceptSetName = "diabetes", vocabularyDatabaseSchema = "cdm_v5" ) cat(sql) # From a JSON string json <- '{"items":[{"concept":{"CONCEPT_ID":316866},"includeDescendants":true}]}' sql2 <- buildConceptSetQuery(json, conceptSetName = "hypertension") cat(sql2) # Exclusion example expr_excl <- list(items = list( list(concept = list(CONCEPT_ID = 201820), includeDescendants = TRUE), list(concept = list(CONCEPT_ID = 201254), isExcluded = TRUE) )) sql3 <- buildConceptSetQuery(expr_excl, conceptSetName = "diabetes_refined") cat(sql3)
Takes the output of buildConceptSetQueries — a named
list of SQL SELECT statements — unions them together, and
writes the result into a single temporary database table with columns
cs_name and concept_id.
createConceptSetTempTable( connection, csQueries, vocabularyDatabaseSchema, tempTableName = "#concept_sets_c", tempEmulationSchema = NULL )createConceptSetTempTable( connection, csQueries, vocabularyDatabaseSchema, tempTableName = "#concept_sets_c", tempEmulationSchema = NULL )
connection |
A |
csQueries |
A named list of SQL query strings as returned by
|
vocabularyDatabaseSchema |
Character string.
The fully qualified schema containing the OMOP vocabulary tables.
This value is substituted for the
|
tempTableName |
Character string.
Name of the temporary table to create.
Defaults to |
tempEmulationSchema |
Character string or |
The function performs the following steps:
Filters out any empty query strings from csQueries.
Unions all remaining queries into a single SELECT
statement.
Wraps the union in a SELECT * INTO <tempTableName>
statement.
Renders the @vocabulary_database_schema parameter
and translates the SQL for the target DBMS.
Executes the SQL via
renderTranslateExecuteSql.
Creates a non-unique index on (cs_name, concept_id)
for efficient downstream joins.
If all queries are empty after filtering, the function stops with an informative error.
Invisibly returns the name of the created temporary table
(tempTableName).
Called primarily for its side effect of creating the table in the
database.
The created temporary table contains two columns:
cs_nameCharacter.
The concept-set label (derived from the names of
csQueries).
concept_idInteger. A resolved OMOP standard concept identifier.
buildConceptSetQueries for generating the input,
buildConceptSetQuery for the single-expression builder,
renderTranslateExecuteSql.
## Not run: library(DatabaseConnector) conn <- connect(dbms = "postgresql", server = "localhost/ohdsi", user = "user", password = "pass") csList <- list( diabetes = list(items = list( list(concept = list(CONCEPT_ID = 201820), includeDescendants = TRUE), list(concept = list(CONCEPT_ID = 433962)) )), hypertension = list(items = list( list(concept = list(CONCEPT_ID = 316866), includeDescendants = TRUE) )) ) csQueries <- buildConceptSetQueries(csList, vocabularyDatabaseSchema = "cdm_v5") createConceptSetTempTable( connection = conn, csQueries = csQueries, vocabularyDatabaseSchema = "cdm_v5", tempTableName = "#concept_sets" ) # Query the resulting table result <- querySql(conn, "SELECT * FROM #concept_sets;") head(result) disconnect(conn) ## End(Not run)## Not run: library(DatabaseConnector) conn <- connect(dbms = "postgresql", server = "localhost/ohdsi", user = "user", password = "pass") csList <- list( diabetes = list(items = list( list(concept = list(CONCEPT_ID = 201820), includeDescendants = TRUE), list(concept = list(CONCEPT_ID = 433962)) )), hypertension = list(items = list( list(concept = list(CONCEPT_ID = 316866), includeDescendants = TRUE) )) ) csQueries <- buildConceptSetQueries(csList, vocabularyDatabaseSchema = "cdm_v5") createConceptSetTempTable( connection = conn, csQueries = csQueries, vocabularyDatabaseSchema = "cdm_v5", tempTableName = "#concept_sets" ) # Query the resulting table result <- querySql(conn, "SELECT * FROM #concept_sets;") head(result) disconnect(conn) ## End(Not run)
Creates a covariateSettings object that can be passed directly to
FeatureExtraction::getDbCovariateData() as a custom covariate
builder. The settings specify which OdysseusCharacterizationModule analyses
to run, including time windows, base features, cohort features, and
concept-set features.
createOcmCovariateSettings( analysisWindows = defineAnalysisWindows(startDays = c(-365, -180, -30, 0, 1, 30, 180, 365), endDays = c(-1, -1, -1, 0, 30, 180, 365, 700)), useBaseFeatures = list(drug_exposure = list(include = FALSE, atc = FALSE, atcLevels = c(1L, 2L, 3L, 4L, 5L)), condition_occurrence = list(include = FALSE, type = "start"), condition_era = list(include = FALSE, type = "start"), drug_era = list(include = FALSE, type = "start", atc = FALSE, atcLevels = c(5L)), procedure_occurrence = list(include = FALSE), observation = list(include = FALSE), device_exposure = list(include = FALSE), visit_occurrence = list(include = TRUE, type = "start"), measurement = list(include = FALSE)), useCohortFeatures = list(include = FALSE, type = "start", cohortIds = NULL, cohortNames = NULL, cohortTable = NULL, covariateSchema = NULL), useConceptSetFeatures = list(conceptSets = NULL, include = FALSE, type = "binary") )createOcmCovariateSettings( analysisWindows = defineAnalysisWindows(startDays = c(-365, -180, -30, 0, 1, 30, 180, 365), endDays = c(-1, -1, -1, 0, 30, 180, 365, 700)), useBaseFeatures = list(drug_exposure = list(include = FALSE, atc = FALSE, atcLevels = c(1L, 2L, 3L, 4L, 5L)), condition_occurrence = list(include = FALSE, type = "start"), condition_era = list(include = FALSE, type = "start"), drug_era = list(include = FALSE, type = "start", atc = FALSE, atcLevels = c(5L)), procedure_occurrence = list(include = FALSE), observation = list(include = FALSE), device_exposure = list(include = FALSE), visit_occurrence = list(include = TRUE, type = "start"), measurement = list(include = FALSE)), useCohortFeatures = list(include = FALSE, type = "start", cohortIds = NULL, cohortNames = NULL, cohortTable = NULL, covariateSchema = NULL), useConceptSetFeatures = list(conceptSets = NULL, include = FALSE, type = "binary") )
analysisWindows |
An |
useBaseFeatures |
Named list of domain configurations, same structure
as in |
useCohortFeatures |
List specifying cohort-based feature extraction,
same structure as in |
useConceptSetFeatures |
List specifying concept-set-based feature
extraction, same structure as in |
An S3 object of class covariateSettings with an attribute
fun set to "getDbOcmCovariateData". This object can be
passed as the covariateSettings argument to
FeatureExtraction::getDbCovariateData(), or used standalone by
calling getDbOcmCovariateData directly.
getDbOcmCovariateData for the corresponding builder function,
planAnalysis for parameter documentation.
settings <- createOcmCovariateSettings( analysisWindows = defineAnalysisWindows( startDays = c(-365, -30), endDays = c(-1, -1) ), useBaseFeatures = list( drug_exposure = list(include = TRUE, atc = FALSE), condition_occurrence = list(include = TRUE, type = "start"), condition_era = list(include = FALSE), drug_era = list(include = FALSE), procedure_occurrence = list(include = FALSE), observation = list(include = FALSE), device_exposure = list(include = FALSE), visit_occurrence = list(include = FALSE), measurement = list(include = FALSE) ) )settings <- createOcmCovariateSettings( analysisWindows = defineAnalysisWindows( startDays = c(-365, -30), endDays = c(-1, -1) ), useBaseFeatures = list( drug_exposure = list(include = TRUE, atc = FALSE), condition_occurrence = list(include = TRUE, type = "start"), condition_era = list(include = FALSE), drug_era = list(include = FALSE), procedure_occurrence = list(include = FALSE), observation = list(include = FALSE), device_exposure = list(include = FALSE), visit_occurrence = list(include = FALSE), measurement = list(include = FALSE) ) )
Define analysis windows
defineAnalysisWindows( startDays = c(-365, -180, -30, 0, 1, 30, 180, 365), endDays = c(-1, -1, -1, 0, 30, 180, 365, 700), windowNames = NULL )defineAnalysisWindows( startDays = c(-365, -180, -30, 0, 1, 30, 180, 365), endDays = c(-1, -1, -1, 0, 30, 180, 365, 700), windowNames = NULL )
startDays |
Integer vector of start days relative to cohort start date. |
endDays |
Integer vector of end days relative to cohort start date. |
windowNames |
Optional character vector of names for each window. |
A list of analysisWindow objects.
Renders the SQL for a singleNodeSpec object, translates it to
the target DBMS dialect, executes it against an open
connect connection, and returns the
result set as a data.frame.
executeSpec(connection, spec, targetDialect = NULL, tempEmulationSchema = NULL)executeSpec(connection, spec, targetDialect = NULL, tempEmulationSchema = NULL)
connection |
A |
spec |
A |
targetDialect |
Character string. The target DBMS dialect for SQL
translation (e.g., |
tempEmulationSchema |
Character string or |
The function performs the following steps:
Renders all @placeholder tokens via
renderSpecSql.
Splits the rendered SQL into individual statements.
For aggregated specs, executes the INSERT statement(s) via
executeSql, then runs the final
SELECT (aggregation) via
querySql.
For non-aggregated specs, executes all statements via
executeSql, then queries the
temp table for rows matching the current analysis_id.
For aggregated specs: a data.frame with columns
cohort_definition_id, covariate_id,
covariate_name, concept_id, analysis_id,
sum_value, mean_value.
For non-aggregated specs: a data.frame with the raw
patient-level rows inserted into #domain_raw_results.
executeSpecs for batch execution of multiple specs,
renderSpecSql for SQL rendering,
singleNodeSetting for creating specs.
## Not run: conn <- DatabaseConnector::connect(dbms = "postgresql", server = "localhost/ohdsi", user = "user", password = "pass") result <- executeSpec(conn, specs[[1]]) head(result) DatabaseConnector::disconnect(conn) ## End(Not run)## Not run: conn <- DatabaseConnector::connect(dbms = "postgresql", server = "localhost/ohdsi", user = "user", password = "pass") result <- executeSpec(conn, specs[[1]]) head(result) DatabaseConnector::disconnect(conn) ## End(Not run)
Iterates over a singleNodeSettingList and executes each spec
sequentially against the database. All specs share the same
#domain_raw_results temp table across executions.
executeSpecs( connection, specs, targetDialect = NULL, tempEmulationSchema = NULL, cleanTempTables = FALSE, stopOnError = TRUE )executeSpecs( connection, specs, targetDialect = NULL, tempEmulationSchema = NULL, cleanTempTables = FALSE, stopOnError = TRUE )
connection |
A |
specs |
A |
targetDialect |
Character string or |
tempEmulationSchema |
Character string or |
cleanTempTables |
Logical. If |
stopOnError |
Logical. If |
The function:
Logs a summary header.
Calls executeSpec for each spec in order.
Drops the shared #domain_raw_results temp table after
all specs have been executed (cleanup).
Returns all results as a named list.
A named list of data.frame objects, one per spec.
Names are the analysis IDs (as character strings).
When stopOnError = FALSE, failed specs produce a
data.frame with zero rows and an "error" attribute
containing the error message.
executeSpec for single-spec execution,
singleNodeSetting for creating specs.
## Not run: conn <- DatabaseConnector::connect(dbms = "postgresql", server = "localhost/ohdsi", user = "user", password = "pass") results <- executeSpecs(conn, specs) lapply(results, head) DatabaseConnector::disconnect(conn) ## End(Not run)## Not run: conn <- DatabaseConnector::connect(dbms = "postgresql", server = "localhost/ohdsi", user = "user", password = "pass") results <- executeSpecs(conn, specs) lapply(results, head) DatabaseConnector::disconnect(conn) ## End(Not run)
Builder function that implements the FeatureExtraction custom covariate
builder interface. It executes the OdysseusCharacterizationModule pipeline
and returns a CovariateData object (an Andromeda object with
covariates, covariateRef, and analysisRef tables).
getDbOcmCovariateData( connection, tempEmulationSchema = NULL, cdmDatabaseSchema, cdmVersion = "5", cohortTable = "#cohort_person", cohortIds = c(-1), rowIdField = "subject_id", covariateSettings, aggregated = FALSE, minCharacterizationMean = 0, ... )getDbOcmCovariateData( connection, tempEmulationSchema = NULL, cdmDatabaseSchema, cdmVersion = "5", cohortTable = "#cohort_person", cohortIds = c(-1), rowIdField = "subject_id", covariateSettings, aggregated = FALSE, minCharacterizationMean = 0, ... )
connection |
A |
tempEmulationSchema |
Character or |
cdmDatabaseSchema |
Character. Schema containing the OMOP CDM tables. |
cdmVersion |
Character. OMOP CDM version ( |
cohortTable |
Character. Fully qualified name of the cohort table
(e.g., |
cohortIds |
Integer vector. Cohort definition IDs to extract
covariates for. Use |
rowIdField |
Character. Column name in the cohort table used as the
row identifier. Typically |
covariateSettings |
An object created by
|
aggregated |
Logical. Currently only |
minCharacterizationMean |
Numeric. Minimum mean value for filtering (currently unused; present for interface compatibility). |
... |
Additional arguments passed by
|
This function is normally not called directly. Instead, create a settings
object with createOcmCovariateSettings and pass it to
FeatureExtraction::getDbCovariateData().
A CovariateData object (Andromeda) with:
covariatesSparse table: rowId, covariateId,
covariateValue.
covariateRefReference: covariateId,
covariateName, analysisId, conceptId.
analysisRefReference: analysisId,
analysisName, domainId, startDay, endDay,
isBinary, missingMeansZero.
createOcmCovariateSettings for creating the settings object.
Creates a comprehensive characterization analysis plan for patient-level feature extraction from an OMOP Common Data Model (CDM) database. The plan defines time windows, base clinical features, cohort-based features, and concept-set-based features to be extracted relative to a target cohort's index date.
Creates a comprehensive characterization analysis plan for patient-level feature extraction from an OMOP Common Data Model (CDM) database. The plan defines time windows, base clinical features, cohort-based features, and concept-set-based features to be extracted relative to a target cohort's index date.
planAnalysis( analysisWindows = defineAnalysisWindows(startDays = c(-365, -180, -30, 0, 1, 30, 180, 365), endDays = c(-1, -1, -1, 0, 30, 180, 365, 700)), useBaseFeatures = list(drug_exposure = list(include = FALSE, atc = FALSE, atcLevels = c(1L, 2L, 3L, 4L, 5L)), condition_occurrence = list(include = FALSE, type = "start"), condition_era = list(include = FALSE, type = "start"), drug_era = list(include = FALSE, type = "start", atc = FALSE, atcLevels = c(5L)), procedure_occurrence = list(include = FALSE), observation = list(include = FALSE), device_exposure = list(include = FALSE), visit_occurrence = list(include = FALSE, type = "start"), measurement = list(include = FALSE)), useCohortFeatures = list(include = FALSE, type = "start", cohortIds = NULL, cohortNames = NULL, cohortTable = NULL, covariateSchema = NULL), useConceptSetFeatures = list(conceptSets = NULL, include = FALSE, type = "binary") ) planAnalysis( analysisWindows = defineAnalysisWindows(startDays = c(-365, -180, -30, 0, 1, 30, 180, 365), endDays = c(-1, -1, -1, 0, 30, 180, 365, 700)), useBaseFeatures = list(drug_exposure = list(include = FALSE, atc = FALSE, atcLevels = c(1L, 2L, 3L, 4L, 5L)), condition_occurrence = list(include = FALSE, type = "start"), condition_era = list(include = FALSE, type = "start"), drug_era = list(include = FALSE, type = "start", atc = FALSE, atcLevels = c(5L)), procedure_occurrence = list(include = FALSE), observation = list(include = FALSE), device_exposure = list(include = FALSE), visit_occurrence = list(include = FALSE, type = "start"), measurement = list(include = FALSE)), useCohortFeatures = list(include = FALSE, type = "start", cohortIds = NULL, cohortNames = NULL, cohortTable = NULL, covariateSchema = NULL), useConceptSetFeatures = list(conceptSets = NULL, include = FALSE, type = "binary") )planAnalysis( analysisWindows = defineAnalysisWindows(startDays = c(-365, -180, -30, 0, 1, 30, 180, 365), endDays = c(-1, -1, -1, 0, 30, 180, 365, 700)), useBaseFeatures = list(drug_exposure = list(include = FALSE, atc = FALSE, atcLevels = c(1L, 2L, 3L, 4L, 5L)), condition_occurrence = list(include = FALSE, type = "start"), condition_era = list(include = FALSE, type = "start"), drug_era = list(include = FALSE, type = "start", atc = FALSE, atcLevels = c(5L)), procedure_occurrence = list(include = FALSE), observation = list(include = FALSE), device_exposure = list(include = FALSE), visit_occurrence = list(include = FALSE, type = "start"), measurement = list(include = FALSE)), useCohortFeatures = list(include = FALSE, type = "start", cohortIds = NULL, cohortNames = NULL, cohortTable = NULL, covariateSchema = NULL), useConceptSetFeatures = list(conceptSets = NULL, include = FALSE, type = "binary") ) planAnalysis( analysisWindows = defineAnalysisWindows(startDays = c(-365, -180, -30, 0, 1, 30, 180, 365), endDays = c(-1, -1, -1, 0, 30, 180, 365, 700)), useBaseFeatures = list(drug_exposure = list(include = FALSE, atc = FALSE, atcLevels = c(1L, 2L, 3L, 4L, 5L)), condition_occurrence = list(include = FALSE, type = "start"), condition_era = list(include = FALSE, type = "start"), drug_era = list(include = FALSE, type = "start", atc = FALSE, atcLevels = c(5L)), procedure_occurrence = list(include = FALSE), observation = list(include = FALSE), device_exposure = list(include = FALSE), visit_occurrence = list(include = FALSE, type = "start"), measurement = list(include = FALSE)), useCohortFeatures = list(include = FALSE, type = "start", cohortIds = NULL, cohortNames = NULL, cohortTable = NULL, covariateSchema = NULL), useConceptSetFeatures = list(conceptSets = NULL, include = FALSE, type = "binary") )
analysisWindows |
An object of class |
useBaseFeatures |
A named list of domain configurations. Each element
name must correspond to a supported OMOP CDM table (e.g.,
|
useCohortFeatures |
A list specifying cohort-based feature extraction with the following components:
|
useConceptSetFeatures |
A list specifying concept-set-based feature extraction with the following components:
|
This function assembles a characterizationSettings object that serves as a
blueprint for downstream feature extraction. It supports three complementary
feature extraction strategies:
Standard OMOP CDM domain tables are used to construct binary or count-based covariates. Supported domains include:
drug_exposure / drug_era: Drug concepts, optionally rolled up to ATC hierarchy levels 1–5.
condition_occurrence / condition_era: Condition concepts with configurable temporal logic.
procedure_occurrence: Procedure concepts.
observation: Observation concepts.
device_exposure: Device concepts.
visit_occurrence: Visit concepts.
measurement: Measurement concepts.
Each domain accepts:
includeLogical. Whether to extract features from this domain.
typeCharacter. Temporal logic: "start" uses the record
start date; "overlap" uses era-style overlap with the time window.
Applicable to era tables and visit_occurrence.
atcLogical. Whether to roll up drug concepts to ATC hierarchy levels. Applicable to drug_exposure and drug_era only.
atcLevelsInteger vector. ATC hierarchy levels to include
(1–5). Applicable when atc = TRUE.
Pre-defined cohorts (stored in a cohort table) are used as binary covariates, indicating whether a patient belongs to each specified cohort within each time window.
User-defined concept sets (analogous to ATLAS concept sets) are used to create
targeted covariates. Each concept set specifies one or more concepts
(optionally including descendants) and the CDM tables to search. Output type
can be "binary" (presence/absence) or "counts" (frequency).
This function assembles a characterizationSettings object that serves as a
blueprint for downstream feature extraction. It supports three complementary
feature extraction strategies:
Standard OMOP CDM domain tables are used to construct binary or count-based covariates. Supported domains include:
drug_exposure / drug_era: Drug concepts, optionally rolled up to ATC hierarchy levels 1–5.
condition_occurrence / condition_era: Condition concepts with configurable temporal logic.
procedure_occurrence: Procedure concepts.
observation: Observation concepts.
device_exposure: Device concepts.
visit_occurrence: Visit concepts.
measurement: Measurement concepts.
Each domain accepts:
includeLogical. Whether to extract features from this domain.
typeCharacter. Temporal logic: "start" uses the record
start date; "overlap" uses era-style overlap with the time window.
Applicable to era tables and visit_occurrence.
atcLogical. Whether to roll up drug concepts to ATC hierarchy levels. Applicable to drug_exposure and drug_era only.
atcLevelsInteger vector. ATC hierarchy levels to include
(1–5). Applicable when atc = TRUE.
Pre-defined cohorts (stored in a cohort table) are used as binary covariates, indicating whether a patient belongs to each specified cohort within each time window.
User-defined concept sets (analogous to ATLAS concept sets) are used to create
targeted covariates. Each concept set specifies one or more concepts
(optionally including descendants) and the CDM tables to search. Output type
can be "binary" (presence/absence) or "counts" (frequency).
An S3 object of class characterizationSettings containing:
analysisWindowsThe validated analysis windows.
useBaseFeaturesThe validated base feature configuration.
useCohortFeaturesThe validated cohort feature configuration.
useConceptSetFeaturesThe validated concept set feature configuration.
An S3 object of class characterizationSettings containing:
analysisWindowsThe validated analysis windows.
useBaseFeaturesThe validated base feature configuration.
useCohortFeaturesThe validated cohort feature configuration.
useConceptSetFeaturesThe validated concept set feature configuration.
defineAnalysisWindows for creating time window definitions.
defineAnalysisWindows for creating time window definitions.
# Minimal plan with default settings plan <- planAnalysis() # Custom plan: conditions and drugs only, two time windows plan <- planAnalysis( analysisWindows = defineAnalysisWindows( startDays = c(-365, 1), endDays = c(-1, 365) ), useBaseFeatures = list( drug_exposure = list( include = TRUE, atc = TRUE, atcLevels = c(3L, 5L) ), condition_occurrence = list( include = TRUE, type = "start" ), condition_era = list(include = FALSE), drug_era = list(include = FALSE), procedure_occurrence = list(include = FALSE), observation = list(include = FALSE), device_exposure = list(include = FALSE), visit_occurrence = list(include = FALSE), measurement = list(include = FALSE) ), useCohortFeatures = list(include = FALSE), useConceptSetFeatures = list(include = FALSE) ) # Plan with cohort features plan <- planAnalysis( useCohortFeatures = list( include = TRUE, type = "start", cohortIds = c(101L, 102L, 103L), cohortNames = c("T2DM", "Hypertension", "CKD"), cohortTable = "my_cohort_table", covariateSchema = "results_schema" ) ) # Plan with custom concept sets plan <- planAnalysis( useConceptSetFeatures = list( conceptSets = list( diabetes = list( items = list( list(concept = list(CONCEPT_ID = 201820L), includeDescendants = TRUE) ), tables = c("condition_occurrence") ) ), include = TRUE, type = "counts" ) ) # Minimal plan with default settings plan <- planAnalysis() # Custom plan: conditions and drugs only, two time windows plan <- planAnalysis( analysisWindows = defineAnalysisWindows( startDays = c(-365, 1), endDays = c(-1, 365) ), useBaseFeatures = list( drug_exposure = list( include = TRUE, atc = TRUE, atcLevels = c(3L, 5L) ), condition_occurrence = list( include = TRUE, type = "start" ), condition_era = list(include = FALSE), drug_era = list(include = FALSE), procedure_occurrence = list(include = FALSE), observation = list(include = FALSE), device_exposure = list(include = FALSE), visit_occurrence = list(include = FALSE), measurement = list(include = FALSE) ), useCohortFeatures = list(include = FALSE), useConceptSetFeatures = list(include = FALSE) ) # Plan with cohort features plan <- planAnalysis( useCohortFeatures = list( include = TRUE, type = "start", cohortIds = c(101L, 102L, 103L), cohortNames = c("T2DM", "Hypertension", "CKD"), cohortTable = "my_cohort_table", covariateSchema = "results_schema" ) ) # Plan with custom concept sets plan <- planAnalysis( useConceptSetFeatures = list( conceptSets = list( diabetes = list( items = list( list(concept = list(CONCEPT_ID = 201820L), includeDescendants = TRUE) ), tables = c("condition_occurrence") ) ), include = TRUE, type = "counts" ) )# Minimal plan with default settings plan <- planAnalysis() # Custom plan: conditions and drugs only, two time windows plan <- planAnalysis( analysisWindows = defineAnalysisWindows( startDays = c(-365, 1), endDays = c(-1, 365) ), useBaseFeatures = list( drug_exposure = list( include = TRUE, atc = TRUE, atcLevels = c(3L, 5L) ), condition_occurrence = list( include = TRUE, type = "start" ), condition_era = list(include = FALSE), drug_era = list(include = FALSE), procedure_occurrence = list(include = FALSE), observation = list(include = FALSE), device_exposure = list(include = FALSE), visit_occurrence = list(include = FALSE), measurement = list(include = FALSE) ), useCohortFeatures = list(include = FALSE), useConceptSetFeatures = list(include = FALSE) ) # Plan with cohort features plan <- planAnalysis( useCohortFeatures = list( include = TRUE, type = "start", cohortIds = c(101L, 102L, 103L), cohortNames = c("T2DM", "Hypertension", "CKD"), cohortTable = "my_cohort_table", covariateSchema = "results_schema" ) ) # Plan with custom concept sets plan <- planAnalysis( useConceptSetFeatures = list( conceptSets = list( diabetes = list( items = list( list(concept = list(CONCEPT_ID = 201820L), includeDescendants = TRUE) ), tables = c("condition_occurrence") ) ), include = TRUE, type = "counts" ) ) # Minimal plan with default settings plan <- planAnalysis() # Custom plan: conditions and drugs only, two time windows plan <- planAnalysis( analysisWindows = defineAnalysisWindows( startDays = c(-365, 1), endDays = c(-1, 365) ), useBaseFeatures = list( drug_exposure = list( include = TRUE, atc = TRUE, atcLevels = c(3L, 5L) ), condition_occurrence = list( include = TRUE, type = "start" ), condition_era = list(include = FALSE), drug_era = list(include = FALSE), procedure_occurrence = list(include = FALSE), observation = list(include = FALSE), device_exposure = list(include = FALSE), visit_occurrence = list(include = FALSE), measurement = list(include = FALSE) ), useCohortFeatures = list(include = FALSE), useConceptSetFeatures = list(include = FALSE) ) # Plan with cohort features plan <- planAnalysis( useCohortFeatures = list( include = TRUE, type = "start", cohortIds = c(101L, 102L, 103L), cohortNames = c("T2DM", "Hypertension", "CKD"), cohortTable = "my_cohort_table", covariateSchema = "results_schema" ) ) # Plan with custom concept sets plan <- planAnalysis( useConceptSetFeatures = list( conceptSets = list( diabetes = list( items = list( list(concept = list(CONCEPT_ID = 201820L), includeDescendants = TRUE) ), tables = c("condition_occurrence") ) ), include = TRUE, type = "counts" ) )
Prints a human-readable summary of a characterizationSettings object.
Prints a human-readable summary of a characterizationSettings object.
## S3 method for class 'characterizationSettings' print(x, ...) ## S3 method for class 'characterizationSettings' print(x, ...)## S3 method for class 'characterizationSettings' print(x, ...) ## S3 method for class 'characterizationSettings' print(x, ...)
x |
A |
... |
Additional arguments (ignored). |
Invisibly returns x.
Invisibly returns x.
Print Single Node Setting List
## S3 method for class 'singleNodeSettingList' print(x, ...)## S3 method for class 'singleNodeSettingList' print(x, ...)
x |
A |
... |
Additional arguments (ignored). |
Invisibly returns x.
Print Single Node Spec
## S3 method for class 'singleNodeSpec' print(x, ...)## S3 method for class 'singleNodeSpec' print(x, ...)
x |
A |
... |
Additional arguments (ignored). |
Invisibly returns x.
Convenience wrapper that calls renderSpecSql on every
element of a singleNodeSettingList.
renderAllSpecSql(specs, targetDialect = NULL, tempEmulationSchema = NULL)renderAllSpecSql(specs, targetDialect = NULL, tempEmulationSchema = NULL)
specs |
A |
targetDialect |
Character or |
tempEmulationSchema |
Character or |
A named character vector of rendered SQL statements, one per spec. Names are the analysis IDs (as character).
Takes a singleNodeSpec object whose sql field contains
a parameterised SQL template and resolves every @placeholder
using the spec's own fields.
renderSpecSql(spec, targetDialect = NULL, tempEmulationSchema = NULL)renderSpecSql(spec, targetDialect = NULL, tempEmulationSchema = NULL)
spec |
A |
targetDialect |
Character string (optional).
When supplied, the rendered SQL is additionally translated to the
target DBMS dialect via |
tempEmulationSchema |
Character string or |
A single character string of executable SQL.
Translates a characterizationSettings object (as returned by
planAnalysis) into a list of executable SQL-based analysis
specifications. Each specification ("node") pairs a feature domain, a time
window, and the appropriate SQL template with all placeholders resolved.
singleNodeSetting( plan, cohortId, cohortDatabaseSchema, cohortTable, cdmDatabaseSchema, vocabularyDatabaseSchema = cdmDatabaseSchema, aggregated = TRUE, rowIdField = "subject_id" )singleNodeSetting( plan, cohortId, cohortDatabaseSchema, cohortTable, cdmDatabaseSchema, vocabularyDatabaseSchema = cdmDatabaseSchema, aggregated = TRUE, rowIdField = "subject_id" )
plan |
A |
cohortId |
Integer scalar. The target cohort definition ID. |
cohortDatabaseSchema |
Character scalar. Schema containing the target cohort table. |
cohortTable |
Character scalar. Name of the target cohort table. |
cdmDatabaseSchema |
Character scalar. Schema containing the OMOP CDM tables. |
vocabularyDatabaseSchema |
Character scalar. Schema containing the OMOP
vocabulary tables. Used for concept name lookups in aggregated output.
Defaults to |
aggregated |
Logical scalar. If |
rowIdField |
Character scalar. Name of the column in the cohort table
to use as the row identifier in the output. Defaults to
|
This function iterates over every enabled domain in useBaseFeatures,
useCohortFeatures, and useConceptSetFeatures, crosses each with
every analysis window, and produces a fully parameterised run specification.
The returned list can be passed directly to an execution engine that renders
and translates the SQL via SqlRender.
Each specification receives a unique analysisId constructed as:
domainIndex * 1000 + windowIndex, ensuring stable, reproducible
identifiers across runs.
A list of S3 objects of class singleNodeSpec. Each element
contains:
analysisIdInteger. Unique analysis identifier.
analysisNameCharacter. Human-readable analysis label.
domainTableCharacter. CDM table name.
conceptIdColCharacter. Concept ID column in the domain table.
dateColCharacter. Start date column.
dateColEndCharacter or NULL. End date column (for
overlap logic).
startDayInteger. Window start day relative to index.
endDayInteger. Window end day relative to index.
typeCharacter. Temporal logic ("start" or
"overlap").
overlapLogical. Whether overlap logic is used.
atcLogical. Whether ATC roll-up is applied.
atcLevelsInteger vector or NULL. ATC levels.
conceptSetLogical. Whether a concept set filter is applied.
conceptSetItemsList or NULL. Concept set items.
aggregatedLogical. Aggregation flag.
cohortIdInteger. Target cohort ID.
cohortDatabaseSchemaCharacter. Cohort schema.
cohortTableCharacter. Cohort table name.
cdmDatabaseSchemaCharacter. CDM schema.
sqlCharacter. Parameterised SQL template.
sourceCharacter. Origin: "base", "cohort",
or "conceptSet".
planAnalysis for creating the analysis plan.
plan <- planAnalysis( useBaseFeatures = list( condition_occurrence = list(include = TRUE, type = "start"), drug_exposure = list(include = FALSE), condition_era = list(include = FALSE), drug_era = list(include = FALSE), procedure_occurrence = list(include = FALSE), observation = list(include = FALSE), device_exposure = list(include = FALSE), visit_occurrence = list(include = FALSE), measurement = list(include = FALSE) ), useCohortFeatures = list(include = FALSE), useConceptSetFeatures = list(include = FALSE) ) specs <- singleNodeSetting( plan = plan, cohortId = 1L, cohortDatabaseSchema = "results", cohortTable = "cohort", cdmDatabaseSchema = "cdm" )plan <- planAnalysis( useBaseFeatures = list( condition_occurrence = list(include = TRUE, type = "start"), drug_exposure = list(include = FALSE), condition_era = list(include = FALSE), drug_era = list(include = FALSE), procedure_occurrence = list(include = FALSE), observation = list(include = FALSE), device_exposure = list(include = FALSE), visit_occurrence = list(include = FALSE), measurement = list(include = FALSE) ), useCohortFeatures = list(include = FALSE), useConceptSetFeatures = list(include = FALSE) ) specs <- singleNodeSetting( plan = plan, cohortId = 1L, cohortDatabaseSchema = "results", cohortTable = "cohort", cdmDatabaseSchema = "cdm" )