Share via


Customize and extend OMOP mappings in healthcare data solutions

This article explains how to customize the default OMOP (Observational Medical Outcomes Partnership) mappings in healthcare data solutions and extend the OMOP transformations capability. The data transformation toolkit (DTT) libraries, designed exclusively for use within the OMOP extensibility framework, help you modify mappings from the silver to gold lakehouse and adjust the default gold lakehouse tables. This transformation can include scenarios such as adding new tables or columns to the OMOP lakehouse. This functionality is currently in preview, and the DTT libraries can't be used outside this specified scope.

Prerequisites

Understand how the library works

The healthcare#_msft_omop_silver_gold_transformation notebook activity runs the DTT library. For each target table, the library queries the most recent SourceModifiedOn date for each SourceTable. This date is then used to filter the corresponding SourceTable. Here, the SourceTable serves as the source query, where a single query can combine multiple tables.

The library also performs source-to-target key mapping to integer values. When you provide a key mapping (such as a Patient FHIR ID to person.person_id), the library checks whether the source FHIR ID already has an assigned target ID. If the assignment exists, it retrieves the existing target ID. Otherwise, it assigns a new target ID. The library uses the same mapping tables to map both primary and foreign keys. For example, when observation.subject is mapped to measurement.person_id, the same ID is assigned to the foreign key measurement.person_id as the patient.id mapped to person.person_id. The library uses the same target ID in both situations. Since delta lakes don't enforce referential integrity, the measurements created might reference person_id values that don't exist until you process the patient records. Using OneLake file explorer, you can view these key mappings under the following folder path: healthcare#.HealthDataManager\DMHCheckpoint\dtt\dtt_state_db\KEY_MAPPING.

Supported scenarios

  • Table additions to the OMOP gold lakehouse: Add custom tables with a prefix to avoid conflicts with the default healthcare data solutions OMOP table names.
  • Column additions to OMOP tables in the OMOP lakehouse: Use a prefix with the columns to avoid naming conflicts.
  • Updates to existing mapping files: If the default healthcare data solutions mapping files are updated in the future, monitor and manually migrate the changes.
  • Table maintenance for managing and optimizing data storage and performance.

Unsupported scenarios

  • Deleting existing OMOP tables
  • Removing columns from existing OMOP tables

OMOP mapping configuration files

The following sections provide details about the OMOP mapping configuration files and the settings you can customize.

Important

The mapping files include the dmfAdapterSchema.json, dbSemanticsConfig.json, and dbSemantics.json files. Don't customize or edit them.

dbTargetSchema.json

This file defines the OMOP schema. By default, the healthcare data solutions environment creates all the tables during deployment. If any tables are missing, the library generates them based on the schema in this file, provided the source values exist. Customize this file if you need to add a column to an existing OMOP table or add a new table to the OMOP lakehouse.

Here are few key details about the properties in the table object collection:

Property Details
columns Define the columns here, and specify the column name and data type. Refer to the examples in the existing OMOP file to copy the values and adjust them accordingly.
name The name of the table.
properties.path The values must be in the format customtable.cdm.json/customtable.
properties.primaryKeys The primary key for the table.
relationships Defines any relationship or foreign keys present in the table. For example, if you add a foreign key to provider on the person table, you must add the following value in the person table's relationship property: {\"joinPairs\": [{\"fromAttribute\": \"provider_id \", \"toAttribute\": \"provider_id\"}], \"fromEntity\": \"person.cdm.json/person\", \"toEntity\": \"provider.cdm.json/provider\"}. Here:
fromAttribute refers to a column in the person table
fromEntity refers to the person table itself,
toAttribute refers to a column in the provider table
toEntity refers to the provider table itself.

dmfAdapter.json

This file is the main mapping file from the source to target. If you only need to update existing mappings, do so without editing any of the other files. Look for the mapping you want to edit in the sourceFields collection and edit the value.

If you're querying a new source table, make sure the table is added to the queryTables collection. Then, either add a new SourceTable or edit an existing SourceTable. The following sections explain key details of the properties in the sourceTables collection.

tableName

This property is the name of the source query. The value populates on the target in the SourceTable column.

query

This property is the query against the source table that executes and supports all the functions and expressions supported by Spark SQL. The tableName is the name of this query when executed by the library. The value also shows up in the target table row in the SourceTable column.

modifiedOnField

Use this field in the source query to filter by date and ensure that you only retrieve new records. The property value is typically set to the msftModifiedDateTime value.

targetAnchorTables

These fields are the tables to which the source query writes the information.

sourceFields.fieldName

This value is either the fieldName returned by the query, or an alias if you used fieldCalculatedValue.

sourceFields.fieldCalculatedValue

This property functions like a withColumn operation applied to the resulting query dataframe. It supports all Spark SQL functions and expressions. When you use fieldCalculatedValue, the fieldname serves as the alias for this column.

sourceFields.fieldType

This property is the sourceField data type.

sourceFields.isPrimaryKey

This property specifies which source field is considered as the primary key of the source query. Make sure this value is unique.

sourceFields.targetFields

Use this field to specify which target field to write the information to. Following are the key properties of this field:

  • tableName: Required value; specifies which table is being written to.

  • fieldName: Required value; specifies which field is being written to.

  • condition: Optional value. If you define this value on the target table primary key, it dictates whether the whole record is written to the target table. If you define it on mappings to other files (and not the target table primary key), it dictates whether transformation happens on this field. See the following example in the Patient to Person mapping:

    {
       "fieldName": "deceasedDateTime",
       "fieldType": "datetime",
       "enabled": true,
       "targetFields": {
         "fields": [
           {
             "tableName": "death",
             "fieldName": "death_datetime",
             "condition": "isnotnull(deceasedDateTime)"
           }
         ]
       }
    }
    
  • targetField: Optional value. If the mapping is to a relationship, then the tableName and fieldName are the parent table and fields and targetField is the actual target field that populates. See the following example of gender standard field mapping from Patient to Person:

    {
       "fieldName": "gender_standard",
       "fieldCalculatedValue": "concat_ws('<->', gender, 'http://hl7.org/fhir/ValueSet/administrative-gender')",
       "fieldType": "string",
       "enabled": true,
       "targetFields": {
         "fields": [
           {
             "tableName": "concept",
             "fieldName": "concept_id",
             "targetField": "gender_concept_id"
           }
         ]
       }
    }
    

dbTargetSchemaConfig.json

If you add new tables to the OMOP lakehouse, update the tables array under the fields section to include the new tables. See the following example:

{
    "fields": [
        {
            "name": "SourceModifiedOn",
            "description": "Used for comparing with source system records to determine when a record was changed in the source.",
            "tables": [
                "person", "observation_period", "visit_occurrence", "visit_detail",
                "condition_occurrence", "drug_exposure", "procedure_occurrence",
                "device_exposure", "measurement", "observation", "death", "note",
                "specimen", "location", "provider", "care_site", "note_nlp",
                "image_occurrence"
            ],
            "type": "timestamp",
            "timestampFormat": "yyyy-MM-ddTHH:mm:ssZ",
            "enabled": true
        },
        {
            "name": "SourceTable",
            "description": "Used for comparing with source system records to determine when a record was changed in the source.",
            "tables": [
                "person", "observation_period", "visit_occurrence", "visit_detail",
                "condition_occurrence", "drug_exposure", "procedure_occurrence",
                "device_exposure", "measurement", "observation", "death", "note",
                "specimen", "location", "provider", "care_site", "note_nlp",
                "image_occurrence"
            ],
            "type": "string",
            "enabled": true
        }
    ]
}

Extend existing OMOP mappings

To modify existing OMOP mappings, follow these steps:

  1. Use OneLake file explorer or Azure Storage Explorer to copy the existing configuration files from the healthcare#.HealthDataManager\DMHConfiguration_internal\fhir4\transformation\omop folder to the OMOP lakehouse files folder healthcare#_msft_gold_omop.Lakehouse/Files/extend_omop_config. If needed, you can store these files in a different lakehouse.

  2. Update dmfAdapter.json with the mapping updates. These changes can include adjusting the FHIR to OMOP mappings, adding new mappings, or parsing extensions to map to OMOP.

  3. In the admin lakehouse files, update the deploymentParametersConfiguration.json file to set the omop_config_path parameter to the new path from step 1.

For more information and examples, see OMOP transformations: Custom mapping examples.