Input Module
This page documents the Input Module step of the demo, where we will perform two processing steps on our new "homr_any_visit_10pct.csv" file before model training in the Learning Module.
The Input Module provides multiple data processing key tools needed to fulfill various tasks within the MEDomics platform. In this proof of concept, we will use two tools from the Input Module : the Column Tagging Tools and the Holdout Set Creation Tools. We will also use the MEDomics editor to delete a column.
The MEDomics Editor refers to the dedicated workspace within our platform where users can visualize datasets, track and review applied transformations, and interactively edit the data.
Column Deletion
Before moving into the Input Module, we have to delete a column from our dataset file named "CSO". This column is not used in the original study and therefore we don't need to keep it.
Double click on the homr_any_visit_10pct.csv file in your workspace to open it in the MEDomics editor. Then, click on the bin above the "CSO" column to delete it.

Column Tagging
This tool is a core component of the MEDomics platform, as it enables the use of the MEDomics Standard data format.
Follow the steps on the figure below to access the Column Tagging tool in the Input Module.

In MEDomics, a tag represents a group of columns. Each tag corresponds to a coherent subset of features sharing a common meaning or role (e.g., administrative data, demographic variables, clinical diagnoses). Column selection for tags is defined by the user based on data understanding and domain knowledge.
The MEDomics Standard format is built on this tagging mechanism. Rather than relying on a fixed dataset schema, MEDomics allows users to define multiple semantic views over the same dataset through tags. This design provides flexibility while preserving consistency and traceability.
Dataset Structure
The predictors in our dataset include:
Demographics (age and sex at birth) β 2 variables
Admission characteristics β 10 variables
Comorbidity diagnoses β 85 binary variables
Admission diagnoses β 147 binary variables
This results in a total of 244 predictors.
Predictor Sets in the POYM Study
The POYM study defines two predictor sets for model training and evaluation:
AdmDemo β Adm (Admission characteristics) + Demo (Demographics)
AdmDemoDx β Adm (Admission characteristics) + Demo (Demographics) + Dx (Comorbidity diagnoses + Admission diagnoses)
For this proof of concept, we represent these predictor sets using three tags:
Adm β Admission characteristics (10 variables)
Demo β Demographics (2 variables: age_original, gender)
Dx β Comorbidity diagnoses (85) + Admission diagnoses (147)
To assign tags to variables:
Open the Input Module from the left navigation panel.
Under Data Organization, select Structuring & Tagging.
Click on Column Tagging Tools.
This tool allows you to assign the appropriate tag (adm, demo, or dx) to each variable according to the study definition.
Variable Mapping by Tag
Adm
Admission characteristics :
ed_visit_countho_ambulance_counttotal_durationflu_seasonliving_statusadmission_groupis_ambulanceis_icu_start_hois_urg_readmservice_group
Simply copy paste the following code line into the tagging tool:
10
Demo
Demographics :
age_originalgender
Simply copy paste the following code line into the tagging tool:
2
Dx
Comorbidity diagnoses + Admission diagnoses (the rest of the columns) Simply copy paste the following code line into the tagging tool:
232
The columns "patient_id", "visit_id" and "oym" should not be assigned to any tag.
The figure below illustrates the process of assigning tags to dataset columns using the Column Tagging Tools.
Select the dataset (
homr_any_visit_10pct.csv).Create the three required tags:
adm,demo, anddxby entering their names one after the other and hitting enter.Copy-paste the column names corresponding to each table from the table above.
Choose the appropriate tag to apply.
Click Apply tags to validate the configuration.
The third step presents two alternative ways to assign columns to their corresponding tags.
This can be done either by:
Pasting the column names manually, or
Selecting the columns directly from the displayed dataset.
In this example, the variables age_original and gender are assigned to the demo tag.

You can visualize the tags within the dataset in the MEDomics editor.
If the dataset is already open in the MEDomics Editor, please close it and reopen it to update the view and ensure that the newly assigned tags are properly displayed.
Holdout set creation
After creating our tags, the final step is to split our data into a learning set and a holdout set.
For this task, we will use the Holdout Set Creation Tools. To access this tool, select Sampling under the Data Wrangling section in the Input Module.

After selecting the dataset (homr_any_visit_10pct.csv):
Enable Shuffle and Stratify.
Select
oymas the target column.Set the split percentage to 20%.
Choose "drop" as the empty cells cleaning method.
Activate the Keep tags toggle.
Click the Save icon to create the Learning and Holdout sets.
These steps are illustrated in the figure below.

This will create two new CSV datasets: Holdout_homr_any_visit_10pct.csv and Learning_homr_any_visit_10pct.csv.
With the creation of the Holdout and the Learning sets, we conclude our Input Module steps, and we can now start the machine learning phase.
This step ensures that the dataset is properly prepared for the demo and ready to be used in a complete end-to-end workflow within MEDomics, including the Learning, Evaluation and Application modules. In the next section, we will use
homr_any_visit_10pct.csvdataset (with the applied tags preserved, of course!) to run machine learning experiments and replicate the POYM study.
This concludes our Input Module section. Now our data is ready for model training!
Last updated