medmodel
What is a MEDMODEL Object?
A .medmodel file is a custom extension used within the MEDomics platform to represent serialized and saved machine learning models generated from the platformβs analytical scenes.
This object serves as a comprehensive container for all essential elements related to a trained model, including the model architecture, training parameters, preprocessing pipeline, selected features, and metadata.
Its purpose is to ensure traceability, reproducibility, and sharing across different MEDomics modules or institutions, allowing seamless deployment, evaluation, and opening the doors to collaboration.
Structure of a MEDMODEL Object
Each MEDMODEL object is composed of two main components:
1. Serialized Scikit-learn Pipeline
Storing preprocessing steps within the pipeline ensures that input data is processed consistently between training and inference, eliminating discrepancies in data handling.
The core of the MEDMODEL is the Scikit-learn Pipeline that encapsulates the entire machine learning workflow (see example below), including:
Preprocessing steps: Normalization, feature scaling, missing-value imputation, categorical encoding, etc.
Feature selection and transformation: Any dimensionality reduction or feature engineering steps applied before model fitting.
Trained estimator: The final classifier or regressor model trained on the selected data (e.g., XGBoost, RandomForest, Logistic Regression).

Storage Details
If the serialized pipeline file (pickle format) is β€ 16 MB, it is stored directly in MongoDB.
If it exceeds 16 MB, it is stored locally on the server, and the MEDMODEL entry in MongoDB references the absolute file path.
2. Model Metadata Dictionary
A companion dictionary holds detailed information describing the model, its inputs, and training context. This metadata ensures reproducibility and facilitates understanding of the modelβs provenance and purpose.
The key metadata fields include:
model_variablesβ The final list of dataset columns (features) used during training.target_variableβ The dependent variable the model predicts.ml_typeβ Specifies whether the model is for classification or regression.
The following diagram summarizes the relationship between MEDMODEL components:
The following figure summarizes the creation process of a MEDMODEL object:

Last updated