Code Editor & MEDomics Terminal
This page introduces the first steps of our proof of concept. It allows users to edit the code responsible for generating the homr_any_visit.csv dataset, which will be used in the following steps.
Create your Workspace
Create the Extraction File
import pandas as pd
# Reproducibility seed
SEED = 54288
# Patient ID to extract
PATIENT_ID = 16
# Change this to your dataset path before running the script
path = "dataset.csv"
# 1) Load data
df = pd.read_csv(path)
# 2) Build homr_any_visit: exactly one visit selected randomly per patient
homr_any_visit = (
df.groupby("patient_id", group_keys=False)
.sample(n=1, random_state=SEED)
.reset_index(drop=True)
)
# 3) Extract the selected patient from homr_any_visit
patient_df = homr_any_visit[homr_any_visit["patient_id"] == PATIENT_ID]
patient_df.to_csv(f"patient_{PATIENT_ID}.csv", index=False)
# 4) Remove this patient from homr_any_visit
homr_any_visit = homr_any_visit[homr_any_visit["patient_id"] != PATIENT_ID]
# 5) Save final homr_any_visit dataset
homr_any_visit.to_csv("homr_any_visit.csv", index=False)
# 5bis) Extract 1/10 of the final dataset
homr_any_visit_10pct = homr_any_visit.sample(frac=0.1, random_state=SEED)
homr_any_visit_10pct.to_csv("homr_any_visit_10pct.csv", index=False)
# 6) Log summary
print(
f"homr_any_visit.csv saved with "
f"{homr_any_visit['patient_id'].nunique()} unique patients "
f"(rows={len(homr_any_visit)}), seed={SEED}\n"
f"Patient {PATIENT_ID} extracted to patient_{PATIENT_ID}.csv\n"
f"10% subset saved to homr_any_visit_10pct.csv "
f"(rows={len(homr_any_visit_10pct)})"
)
Visualize and Edit the Extraction File


Run the Extraction File


Last updated