Input Module

This page documents the Input Module step of the demo, where we will perform the last processing steps on our PARIS data before using in the Learning Module.

The Input Module provides multiple data processing key tools needed to fulfill various tasks within the MEDomics platform. In this Proof of Concept (PoC), we will use it for two main tasks: the deletion of associated features and the creation of a Holdout set.

Columns deletion

As we have seen in the previous step, multiple variables within our data are highly associated and must be removed. To do so, we will use the Drop Columns Tools, which enables the deletion of multiple columns at once. First, open the Input Module, select your target CSV (PARIS_ML.csv), then scroll down to the Drop Columns tool. Next, select the following columns to be deleted:

ActivitiesPain7
DiscussionHealthcareProfessionals
RentMortgage12
HealthcareInvolvement
HealthcareConsideration
ComplexityHealthIssues

Once selected, choose a new name for the final set, then hit Create new dataset. All these steps are laid out in the figure below:

Holdout set creation

After cleaning our dataset, the final step is to split our data into a learning set and a holdout set. For this task, we will use the Holdout Set Creation Tools. After selecting our final CSV (PARIS_FINAL.csv), keep the split percentage at 20%, "drop" as the empty cells cleaning method (feel free to test other options) and PARIS_ML as the new CSV name. Then hit the plus icon. This will create two new CSV datasets: Holdout_PARIS_ML.csv and Learning_PARIS_ML.csv. These steps are illustrated in the figure below:

With the creation of the Holdout and the Learning sets, we conclude our Input Module step, and we can now start the machine learning phase.

Extra: Other use cases

Another key tool you should try before the machine learning step is the subset creation tool. This tool can be used to create new data or overwrite existing data based on different conditions. For example, it can be utilized to remove rows where the machine learning target variable is null or undefined.

After that, you can overwrite the current dataset or create a new filtered one under a new name.

This concludes the third step of this PoC. Now our data is ready to tackle the machine learning prediction problem!

PreviousExploratory Module NextLearning Module

Last updated 12 days ago

hashtagColumns deletion

hashtagHoldout set creation

hashtagExtra: Other use cases

Columns deletion

Holdout set creation

Extra: Other use cases