Analysis

This page explains the Analysis Box functionality, including the Analysis Mode, and how it enables comprehensive model performance evaluation through detailed metrics and diagnostic tools.

The Analysis Box

The Analysis Box (Figure below) is the final component in the learning module pipeline, positioned immediately after the Training Box. It serves as the dedicated component for model evaluation, accepting inputs from:

Key Characteristics:

  1. Node-Free Design: Unlike other boxes, this is a preconfigured analysis terminal that cannot contain additional nodes.

  2. PyCaret Integration: Implements plot_model() with the following parameter controls:

    • Plot Metric (plot parameter):

      • Sets the evaluation visualization type (default: 'auc')

      • Options include: confusion matrix, feature importance, ROC curve, etc.

    • Scale (scale parameter):

      • Adjusts output figure resolution (range: 0-1)

      • Higher values increase image quality and file size

The analysis box

The Analysis Box represents the "Analysis" section of the machine learning workflow:

The Analysis Mode

If you prefer a quick summary, jump to the following section.

The Analysis Mode becomes available after a successful experiment execution. When activated, a results panel appears at the bottom of the interface, displaying results for all pipelines in the current scene. This mode provides a detailed breakdown of results organized by pipeline and node.

Pipeline Results Structure:

Each pipeline, identifiable by its customizable name, presents results through the following node-specific information:

Results for Dataset node
  • Clean Node: Shows the preprocessing parameters configured in PyCaret's setup.

Results for Clean node
  • Split Node: Presents detailed split statistics, including sample counts per fold/iteration and class distribution metrics.

Results for Split node
  • Model Node: Contains the complete set of performance metrics for the model.

Results for Model node
  • Combine Models Node: Provides evaluation metrics for the combined model output.

Results for Combine Models node
  • Analysis Node: Displays the plot selected in the Analysis Box.

Results for Analysis node

Finalize & Save Model

This feature, used through the button 'Finalize & Save Model' for a selected pipeline, performs two critical functions through PyCaret integration:

  1. Model Finalization: Retrains the selected model on the complete dataset using PyCaret's finalize_model() function.

  2. Model Saving: Saves the finalized model as a pickle file via PyCaret's save_model() function. The saved model appears in the experiment's models subfolder using the model's classname or the Model node's ID if it has been changed from the default one ('Model').

How to finalize and save a model in a given pipeline

The process requires no parameter configuration, automatically preserving all training parameters from the original experiment.

Additionally, this button represents the "Final Model" section of the machine learning workflow, as shown in the following figure:

The Generate Feature

The Generate functionality exports the complete pipeline configuration as executable Python code in Jupyter Notebook format. You can generate a Jupyter Notebook using the Generate button next to a selected pipeline. The generated notebook, which mirrors the pipeline's structure and parameters, appears in the experiment's notebooks subfolder using the pipeline's current name as its file identifier.

How to generate a Notebook for a given pipeline

This feature enables:

  • Deeper investigation of the training process

  • Custom code modifications for performance optimization

  • Enhanced reproducibility

Additionally, you can also launch any generated notebook directly from the application by simply double-clicking the file. Conversely, you can right-click and select the "Open in..." option to open your notebook in VSCode.

Different options to open a generated Jupyter Notebook

An example of a generated notebook, opened in VS Code, is shown below.

Example of a generated Jupyter Notebook

Manage Pipelines

The Manage Pipelines interface serves two primary purposes:

  1. Pipeline Overview: Displays a structured summary of all nodes comprising each pipeline and their connections.

  2. Naming Control: Allows pipeline renaming, which simultaneously updates:

    • The notebook filename in the Generate feature

    • All experiment tracking references

How to access and use the Manage Pipelines panel

The node's selection box

In both Analysis and Results modes, a checkbox is available at the top of each runnable node. Use this control to selectively display results for specific nodes, hiding the output of others. A green checkbox indicates that the node is a mandatory component of all pipelines; consequently, its results will always be displayed.

Selection box
Independent selection box

In the following example, only the results of the checked node Clean2 are displayed, while the other pipelines are hidden.

Illustration of the node's checkbox usage

The highlighting feature

This feature enhances navigation in both Analysis and Results modes by dynamically applying distinct color codes to selected nodes and pipelines. It highlights the entire execution path of a chosen pipeline, making it easy to distinguish from others. The system uses the following color scheme to indicate status:

  • Orange: Used for non-executed nodes and the connecting edges of a non-executed pipeline.

  • Green: Indicates a selected and successfully executed node.

  • Blue: Highlights all elements (nodes and edges) of the currently selected pipeline.

This functionality is particularly valuable in complex scenes with multiple pipelines, as it simplifies the process of tracking and comparing results. The following figure illustrates these color codes in the context of different user interactions.

Illustration of the highlight feature for different interactions

Summary of the Analysis Mode

A full breakdown of the Analysis Mode is presented in the following figure:

Breakdown of the Analysis Mode

On the next page, you will learn more about the new scene type 'Experimental' and how you can use it as a testing environment for your machine learning experiments.

Last updated