Configuration File

The communication between the user and the toolbox is performed through a configuration file containing a list of tunable system parameters. This file is implemented in YAML, a simple and concise language that maps easily into native data structures. Its comprehensibility makes it accessible to developers and non-developers and facilitates tracking experiment changes over time.

1. Defining the Task and Path

name: 'AIDE'
# Addressed task, choices: Classification, OutlierDetection, ImpactAssessment
task: ...
# Use a previously saved model to skip the train phase (True/False)
from_scratch: ...
# Path to the best model, required if from_scratch: False
best_run_path: ''
# Directory to save model outputs and results
save_path: "experiments/"

2. Defining the Dataset

Pointer to the Dataset class. This section can be customized by adding more variables. See Section Database for more details on how to create your own Dataset class.

# Database and DataLoader definition
data:
    name: ... # Dataset class name
    data_dim: ...  # Data dimension
    input_size: ...  # Number of features
    features:  # Name of the features of the database
    features_selected: ...  # Features selected from the whole set of features
    num_classes: ...  # Number of categories in the database (drought, non-drought, e.g.)
    lon_slice_test: ...  # If visualization 2D enabled, min/max longitude coordinates (test)
    lat_slice_test: ...  # If visualization 2D enabled, min/max latitude coordinates (test)

3. Defining the Model

To specify the architecture to train, use the parameter type. This can be a user-defined model or a model available in the toolbox (see Section Available models for more details).

# Architecture definition
arch:
    # Select a user-defined model (true/false)
    user_defined: ...
    # Type of architecture to be used (e.g., 'UNET')
    type: ...
    # Parameters to configure the architecture
    params:
        param_1: ...
    # Model input dimension (1: 1D, 2: 2D)
    input_model_dim: ...
    # Model output dimension (1: 1D, 2: 2D)
    output_model_dim: ...

4. Defining the Training

This part of the configuration file allows for specifying the parameters of:

  • loss function: Can be either custom or from a Python package. To choose from a Python package, set user_defined: False and specify loss name and package (e.g. type: 'sigmoid_focal_loss' and package: 'torchvision.ops' ). For custom losses, see section Custom Loss in Advanced features

  • optimizer: Defines the parameters to initialize the optimizer. type can be any of torch.optim.

  • trainer: Defines the parameters to initialize the Pytorch Lightning trainer.

  • dataloader: Defines the number of workers for the Pytorch Lightning dataloader.

# Definition of the training stage
implementation:
    # Loss function
    loss:
        user_defined: ... # Select user-defined model (true/false)
        type: ...  # Python class name
        package: ...  # Python package, none for user defined
        activation:
            type: ... # Activation before computing the loss function
        masked: ...  # Use masks to compute loss
        # Parameters for the loss function
        params:
            reduction: 'none'
            param_1: ...
    # Definition of the optimizer
    optimizer:
        type: ...  # Optimizer type
        lr: ...  # Learning rate
        weight_decay: ...  # Weight decay
        gclip_value: ...  # Gradient clipping values
    # Definition of PyTorch trainer
    trainer:
        accelerator: ...  # Choices: gpu/cpu
        devices: ... #
        epochs: ...  # Number of epochs
        batch_size: ...  # Batch size
        monitor:  # Metric to be monitored during training
            split: ...  # Choices: train/val/test
            metric: ...  # Either loss or a metric's name to monitor for early stopping and checkpoints
        monitor_mode: ...  # Monitor mode (increase or decrease monitored metric value)
        early_stop: ...  # Number of steps to perform early stopping
    # Definition of PyTorch data loader
    data_loader:
        num_workers: ... # Number of CPUs to read the data in parallel

5. Defining the Evaluation

The toolbox provides several modules for evaluation at inference: metrics, visualizations, characterization and XAI. The metrics module will always be run while the other can be (de)activated. For more details of the capabilities of each module, please refeer to Section Evaluations.

# Types of chosen evaluations, choices: Visualization, Characterization, XAI
evaluation:
    metrics:
        Metric_1: {param_1: ...}  # Metric for evaluation, from torchmetrics. Metric_1 has to be the name of the metric as in torchmetrics docs

    visualization:
        activate: ... # Choices: True/False
        params:
            param_1: ...

    characterization:
        activate: ... # Choices: True/False
        params:
            param_1: ...

    xai:
        activate: ... # Choices: True/False
        params:
            param_1: ...