Configuration File

The communication between the user and the toolbox is performed through a configuration file containing a list of tunable system parameters. This file is implemented in YAML, a simple and concise language that maps easily into native data structures. Its comprehensibility makes it accessible to developers and non-developers and facilitates tracking experiment changes over time.

1. Defining the Task and Path

name: 'AIDE'
# Addressed task, choices: Classification, OutlierDetection, ImpactAssessment
task: ...
# Use a previously saved model to skip the train phase (True/False)
from_scratch: ...
# Path to the best model, required if from_scratch: False
best_run_path: ''
# Directory to save model outputs and results
save_path: "experiments/"

2. Defining the Dataset

Pointer to the Dataset class. This section can be customized by adding more variables. See Section Database for more details on how to create your own Dataset class.

# Database and DataLoader definition
data:
    name: ... # Dataset class name
    data_dim: ...  # Data dimension
    input_size: ...  # Number of features
    features:  # Name of the features of the database
    features_selected: ...  # Features selected from the whole set of features
    num_classes: ...  # Number of categories in the database (drought, non-drought, e.g.)
    lon_slice_test: ...  # If visualization 2D enabled, min/max longitude coordinates (test)
    lat_slice_test: ...  # If visualization 2D enabled, min/max latitude coordinates (test)

3. Defining the Model

To specify the architecture to train, use the parameter type. This can be a user-defined model or a model available in the toolbox (see Section Available models for more details).

# Architecture definition
arch:
    # Select a user-defined model (true/false)
    user_defined: ...
    # Type of architecture to be used (e.g., 'UNET')
    type: ...
    # Parameters to configure the architecture
    params:
        param_1: ...
    # Model input dimension (1: 1D, 2: 2D)
    input_model_dim: ...
    # Model output dimension (1: 1D, 2: 2D)
    output_model_dim: ...

4. Defining the Training

This part of the configuration file allows for specifying the parameters of:

loss function: Can be either custom or from a Python package. To choose from a Python package, set user_defined: False and specify loss name and package (e.g. type: 'sigmoid_focal_loss' and package: 'torchvision.ops' ). For custom losses, see section Custom Loss in Advanced features

optimizer: Defines the parameters to initialize the optimizer. type can be any of torch.optim.

trainer: Defines the parameters to initialize the Pytorch Lightning trainer.

dataloader: Defines the number of workers for the Pytorch Lightning dataloader.

# Definition of the training stage
implementation:
    # Loss function
    loss:
        user_defined: ... # Select user-defined model (true/false)
        type: ...  # Python class name
        package: ...  # Python package, none for user defined
        activation:
            type: ... # Activation before computing the loss function
        masked: ...  # Use masks to compute loss
        # Parameters for the loss function
        params:
            reduction: 'none'
            param_1: ...
    # Definition of the optimizer
    optimizer:
        type: ...  # Optimizer type
        lr: ...  # Learning rate
        weight_decay: ...  # Weight decay
        gclip_value: ...  # Gradient clipping values
    # Definition of PyTorch trainer
    trainer:
        accelerator: ...  # Choices: gpu/cpu
        devices: ... #
        epochs: ...  # Number of epochs
        batch_size: ...  # Batch size
        monitor:  # Metric to be monitored during training
            split: ...  # Choices: train/val/test
            metric: ...  # Either loss or a metric's name to monitor for early stopping and checkpoints
        monitor_mode: ...  # Monitor mode (increase or decrease monitored metric value)
        early_stop: ...  # Number of steps to perform early stopping
    # Definition of PyTorch data loader
    data_loader:
        num_workers: ... # Number of CPUs to read the data in parallel

5. Defining the Evaluation

The toolbox provides several modules for evaluation at inference: metrics, visualizations, characterization and XAI. The metrics module will always be run while the other can be (de)activated. For more details of the capabilities of each module, please refeer to Section Evaluations.

# Types of chosen evaluations, choices: Visualization, Characterization, XAI
evaluation:
    metrics:
        Metric_1: {param_1: ...}  # Metric for evaluation, from torchmetrics. Metric_1 has to be the name of the metric as in torchmetrics docs

    visualization:
        activate: ... # Choices: True/False
        params:
            param_1: ...

    characterization:
        activate: ... # Choices: True/False
        params:
            param_1: ...

    xai:
        activate: ... # Choices: True/False
        params:
            param_1: ...