databases package

Submodules

databases.DROUGHT_database module

databases.DroughtED_database module

class databases.DroughtED_database.DroughtED(config, period='train')

Bases: Dataset

This class is used to load the DroughtED dataset using PyTorch’s Dataset class.

Parameters:
  • config (dict) – The configuration file

  • period (str, optional) – The period of the dataset to load, defaults to ‘train’

  • window_size (int, optional) – The window size of the dataset, defaults to 12

  • features_selected (list, optional) – The features to be used, defaults to [‘NDVI’, ‘precipitation’, ‘temperature’]

  • num_classes (int, optional) – The number of classes, defaults to 6

  • train_slice (dict, optional) – The time period of the training set, defaults to {‘start’: 1982, ‘end’: 2006}

  • val_slice (dict, optional) – The time period of the validation set, defaults to {‘start’: 2007, ‘end’: 2011}

  • test_slice (dict, optional) – The time period of the test set, defaults to {‘start’: 2012, ‘end’: 2016}

  • id2class (dict, optional) – The dictionary of class IDs, defaults to {‘None’: 0, ‘D0’: 1, ‘D1’: 2, ‘D2’: 3, ‘D3’: 4, ‘D4’: 5}

  • weights (list, optional) – The weights of each class, defaults to [100-ones_percentage, ones_percentage]

  • dfs (pandas.core.frame.DataFrame, optional) – The dataframe of the dataset, defaults to None

  • X (numpy.ndarray, optional) – The input data, defaults to None

  • y (numpy.ndarray, optional) – The output data, defaults to None

  • class_bound (int, optional) – The boundary of the classes, defaults to 2

  • start (datetime, optional) – The start date of the dataset, defaults to None

  • end (datetime, optional) – The end date of the dataset, defaults to None

  • dates (pandas.core.indexes.datetimes.DatetimeIndex, optional) – The dates of the dataset, defaults to None

  • ones_percentage (float, optional) – The percentage of the positive class in the binarise function, defaults to None

binarize_data()

Binaries the data based off a threshold set by the class_bound parameter.

Parameters:

class_bound (int, optional) – The boundary of the classes, defaults to 2

return: The binarised data :rtype: pandas.core.frame.DataFrame

interpolate_nans(padata, pkind='linear')

Interpolate over nans in a 1D array.

Parameters:
  • padata (numpy.ndarray) – 1D array with nans to interpolate over

  • pkind (str, optional) – Interpolation method, defaults to ‘linear’

Returns:

Interpolated array

Return type:

numpy.ndarray

see: https://stackoverflow.com/a/53050216/2167159

loadXY(period='train', random_state=42, normalize=True)

Load the data and split it into X and y, and a conditional normalisation statement.

Parameters:

period (str, optional) – The period of the dataset to load, defaults to ‘train’

Returns:

The input and output data

Return type:

numpy.ndarray, numpy.ndarray

normalize(X_time)

Normalise the data using a standard scaler.

Parameters:

X_time (numpy.ndarray) – The input data

Returns:

The normalised data

Return type:

numpy.ndarray

read_database_files()

Read the database files.

Returns:

The dataframe of the dataset

Return type:

pandas.core.frame.DataFrame

select_slice(dfs)

Select the time period of the dataset.

Parameters:

dfs (pandas.core.frame.DataFrame) – The dataframe of the dataset

Returns:

Slice of the dataframe

Return type:

pandas.core.frame.DataFrame

databases.XAIDA_database module

Module contents