databases package

Submodules

databases.DROUGHT_database module

databases.DroughtED_database module

class databases.DroughtED_database.DroughtED(config, period='train')

Bases: Dataset

This class is used to load the DroughtED dataset using PyTorch’s Dataset class.

Parameters:

config (dict) – The configuration file
period (str, optional) – The period of the dataset to load, defaults to ‘train’
window_size (int, optional) – The window size of the dataset, defaults to 12
features_selected (list, optional) – The features to be used, defaults to [‘NDVI’, ‘precipitation’, ‘temperature’]
num_classes (int, optional) – The number of classes, defaults to 6
train_slice (dict, optional) – The time period of the training set, defaults to {‘start’: 1982, ‘end’: 2006}
val_slice (dict, optional) – The time period of the validation set, defaults to {‘start’: 2007, ‘end’: 2011}
test_slice (dict, optional) – The time period of the test set, defaults to {‘start’: 2012, ‘end’: 2016}
id2class (dict, optional) – The dictionary of class IDs, defaults to {‘None’: 0, ‘D0’: 1, ‘D1’: 2, ‘D2’: 3, ‘D3’: 4, ‘D4’: 5}
weights (list, optional) – The weights of each class, defaults to [100-ones_percentage, ones_percentage]
dfs (pandas.core.frame.DataFrame, optional) – The dataframe of the dataset, defaults to None
X (numpy.ndarray, optional) – The input data, defaults to None
y (numpy.ndarray, optional) – The output data, defaults to None
class_bound (int, optional) – The boundary of the classes, defaults to 2
start (datetime, optional) – The start date of the dataset, defaults to None
end (datetime, optional) – The end date of the dataset, defaults to None
dates (pandas.core.indexes.datetimes.DatetimeIndex, optional) – The dates of the dataset, defaults to None
ones_percentage (float, optional) – The percentage of the positive class in the binarise function, defaults to None

binarize_data()

Binaries the data based off a threshold set by the class_bound parameter.

Parameters:: class_bound (int, optional) – The boundary of the classes, defaults to 2

return: The binarised data :rtype: pandas.core.frame.DataFrame

interpolate_nans(padata, pkind='linear')

Interpolate over nans in a 1D array.

Parameters:

padata (numpy.ndarray) – 1D array with nans to interpolate over
pkind (str, optional) – Interpolation method, defaults to ‘linear’

Returns:

Interpolated array

Return type:

numpy.ndarray

see: https://stackoverflow.com/a/53050216/2167159

loadXY(period='train', random_state=42, normalize=True)

Load the data and split it into X and y, and a conditional normalisation statement.

Parameters:: period (str, optional) – The period of the dataset to load, defaults to ‘train’
Returns:: The input and output data
Return type:: numpy.ndarray, numpy.ndarray

normalize(X_time)

Normalise the data using a standard scaler.

Parameters:: X_time (numpy.ndarray) – The input data
Returns:: The normalised data
Return type:: numpy.ndarray

read_database_files()

Read the database files.

Returns:: The dataframe of the dataset
Return type:: pandas.core.frame.DataFrame

select_slice(dfs)

Select the time period of the dataset.

Parameters:: dfs (pandas.core.frame.DataFrame) – The dataframe of the dataset
Returns:: Slice of the dataframe
Return type:: pandas.core.frame.DataFrame

databases package

Submodules

databases.DROUGHT_database module

databases.DroughtED_database module

databases.XAIDA_database module

Module contents