databases package
Submodules
databases.DROUGHT_database module
databases.DroughtED_database module
- class databases.DroughtED_database.DroughtED(config, period='train')
Bases:
DatasetThis class is used to load the DroughtED dataset using PyTorch’s Dataset class.
- Parameters:
config (dict) – The configuration file
period (str, optional) – The period of the dataset to load, defaults to ‘train’
window_size (int, optional) – The window size of the dataset, defaults to 12
features_selected (list, optional) – The features to be used, defaults to [‘NDVI’, ‘precipitation’, ‘temperature’]
num_classes (int, optional) – The number of classes, defaults to 6
train_slice (dict, optional) – The time period of the training set, defaults to {‘start’: 1982, ‘end’: 2006}
val_slice (dict, optional) – The time period of the validation set, defaults to {‘start’: 2007, ‘end’: 2011}
test_slice (dict, optional) – The time period of the test set, defaults to {‘start’: 2012, ‘end’: 2016}
id2class (dict, optional) – The dictionary of class IDs, defaults to {‘None’: 0, ‘D0’: 1, ‘D1’: 2, ‘D2’: 3, ‘D3’: 4, ‘D4’: 5}
weights (list, optional) – The weights of each class, defaults to [100-ones_percentage, ones_percentage]
dfs (pandas.core.frame.DataFrame, optional) – The dataframe of the dataset, defaults to None
X (numpy.ndarray, optional) – The input data, defaults to None
y (numpy.ndarray, optional) – The output data, defaults to None
class_bound (int, optional) – The boundary of the classes, defaults to 2
start (datetime, optional) – The start date of the dataset, defaults to None
end (datetime, optional) – The end date of the dataset, defaults to None
dates (pandas.core.indexes.datetimes.DatetimeIndex, optional) – The dates of the dataset, defaults to None
ones_percentage (float, optional) – The percentage of the positive class in the binarise function, defaults to None
- binarize_data()
Binaries the data based off a threshold set by the class_bound parameter.
- Parameters:
class_bound (int, optional) – The boundary of the classes, defaults to 2
return: The binarised data :rtype: pandas.core.frame.DataFrame
- interpolate_nans(padata, pkind='linear')
Interpolate over nans in a 1D array.
- Parameters:
padata (numpy.ndarray) – 1D array with nans to interpolate over
pkind (str, optional) – Interpolation method, defaults to ‘linear’
- Returns:
Interpolated array
- Return type:
numpy.ndarray
- loadXY(period='train', random_state=42, normalize=True)
Load the data and split it into X and y, and a conditional normalisation statement.
- Parameters:
period (str, optional) – The period of the dataset to load, defaults to ‘train’
- Returns:
The input and output data
- Return type:
numpy.ndarray, numpy.ndarray
- normalize(X_time)
Normalise the data using a standard scaler.
- Parameters:
X_time (numpy.ndarray) – The input data
- Returns:
The normalised data
- Return type:
numpy.ndarray
- read_database_files()
Read the database files.
- Returns:
The dataframe of the dataset
- Return type:
pandas.core.frame.DataFrame
- select_slice(dfs)
Select the time period of the dataset.
- Parameters:
dfs (pandas.core.frame.DataFrame) – The dataframe of the dataset
- Returns:
Slice of the dataframe
- Return type:
pandas.core.frame.DataFrame