Module Reference¶
-
class
skedm.skedm.
Classification
(weights='uniform')¶ Classification using a k-nearest neighbors method. Predictions can be made for each nearest neighbor (predict_individual) or by averaging the k nearest neighbors (predict).
Parameters: weights (str) – Procedure to weight the near neighbors. Options:
- ‘uniform’ : uniform weighting
- ‘distance’ : weighted as 1/distance
Example
>>> X # embed series of shape (nsamples, embedding dimension) >>> y # future trajectory of a point of shape (nsamples, num predictions) >>> import skedm as edm >>> R = edm.Classification() >>> train_len = int(len(X)*.75) # train on 75 percent >>> R.fit(X[0:train_len], y[0:train_len]) >>> preds = R.predict(X[train_len:], [0,10,20]) # test at 1, 10, and 20 nn >>> score = M.score(ytest) # Calculate klecka's tau
-
dist_calc
(Xtest)¶ Calculates the distance from the testing set to the training set.
Parameters: Xtest (2d array) – Test features (nsamples, nfeatures).
-
dist_stats
(nn_list)¶ Returns the mean and std of the distances for the given nn_list
-
fit
(Xtrain, ytrain)¶ Fit the training data. Can also be thought of as reconstructing the attractor.
Parameters: - Xtrain (2D array) – Features of shape (nsamples,nfeatures).
- ytrain (2D array) – Targets of shape (nsamples,ntargets).
-
predict
(Xtest, nn_list)¶ Make a prediction for a certain value of near neighbors
Parameters: - Xtest (2d array) – Contains the test features.
- nn_list (1d array of ints) – Neighbors to be tested.
Returns: Ypred – Predictions returned for each nn value in nn_list. It is the same length as nn_list.
Return type: list
-
predict_individual
(Xtest, nn_list)¶ Make a prediction for each neighbor.
Parameters: - Xtest (2d array) – Contains the test features.
- nn_list (1d array of ints) – Neighbors to be tested.
Returns: Ypred – Predictions returned for each nn value in nn_list. It is the same length as nn_list.
Return type: list
-
score
(ytest, how='tau')¶ Evalulate the predictions.
Parameters: - ytest (2d array) – Contains the target values.
- how (str) –
How to score the predictions. Possible values:
- ‘compare’ : Percent correctly predicted. For more info, see
- utilities.class_compare.
- ‘error’ : Percent correct scaled by the most common prediction
- of the series. See utilities.classification_error for more.
- ‘tau’ : Kleckas tau
Returns: scores – Scores for the predicted values. Shape (len(nn_list),num_preds)
Return type: 2d array
-
class
skedm.skedm.
Embed
(X)¶ Embed a 1d, 2d array, or 3d array in n-dimensional space. Assists in choosing an embedding dimension and a lag value.
Parameters: X (1d, 2d, or 3d array) – Array to be embedded in n-dimensional space. -
embed_vectors_1d
(lag, embed, predict)¶ Embeds vectors from a one dimensional array in m-dimensional space.
Parameters: - X (array) – A 1-D array representing the training or testing set.
- lag (int) – Lag values as calculated from the first minimum of the mutual info.
- embed (int) – Embedding dimension. How many lag values to take.
- predict (int) – Distance to forecast (see example).
Returns: - features (array of shape [num_vectors,embed]) – A 2-D array containing all of the embedded vectors.
- targets (array of shape [num_vectors,predict]) – A 2-D array containing the evolution of the embedded vectors.
Example
>>> X = [0,1,2,3,4,5,6,7,8,9,10] >>> em = 3 >>> lag = 2 >>> predict=3 >>> features, targets = embed_vectors_1d(lag, embed, predict) >>> features # [[0,2,4], [1,3,5], [2,4,6], [3,5,7]] >>> targets # [[5,6,7], [6,7,8], [7,8,9], [8,9,10]]
-
embed_vectors_2d
(lag, embed, predict, percent=0.1)¶ Embeds vectors from a two dimensional image in m-dimensional space.
Parameters: - X (array) – A 2-D array representing the training set or testing set.
- lag (tuple of ints (r,c)) – Row and column lag values (r,c) can think of as (height,width).
- embed (tuple of ints (r,c)) – Row and column embedding shape (r,c) can think of as (height,width). c must be odd.
- predict (int) – Distance in the space to forecast (see example).
- percent (float (default = None)) – Percent of the space to embed. Used for computation efficiency.
Returns: - features (array of shape [num_vectors,r*c]) – A 2-D array containing all of the embedded vectors.
- targets (array of shape [num_vectors,predict]) – A 2-D array containing the evolution of the embedded vectors.
Example
>>> lag = (3,4) >>> embed = (2,5) >>> predict = 2 >>> features, targets = embed_vectors_2d(lag, embed, predict)
Notes
The embed space above looks like the following:
[f] _ _ _ [f] _ _ _ [f] _ _ _ [f] _ _ _ [f] | | | | | | | | | | [f] _ _ _ [f] _ _ _ [f] _ _ _ [f] _ _ _ [f] [t] [t]
-
embed_vectors_3d
(lag, embed, predict, percent=0.1)¶ Embeds vectors from a 3-dimensional matrix in n-dimensional space.
Parameters: - X (array) – A 3-D array representing the training set or testing set.
- lag (tuple of ints (r,c)) – Row and column lag values (r,c) can think of as (height,width).
- embed (tuple of ints (r,c,t)) – Row and column, and time embedding shape (r,c,t) can think of as (height,width,time). c must be odd.
- predict (int) – Distance in the space to forecast (see example).
- percent (float (default = None)) – Percent of the space to embed. Used for computation efficiency.
Returns: - features (array of shape [num_vectors,r*c]) – A 2-D array containing all of the embedded vectors
- targets (array of shape [num_vectors,predict]) – A 2-D array containing the evolution of the embedded vectors
Example
>>> lag = (3,4,2) #height,width,time >>> embed = (3,3) >>> predict = 2 >>> features, targets = embed_vectors_3d(lag, embed, predict)
Notes
The above example would look like the following:
[f] _ _ _ [f] _ _ _ [f] | | | | | | [f] _ _ _ [f] _ _ _ [f] | | | | | | [f] _ _ _ [f] _ _ _ [f]
The targets would be directly below the center [f].
-
mutual_information
(max_lag)¶ Calculates the mutual information between a time series and a shifted version of itself. Uses numpy’s mutual information for the calculation.
Parameters: max_lag (int) – Maximum amount to shift the time series. Returns: mi – Mutual information values for every shift value. Shape (max_lag,). Return type: 1d array
-
mutual_information_3d
(max_lag, percent_calc=0.5, digitize=True)¶ Calculates the mutual information along the rows and down columns at a certain number of indices (percent_calc) and returns the sum of the mutual informaiton along the columns and along the rows.
Parameters: - M (3-D array) – Input three-dimensional array.
- max_lag (integer) – Maximum amount to shift the space.
- percent_calc (float) – Percent of rows and columns to use for the mutual information calculation.
Returns: - R_mut (1-D array) – The mutual inforation averaged down the rows (vertical)
- C_mut (1-D array) – The mutual information averaged across the columns (horizontal)
- Z_mut (1-D array) – The mutual information averaged along the depth.
-
mutual_information_spatial
(max_lag, percent_calc=0.5, digitize=True)¶ Calculates the mutual information along the rows and down columns at a certain number of indices (percent_calc) and returns the sum of the mutual informaiton along the columns and along the rows.
Parameters: - M (2-D array) – Input two-dimensional image.
- max_lag (integer) – Maximum amount to shift the space.
- percent_calc (float) – Percent of rows and columns to use for the mutual information calculation.
Returns: - R_mut (1-D array) – The mutual inforation averaged down the rows (vertical).
- C_mut (1-D array) – The mutual information averaged across the columns (horizontal).
- r_mi (2-D array) – The mutual information down each row (vertical).
- c_mi (2-D array) – The mutual information across the columns (horizontal).
-
-
class
skedm.skedm.
Regression
(weights='uniform')¶ Regression using a k-nearest neighbors method. Predictions can be made for each nearest neighbor (predict_individual) or by averaging the k nearest neighbors (predict).
Parameters: weights (str) – How to weight the near neighbors. Options are:
- ‘uniform’ : uniform weighting
- ‘distance’ : weighted as 1/distance
Example
>>> X # embed time series of shape (nsamples, embedding dimension) >>> y # future trajectory of a point of shape (nsamples, num predictions) >>> import skedm as edm >>> R = edm.Regression() >>> train_len = int(len(X)*.75) # train on 75 percent >>> R.fit(X[0:train_len], y[0:train_len]) >>> preds = R.predict(X[train_len:], [0,10,20]) # test at 1, 10, and 20 nn >>> score = M.score(ytest) # Calculate coefficient of determination
-
dist_calc
(Xtest)¶ Calculates the distance from the testing set to the training set.
Parameters: Xtest (2D array) – Test features (nsamples, nfeatures).
-
dist_stats
(nn_list)¶ Calculates the mean and std of the distances for the given nn_list.
Parameters: nn_list (1d array of ints) – Neighbors to have their mean distance and std returned. Returns: - mean (1d array) – Mean of the all the test distances corresponding to the nn_list.
- std (1d array) – Std of all the test distances corresponding to the nn_list.
-
fit
(Xtrain, ytrain)¶ Fit the training data. Can also be thought of as populating the phase space.
Parameters: - Xtrain (2D array) – Embed training time series. Features shape (nsamples, nfeatures).
- ytrain (2D array) – Future trajectory of the points. Targets Shape (nsamples,ntargets).
-
predict
(Xtest, nn_list)¶ Make a prediction for a certain value of near neighbors
Parameters: - Xtest (2d array) – Testing samples of shape (nsamples,nfeatures)
- nn_list (1d array) – Values of Near Neighbors to use to make predictions
Returns: ypred – Predictions for ytest of shape(nsamples,num predictions).
Return type: 2d array
-
predict_individual
(Xtest, nn_list)¶ Make a prediction for each neighbor.
Parameters: - Xtest (2d array) – Contains the test features.
- nn_list (1d array of ints) – Neighbors to be tested.
-
score
(ytest, how='score')¶ Score the predictions.
Parameters: - ytest (2d array) – Target values.
- how (str) –
How to score the predictions. Options include:
-‘score’ : Coefficient of determination. -‘corrcoef’ : Correlation coefficient.
Returns: score – Scores for the corresponding near neighbors.
Return type: 2d array
utilities¶
-
skedm.utilities.
class_compare
(preds, actual)¶ Percent correct between predicted values and actual values.
Parameters: - preds (1D array) – Predicted values of shape (num samples,).
- actual (1D array) – Actual values from the testing set. Shape (num samples,).
Returns: cc – Returns the correlation coefficient.
Return type: float
-
skedm.utilities.
classification_error
(preds, actual)¶ Percent correct between predicted values and actual values scaled to the most common prediction of the space.
Parameters: - preds (1D array) – Predicted values of shape (num samples,).
- actual (1D array) – Actual values of shape (num samples,).
Returns: cc – Returns the correlation coefficient
Return type: float
-
skedm.utilities.
cohens_kappa
(preds, actual)¶ Calculates cohens kappa.
Parameters: - preds (1D array) – Predicted values of shape (num samples,).
- test (array of shape (num samples,)) – Actual values from the testing set.
Returns: c – Returns the cohens_kappa.
Return type: float
-
skedm.utilities.
corrcoef
(preds, actual)¶ Correlation Coefficient of between predicted values and actual values
Parameters: - preds (1D array) – Predicted values of shape (num samples,).
- test (1D array) – Actual values from the testing set of shape (num samples,).
Returns: cc – Returns the correlation coefficient.
Return type: float
-
skedm.utilities.
keep_diversity
(X, thresh=1.0)¶ Returns indices where the columns are not a single class.
Parameters: - X (2d array of ints) – Array to evaluate for diversity
- thresh (float) – Percent of species that need to be unique.
Returns: keep – Array where true means there is more than one class in that row.
Return type: 1d boolean array
Examples
>>> x = np.array([[1 1 1 1] [2 1 2 3] [2 2 2 2] [3 2 1 4]]) >>> keep_diversity(x) array([F,T,F,T])
-
skedm.utilities.
kleckas_tau
(preds, actual)¶ Calculates kleckas tau
Parameters: - preds (1D array) – Predicted values of shape (num samples,).
- actual (1D array) – Actual values of shape (num samples,).
Returns: tau – Returns kleckas tau
Return type: float
-
skedm.utilities.
klekas_tau_spatial
(X, max_lag, percent_calc=0.5)¶ Calculates the kleckas tau value between a shifted and unshifted slice of the space.
Parameters: - X (2D array) – Spatail image.
- max_lag (integer) – Maximum amount to shift the space.
- percent_calc (float) – How many rows and columns to use average over. Using the whole space is overkill.
Returns: - R_mut (1D array) – Klekas tau averaged down the rows (vertical).
- C_mut (1-D array) – Klekas tau averaged across the columns (horizontal).
- r_mi (2-D array) – Klekas tau down each row (vertical).
- c_mi (2-D array) – Klekas tau across each columns (horizontal).
-
skedm.utilities.
mi_digitize
(X)¶ Digitize a time series for mutual information analysis
Parameters: X (1D array) – Array to be digitized of length m. Returns: Y – Digitized array of length m. Return type: 1D array
-
skedm.utilities.
score
(preds, actual)¶ Calculates the coefficient of determination.
The coefficient R^2 is defined as (1 - u/v), where u is the regression sum of squares ((y_true - y_pred) ** 2).sum() and v is the residual sum of squares ((y_true - y_true.mean()) ** 2).sum(). Best possible score is 1.0, lower values are worse.
Parameters: - preds (1D array) – Predicted values of shape (num samples,)
- test (1D array) – Actual values of shape (num samples,)
Returns: cc – Returns the coefficient of determination.
Return type: float
-
skedm.utilities.
variance_explained
(preds, actual)¶ Explained variance between predicted values and actual values.
Parameters: - preds (1D array) – Predict values of shape (num samples,).
- actual (1D array) – Actual values of shape (num samples,).
Returns: cc – Returns the correlation coefficient
Return type: float
-
skedm.utilities.
weighted_mean
(X, distances)¶ Calculates the weighted mean given a set of values and their corresponding distances.
Only 1/distance is implemented. This essentially is just a weighted mean down axis=1.
Parameters: - X (2d array) – Training values of shape(nsamples,number near neighbors).
- distances (2d array) – Sorted distances to the near neighbors for the indices. Shape(nsamples,number near neighbors).
Returns: w_mean – Weighted predictions.
Return type: 2d array
-
skedm.utilities.
weighted_mode
(a, w, axis=0)¶ This function is borrowed from sci-kit learn’s extmath.py
Returns an array of the weighted modal (most common) value in a
If there is more than one such value, only the first is returned. The bin-count for the modal bins is also returned.
This is an extension of the algorithm in scipy.stats.mode.
Parameters: - a (array_like) – n-dimensional array of which to find mode(s).
- w (array_like) – n-dimensional array of weights for each value.
- axis (int, optional) – Axis along which to operate. Default is 0, i.e. the first axis.
Returns: - vals (ndarray) – Array of modal values.
- score (ndarray) – Array of weighted counts for each mode.
Examples
>>> from sklearn.utils.extmath import weighted_mode >>> x = [4, 1, 4, 2, 4, 2] >>> weights = [1, 1, 1, 1, 1, 1] >>> weighted_mode(x, weights) (array([ 4.]), array([ 3.]))
The value 4 appears three times: with uniform weights, the result is simply the mode of the distribution.
>>> weights = [1, 3, 0.5, 1.5, 1, 2] # deweight the 4's >>> weighted_mode(x, weights) (array([ 2.]), array([ 3.5]))
The value 2 has the highest score: it appears twice with weights of 1.5 and 2: the sum of these is 3.
See also
scipy.stats.mode()