Module Reference

class skedm.skedm.Classification(weights='uniform')

Classification using a k-nearest neighbors method. Predictions can be made for each nearest neighbor (predict_individual) or by averaging the k nearest neighbors (predict).

Parameters:weights (str) –

Procedure to weight the near neighbors. Options:

  • ‘uniform’ : uniform weighting
  • ‘distance’ : weighted as 1/distance

Example

>>> X # embed series of shape (nsamples, embedding dimension)
>>> y # future trajectory of a point of shape (nsamples, num predictions)
>>> import skedm as edm
>>> R = edm.Classification()
>>> train_len = int(len(X)*.75) # train on 75 percent
>>> R.fit(X[0:train_len], y[0:train_len])
>>> preds = R.predict(X[train_len:], [0,10,20]) # test at 1, 10, and 20 nn
>>> score = M.score(ytest) # Calculate klecka's tau
dist_calc(Xtest)

Calculates the distance from the testing set to the training set.

Parameters:Xtest (2d array) – Test features (nsamples, nfeatures).
dist_stats(nn_list)

Returns the mean and std of the distances for the given nn_list

fit(Xtrain, ytrain)

Fit the training data. Can also be thought of as reconstructing the attractor.

Parameters:
  • Xtrain (2D array) – Features of shape (nsamples,nfeatures).
  • ytrain (2D array) – Targets of shape (nsamples,ntargets).
predict(Xtest, nn_list)

Make a prediction for a certain value of near neighbors

Parameters:
  • Xtest (2d array) – Contains the test features.
  • nn_list (1d array of ints) – Neighbors to be tested.
Returns:

Ypred – Predictions returned for each nn value in nn_list. It is the same length as nn_list.

Return type:

list

predict_individual(Xtest, nn_list)

Make a prediction for each neighbor.

Parameters:
  • Xtest (2d array) – Contains the test features.
  • nn_list (1d array of ints) – Neighbors to be tested.
Returns:

Ypred – Predictions returned for each nn value in nn_list. It is the same length as nn_list.

Return type:

list

score(ytest, how='tau')

Evalulate the predictions.

Parameters:
  • ytest (2d array) – Contains the target values.
  • how (str) –

    How to score the predictions. Possible values:

    • ‘compare’ : Percent correctly predicted. For more info, see
      utilities.class_compare.
    • ‘error’ : Percent correct scaled by the most common prediction
      of the series. See utilities.classification_error for more.
    • ‘tau’ : Kleckas tau
Returns:

scores – Scores for the predicted values. Shape (len(nn_list),num_preds)

Return type:

2d array

class skedm.skedm.Embed(X)

Embed a 1d, 2d array, or 3d array in n-dimensional space. Assists in choosing an embedding dimension and a lag value.

Parameters:X (1d, 2d, or 3d array) – Array to be embedded in n-dimensional space.
embed_vectors_1d(lag, embed, predict)

Embeds vectors from a one dimensional array in m-dimensional space.

Parameters:
  • X (array) – A 1-D array representing the training or testing set.
  • lag (int) – Lag values as calculated from the first minimum of the mutual info.
  • embed (int) – Embedding dimension. How many lag values to take.
  • predict (int) – Distance to forecast (see example).
Returns:

  • features (array of shape [num_vectors,embed]) – A 2-D array containing all of the embedded vectors.
  • targets (array of shape [num_vectors,predict]) – A 2-D array containing the evolution of the embedded vectors.

Example

>>> X = [0,1,2,3,4,5,6,7,8,9,10]
>>> em = 3
>>> lag = 2
>>> predict=3
>>> features, targets = embed_vectors_1d(lag, embed, predict)
>>> features # [[0,2,4], [1,3,5], [2,4,6], [3,5,7]]
>>> targets # [[5,6,7], [6,7,8], [7,8,9], [8,9,10]]
embed_vectors_2d(lag, embed, predict, percent=0.1)

Embeds vectors from a two dimensional image in m-dimensional space.

Parameters:
  • X (array) – A 2-D array representing the training set or testing set.
  • lag (tuple of ints (r,c)) – Row and column lag values (r,c) can think of as (height,width).
  • embed (tuple of ints (r,c)) – Row and column embedding shape (r,c) can think of as (height,width). c must be odd.
  • predict (int) – Distance in the space to forecast (see example).
  • percent (float (default = None)) – Percent of the space to embed. Used for computation efficiency.
Returns:

  • features (array of shape [num_vectors,r*c]) – A 2-D array containing all of the embedded vectors.
  • targets (array of shape [num_vectors,predict]) – A 2-D array containing the evolution of the embedded vectors.

Example

>>> lag = (3,4)
>>> embed = (2,5)
>>> predict = 2
>>> features, targets = embed_vectors_2d(lag, embed, predict)

Notes

The embed space above looks like the following:

[f] _ _ _ [f] _ _ _ [f] _ _ _ [f] _ _ _ [f]
 |         |         |         |         |
 |         |         |         |         |
[f] _ _ _ [f] _ _ _ [f] _ _ _ [f] _ _ _ [f]
                    [t]
                    [t]
embed_vectors_3d(lag, embed, predict, percent=0.1)

Embeds vectors from a 3-dimensional matrix in n-dimensional space.

Parameters:
  • X (array) – A 3-D array representing the training set or testing set.
  • lag (tuple of ints (r,c)) – Row and column lag values (r,c) can think of as (height,width).
  • embed (tuple of ints (r,c,t)) – Row and column, and time embedding shape (r,c,t) can think of as (height,width,time). c must be odd.
  • predict (int) – Distance in the space to forecast (see example).
  • percent (float (default = None)) – Percent of the space to embed. Used for computation efficiency.
Returns:

  • features (array of shape [num_vectors,r*c]) – A 2-D array containing all of the embedded vectors
  • targets (array of shape [num_vectors,predict]) – A 2-D array containing the evolution of the embedded vectors

Example

>>> lag = (3,4,2) #height,width,time
>>> embed = (3,3)
>>> predict = 2
>>> features, targets = embed_vectors_3d(lag, embed, predict)

Notes

The above example would look like the following:

[f] _ _ _ [f] _ _ _ [f]
 |         |         |
 |         |         |
[f] _ _ _ [f] _ _ _ [f]
 |         |         |
 |         |         |
[f] _ _ _ [f] _ _ _ [f]

The targets would be directly below the center [f].

mutual_information(max_lag)

Calculates the mutual information between a time series and a shifted version of itself. Uses numpy’s mutual information for the calculation.

Parameters:max_lag (int) – Maximum amount to shift the time series.
Returns:mi – Mutual information values for every shift value. Shape (max_lag,).
Return type:1d array
mutual_information_3d(max_lag, percent_calc=0.5, digitize=True)

Calculates the mutual information along the rows and down columns at a certain number of indices (percent_calc) and returns the sum of the mutual informaiton along the columns and along the rows.

Parameters:
  • M (3-D array) – Input three-dimensional array.
  • max_lag (integer) – Maximum amount to shift the space.
  • percent_calc (float) – Percent of rows and columns to use for the mutual information calculation.
Returns:

  • R_mut (1-D array) – The mutual inforation averaged down the rows (vertical)
  • C_mut (1-D array) – The mutual information averaged across the columns (horizontal)
  • Z_mut (1-D array) – The mutual information averaged along the depth.

mutual_information_spatial(max_lag, percent_calc=0.5, digitize=True)

Calculates the mutual information along the rows and down columns at a certain number of indices (percent_calc) and returns the sum of the mutual informaiton along the columns and along the rows.

Parameters:
  • M (2-D array) – Input two-dimensional image.
  • max_lag (integer) – Maximum amount to shift the space.
  • percent_calc (float) – Percent of rows and columns to use for the mutual information calculation.
Returns:

  • R_mut (1-D array) – The mutual inforation averaged down the rows (vertical).
  • C_mut (1-D array) – The mutual information averaged across the columns (horizontal).
  • r_mi (2-D array) – The mutual information down each row (vertical).
  • c_mi (2-D array) – The mutual information across the columns (horizontal).

class skedm.skedm.Regression(weights='uniform')

Regression using a k-nearest neighbors method. Predictions can be made for each nearest neighbor (predict_individual) or by averaging the k nearest neighbors (predict).

Parameters:weights (str) –

How to weight the near neighbors. Options are:

  • ‘uniform’ : uniform weighting
  • ‘distance’ : weighted as 1/distance

Example

>>> X # embed time series of shape (nsamples, embedding dimension)
>>> y # future trajectory of a point of shape (nsamples, num predictions)
>>> import skedm as edm
>>> R = edm.Regression()
>>> train_len = int(len(X)*.75) # train on 75 percent
>>> R.fit(X[0:train_len], y[0:train_len])
>>> preds = R.predict(X[train_len:], [0,10,20]) # test at 1, 10, and 20 nn
>>> score = M.score(ytest) # Calculate coefficient of determination
dist_calc(Xtest)

Calculates the distance from the testing set to the training set.

Parameters:Xtest (2D array) – Test features (nsamples, nfeatures).
dist_stats(nn_list)

Calculates the mean and std of the distances for the given nn_list.

Parameters:nn_list (1d array of ints) – Neighbors to have their mean distance and std returned.
Returns:
  • mean (1d array) – Mean of the all the test distances corresponding to the nn_list.
  • std (1d array) – Std of all the test distances corresponding to the nn_list.
fit(Xtrain, ytrain)

Fit the training data. Can also be thought of as populating the phase space.

Parameters:
  • Xtrain (2D array) – Embed training time series. Features shape (nsamples, nfeatures).
  • ytrain (2D array) – Future trajectory of the points. Targets Shape (nsamples,ntargets).
predict(Xtest, nn_list)

Make a prediction for a certain value of near neighbors

Parameters:
  • Xtest (2d array) – Testing samples of shape (nsamples,nfeatures)
  • nn_list (1d array) – Values of Near Neighbors to use to make predictions
Returns:

ypred – Predictions for ytest of shape(nsamples,num predictions).

Return type:

2d array

predict_individual(Xtest, nn_list)

Make a prediction for each neighbor.

Parameters:
  • Xtest (2d array) – Contains the test features.
  • nn_list (1d array of ints) – Neighbors to be tested.
score(ytest, how='score')

Score the predictions.

Parameters:
  • ytest (2d array) – Target values.
  • how (str) –

    How to score the predictions. Options include:

    -‘score’ : Coefficient of determination. -‘corrcoef’ : Correlation coefficient.
Returns:

score – Scores for the corresponding near neighbors.

Return type:

2d array

utilities

skedm.utilities.class_compare(preds, actual)

Percent correct between predicted values and actual values.

Parameters:
  • preds (1D array) – Predicted values of shape (num samples,).
  • actual (1D array) – Actual values from the testing set. Shape (num samples,).
Returns:

cc – Returns the correlation coefficient.

Return type:

float

skedm.utilities.classification_error(preds, actual)

Percent correct between predicted values and actual values scaled to the most common prediction of the space.

Parameters:
  • preds (1D array) – Predicted values of shape (num samples,).
  • actual (1D array) – Actual values of shape (num samples,).
Returns:

cc – Returns the correlation coefficient

Return type:

float

skedm.utilities.cohens_kappa(preds, actual)

Calculates cohens kappa.

Parameters:
  • preds (1D array) – Predicted values of shape (num samples,).
  • test (array of shape (num samples,)) – Actual values from the testing set.
Returns:

c – Returns the cohens_kappa.

Return type:

float

skedm.utilities.corrcoef(preds, actual)

Correlation Coefficient of between predicted values and actual values

Parameters:
  • preds (1D array) – Predicted values of shape (num samples,).
  • test (1D array) – Actual values from the testing set of shape (num samples,).
Returns:

cc – Returns the correlation coefficient.

Return type:

float

skedm.utilities.keep_diversity(X, thresh=1.0)

Returns indices where the columns are not a single class.

Parameters:
  • X (2d array of ints) – Array to evaluate for diversity
  • thresh (float) – Percent of species that need to be unique.
Returns:

keep – Array where true means there is more than one class in that row.

Return type:

1d boolean array

Examples

>>> x = np.array([[1 1 1 1]
[2 1 2 3]
[2 2 2 2]
[3 2 1 4]])
>>> keep_diversity(x)
array([F,T,F,T])
skedm.utilities.kleckas_tau(preds, actual)

Calculates kleckas tau

Parameters:
  • preds (1D array) – Predicted values of shape (num samples,).
  • actual (1D array) – Actual values of shape (num samples,).
Returns:

tau – Returns kleckas tau

Return type:

float

skedm.utilities.klekas_tau_spatial(X, max_lag, percent_calc=0.5)

Calculates the kleckas tau value between a shifted and unshifted slice of the space.

Parameters:
  • X (2D array) – Spatail image.
  • max_lag (integer) – Maximum amount to shift the space.
  • percent_calc (float) – How many rows and columns to use average over. Using the whole space is overkill.
Returns:

  • R_mut (1D array) – Klekas tau averaged down the rows (vertical).
  • C_mut (1-D array) – Klekas tau averaged across the columns (horizontal).
  • r_mi (2-D array) – Klekas tau down each row (vertical).
  • c_mi (2-D array) – Klekas tau across each columns (horizontal).

skedm.utilities.mi_digitize(X)

Digitize a time series for mutual information analysis

Parameters:X (1D array) – Array to be digitized of length m.
Returns:Y – Digitized array of length m.
Return type:1D array
skedm.utilities.score(preds, actual)

Calculates the coefficient of determination.

The coefficient R^2 is defined as (1 - u/v), where u is the regression sum of squares ((y_true - y_pred) ** 2).sum() and v is the residual sum of squares ((y_true - y_true.mean()) ** 2).sum(). Best possible score is 1.0, lower values are worse.

Parameters:
  • preds (1D array) – Predicted values of shape (num samples,)
  • test (1D array) – Actual values of shape (num samples,)
Returns:

cc – Returns the coefficient of determination.

Return type:

float

skedm.utilities.variance_explained(preds, actual)

Explained variance between predicted values and actual values.

Parameters:
  • preds (1D array) – Predict values of shape (num samples,).
  • actual (1D array) – Actual values of shape (num samples,).
Returns:

cc – Returns the correlation coefficient

Return type:

float

skedm.utilities.weighted_mean(X, distances)

Calculates the weighted mean given a set of values and their corresponding distances.

Only 1/distance is implemented. This essentially is just a weighted mean down axis=1.

Parameters:
  • X (2d array) – Training values of shape(nsamples,number near neighbors).
  • distances (2d array) – Sorted distances to the near neighbors for the indices. Shape(nsamples,number near neighbors).
Returns:

w_mean – Weighted predictions.

Return type:

2d array

skedm.utilities.weighted_mode(a, w, axis=0)

This function is borrowed from sci-kit learn’s extmath.py

Returns an array of the weighted modal (most common) value in a

If there is more than one such value, only the first is returned. The bin-count for the modal bins is also returned.

This is an extension of the algorithm in scipy.stats.mode.

Parameters:
  • a (array_like) – n-dimensional array of which to find mode(s).
  • w (array_like) – n-dimensional array of weights for each value.
  • axis (int, optional) – Axis along which to operate. Default is 0, i.e. the first axis.
Returns:

  • vals (ndarray) – Array of modal values.
  • score (ndarray) – Array of weighted counts for each mode.

Examples

>>> from sklearn.utils.extmath import weighted_mode
>>> x = [4, 1, 4, 2, 4, 2]
>>> weights = [1, 1, 1, 1, 1, 1]
>>> weighted_mode(x, weights)
(array([ 4.]), array([ 3.]))

The value 4 appears three times: with uniform weights, the result is simply the mode of the distribution.

>>> weights = [1, 3, 0.5, 1.5, 1, 2] # deweight the 4's
>>> weighted_mode(x, weights)
(array([ 2.]), array([ 3.5]))

The value 2 has the highest score: it appears twice with weights of 1.5 and 2: the sum of these is 3.

See also

scipy.stats.mode()