Module Reference¶

class skedm.skedm.Classification(weights='uniform')¶

Classification using a k-nearest neighbors method. Predictions can be made for each nearest neighbor (predict_individual) or by averaging the k nearest neighbors (predict).

Parameters:

weights (str) –

Procedure to weight the near neighbors. Options:

‘uniform’ : uniform weighting
‘distance’ : weighted as 1/distance

Example

>>> X # embed series of shape (nsamples, embedding dimension)
>>> y # future trajectory of a point of shape (nsamples, num predictions)
>>> import skedm as edm
>>> R = edm.Classification()
>>> train_len = int(len(X)*.75) # train on 75 percent
>>> R.fit(X[0:train_len], y[0:train_len])
>>> preds = R.predict(X[train_len:], [0,10,20]) # test at 1, 10, and 20 nn
>>> score = M.score(ytest) # Calculate klecka's tau

dist_calc(Xtest)¶

Calculates the distance from the testing set to the training set.

Parameters:	Xtest (2d array) – Test features (nsamples, nfeatures).

dist_stats(nn_list)¶: Returns the mean and std of the distances for the given nn_list

fit(Xtrain, ytrain)¶

Fit the training data. Can also be thought of as reconstructing the attractor.

Parameters:	Xtrain (2D array) – Features of shape (nsamples,nfeatures). ytrain (2D array) – Targets of shape (nsamples,ntargets).

predict(Xtest, nn_list)¶

Make a prediction for a certain value of near neighbors

Parameters:	Xtest (2d array) – Contains the test features. nn_list (1d array of ints) – Neighbors to be tested.
Returns:	Ypred – Predictions returned for each nn value in nn_list. It is the same length as nn_list.
Return type:	list

predict_individual(Xtest, nn_list)¶

Make a prediction for each neighbor.

Parameters:	Xtest (2d array) – Contains the test features. nn_list (1d array of ints) – Neighbors to be tested.
Returns:	Ypred – Predictions returned for each nn value in nn_list. It is the same length as nn_list.
Return type:	list

score(ytest, how='tau')¶

Evalulate the predictions.

Parameters:	ytest (2d array) – Contains the target values. how (str) – How to score the predictions. Possible values: ‘compare’ : Percent correctly predicted. For more info, see utilities.class_compare. ‘error’ : Percent correct scaled by the most common prediction of the series. See utilities.classification_error for more. ‘tau’ : Kleckas tau
Returns:	scores – Scores for the predicted values. Shape (len(nn_list),num_preds)
Return type:	2d array

class skedm.skedm.Embed(X)¶

Embed a 1d, 2d array, or 3d array in n-dimensional space. Assists in choosing an embedding dimension and a lag value.

Parameters:	X (1d, 2d, or 3d array) – Array to be embedded in n-dimensional space.

embed_vectors_1d(lag, embed, predict)¶

Embeds vectors from a one dimensional array in m-dimensional space.

Parameters:

X (array) – A 1-D array representing the training or testing set.
lag (int) – Lag values as calculated from the first minimum of the mutual info.
embed (int) – Embedding dimension. How many lag values to take.
predict (int) – Distance to forecast (see example).

Returns:

features (array of shape [num_vectors,embed]) – A 2-D array containing all of the embedded vectors.
targets (array of shape [num_vectors,predict]) – A 2-D array containing the evolution of the embedded vectors.

Example

>>> X = [0,1,2,3,4,5,6,7,8,9,10]
>>> em = 3
>>> lag = 2
>>> predict=3
>>> features, targets = embed_vectors_1d(lag, embed, predict)
>>> features # [[0,2,4], [1,3,5], [2,4,6], [3,5,7]]
>>> targets # [[5,6,7], [6,7,8], [7,8,9], [8,9,10]]

embed_vectors_2d(lag, embed, predict, percent=0.1)¶

Embeds vectors from a two dimensional image in m-dimensional space.

Parameters:

X (array) – A 2-D array representing the training set or testing set.
lag (tuple of ints (r,c)) – Row and column lag values (r,c) can think of as (height,width).
embed (tuple of ints (r,c)) – Row and column embedding shape (r,c) can think of as (height,width). c must be odd.
predict (int) – Distance in the space to forecast (see example).
percent (float (default = None)) – Percent of the space to embed. Used for computation efficiency.

Returns:

features (array of shape [num_vectors,r*c]) – A 2-D array containing all of the embedded vectors.
targets (array of shape [num_vectors,predict]) – A 2-D array containing the evolution of the embedded vectors.

Example

>>> lag = (3,4)
>>> embed = (2,5)
>>> predict = 2
>>> features, targets = embed_vectors_2d(lag, embed, predict)

Notes

The embed space above looks like the following:

[f] _ _ _ [f] _ _ _ [f] _ _ _ [f] _ _ _ [f]
 |         |         |         |         |
 |         |         |         |         |
[f] _ _ _ [f] _ _ _ [f] _ _ _ [f] _ _ _ [f]
                    [t]
                    [t]

embed_vectors_3d(lag, embed, predict, percent=0.1)¶

Embeds vectors from a 3-dimensional matrix in n-dimensional space.

Parameters:

X (array) – A 3-D array representing the training set or testing set.
lag (tuple of ints (r,c)) – Row and column lag values (r,c) can think of as (height,width).
embed (tuple of ints (r,c,t)) – Row and column, and time embedding shape (r,c,t) can think of as (height,width,time). c must be odd.
predict (int) – Distance in the space to forecast (see example).
percent (float (default = None)) – Percent of the space to embed. Used for computation efficiency.

Returns:

features (array of shape [num_vectors,r*c]) – A 2-D array containing all of the embedded vectors
targets (array of shape [num_vectors,predict]) – A 2-D array containing the evolution of the embedded vectors

Example

>>> lag = (3,4,2) #height,width,time
>>> embed = (3,3)
>>> predict = 2
>>> features, targets = embed_vectors_3d(lag, embed, predict)

Notes

The above example would look like the following:

[f] _ _ _ [f] _ _ _ [f]
 |         |         |
 |         |         |
[f] _ _ _ [f] _ _ _ [f]
 |         |         |
 |         |         |
[f] _ _ _ [f] _ _ _ [f]

The targets would be directly below the center [f].

mutual_information(max_lag)¶

Calculates the mutual information between a time series and a shifted version of itself. Uses numpy’s mutual information for the calculation.

Parameters:	max_lag (int) – Maximum amount to shift the time series.
Returns:	mi – Mutual information values for every shift value. Shape (max_lag,).
Return type:	1d array

mutual_information_3d(max_lag, percent_calc=0.5, digitize=True)¶

Calculates the mutual information along the rows and down columns at a certain number of indices (percent_calc) and returns the sum of the mutual informaiton along the columns and along the rows.

Parameters:

M (3-D array) – Input three-dimensional array.
max_lag (integer) – Maximum amount to shift the space.
percent_calc (float) – Percent of rows and columns to use for the mutual information calculation.

Returns:

R_mut (1-D array) – The mutual inforation averaged down the rows (vertical)
C_mut (1-D array) – The mutual information averaged across the columns (horizontal)
Z_mut (1-D array) – The mutual information averaged along the depth.

mutual_information_spatial(max_lag, percent_calc=0.5, digitize=True)¶

Calculates the mutual information along the rows and down columns at a certain number of indices (percent_calc) and returns the sum of the mutual informaiton along the columns and along the rows.

Parameters:

M (2-D array) – Input two-dimensional image.
max_lag (integer) – Maximum amount to shift the space.
percent_calc (float) – Percent of rows and columns to use for the mutual information calculation.

Returns:

R_mut (1-D array) – The mutual inforation averaged down the rows (vertical).
C_mut (1-D array) – The mutual information averaged across the columns (horizontal).
r_mi (2-D array) – The mutual information down each row (vertical).
c_mi (2-D array) – The mutual information across the columns (horizontal).

class skedm.skedm.Regression(weights='uniform')¶

Regression using a k-nearest neighbors method. Predictions can be made for each nearest neighbor (predict_individual) or by averaging the k nearest neighbors (predict).

Parameters:

weights (str) –

How to weight the near neighbors. Options are:

‘uniform’ : uniform weighting

‘distance’ : weighted as 1/distance

Example

>>> X # embed time series of shape (nsamples, embedding dimension)
>>> y # future trajectory of a point of shape (nsamples, num predictions)
>>> import skedm as edm
>>> R = edm.Regression()
>>> train_len = int(len(X)*.75) # train on 75 percent
>>> R.fit(X[0:train_len], y[0:train_len])
>>> preds = R.predict(X[train_len:], [0,10,20]) # test at 1, 10, and 20 nn
>>> score = M.score(ytest) # Calculate coefficient of determination

dist_calc(Xtest)¶

Calculates the distance from the testing set to the training set.

Parameters:	Xtest (2D array) – Test features (nsamples, nfeatures).

dist_stats(nn_list)¶

Calculates the mean and std of the distances for the given nn_list.

Parameters:	nn_list (1d array of ints) – Neighbors to have their mean distance and std returned.
Returns:	mean (1d array) – Mean of the all the test distances corresponding to the nn_list. std (1d array) – Std of all the test distances corresponding to the nn_list.

fit(Xtrain, ytrain)¶

Fit the training data. Can also be thought of as populating the phase space.

Parameters:	Xtrain (2D array) – Embed training time series. Features shape (nsamples, nfeatures). ytrain (2D array) – Future trajectory of the points. Targets Shape (nsamples,ntargets).

predict(Xtest, nn_list)¶

Make a prediction for a certain value of near neighbors

Parameters:	Xtest (2d array) – Testing samples of shape (nsamples,nfeatures) nn_list (1d array) – Values of Near Neighbors to use to make predictions
Returns:	ypred – Predictions for ytest of shape(nsamples,num predictions).
Return type:	2d array

predict_individual(Xtest, nn_list)¶

Make a prediction for each neighbor.

Parameters:	Xtest (2d array) – Contains the test features. nn_list (1d array of ints) – Neighbors to be tested.

score(ytest, how='score')¶

Score the predictions.

Parameters:	ytest (2d array) – Target values. how (str) – How to score the predictions. Options include: -‘score’ : Coefficient of determination. -‘corrcoef’ : Correlation coefficient.
Returns:	score – Scores for the corresponding near neighbors.
Return type:	2d array

utilities¶

skedm.utilities.class_compare(preds, actual)¶

Percent correct between predicted values and actual values.

Parameters:	preds (1D array) – Predicted values of shape (num samples,). actual (1D array) – Actual values from the testing set. Shape (num samples,).
Returns:	cc – Returns the correlation coefficient.
Return type:	float

skedm.utilities.classification_error(preds, actual)¶

Percent correct between predicted values and actual values scaled to the most common prediction of the space.

Parameters:	preds (1D array) – Predicted values of shape (num samples,). actual (1D array) – Actual values of shape (num samples,).
Returns:	cc – Returns the correlation coefficient
Return type:	float

skedm.utilities.cohens_kappa(preds, actual)¶

Calculates cohens kappa.

Parameters:	preds (1D array) – Predicted values of shape (num samples,). test (array of shape (num samples,)) – Actual values from the testing set.
Returns:	c – Returns the cohens_kappa.
Return type:	float

skedm.utilities.corrcoef(preds, actual)¶

Correlation Coefficient of between predicted values and actual values

Parameters:	preds (1D array) – Predicted values of shape (num samples,). test (1D array) – Actual values from the testing set of shape (num samples,).
Returns:	cc – Returns the correlation coefficient.
Return type:	float

skedm.utilities.keep_diversity(X, thresh=1.0)¶

Returns indices where the columns are not a single class.

Parameters:	X (2d array of ints) – Array to evaluate for diversity thresh (float) – Percent of species that need to be unique.
Returns:	keep – Array where true means there is more than one class in that row.
Return type:	1d boolean array

Examples

>>> x = np.array([[1 1 1 1]
[2 1 2 3]
[2 2 2 2]
[3 2 1 4]])
>>> keep_diversity(x)
array([F,T,F,T])

skedm.utilities.kleckas_tau(preds, actual)¶

Calculates kleckas tau

Parameters:	preds (1D array) – Predicted values of shape (num samples,). actual (1D array) – Actual values of shape (num samples,).
Returns:	tau – Returns kleckas tau
Return type:	float

skedm.utilities.klekas_tau_spatial(X, max_lag, percent_calc=0.5)¶

Calculates the kleckas tau value between a shifted and unshifted slice of the space.

Parameters:

X (2D array) – Spatail image.
max_lag (integer) – Maximum amount to shift the space.
percent_calc (float) – How many rows and columns to use average over. Using the whole space is overkill.

Returns:

R_mut (1D array) – Klekas tau averaged down the rows (vertical).
C_mut (1-D array) – Klekas tau averaged across the columns (horizontal).
r_mi (2-D array) – Klekas tau down each row (vertical).
c_mi (2-D array) – Klekas tau across each columns (horizontal).

skedm.utilities.mi_digitize(X)¶

Digitize a time series for mutual information analysis

Parameters:	X (1D array) – Array to be digitized of length m.
Returns:	Y – Digitized array of length m.
Return type:	1D array

skedm.utilities.score(preds, actual)¶

Calculates the coefficient of determination.

The coefficient R^2 is defined as (1 - u/v), where u is the regression sum of squares ((y_true - y_pred) ** 2).sum() and v is the residual sum of squares ((y_true - y_true.mean()) ** 2).sum(). Best possible score is 1.0, lower values are worse.

Parameters:	preds (1D array) – Predicted values of shape (num samples,) test (1D array) – Actual values of shape (num samples,)
Returns:	cc – Returns the coefficient of determination.
Return type:	float

skedm.utilities.variance_explained(preds, actual)¶

Explained variance between predicted values and actual values.

Parameters:	preds (1D array) – Predict values of shape (num samples,). actual (1D array) – Actual values of shape (num samples,).
Returns:	cc – Returns the correlation coefficient
Return type:	float

skedm.utilities.weighted_mean(X, distances)¶

Calculates the weighted mean given a set of values and their corresponding distances.

Only 1/distance is implemented. This essentially is just a weighted mean down axis=1.

Parameters:	X (2d array) – Training values of shape(nsamples,number near neighbors). distances (2d array) – Sorted distances to the near neighbors for the indices. Shape(nsamples,number near neighbors).
Returns:	w_mean – Weighted predictions.
Return type:	2d array

skedm.utilities.weighted_mode(a, w, axis=0)¶

This function is borrowed from sci-kit learn’s extmath.py

Returns an array of the weighted modal (most common) value in a

If there is more than one such value, only the first is returned. The bin-count for the modal bins is also returned.

This is an extension of the algorithm in scipy.stats.mode.

Parameters:

a (array_like) – n-dimensional array of which to find mode(s).
w (array_like) – n-dimensional array of weights for each value.
axis (int, optional) – Axis along which to operate. Default is 0, i.e. the first axis.

Returns:

vals (ndarray) – Array of modal values.
score (ndarray) – Array of weighted counts for each mode.

Examples

>>> from sklearn.utils.extmath import weighted_mode
>>> x = [4, 1, 4, 2, 4, 2]
>>> weights = [1, 1, 1, 1, 1, 1]
>>> weighted_mode(x, weights)
(array([ 4.]), array([ 3.]))

The value 4 appears three times: with uniform weights, the result is simply the mode of the distribution.

>>> weights = [1, 3, 0.5, 1.5, 1, 2] # deweight the 4's
>>> weighted_mode(x, weights)
(array([ 2.]), array([ 3.5]))

The value 2 has the highest score: it appears twice with weights of 1.5 and 2: the sum of these is 3.