crunchers.sklearn_helpers package

Submodules

crunchers.sklearn_helpers.assessment module

Provide helper functions for working with scikit-learn based objects.

crunchers.sklearn_helpers.assessment.confusion_matrix_to_pandas(cm, labels)[source]

Return the confusion matrix as a pandas dataframe.

It is created from the confusion matrix stored in cm with rows and columns labeled with labels.

crunchers.sklearn_helpers.assessment.normalize_confusion_matrix(cm)[source]

Return confusion matrix with values as fractions of outcomes instead of specific cases.

crunchers.sklearn_helpers.assessment.plot_confusion_matrix(cm, labels=None, cmap='Blues', title=None, norm=False, context=None, annot=True)[source]

Plot and return the confusion matrix heatmap figure.

crunchers.sklearn_helpers.exploration module

Provide functions that help quickly explore datasets with sklearn.

class crunchers.sklearn_helpers.exploration.KMeansReport(data, n_clusters, seed=None, n_jobs=-1, palette='deep')[source]

Bases: object

Manage KMeans Clustering and exploration of results.

cluster()[source]

Fit each estimator.

eval_silhouette(verbose=True)[source]

Evaluate each estimator via silhouette score.

init_estimators()[source]

Set up and return dictionary of estimators with key = n_clusters.

plot_silhouette_results(feature_names=None, feature_space=None)[source]

Perform plotting similar to that from sklearn link below.

http://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_silhouette_analysis.html

class crunchers.sklearn_helpers.exploration.PCAReport(data, pca=None, n_components=None, data_labels=None, color_palette=None, label_colors=None, name=None)[source]

Bases: object

Manage PCA and exploration of results.

filter_by_loadings(kind, column, hi_thresh, lo_thresh)[source]

Return index of row names.

kind (str): either [‘pearsonr’,’spearmanr’] column (str): which PC column to filter hi_thresh (float): retain rows with >= hi_thresh lo_thresh (float): retain rows with <= lo_thresh

get_loading_corr(kind='pearsonr')[source]

Return dataframe of correlation based “loadings” repective of kind.

get_pcs(rerun=True)[source]

Fit and Transform via our local PCA object; store results in self.pcs.

n_components

Provide access to the number of PCs.

plot_pcs(components=None, label_colors=None, diag='kde', diag_kws=None, **kwargs)[source]

Plot scatter-plots below the diagonal and density plots on the diagonal.

components (list): list of components to plot

label_colors = {‘label1’:’g’,
‘label2’:’r’, ‘label3’:’b’ }
plot_variance_accumulation(thresh=6, verbose=False)[source]

Plot variance accumulation over PCs.

plot_variance_decay(thresh=6, verbose=False)[source]

Plot variance decay over PCs.

crunchers.sklearn_helpers.misc module

Collect misc sklearn helpers here.

crunchers.sklearn_helpers.misc.repandasify(array, y_names, X_names=None)[source]

Convert numpy array into pandas dataframe using provided index and column names.

Module contents