crunchers.sklearn_helpers package


crunchers.sklearn_helpers.assessment module

Provide helper functions for working with scikit-learn based objects.

crunchers.sklearn_helpers.assessment.confusion_matrix_to_pandas(cm, labels)[source]

Return the confusion matrix as a pandas dataframe.

It is created from the confusion matrix stored in cm with rows and columns labeled with labels.


Return confusion matrix with values as fractions of outcomes instead of specific cases.

crunchers.sklearn_helpers.assessment.plot_confusion_matrix(cm, labels=None, cmap='Blues', title=None, norm=False, context=None, annot=True)[source]

Plot and return the confusion matrix heatmap figure.

crunchers.sklearn_helpers.exploration module

Provide functions that help quickly explore datasets with sklearn.

class crunchers.sklearn_helpers.exploration.KMeansReport(data, n_clusters, seed=None, n_jobs=-1, palette='deep')[source]

Bases: object

Manage KMeans Clustering and exploration of results.


Fit each estimator.


Evaluate each estimator via silhouette score.


Set up and return dictionary of estimators with key = n_clusters.

plot_silhouette_results(feature_names=None, feature_space=None)[source]

Perform plotting similar to that from sklearn link below.

class crunchers.sklearn_helpers.exploration.PCAReport(data, pca=None, n_components=None, data_labels=None, color_palette=None, label_colors=None, name=None)[source]

Bases: object

Manage PCA and exploration of results.

filter_by_loadings(kind, column, hi_thresh, lo_thresh)[source]

Return index of row names.

kind (str): either [‘pearsonr’,’spearmanr’] column (str): which PC column to filter hi_thresh (float): retain rows with >= hi_thresh lo_thresh (float): retain rows with <= lo_thresh


Return dataframe of correlation based “loadings” repective of kind.


Fit and Transform via our local PCA object; store results in self.pcs.


Provide access to the number of PCs.

plot_pcs(components=None, label_colors=None, diag='kde', diag_kws=None, **kwargs)[source]

Plot scatter-plots below the diagonal and density plots on the diagonal.

components (list): list of components to plot

label_colors = {‘label1’:’g’,
‘label2’:’r’, ‘label3’:’b’ }
plot_variance_accumulation(thresh=6, verbose=False)[source]

Plot variance accumulation over PCs.

plot_variance_decay(thresh=6, verbose=False)[source]

Plot variance decay over PCs.

crunchers.sklearn_helpers.misc module

Collect misc sklearn helpers here.

crunchers.sklearn_helpers.misc.repandasify(array, y_names, X_names=None)[source]

Convert numpy array into pandas dataframe using provided index and column names.

Module contents