classifiers Module¶

Classification algorithms for supervised learning tasks.

@author: drusk

class pml.supervised.classifiers.AbstractClassifier(training_set)[source]¶

This is the base class which classification algorithms should extend. It provides the common functionality for each classifier.

__init__(training_set)[source]¶

Constructs the classifier. Subclasses may have additional parameters in their constructors.

Args:

training_set:: A labelled DataSet object used to train the classifier.

Raises:

UnlabelledDataSetError if the training set is not labelled.

classify(sample)[source]¶

Predicts a sample’s classification based on the training set.

Args:

sample:: the sample or observation to be classified.

Returns:

The sample’s classification.

Raises:

InconsistentFeaturesError if the sample doesn’t have the same features as the training data.

classify_all(dataset)[source]¶

Predicts the classification of each sample in a dataset.

Args:

dataset: DataSet compatible object (see DataSet constructor): the dataset whose samples (observations) will be classified.

Returns:

A ClassifiedDataSet which contains the classification results for each sample. It also contains the original data.

class pml.supervised.classifiers.ClassifiedDataSet(dataset, classifications)[source]¶

A collection of data which has been analysed by a classification algorithm. It contains both the original DataSet and the results of the classification. It provides methods for analysing these classification results.

__init__(dataset, classifications)[source]¶

Creates a new ClassifiedDataSet.

Args:

dataset: model.DataSet: A dataset which has been classified but does not hold the results.
classifications: pandas.Series: A Series with the classification results.

compute_accuracy()[source]¶

Calculates the percent accuracy of classification results.

Returns:: The percent accuracy of the classification results, i.e. the number of samples correctly classified divided by the total number of samples. Should be a floating point number between 0 and 1.
Raises:: UnlabelledDataSetError if the dataset is not labelled.

get_classifications()[source]¶

Retrieves the classifications computed for this dataset.

Returns:: A pandas Series containing each sample’s classification.

classifiers Module¶

Project Versions

Previous topic

Next topic

This Page

Navigation

classifiers Module¶

Project Versions

RTD Search

Previous topic

Next topic

This Page

Quick search

Navigation