classifiers Module
Classification algorithms for supervised learning tasks.
@author: drusk
-
class pml.supervised.classifiers.AbstractClassifier(training_set)[source]
This is the base class which classification algorithms should extend. It
provides the common functionality for each classifier.
-
__init__(training_set)[source]
Constructs the classifier. Subclasses may have additional parameters
in their constructors.
- Args:
- training_set:
- A labelled DataSet object used to train the classifier.
- Raises:
- UnlabelledDataSetError if the training set is not labelled.
-
classify(sample)[source]
Predicts a sample’s classification based on the training set.
- Args:
- sample:
- the sample or observation to be classified.
- Returns:
- The sample’s classification.
- Raises:
- InconsistentFeaturesError if the sample doesn’t have the same
features as the training data.
-
classify_all(dataset)[source]
Predicts the classification of each sample in a dataset.
- Args:
- dataset: DataSet compatible object (see DataSet constructor)
- the dataset whose samples (observations) will be classified.
- Returns:
- A ClassifiedDataSet which contains the classification results for
each sample. It also contains the original data.
-
class pml.supervised.classifiers.ClassifiedDataSet(dataset, classifications)[source]
A collection of data which has been analysed by a classification
algorithm. It contains both the original DataSet and the results of
the classification. It provides methods for analysing these
classification results.
-
__init__(dataset, classifications)[source]
Creates a new ClassifiedDataSet.
- Args:
- dataset: model.DataSet
- A dataset which has been classified but does not hold the results.
- classifications: pandas.Series
- A Series with the classification results.
-
compute_accuracy()[source]
Calculates the percent accuracy of classification results.
- Returns:
- The percent accuracy of the classification results, i.e. the number
of samples correctly classified divided by the total number of
samples. Should be a floating point number between 0 and 1.
- Raises:
- UnlabelledDataSetError if the dataset is not labelled.
-
get_classifications()[source]
Retrieves the classifications computed for this dataset.
- Returns:
- A pandas Series containing each sample’s classification.