Machine Learning Process
Overview
Dataset Preparation
- Feature Extraction : is the proces
of computing derived features (new columns or properties) from the existing columns (properties) in the dataset.
- Annotation : is the process
used in supervised learning to annotate a dataset.
- Data Cleansing : is the process
of pre-processing a dataset in order to deal with anomalies in the data for which the model is not constructed to deal with.
For example, some datasets contain missing values or corrupted values. Depending on the model, one may need to either filter
the data or utilize some data cleansing protocol prior to feeding to the model.
- Data Partitioning :
partitions a data set into the learning set and test sets.
Choosing and Applying an Algorithm
The first step is to understand the
type of learning response
that we seek from the algorithm. That is, is the label that we are trying predict one out of a finite
set of labels (a classification), or is it more like a forecast, that is a real number drawn from a
continuous range of values (a regression).
Often, a given problem could be viewed as either a classification or a regression.
Once the type of response function is known, the algorithms that can produce the type of response is
curtailed. If there is more than one algorithm that can apply, it is possible to build an
ensemble out of all the applicable models.
Evaluation
Evaluation
is the process of evaluating the effectiveness of a machine learning model.