Machine Learning Process

Overview


Dataset Preparation


  • Feature Extraction : is the proces of computing derived features (new columns or properties) from the existing columns (properties) in the dataset.
  • Annotation : is the process used in supervised learning to annotate a dataset.
  • Data Cleansing : is the process of pre-processing a dataset in order to deal with anomalies in the data for which the model is not constructed to deal with. For example, some datasets contain missing values or corrupted values. Depending on the model, one may need to either filter the data or utilize some data cleansing protocol prior to feeding to the model.
  • Data Partitioning : partitions a data set into the learning set and test sets.

Choosing and Applying an Algorithm


The first step is to understand the type of learning response that we seek from the algorithm. That is, is the label that we are trying predict one out of a finite set of labels (a classification), or is it more like a forecast, that is a real number drawn from a continuous range of values (a regression).

Often, a given problem could be viewed as either a classification or a regression.

Once the type of response function is known, the algorithms that can produce the type of response is curtailed. If there is more than one algorithm that can apply, it is possible to build an ensemble out of all the applicable models.

Evaluation


Evaluation is the process of evaluating the effectiveness of a machine learning model.

Contents