Principal component analysis (PCA)

What is the analysis of the main components?

Machine Learning techniques need large volumes of data to create efficient models of quality. However, often, the training data sets contain a lot of irrelevant data or data that provides little information. The feature selection algorithms analyse the input data, classify them in different subsets and define a metric than can be used to assess the relevance of the information provided by each of them. Then, they will discard the working dataset of those characteristics or fields that contribute less information, allowing them to save data storage, and the execution time that leads to a more efficient model.

The analysis of the main components (Principal Component Analysis or PCA) is one of the most common feature selection algorithms.

It consists of a specific feature selection technique that uses an orthogonal transformation to convert a set of observations of variables, possibly correlated, into a smaller set of variables that are no longer correlated; known as major components.

The main question that helps us to understand this method is " How many parameters of the dataset are necessary to explain a significant variation?" That´s to say, it is evident, when discarding parameters or variables, that some information will always be lost. The issue is to assess how much information we can afford to ´lose´ by discarding certain parameters in order to obtain a quicker and more efficient model.