“Machine learning is the field of study that gives computers the ability to learn without being explicitly programmed.”Arthur Samuel – 1959
This article gives a quick brief about machine learning, its different types, and some examples of known machine learning algorithms
The goal of machine learning is to develop methods that can automatically detect patterns in data and then to use the uncovered patterns to predict future data or other outcomes of interest.
Through this article we are going to use the term training data set to refer to the historical data used by the machine learning algorithm to detect the hidden patterns. And we going to use the term new observations to refer to future data that the algorithm will be used on to predict certain outcomes.
Machine learning algorithms falls into three main types depending on the available input data and learning style: supervised, unsupervised, and reinforcement learning.
Below is a diagram from scikit-learn website that helps the library user identify which machine learning algorithm to choose depending on the type of the training dataset.
Supervised learning algorithms are the algorithms concerned with labelled training data sets. It applies its knowledge about those labels and apply them on new observations so as to try and predict their labels.
Supervised learning algorithm can be further categorised into classification algorithms and regression algorithms.
What is Classification?
Classification is used when the label to predict by the algorithm is discrete, which means the possible values for that label is finite. We call that label, the class or the dependent attribute. The other known information about the data set entries are called the independent variables.
As an analogy, think of a little toddler that is learning about colours. You might teach him/her by keep introducing pens of some different colours and teach him/her the name of each colour. Later after the toddler is well trained, he/she will be able to tell the colour of a car, although cars were not used during the training.
How is Regression Different?
Regression is pretty much the same problem as classification with one significant different, the dependent attribute is a continuous value, not a class.
An analogy to this kind of learning would be a child that is asked to order his friends in class according to their weights without using a scale. The child might use his knowledge and relate the height and shape of his friends bodies to decide which friend would weigh more than the other.
What are Some Use Cases?
An example for a use case where machine learning classification would be used is classifying a medical scanned image either as positive or negative to a medical condition using a historical repository of other patient images where the classification is already stored in the data set.
Another day to day example as well would be your email server identifying an email to be spam or not depending both on the information in that email and on the history of other emails and may be how you identified them yourself as spam in the past.
An example use case for regression problem is deciding how much your house would sell for using information about the house such as number of rooms and land area, comparing it to a training dataset which has the same information about other houses and the prices they were sold for. Because price, the target attribute, is a continues value, this is considered a regression problem.
In this type of learning, the computer is left with data with no labels. The computer is on its own to find hidden patterns and categorise data.
An analogy to this type of learning is the task of organising your DVDs shelf to ease that task of finding a title. After having a look at the DVDs and what attributes you will order them by, you might decide to group the titles with the same genre together.
What are Some Real Life Examples?
An example for this kind of learning is the case of a pants designer company that has a database of customers waist size against height. The company cannot produce every combination between the two attributes. A machine learning algorithm could be used to find out where the data clusters are centred, knowing that manufacturing those sizes would target as many customers as possible.
The computer in this type of learning dynamically interacts with its environment and learn while trying to achieve a certain goal. The program is provided feedback by rewards and punishments while trying to reach its goal in the problem space.
An analogy for this type of learning is pets training, a pet would be rewarded every time it does the right thing, and will be punished otherwise. Bit by bit, the pet will do good more than bad.
What is This Useful For?
As an example, reinforcement learning is used to teach computer based game players. The computer program tries to play the moves that will maximise a certain heuristic function that evaluates the player situation or chances to win. The heuristic function value represents the reward in this case.
Classification, regression, clustering, etc are all the problems and to solve those problems there are existing algorithms that solves the problem in different ways. This is by no mean a complete list of known machine learning algorithms, but it lists some of the famous ones. Hopefully in the feature, there will be an article to address each algorithm separately.
- Naïve Bayes
- Decision Trees
- K-Nearst Neighbors
- Support Vector Machines
- Linear regression
- Brute Force
This Wikipedia page has sample datasets that can fit the purpose for developing and testing machine learning algorithms.