What is Machine Learning?#

Machine learning (ML) is a form of programming that uses data to train a computer to perform tasks. Unlike traditional programming in which rules are programmed explicitly, machine learning uses data to induce rulesets automatically. At a high level, machine learning is a collection of techniques borrowed from many disciplines such as statistics, probability theory, and information theory combined with novel ideas for the purpose of gaining insight through data and computation. Machine Learning is further broken down into subcategories based on how the learners are trained and the tasks they handle.

Supervised Learning#

Supervised learning is a type of machine learning that incorporates a training signal in the form of human annotations called labels. Labels are the desired output of a learner given the sample we are showing it. For this reason, you can think of supervised leaning as learning by example. There are two types of supervised learning to consider in Rubix ML.

Classification#

For classification problems, a learner is trained to differentiate samples among a set of k possible discrete classes. In this type of problem, the training labels are the classes that each sample belongs to. Examples of class labels include cat, dog, human, etc. Classification problems range from simple to very complex and include image recognition, text sentiment analysis, and Iris flower classification.

Regression#

Regression is a learning problem that aims to predict a continuous-valued outcome. In this case, the training labels are continuous data types such as integers and floating point numbers. Unlike classifiers, the range of predictions that a regressor can make is infinite. Regression problems include estimating the sale price of a home, credit scoring, and determining the steering angle of a self-driving vehicle.

Unsupervised Learning#

A form of learning that does not require training labels is called Unsupervised learning. Unsupervised learners focus on digesting patterns within raw samples. There are three types of unsupervised learning to consider in Rubix ML.

Clustering#

Clustering takes a dataset and assigns each of the samples a discrete cluster number based on its similarity to other samples from the training set. It can be looked at as a weaker form of classification where the labels of the classes are unknown. Clustering is used in tissue differentiation from PET scan images, customer database market segmentation, and to discover communities within social networks.

Anomaly Detection#

Anomalies are defined as samples that have been generated by a different process than normal or those that do not conform to the expected distribution of the training data. Samples can either be flagged or ranked based on their anomaly score. Anomaly detection is used in information security for intrusion and denial of service detection, and in the financial industry to detect fraud.

Manifold Learning#

Manifold learning is a type of unsupervised non-linear dimensionality reduction used for embedding datasets into dense feature representations. Embedders are used for visualizing high dimensional (3 or more) datasets in low (1 to 3) dimensions, and for compressing samples before input to a learning algorithm.

Deep Learning#

Deep Learning is a subset of machine learning that involves layers of computation that form feature representations of greater and greater complexity. It is a paradigm shift from human-engineered features to letting the learner construct its own features from the raw data. Deep Learning is used in image recognition, natural language processing, and for other tasks demanding very high-dimensionality.

Other Forms of ML#

Although the supervised and unsupervised learning framework covers a substantial number of problems, there are other types of machine learning that the library does not support out of the box.

Reinforcement Learning#

Reinforcement Learning (RL) is a type of machine learning that aims to learn the optimal control of an agent within an environment through cumulative reward. The data used to train an RL learner are the states obtained by performing some action and then observing the response. If supervised learning is learning by example then reinforcement learning is learning from mistakes. Reinforcement learning is used to train AIs to play games such as Go, Chess, and Starcraft 2, as well as in robotics for learning movements such as walking or grasping.