Axiros | Open Device & Service Management

View Original

A Cursory Look At Machine Learning


Since a large set of posts on this blog deal with Machine Learning (ML), I decided to write up a cursory look on the ML field. This post can then be referenced from other posts for a deeper introduction into the topic.

ML is the process of training a model using a training dataset and an optimiziation algorithm such that a certain task is fulfilled. There are two major categories of ML algorithms: Supervised ones and unsupervised ones. The dataset in supervised ML contains both input data X and target data Y. The ML algorithm then learns a function that models input to output in a way such that the error of the output is minimized, i.e. given a certain input, the output of the model shall be as close as possible to the corresponding target. Usually, the dataset is split in two — a training dataset and a test dataset. The ML algorithm trains using the training dataset only and uses the test dataset to evaluate whether the algorithm generalizes to unseen input data. Exemplary tasks for supervised ML are classification and regression.

Classification consists of finding the class of input data, e.g. given a picture of a digit between 0 and 9 written by a human, classify which of the 10 digits is shown. The dataset of these written digits is very well known in the ML community as the MNIST dataset.

Regression on the other hand consists of not just predicting a class, but any kind of numerical data. An easy example is predicting the temperature tomorrow, given the historical daily temperature in the last 10 years. Another one is simply training a neural network to multiply two numbers - this is also a regression task. The target may also be high-dimensional, i.e. a vector of numbers. One classical approach to modeling regression problems is using Linear Regression.

As for unsupervised learning, the dataset now only contains input data and no target data. The idea is to let the ML algorithm find a structure in the data without specifying the structure in detail. A few well-known tasks are clustering or outlier detection.

Clustering consists of grouping the data by their similarity into a pre-defined number of clusters. One of the most popular algorithms here is k-means. Some algorithms, such as density-based ones, do not require the number of clusters (which may not be easily defined dependent on the data), e.g. DBSCAN or OPTICS.

Clustering has been very useful for us, c.f. the FBC downstream SNR visualization blog post and the pre-equalization grouping blog post.

For outlier detection, exemplary algorithms are Local Outlier Factor or AutoEncoders. AutoEncoders are based on neural networks — a very versatile structure that is available for both supervised and unsupervised learning. Neural networks come in many shapes and can be decomposed into two parts: The structure itself, i.e. a specific type of graph consisting of layers where nodes contain biases and edges contain weights, and an optimizer that updates the weights and biases iteratively with the training data available.

The Figure to the right shows the structure of a fully connected neural network (a specific type of neural network), where each node is connected to each other of the next layer, with three layers of sizes 3, 4 and 3, where the first layer is the input layer, the middle one is the hidden layer and the last one is the output layer. After each layer there can be an activation function that is non-linear most of the time. For a deep introduction into Machine Learning the interested reader is referred to Bishop's Pattern recognition and Machine Learning. Lastly, the neural network shown here is in fact a deep learning model, since it contains a hidden layer. For a deep introduction into Deep Learning, the interested reader is referred to Deep Learning by Goodfellow et al.