Review Article

Gastroenterology Meets Machine Learning: Status Quo and Quo Vadis

Table 1

Overview of ML techniques.

Algorithm Overview

Artificial neural networks (ANN)ANN is inspired by interconnections between neurons in biological neural networks. It consists of a set of nodes configured in layers (input, hidden, and output), connected to one another via weighted edges. Input feature vectors are processed sequentially by every layer in the net via non-linear transformations, before an output is generated upon reaching the final layer. During the training process, if the output of the ANN is incorrect, an algorithm known as backpropagation distributes the error term back up through the layers, by modifying the weights at each edge. ANN can be supervised or unsupervised. More recently, there has been a resurgence of interest in multi-layered ANNs or Deep Learning (DL), given their ability to work well with complex and high-dimensional data sets. Convolutional Neural Network (CNN), a variation of DL, is a useful technique used in image classification.

Support vector machine (SVM)SVM is a discriminative classifier formally defined by a separating hyperplane. In other words, given labelled training data, the algorithm outputs an optimal hyperplane that categorizes new examples.

Decision Tree (DT)DT is the simplest tree-based supervised ML model. The aim is to recursively construct a tree structure, in which each internal node represents a condition based on which the tree splits into branches/ edges. The end of the branch that does not split anymore is the decision/leaf.
Importantly, trees can be combined using ensemble learning to yield potent classifiers such as Random Forests (RF) and Boosted Trees.

k-Nearest neighbours (KNN)KNN is supervised algorithm that classifies new data by a majority vote of its neighbors, with the data being assigned to the class most common amongst its K nearest neighbors measured by a distance function.

Logistic regression (LR)LR is a traditional statistical method for solving binary classification problems (problems with two class values). It predicts the probability of occurrence of an event by fitting data to a logistic function.

K-mean clustering (KM)KM is a popular unsupervised ML algorithm. The algorithm works iteratively to partition data into k clusters in which each object belongs to the cluster with the nearest mean. This technique produces exactly k different clusters of greatest possible distinction. The best number of clusters k leading to the greatest separation (distance) is not known a priori and must be computed from the data.