Dr Terry Windeatt
Academic and research departmentsCentre for Vision, Speech and Signal Processing (CVSSP), Department of Electrical and Electronic Engineering.
Terry Windeatt received the BSc degree in Applied Science from University of Sussex, UK followed by M.Sc. in Electronic Engineering from University of California, B.A.(CNAA) in Theology and PhD degree from University of Surrey, U.K. After lecturing in Control Engineering at Kingston University, UK, he went to live and work in the USA for eight years. He worked on Intelligent Systems in the Research and Development Departments of General Motors and Xerox Corporation in Rochester, NY. His industrial R&D experience is in modelling/simulation for intelligent automotive and office-copying applications. He worked on early versions of closed loop control systems for car emissions and xerographic process. He returned from the United States to join the Department of Electrical and Electronic Engineering at the University of Surrey, where he now lectures in Machine Intelligence. He has worked on various research projects in the Centre for Vision, Speech and Signal Processing in the areas of Pattern Recognition, Neural Nets and Computer Vision.
eee3005 Control Engineering
eeem005 AI and AI Programming
Bias and variance (B&V) decomposition is frequently used as a tool for analysing classification performance. However, the standard B&V terminologies were originally defined for the regression setting and their extensions to classification has led to several different models / definitions in the literature. Although the relation between some of these models has previously been explored, their links to the standard terminology in terms of the Bayesian statistics has not been established. In this paper, we aim to provide this missing link via employing the frameworks of Tumer & Ghosh (T&G) and James. By unifying the two approaches, we relate the classification B&V defined for the 0/1 loss to the standard B&V of the boundary distributions given for the squared error loss. The closed form relationships derived in this study provide deeper understanding of the classification performance, and their example uses on predictor design and analysis are demonstrated in two case studies.
— A spectral approximation of a Boolean function is proposed for approximating the decision boundary of an ensemble of Deep Neural Networks (DNNs) solving two-class pattern recognition problems. The Walsh combination of relatively weak DNN classifiers is shown experimentally to be capable of detecting adversarial attacks. By observing the difference in Walsh coefficient approximation between clean and adversarial images, it appears that transferability of attack may be used for detection. Approximating the decision boundary may also aid in understanding the learning and transferability properties of DNNs. While the experiments here use images, the proposed approach of modelling two-class ensemble decision boundaries could in principle be applied to any application area. Index Terms—Adversarial robustness, Boolean functions, ensemble, deep neural networks, machine learning, pattern analysis, spectral analysis.
Ensemble learning is a method of combining learners to obtain more reliable and accurate predictions in supervised and unsupervised learning. However, the ensemble sizes are sometimes unnecessarily large which leads to additional memory usage, computational overhead and decreased effectiveness. To overcome such side effects, pruning algorithms have been developed; since this is a combinatorial problem, finding the exact subset of ensembles is computationally infeasible. Different types of heuristic algorithms have developed to obtain an approximate solution but they lack a theoretical guarantee. Error Correcting Output Code (ECOC) is one of the well-known ensemble techniques for multiclass classification which combines the outputs of binary base learners to predict the classes for multiclass data. In this paper, we propose a novel approach for pruning the ECOC matrix by utilizing accuracy and diversity information simultaneously. All existing pruning methods need the size of the ensemble as a parameter, so the performance of the pruning methods depends on the size of the ensemble. Our unparametrized pruning method is novel as being independent of the size of ensemble. Experimental results show that our pruning method is mostly better than other existing approaches.
We compare experimentally the performance of three approaches to ensemble-based classification on general multi-class datasets. These are the methods of random forest, error-correcting output codes (ECOC) and ECOC enhanced by the use of bootstrapping and class-separability weighting (ECOC-BW). These experiments suggest that ECOC-BW yields better generalisation performance than either random forest or unmodified ECOC. A bias-variance analysis indicates that ECOC benefits from reduced bias, when compared to random forest, and that ECOC-BW benefits additionally from reduced variance. One disadvantage of ECOC-based algorithms, however, when compared with random forest, is that they impose a greater computational demand leading to longer training times.
We outline a design for a FACS-based facial expression recognition system and describe in more detail the implementation of two of its main components. Firstly we look at how features that are useful from a pattern analysis point of view can be extracted from a raw input image. We show that good results can be obtained by using the method of local binary patterns (LPB) to generate a large number of candidate features and then selecting from them using fast correlation-based ltering (FCBF). Secondly we show how Platt scaling can be used to improve the performance of an error-correcting output code (ECOC) classi er.
Error Correcting Output Coding (ECOC) is a multi- class classification technique in which multiple binary classifiers are trained according to a preset code matrix such that each one learns a separate dichotomy of the classes. While ECOC is one of the best solutions for multi-class problems, one issue which makes it suboptimal is that the training of the base classifiers is done independently of the generation of the code matrix. In this paper, we propose to modify a given ECOC matrix to improve its performance by reducing this decoupling. The proposed algorithm uses beam search to iteratively modify the original matrix, using validation accuracy as a guide. It does not involve further training of the classifiers and can be applied to any ECOC matrix. We evaluate the accuracy of the proposed algorithm (BeamE- COC) using 10-fold cross-validation experiments on 6 UCI datasets, using random code matrices of different sizes, and base classifiers of different strengths. Compared to the random ECOC approach, BeamECOC increases the average cross-validation accuracy in 83 : 3% of the experimental settings involving all datasets, and gives better results than the state-of-the-art in 75% of the scenarios. By employing BeamECOC, it is also possible to reduce the number of columns of a random matrix down to 13% and still obtain comparable or even better results at times.
By performing experiments on publicly available multi-class datasets we examine the effect of bootstrapping on the bias/variance behaviour of error-correcting output code ensembles. We present evidence to show that the general trend is for bootstrapping to reduce variance but to slightly increase bias error. This generally leads to an improvement in the lowest attainable ensemble error, however this is not always the case and bootstrapping appears to be most useful on datasets where the non-bootstrapped ensemble classifier is prone to overfitting.
A spectral analysis of a Boolean function is proposed for ap- proximating the decision boundary of an ensemble of classifiers, and an in- tuitive explanation of computing Walsh coefficients for the functional ap- proximation is provided. It is shown that the difference between first and third order coefficient approximation is a good indicator of optimal base classifier complexity. When combining Neural Networks, experimental re- sults on a variety of artificial and real two-class problems demonstrate un- der what circumstances ensemble performance can be improved. For tuned base classifiers, first order coefficients provide performance similar to ma- jority vote. However, for weak/fast base classifiers, higher order coefficient approximation may give better performance. It is also shown that higher order coefficient approximation is superior to the Adaboost logarithmic weighting rule when boosting weak Decision Tree base classifiers.
Ensemble learning is a method of combining learners, however the ensemble sizes are sometimes unnecessarily large which causes extra memory usage and decrease in effectiveness. Error Correcting Output Code (ECOC) is one of the well known ensemble techniques for multiclass classiﬁcation which combines the outputs of binary base learners to predict the classes for multiclass data. We formulate ECOC for ensemble selection problem by using difference of convex functions (dc) programming and zero norm approximation to cardinality constraint. Experiments show that it outperforms the standard ECOC.
PC and TPDA algorithms are robust and well known prototype algorithms, incorporating constraint-based approaches for causal discovery. However, both algorithms cannot scale up to deal with high dimensional data, that is more than few hundred features. This chapter presents hybrid correlation and causal feature selection for ensemble classifiers to deal with this problem. Redundant features are removed by correlation-based feature selection and then irrelevant features are eliminated by causal feature selection. The number of eliminated features, accuracy, the area under the receiver operating characteristic curve (AUC) and false negative rate (FNR) of proposed algorithms are compared with correlation-based feature selection (FCBF and CFS) and causal based feature selection algorithms (PC, TPDA, GS, IAMB).
A new surface based approach to implicit surface polygonisation is introduced. This is applied to the reconstruction of 3D surface models of complex objects from multiple range images. Geometric fusion of multiple range images into an implicit surface representation was presented in previous work. This paper introduces an efficient algorithm to reconstruct a triangulated model of a manifold implicit surface, a local 3D constraint is derived which defines the Delaunay surface triangulation of a set of points on a manifold surface in 3D space. The `marching triangles' algorithm uses the local 3D constraint to reconstruct a Delaunay triangulation of an arbitrary topology manifold surface. Computational and representational costs are both a factor of 3-5 lower than previous volumetric approaches such as marching cubes
This paper concentrates on the comparisons of systems that are used for the recognition of expressions generated by six upper face action units (AUs) by using Facial Action Coding System (FACS). Haar wavelet, Haar-Like and Gabor wavelet coe cients are compared, using Adaboost for feature selection. The binary classi cation results by using Support Vector Machines (SVM) for the upper face AUs have been observed to be better than the current results in the literature, for example 96.5% for AU2 and 97.6% for AU5. In multi-class classi cation case, the Error Correcting Output Coding (ECOC) has been applied. Although for a large number of classes, the results are not as accurate as the binary case, ECOC has the advantage of solving all problems simultaneously; and for large numbers of training samples and small number of classes, error rates are improved.
Error Correcting Output Coding (ECOC) is a multiclass classification technique, in which multiple base classifiers (dichotomizers) are trained using subsets of the training data, determined by a preset code matrix. While it is one of the best solutions to multiclass problems, ECOC is suboptimal, as the code matrix and the base classifiers are not learned simultaneously. In this paper, we show an iterative update algorithm that reduces this decoupling. We compare the algorithm with the standard ECOC approach, using Neural Networks (NNs) as the base classifiers, and show that it improves the accuracy for some well-known data sets under different settings.
A feature ranking scheme for multilayer perceptron (MLP) ensembles is proposed, along with a stopping criterion based upon the out-of-bootstrap estimate. To solve multi-class problems feature ranking is combined with modified error-correcting output coding. Experimental results on benchmark data demonstrate the versatility of the MLP base classifier in removing irrelevant features.
We consider a multiple classifier system which combines the hard decisions of experts by voting. We argue that the individual experts should not set their own decision thresholds. The respective thresholds should be selected jointly as this will allow compensation of the weaknesses of some experts by the relative strengths of the others. We perform the joint optimization of decision thresholds for a multiple expert system by a systematic sampling of the multidimensional decision threshold space. We show the effectiveness of this approach on the important practical application of video shot cut detection.
Irrelevant features may lead to degradation in accuracy and efficiency of classifier performance. In this paper, Bootstrap Causal Feature Selection (BCFS) algorithm is proposed. BCFS uses bootstrapping with a causal discovery algorithm to remove irrelevant features. The results are evaluated by the number of selected features and classification accuracy. According to the experimental results, BCFS is able to remove irrelevant features and provides slightly higher average accuracy than using the original features and causal feature selection. Moreover, BCFS also reduces complexity in causal graphs which provides more comprehensibility for the casual discovery system. © 2013 IEEE.
We outline a design for a FACS-based facial expression recognition system and describe in more detail the implementation of two of its main components. Firstly we look at how features that are useful from a pattern analysis point of view can be extracted from a raw input image. We show that good results can be obtained by using the method of local binary patterns (LPB) to generate a large number of candidate features and then selecting from them using fast correlation-based filtering (FCBF). Secondly we show how Platt scaling can be used to improve the performance of an error-correcting output code (ECOC) classifier.
Within the context face expression classication using the facial action coding system (FACS), we address the problem of detecting facial action units (AUs). The method adopted is to train a single error-correcting output code (ECOC) multiclass classier to estimate the probabilities that each one of several commonly occurring AU groups is present in the probe image. Platt scaling is used to calibrate the ECOC outputs to probabilities and appropriate sums of these probabilities are taken to obtain a separate probability for each AU individually. Feature extraction is performed by generating a large number of local binary pattern (LBP) features and then selecting from these using fast correlation-based ltering (FCBF). The bias and variance properties of the classifier are measured and we show that both these sources of error can be reduced by enhancing ECOC through the application of bootstrapping and class-separability weighting.
Within the context face expression classification using the facial action coding system (FACS), we address the problem of detecting facial action units (AUs). The method adopted is to train a single error-correcting output code (ECOC) multiclass classifier to estimate the probabilities that each one of several commonly occurring AU groups is present in the probe image. Platt scaling is used to calibrate the ECOC outputs to probabilities and appropriate sums of these probabilities are taken to obtain a separate probability for each AU individually. Feature extraction is performed by generating a large number of local binary pattern (LBP) features and then selecting from these using fast correlation-based filtering (FCBF). The bias and variance properties of the classifier are measured and we show that both these sources of error can be reduced by enhancing ECOC through the application of bootstrapping and class-separability weighting.
An approach to approximating the decision boundary of an ensemble of two-class classifiers is proposed. Spectral coefficients are used to approximate the discrete probability density function of a Boolean Function. It is shown that the difference between first and third order coefficient approximation is a good indicator of optimal base classifier complexity. A theoretical analysis is supported by experimental results on a variety of Artificial and Real two-class problems.
Existing ensemble pruning algorithms in the literature have mainly been defined for unweighted or weighted voting ensembles, whose extensions to the Error Correcting Output Coding (ECOC) framework is not successful. This paper presents a novel pruning algorithm to be used in the pruning of ECOC, via using a new accuracy measure together with diversity and Hamming distance information. The results show that the novel method outperforms those existing in the state-of-the-art.
To improve the performance of the computer-aided systems for breast cancer diagnosis, the ensemble classifier is proposed for classifying the histological structures in the breast cancer microscopic images into three region types: positive cancer cells, negative cancer cells and non-cancer cell (stromal cells and lymphocyte cells) image. The bagging and boosting ensemble techniques are used with the decision tree (DT) learner. They are also compared with the single classifier, DT. The feature used as an input of classifiers is the fractal dimension (FD) based 12 color channels. It is computed from the image datasets, which are manually prepared in small cropped image with 3 window sizes including 128×128 pixels, 192×192 pixels and 256×256 pixels. The results show that the boosting ensemble classifier gives the best accuracy about 80% from window size of 256, although it is the lowest when using the single DT as classifier. The results indicated that the ensemble method is capable of improving the accuracy in the classification compared to the single classifier. The classification model using FD and the ensemble classifier would be applied to develop the computer- aided systems for breast cancer diagnosis in the future.
A dynamic method of selecting a pruned ensemble of predictors for regression problems is described. The proposed method enhances the prediction accuracy and generalization ability of pruning methods that change the order in which ensemble members are combined. Ordering heuristics attempt to combine accurate yet complementary regressors. The proposed method enhances the performance by modifying the order of aggregation through distributing the regressor selection over the entire dataset. This paper compares four static ensemble pruning approaches with the proposed dynamic method. The experimental comparison is made using MLP regressors on benchmark datasets and on an industrial application of radio frequency source calibration. © 2014 Springer International Publishing Switzerland.
Facial action unit (au) classification is an approach to face expression recognition that decouples the recognition of expression from individual actions. In this paper, upper face aus are classified using an ensemble of MLP (Multi-layer perceptron) base classifiers with feature ranking based on PCA components. This approach is compared experimentally with other popular feature-ranking methods applied to Gabor features. Experimental results on Cohn-Kanade database demonstrate that the MLP ensemble is relatively insensitive to the feature-ranking method but optimized PCA features achieve lowest error rate. When posed as a multi-class problem using Error- Correcting-Output-Coding (ECOC), error rates are comparable to two-class problems (one-versus-rest) when the number of features and base classifier are optimized.
PC and TPDA algorithms are robust and well known prototype algorithms, incorporating constraint-based approaches for causal discovery. However, both algorithms cannot scale up to deal with high dimensional data, that is more than few hundred features. This paper presents hybrid correlation and causal feature selection for ensemble classifiers to deal with this problem. The number of eliminated features, accuracy, the area under the receiver operating characteristic curve (AUC) and false negative rate (FNR) of proposed algorithms are compared with correlation-based feature selection (FCBF and CFS) and causal based feature selection algorithms (PC, TPDA, GS, IAMB).
High dimensional data can lead to low accuracy of classification and take a long time to calculate because it contains irrelevant features and redundant features. To overcome this problem, dimension of data has to be reduced. Causal feature selection is one of methods for feature reduction but it cannot identify redundant features. This paper presents Parent-Children based for Causal Redundant Feature Identification (PCRF) algorithm to identify and remove redundant features. The accuracy of classification and number of feature reduced by PCRF algorithm are compared with correlation feature selection. According to the results, PCRF algorithm can identify redundant feature but has lower accuracy of classification than correlation feature selection.
One of the methods used to evaluate the performance of ensemble classifiers is bias and variance analysis. In this chapter, we analyse bootstrap aggregating (bagging) and Error Correcting Output Coding (ECOC) ensembles using a biasvariance framework; and make comparisons with single classifiers, while having Neural Networks (NNs) as base classifiers. As the performance of the ensembles depends on the individual base classifiers, it is important to understand the overall trends when the parameters of the base classifiers -nodes and epochs for NNs-, are changed.We show experimentally on 5 artificial and 4 UCI MLR datasets that there are some clear trends in the analysis that should be taken into consideration while designing NN classifier systems.
A method for applying weighted decoding to error-correcting output code ensembles of binary classifiers is presented. This method is sensitive to the target class in that a separate weight is computed for each base classifier and target class combination. Experiments on 11 UCI datasets show that the method tends to improve classification accuracy when using neural network or support vector machine base classifiers. It is further shown that weighted decoding combines well with the technique of bootstrapping to improve classification accuracy still further.
This paper addresses the problem of reconstructing an integrated 3D model from multiple 2.5D range images. A novel integration algorithm is presented based on a continuous implicit surface representation. This is the first reconstruction algorithm to use operations in 3D space only. The algorithm is guaranteed to reconstruct the correct topology of surface features larger than the range image sampling resolution. Reconstruction of triangulated models from multi-image data sets is demonstrated for complex objects. Performance characterization of existing range image integration algorithms is addressed in the second part of this paper. This comparison defines the relative computational complexity and geometric limitations of existing integration algorithms.
There are a variety of methods for inducing predictive systems from observed data. Many of these methods fall into the field of study of machine learning. Some of the most effective algorithms in this domain succeed by combining a number of distinct predictive elements to form what can be described as a type of committee. Well known examples of such algorithms are AdaBoost, bagging and random forests. Stochastic discrimination is a committee-forming algorithm that attempts to combine a large number of relatively simple predictive elements in an effort to achieve a high degree of accuracy. A key element of the success of this technique is that its coverage of the observed feature space should be uniform in nature. We introduce a new uniformity enforcement method, which on benchmark datasets, leads to greater predictive efficiency than the currently published method.
Within the context of facial expression classification using the facial action coding system (FACS), we address the problem of detecting facial action units (AUs). Feature extraction is performed by generating a large number of multi-resolution local binary pattern (MLBP) features and then selecting from these using fast correlation-based filtering (FCBF). The need for a classifier per AU is avoided by training a single error-correcting output code (ECOC) multi-class classifier to generate occurrence scores for each of several AU groups. A novel weighted decoding scheme is proposed with the weights computed using first order Walsh coefficients. Platt scaling is used to calibrate the ECOC scores to probabilities and appropriate sums are taken to obtain separate probability estimates for each AU individually. The bias and variance properties of the classifier are measured and we show that both these sources of error can be reduced by enhancing ECOC through bootstrapping and weighted decoding.
We investigate the effects, in terms of a bias-variance decomposition of error, of applying class-separability weighting plus bootstrapping in the construction of error-correcting output code ensembles of binary classifiers. Evidence is presented to show that bias tends to be reduced at low training strength values whilst variance tends to be reduced across the full range. The relative importance of these effects, however, varies depending on the stability of the base classifier type.
Two-class supervised learning in the context of a classifier ensemble may be formulated as learning an incompletely specified Boolean function, and the associated Walsh coefficients can be estimated without knowledge of the unspecified patterns. Using an extended version of the Tumer-Ghosh model, the relationship between Added Classification Error and second order Walsh coefficients is established. In this paper, the ensemble is composed of Multi-layer Perceptron (MLP) base classifiers, with the number of hidden nodes and epochs systematically varied. Experiments demonstrate that the mean second order coefficients peak at the same number of training epochs as ensemble test error reaches a minimum.
There are two approaches to automating the task of facial expression recognition, the first concentrating on what meaning is conveyed by facial expression and the second on categorising deformation and motion into visual classes. The latter approach has the advantage that the interpretation of facial expression is decoupled from individual actions as in FACS (Facial Action Coding System). In this chapter, upper face action units (aus) are classified using an ensemble of MLP base classifiers with feature ranking based on PCA components. When posed as a multi-class problem using Error-Correcting-Output-Coding (ECOC), experimental results on Cohn-Kanade database demonstrate that error rates comparable to two-class problems (one-versus-rest) may be obtained. The ECOC coding and decoding strategies are discussed in detail, and a novel weighted decoding approach is shown to outperform conventional ECOC decoding. Furthermore, base classifiers are tuned using the ensemble Out-of-Bootstrap estimate, for which purpose, ECOC decoding is modified. The error rates obtained for six upper face aus around the eyes are believed to be among the best for this database.
T. Windeatt, C. Zor and N.C. Camgoz, Approximation of Ensemble Boundary using Spectral Coefficients, IEEE Trans Neural Networks and Learning Systems, Volume: 30 , Issue: 4 , April 2019.
Windeatt T., Optimising Ensemble of Two-Class classifiers u Spectral Analysis, 24th International Conference on Pattern Recognition (ICPR), (2018), to pp. 1051-4651