Publications | Martin Ferianc

Youssef Abdalla, Martin Ferianc, Haya Alfassam, Atheer Awad, Ruochen Qiao, Miguel Rodrigues, Mine Orlu, Abdul W. Basit, David Shorthouse

May, 2025 Advanced Intelligent Systems

A Novel Semi-Automated Pipeline for Optimizing 3D-Printed Drug Formulations

3D printing offers a promising approach to creating personalized medicines. However, costly, expertise-dependent trial-and-error methods hinder efficient drug formulation, posing challenges for tailoring treatments to individual patients. To address this, a novel pipeline is developed for 3D printing using selective laser sintering (SLS), replacing laborious steps with advanced computational methods. A differential evolution-based optimizer generates formulations for the desired drugs, while a deep learning ensemble predicts the optimal printing parameters along with associated confidence intervals. Manual handling is only required for the final formulation preparation and printing processes. The pipeline successfully generates diverse formulations, composed of a wide variety of materials and with high printability probabilities. This was validated by successfully printing 80% of the generated drug formulations and achieving 92% accuracy in predicting printing parameters. Notably, the time required to develop and print a new drug formulation is decreased to a single day. This study is the first to demonstrate a semiautomated, 3D printing drug formulation design and printing parameter selection pipeline. Furthermore, the pipeline is not limited to SLS printing but can also be adapted for the optimization of other 3D printing technologies or formulation platforms.

Reem Masoud, Ziquan Liu, Martin Ferianc, Philip C Treleaven, Miguel Rodrigues Rodrigues

January, 2025 Proceedings of the 31st International Conference on Computational Linguistics

Cultural Alignment in Large Language Models: An Explanatory Analysis Based on Hofstede’s Cultural Dimensions

The deployment of large language models (LLMs) raises concerns regarding their cultural misalignment and potential ramifications on individuals and societies with diverse cultural backgrounds. While the discourse has focused mainly on political and social biases, our research proposes a Cultural Alignment Test (Hoftedes CAT) to quantify cultural alignment using Hofstedes cultural dimension framework, which offers an explanatory cross-cultural comparison through the latent variable analysis. We apply our approach to quantitatively evaluate LLMs, namely Llama 2, GPT-3.5, and GPT-4, against the cultural dimensions of regions like the United States, China, and Arab countries, using different prompting styles and exploring the effects of language-specific fine-tuning on the models behavioural tendencies and cultural values. Our results quantify the cultural alignment of LLMs and reveal the difference between LLMs in explanatory cultural dimensions. Our study demonstrates that while all LLMs struggle to grasp cultural values, GPT-4 shows a unique capability to adapt to cultural nuances, particularly in Chinese settings. However, it faces challenges with American and Arab cultures. The research also highlights that fine-tuning LLama 2 models with different languages changes their responses to cultural questions, emphasizing the need for culturally diverse development in AI for worldwide acceptance and ethical use.

Vittorio Casagrande, Martin Ferianc, Miguel Rodrigues, Francesca Boem

November, 2024 IEEE Transactions on Control Systems Technology

Online End-to-End Learning-Based Predictive Control for Microgrid Energy Management

This article proposes an innovative Online Learning (OL) algorithm designed for efficient microgrid energy management, integrating Recurrent Neural Networks (RNNs), and Model Predictive Control (MPC) in an End-to-End (E2E) learning-based control architecture. The algorithm leverages the RNN capabilities to predict uncertain and possibly evolving profiles of electricity price, load demand, and renewable generation. These are then exploited in an integrated MPC optimization problem to minimize the overall microgrid electricity consumption cost while guaranteeing operation constraints. The proposed methodology incorporates a specifically designed online version of the Stochastic Weight Averaging (O-SWA) and Experience Replay (ER) methods to enhance OL capabilities, ensuring more robust and adaptive learning in real-time scenarios. In addition, to address the challenge of model uncertainty, a task-based loss approach is proposed by integrating the MPC optimization as a differentiable optimization layer within the Neural Network (NN), allowing the OL architecture to jointly optimize prediction and control performance. The performance of the proposed methodology is evaluated through extensive simulation results, showcasing its Transfer Learning (TL) capabilities across different microgrid sites, which are crucial for deployment in real microgrids. We finally show that our OL algorithm can be used to estimate the prediction uncertainty of the unknown profiles.

Martin Ferianc

July, 2024

Making Neural Networks Confidence-Calibrated and Practical

Neural networks (NNs) have become powerful tools due to their predictive accuracy. However, NNs’ real-world applicability depends on accuracy and the alignment between confidence and accuracy, known as confidence calibration. Bayesian NNs (BNNs) and NN ensembles achieve good confidence calibration but are computationally expensive. In contrast, pointwise NNs are computationally efficient but poorly calibrated. Addressing these issues, this thesis proposes methods to enhance confidence calibration while maintaining or improving computational efficiency. For users preferring pointwise NNs, we propose methodology for regularising the NNs’ training by using single or multiple artificial noises to improve confidence calibration and accuracy relative to standard training up to 12% without additional operations at runtime. For users able to modify the NN architecture, we propose the Single Architecture Ensemble (SAE) framework, which generalises multi-input and multi-exit architectures to embed multiple predictors into a single NN, emulating an ensemble, maintaining or improving confidence calibration and accuracy while reducing the number of compute operations or parameters by 1.5 to 3.7 times. For users who already trained an NN ensemble, we propose knowledge distillation to transfer the ensemble’s predictive distribution to a single NN, marginally improving confidence calibration and accuracy, while halving the number of parameters or compute operations. We proposed uniform quantisation for BNNs, and benchmarked its impact on confidence calibration of pointwise NNs and BNNs, showing that e.g. 8-bit quantisation does not harm confidence calibration, but it reduces the memory footprint by 4 times in comparison to 32-bit floating-point precision. Lastly, we proposed an optimisation framework and a Dropout block to enable BNNs on existing field-programmable gate array-based accelerators, improving their inference latency or energy efficiency 2 to 100 times and algorithmic performance across tasks. This thesis presents methods to reduce NNs’ computational costs while maintaining or improving their algorithmic performance, making confidence-calibrated NNs practical in real-world applications.

Youssef Abdalla, Martin Ferianc, Atheer Awad, Jeesu Kim, Moe Elbadawi, Abdul W. Basit, Mine Orlu, Miguel Rodrigues

July, 2024 International Journal of Pharmaceutics

Smart Laser Sintering: Deep Learning-Powered Powder Bed Fusion 3D Printing in Precision Medicine

Medicines remain ineffective for over 50% of patients due to conventional mass production methods with fixed drug dosages. Three-dimensional (3D) printing, specifically selective laser sintering (SLS), offers a potential solution to this challenge, allowing the manufacturing of small, personalized batches of medication. Despite its simplicity and suitability for upscaling to large-scale production, SLS was not designed for pharmaceutical manufacturing and necessitates a time-consuming, trial-and-error adaptation process. In response, this study introduces a deep learning model trained on a variety of features to identify the best feature set to represent drugs and polymeric materials for the prediction of the printability of drug-loaded formulations using SLS. The proposed model demonstrates success by achieving 90% accuracy in predicting printability. Furthermore, explainability analysis unveils materials that facilitate SLS printability, offering invaluable insights for scientists to optimize SLS formulations, which can be expanded to other disciplines. This represents the first study in the field to develop an interpretable, uncertainty-optimized deep learning model for predicting the printability of drug-loaded formulations. This paves the way for accelerating formulation development, propelling us into a future of personalized medicine with unprecedented manufacturing precision.

Vittorio Casagrande, Martin Ferianc, Miguel Rodrigues, Francesca Boem

July, 2024 2024 IFAC Safeprocess

Learning-based MPC with uncertainty estimation for resilient microgrid energy management

To enhance fault resilience in microgrid systems at the energy management level, this paper introduces a novel proactive scheduling algorithm, based on uncertainty modelling thanks to a specifically designed neural network. The algorithm is trained and deployed online and it estimates uncertainties in predicting future load demands and other relevant profiles. We integrate the novel learning algorithm with a stochastic model predictive control, enabling the microgrid to store sufficient energy to adaptively deal with possible faults. Experimental results show that a reliable estimation of the unknown profiles’ mean and variance is obtained, improving the robustness of proactive scheduling strategies against uncertainties.

Martin Ferianc, Ondrej Bohdal, Timothy Hospedales, Miguel Rodrigues

April, 2024 Transactions on Machine Learning Research

Navigating Noise: A Study of How Noise Influences Generalisation and Calibration of Neural Networks

Enhancing the generalisation abilities of neural networks (NNs) through integrating noise such as MixUp or Dropout during training has emerged as a powerful and adaptable technique. Despite the proven efficacy of noise in NN training, there is no consensus regarding which noise sources, types and placements yield maximal benefits in generalisation and confidence calibration. This study thoroughly explores diverse noise modalities to evaluate their impacts on NN’s generalisation and calibration under in-distribution or out-of-distribution settings, paired with experiments investigating the metric landscapes of the learnt representations, across a spectrum of NN architectures, tasks, and datasets. Our study shows that AugMix and weak augmentation exhibit cross-task effectiveness in computer vision, emphasising the need to tailor noise to specific domains. Our findings emphasise the efficacy of combining noises and successful hyperparameter transfer within a single domain but the difficulties in transferring the benefits to other domains. Furthermore, the study underscores the complexity of simultaneously optimising for both generalisation and calibration, emphasising the need for practitioners to carefully consider noise combinations and hyperparameter tuning for optimal performance in specific tasks and datasets.

Martin Ferianc, Hongxiang Fan, Miguel R. D. Rodrigues

January, 2024 35th British Machine Vision Conference 2024, BMVC 2024, Glasgow, UK, November 25-28, 2024

SAE: Single Architecture Ensemble Neural Networks

Ensembles of separate neural networks (NNs) have shown superior accuracy and confidence calibration over single NN across tasks. To improve the hardware efficiency of ensembles of separate NNs, recent methods create ensembles within a single network via adding early exits or considering multi input multi output approaches. However, it is unclear which of these methods is the most effective for a given task, needing a manual and separate search through each method. Our novel Single Architecture Ensemble (SAE) framework enables an automatic and joint search through the early exit and multi input multi output configurations and their previously unobserved in-between combinations. SAE consists of two parts: a scalable search space that generalises the previous methods and their in-between configurations, and an optimisation objective that allows learning the optimal configuration for a given task. Our image classification and regression experiments show that with SAE we can automatically find diverse configurations that fit the task, achieving competitive accuracy or confidence calibration to baselines while reducing the compute operations or parameter count by up to 1.5∼3.7×.

Reem I Masoud, Martin Ferianc, Philip Colin Treleaven, Miguel RD Rodrigues

January, 2024 Workshop on Socially Responsible Language Modelling Research

LLM Alignment Using Soft Prompt Tuning: The Case of Cultural Alignment

Large Language Model (LLM) alignment traditionally relies on supervised fine-tuning or alignment frameworks such as Kullback-Leibler (KL) regularization and reward models. These methods typically require labeled or preference datasets and involve updating model weights to align the LLM with the training objective or reward model. In the realm of cultural alignment, the non-differentiable nature of cultural dimensions renders these methods infeasible. To overcome this, we propose a scalable strategy that combines soft prompt tuning—which freezes the model parameters while modifying the input prompt embeddings—with Differential Evolution (DE), a black-box optimization method for cases where a differentiable objective is unattainable. This strategy ensures alignment consistency without the need for preference data or model parameter updates, significantly enhancing efficiency and mitigating overfitting. Our empirical findings indicate marked advancements in aligning LLM behavior within intricate cultural contexts, demonstrating the proposed method’s practicality and effectiveness. This work contributes to closing the gap between computational models and the complexities of human culture, offering a significant step forward in the nuanced alignment of LLMs across diverse human contexts.

Xiaoliang Luo, Akilles Rechardt, Guangzhi Sun, Kevin K Nejad, Felipe Yáñez, Bati Yilmaz, Kangjoo Lee, Alexandra O Cohen, Valentina Borghesani, Anton Pashkov, others

January, 2024 Nature human behaviour

Large language models surpass human experts in predicting neuroscience results

Scientific discoveries often hinge on synthesizing decades of research, a task that potentially outstrips human information processing capacities. Large language models (LLMs) offer a solution. LLMs trained on the vast scientific literature could potentially integrate noisy yet interrelated findings to forecast novel results better than human experts. Here, to evaluate this possibility, we created BrainBench, a forward-looking benchmark for predicting neuroscience results. We find that LLMs surpass experts in predicting experimental outcomes. BrainGPT, an LLM we tuned on the neuroscience literature, performed better yet. Like human experts, when LLMs indicated high confidence in their predictions, their responses were more likely to be correct, which presages a future where LLMs assist humans in making discoveries. Our approach is not neuroscience specific and is transferable to other knowledge-intensive endeavours.

Martin Wistuba, Martin Ferianc, Lukas Balles, Cédric Archambeau, Giovanni Zappella

January, 2023 2023 Continual Learning in Computer Vision (CLCV) workshop at CVPR 2023

Renate: A library for real-world continual learning

Continual learning enables the incremental training of machine learning models on non-stationary data streams.While academic interest in the topic is high, there is little indication of the use of state-of-the-art continual learning algorithms in practical machine learning deployment. This paper presents Renate, a continual learning library designed to build real-world updating pipelines for PyTorch models. We discuss requirements for the use of continual learning algorithms in practice, from which we derive design principles for Renate. We give a high-level description of the library components and interfaces. Finally, we showcase the strengths of the library by presenting experimental results. Renate may be found at https://github.com/awslabs/renate.

Martin Ferianc, Miguel Rodrigues

January, 2023 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

MIMMO: Multi-Input Massive Multi-Output Neural Network

Neural networks (NNs) have achieved superhuman accuracy in multiple tasks, but NNs predictions’ certainty is often debatable, especially if confronted with out of training distribution data. Averaging predictions of an ensemble of NNs can recalibrate the certainty of the predictions, but an ensemble is computationally expensive to deploy in practice. Recently, a new hardware-efficient multi-input multi-output (MIMO) NN was proposed to fit an ensemble of independent NNs into a single NN. In this work, we propose the addition of early-exits to the MIMO architecture with inferred depth-wise weightings to produce multiple predictions for the same input, giving a more diverse ensemble. We denote this combination as MIMMO: a multi-input, massive multi-output NN and we show that it can achieve better accuracy and calibration compared to the MIMO NN, simultaneously fit more NNs and be similarly hardware efficient as MIMO or the early-exit ensemble.

Martin Ferianc, Ondrej Bohdal, Timothy Hospedales, Miguel Rodrigues

January, 2023 ICML 2023 Workshop on Spurious Correlations, Invariance, and Stability

Impact of Noise on Calibration and Generalisation of Neural Networks

Noise injection and data augmentation strategies have been effective for enhancing the generalisation and robustness of neural networks (NNs). Certain types of noise such as label smoothing and MixUp have also been shown to improve calibration. Since noise can be added in various stages of the NN’s training, it motivates the question of when and where the noise is the most effective. We study a variety of noise types to determine how much they improve calibration and generalisation, and under what conditions. More specifically we evaluate various noise-injection strategies in both in-distribution (ID) and out-of-distribution (OOD) scenarios. The findings highlight that activation noise was the most transferable and effective in improving generalisation, while input augmentation noise was prominent in improving calibration on OOD but not necessarily ID data.

Vittorio Casagrande, Martin Ferianc, Miguel Rodrigues, Francesca Boem

January, 2023 2023 31st Mediterranean Conference on Control and Automation (MED)

An Online Learning Method for Microgrid Energy Management Control

We propose a novel Model Predictive Control (MPC) scheme based on online-learning (OL) for microgrid energy management, where the control optimisation is embedded as the last layer of the neural network. The proposed MPC scheme deals with uncertainty on the load and renewable generation power profiles and on electricity prices, by employing the predictions provided by an online trained neural network in the optimisation problem. In order to adapt to possible changes in the environment, the neural network is online trained based on continuously received data. The network hyperparameters are selected by performing a hyperparameter optimisation before the deployment of the controller, using a pretraining dataset. We show the effectiveness of the proposed method for microgrid energy management through extensive experiments on real microgrid datasets. Moreover, we show that the proposed algorithm has good transfer learning (TL) capabilities among different microgrids.

Martin Ferianc, Miguel Rodrigues

May, 2022 ICML 2022 Workshop on Distribution-Free Uncertainty Quantification

Simple Regularisation for Uncertainty-Aware Knowledge Distillation

Considering uncertainty estimation of modern neural networks (NNs) is one of the most important steps towards deploying machine learning systems to meaningful real-world applications such as in medicine, finance or autonomous systems. At the moment, ensembles of different NNs constitute the state-of-the-art in both accuracy and uncertainty estimation in different tasks. However, ensembles of NNs are unpractical under real-world constraints, since their computation and memory consumption scale linearly with the size of the ensemble, which increase their latency and deployment cost. In this work, we examine a simple regularisation approach for distribution-free knowledge distillation of ensemble of machine learning models into a single NN. The aim of the regularisation is to preserve the diversity, accuracy and uncertainty estimation characteristics of the original ensemble without any intricacies, such as fine-tuning. We demonstrate the generality of the approach on combinations of toy data, SVHN/CIFAR-10, simple to complex NN architectures and different tasks.

Hongxiang Fan, Martin Ferianc, Zhiqiang Que, Shuanglong Liu, Xinyu Niu, Miguel Rodrigues, Wayne Luk

April, 2022 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

FPGA-based Acceleration for Bayesian Convolutional Neural Networks

Neural networks (NNs) have demonstrated their potential in a variety of domains ranging from computer vision to natural language processing. Among various NNs, two-dimensional (2D) and three-dimensional (3D) convolutional neural networks (CNNs) have been widely adopted for a broad spectrum of applications such as image classification and video recognition, due to their excellent capabilities in extracting 2D and 3D features. However, standard 2D and 3D CNNs are not able to capture their model uncertainty which is crucial for many safety-critical applications including healthcare and autonomous driving. In contrast, Bayesian convolutional neural networks (BayesCNNs), as a variant of CNNs, have demonstrated their ability to express uncertainty in their prediction via a mathematical grounding. Nevertheless, BayesCNNs have not been widely used in industrial practice due to their compute requirements stemming from sampling and subsequent forward passes through the whole network multiple times. As a result, these requirements significantly increase the amount of computation and memory consumption in comparison to standard CNNs. This paper proposes a novel FPGA-based hardware architecture to accelerate both 2D and 3D BayesCNNs based on Monte Carlo Dropout. Compared with other state-of-the-art accelerators for BayesCNNs, the proposed design can achieve up to 4 times higher energy efficiency and 9 times better compute efficiency. An automatic framework capable of supporting partial Bayesian inference is proposed to explore the trade-off between algorithm and hardware performance. Extensive experiments are conducted to demonstrate that our framework can effectively find the optimal implementations in the design space.

Hongxiang Fan, Martin Ferianc, Zhiqiang Que, Xinyu Niu, Miguel Rodrigues, Wayne Luk

April, 2022 IEEE Transactions on Parallel and Distributed Systems

Accelerating Bayesian Neural Networks via Algorithmic and Hardware Optimizations

Bayesian neural networks (BayesNNs) have demonstrated their advantages in various safety-critical applications, such as autonomous driving or healthcare, due to their ability to capture and represent model uncertainty. However, standard BayesNNs require to be repeatedly run because of Monte Carlo sampling to quantify their uncertainty, which puts a burden on their real-world hardware performance. To address this performance issue, this paper systematically exploits the extensive structured sparsity and redundant computation in BayesNNs. Different from the unstructured or structured sparsity existing in standard convolutional NNs, the structured sparsity of BayesNNs is introduced by Monte Carlo Dropout and its associated sampling required during uncertainty estimation and prediction, which can be exploited through both algorithmic and hardware optimizations. We first classify the observed sparsity patterns into three categories: dropout sparsity, layer sparsity and sample sparsity. On the algorithmic side, a framework is proposed to automatically explore these three sparsity categories without sacrificing algorithmic performance. We demonstrated that structured sparsity can be exploited to accelerate CPU designs by up to 49 times, and GPU designs by up to 40 times. On the hardware side, a novel hardware architecture is proposed to accelerate BayesNNs, which achieves a high hardware performance using the runtime adaptable hardware engines and the intelligent skipping support. Upon implementing the proposed hardware design on an FPGA, our experiments demonstrated that the algorithm-optimized BayesNNs can achieve up to 56 times speedup when compared with unoptimized Bayesian nets. Comparing with the optimized GPU implementation, our FPGA design achieved up to 7.6 times speedup and up to 39.3 times higher energy efficiency

Hongxiang Fan, Martin Ferianc, Wayne Luk

January, 2022 Proceedings of the 59th ACM/IEEE Design Automation Conference

Enabling fast uncertainty estimation: accelerating bayesian transformers via algorithmic and hardware optimizations

Quantifying the uncertainty of neural networks (NNs) has been required by many safety-critical applications such as autonomous driving or medical diagnosis. Recently, Bayesian transformers have demonstrated their capabilities in providing high-quality uncertainty estimates paired with excellent accuracy. However, their real-time deployment is limited by the compute-intensive attention mechanism that is core to the transformer architecture, and the repeated Monte Carlo sampling to quantify the predictive uncertainty. To address these limitations, this paper accelerates Bayesian transformers via both algorithmic and hardware optimizations. On the algorithmic level, an evolutionary algorithm (EA)-based framework is proposed to exploit the sparsity in Bayesian transformers and ease their computational workload. On the hardware level, we demonstrate that the sparsity brings hardware performance improvement on our optimized CPU and GPU implementations. An adaptable hardware architecture is also proposed to accelerate Bayesian transformers on an FPGA. Extensive experiments demonstrate that the EA-based framework, together with hardware optimizations, reduce the latency of Bayesian transformers by up to 13, 12 and 20 times on CPU, GPU and FPGA platforms respectively, while achieving higher algorithmic performance.

Hongxiang Fan, Martin Ferianc, Zhiqiang Que, He Li, Shuanglong Liu, Xinyu Niu, Wayne Luk

January, 2022 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)

Algorithm and Hardware Co-design for Reconfigurable CNN Accelerator

Recent advances in algorithm-hardware co-design for deep neural networks (DNNs) have demonstrated their potential in automatically designing neural architectures and hardware designs. Nevertheless, it is still a challenging optimization problem due to the expensive training cost and the time-consuming hardware implementation, which makes the exploration on the vast design space of neural architecture and hardware design intractable. In this paper, we demonstrate that our proposed approach is capable of locating designs on the Pareto frontier. This capability is enabled by a novel three-phase co-design framework, with the following new features: (a) decoupling DNN training from the design space exploration of hardware architecture and neural architecture, (b) providing a hardware-friendly neural architecture space by considering hardware characteristics in constructing the search cells, (c) adopting Gaussian process to predict accuracy, latency and power consumption to avoid time-consuming synthesis and place-and-route processes. In comparison with the manually-designed ResNet101, InceptionV2 and MobileNetV2, we can achieve up to 5% higher accuracy with up to $3times$ speed up on the ImageNet dataset. Compared with other state-of-the-art co-design frameworks, our found network and hardware configuration can achieve 2% (~ 6% higher accuracy, $2times∼ 26times$ smaller latency and $8.5times$ higher energy efficiency.

Martin Ferianc, Zhiqiang Que, Hongxiang Fan, Wayne Luk, Miguel Rodrigues

December, 2021 2021 International Conference on Field-Programmable Technology (ICFPT)

Optimizing Bayesian Recurrent Neural Networks on an FPGA-based Accelerator

Neural networks have demonstrated their outstanding performance in a wide range of tasks. Specifically recurrent architectures based on long-short term memory (LSTM) cells have manifested excellent capability to model time dependencies in real-world data. However, standard recurrent architectures cannot estimate their uncertainty which is essential for safety-critical applications such as in medicine. In contrast, Bayesian recurrent neural networks (RNNs) are able to provide uncertainty estimation with improved accuracy. Nonetheless, Bayesian RNNs are computationally and memory demanding, which limits their practicality despite their advantages. To address this issue, we propose an FPGA-based hardware design to accelerate Bayesian LSTM-based RNNs. To further improve the overall algorithmic-hardware performance, a co-design framework is proposed to explore the most fitting algorithmic-hardware configurations for Bayesian RNNs. We conduct extensive experiments on healthcare applications to demonstrate the improvement of our design and the effectiveness of our framework. Compared with GPU implementation, our FPGA-based design can achieve up to 10 times speedup with nearly 106 times higher energy efficiency. To the best of our knowledge, this is the first work targeting acceleration of Bayesian RNNs on FPGAs.

Hongxiang Fan, Martin Ferianc, Miguel Rodrigues, Hongyu Zhou, Xinyu Niu, Wayne Luk

December, 2021 2021 58th ACM/IEEE Design Automation Conference (DAC)

High-Performance FPGA-based Accelerator for Bayesian Neural Networks

Neural networks (NNs) have demonstrated their potential in a wide range of applications such as image recognition, decision making or recommendation systems. However, standard NNs are unable to capture their model uncertainty which is crucial for many safety-critical applications including healthcare and autonomous vehicles. In comparison, Bayesian neural networks (BNNs) are able to express uncertainty in their prediction via a mathematical grounding. Nevertheless, BNNs have not been as widely used in industrial practice, mainly because of their expensive computational cost and limited hardware performance. This work proposes a novel FPGA based hardware architecture to accelerate BNNs inferred through Monte Carlo Dropout. Compared with other state-of-the-art BNN accelerators, the proposed accelerator can achieve up to 4 times higher energy efficiency and 9 times better compute efficiency. Considering partial Bayesian inference, an automatic framework is proposed, which explores the trade-off between hardware and algorithmic performance. Extensive experiments are conducted to demonstrate that our proposed framework can effectively find the optimal points in the design space.

Shuanglong Liu, Hongxiang Fan, Martin Ferianc, Xinyu Niu, Huifeng Shi, Wayne Luk

February, 2021 IEEE Transactions on Neural Networks and Learning Systems

Toward Full-Stack Acceleration of Deep Convolutional Neural Networks on FPGAs

Due to the huge success and rapid development of convolutional neural networks (CNNs), there is a growing demand for hardware accelerators that accommodate a variety of CNNs to improve their inference latency and energy efficiency, in order to enable their deployment in real-time applications. Among popular platforms, field-programmable gate arrays (FPGAs) have been widely adopted for CNN acceleration because of their capability to provide superior energy efficiency and low-latency processing, while supporting high reconfigurability, making them favorable for accelerating rapidly evolving CNN algorithms. This article introduces a highly customized streaming hardware architecture that focuses on improving the compute efficiency for streaming applications by providing full-stack acceleration of CNNs on FPGAs. The proposed accelerator maps most computational functions, that is, convolutional and deconvolutional layers into a singular unified module, and implements the residual and concatenative connections between the functions with high efficiency, to support the inference of mainstream CNNs with different topologies. This architecture is further optimized through exploiting different levels of parallelism, layer fusion, and fully leveraging digital signal processing blocks (DSPs). The proposed accelerator has been implemented on Intel’s Arria 10 GX1150 hardware and evaluated with a wide range of benchmark models. The results demonstrate a high performance of over 1.3 TOP/s of throughput, up to 97% of compute [multiply-accumulate (MAC)] efficiency, which outperforms the state-of-the-art FPGA accelerators.

Martin Ferianc, Hongxiang Fan, Miguel Rodrigues

January, 2021 arXiv pre-print

VINNAS: Variational Inference-based Neural Network Architecture Search

In recent years, neural architecture search (NAS) has received intensive scientific and industrial interest due to its capability of finding a neural architecture with high accuracy for various artificial intelligence tasks such as image classification or object detection. In particular, gradient-based NAS approaches have become one of the more popular approaches thanks to their computational efficiency during the search. However, these methods often experience a mode collapse, where the quality of the found architectures is poor due to the algorithm resorting to choosing a single operation type for the entire network, or stagnating at a local minima for various datasets or search spaces. To address these defects, we present a differentiable variational inference-based NAS method for searching sparse convolutional neural networks. Our approach finds the optimal neural architecture by dropping out candidate operations in an over-parameterised supergraph using variational dropout with automatic relevance determination prior, which makes the algorithm gradually remove unnecessary operations and connections without risking mode collapse. The evaluation is conducted through searching two types of convolutional cells that shape the neural network for classifying different image datasets. Our method finds diverse network cells, while showing state-of-the-art accuracy with up to almost 2 times fewer non-zero parameters.

Martin Ferianc, Partha Maji, Matthew Mattina, Miguel Rodrigues

January, 2021 Uncertainty in Artificial Intelligence

On the effects of quantisation on model uncertainty in Bayesian neural networks

Bayesian neural networks (BNNs) are making significant progress in many research areas where decision-making needs to be accompanied by uncertainty estimation. Being able to quantify uncertainty while making decisions is essential for understanding when the model is over-/under-confident, and hence BNNs are attracting interest in safety-critical applications, such as autonomous driving, healthcare, and robotics. Nevertheless, BNNs have not been as widely used in industrial practice, mainly because of their increased memory and compute costs. In this work, we investigate quantisation of BNNs by compressing 32-bit floating-point weights and activations to their integer counterparts, that has already been successful in reducing the compute demand in standard pointwise neural networks. We study three types of quantised BNNs, we evaluate them under a wide range of different settings, and we empirically demonstrate that a uniform quantisation scheme applied to BNNs does not substantially decrease their quality of uncertainty estimation.

Martin Ferianc, Anush Sankaran, Olivier Mastropietro, Ehsan Saboori, Quentin Cappart

January, 2021 The AAAI-22 Workshop on Information-Theoretic Methods for Causal Inference and Discovery

On Causal Inference for Data-free Structured Pruning

Neural networks (NNs) are making a large impact both on research and industry. Nevertheless, as NNs’ accuracy increases, it is followed by an expansion in their size, required number of compute operations and energy consumption. Increase in resource consumption results in NNs’ reduced adoption rate and real-world deployment impracticality. Therefore, NNs need to be compressed to make them available to a wider audience and at the same time decrease their runtime costs. In this work, we approach this challenge from a causal inference perspective, and we propose a scoring mechanism to facilitate structured pruning of NNs. The approach is based on measuring mutual information under a maximum entropy perturbation, sequentially propagated through the NN. We demonstrate the method’s performance on two datasets and various NNs’ sizes, and we show that our approach achieves competitive performance under challenging conditions.

Martin Ferianc, Hongxiang Fan, Divyansh Manocha, Hongyu Zhou, Shuanglong Liu, Xinyu Niu, Wayne Luk

January, 2021 Electronics

Improving Performance Estimation for Design Space Exploration for Convolutional Neural Network Accelerators

Contemporary advances in neural networks (NNs) have demonstrated their potential in different applications such as in image classification, object detection or natural language processing. In particular, reconfigurable accelerators have been widely used for the acceleration of NNs due to their reconfigurability and efficiency in specific application instances. To determine the configuration of the accelerator, it is necessary to conduct design space exploration to optimize the performance. However, the process of design space exploration is time consuming because of the slow performance evaluation for different configurations. Therefore, there is a demand for an accurate and fast performance prediction method to speed up design space exploration. This work introduces a novel method for fast and accurate estimation of different metrics that are of importance when performing design space exploration. The method is based on a Gaussian process regression model parametrised by the features of the accelerator and the target NN to be accelerated. We evaluate the proposed method together with other popular machine learning based methods in estimating the latency and energy consumption of our implemented accelerator on two different hardware platforms targeting convolutional neural networks. We demonstrate improvements in estimation accuracy, without the need for significant implementation effort or tuning.

Martin Ferianc, Divyansh Manocha, Hongxiang Fan, Miguel Rodrigues

January, 2021 Artificial Neural Networks and Machine Learning – ICANN 2021

ComBiNet: Compact Convolutional Bayesian Neural Network for Image Segmentation

Fully convolutional U-shaped neural networks have largely been the dominant approach for pixel-wise image segmentation. In this work, we tackle two defects that hinder their deployment in real-world applications: 1) Predictions lack uncertainty quantification that may be crucial to many decision-making systems; 2) Large memory storage and computational consumption demanding extensive hardware resources. To address these issues and improve their practicality we demonstrate a few-parameter compact Bayesian convolutional architecture, that achieves a marginal improvement in accuracy in comparison to related work using significantly fewer parameters and compute operations. The architecture combines parameter-efficient operations such as separable convolutions, bilinear interpolation, multi-scale feature propagation and Bayesian inference for per-pixel uncertainty quantification through Monte Carlo Dropout. The best performing configurations required fewer than 2.5 million parameters on diverse challenging datasets with few observations.

Hongxiang Fan, Martin Ferianc, Shuanglong Liu, Zhiqiang Que, Xinyu Niu, Wayne Luk

October, 2020 2020 IEEE 38th International Conference on Computer Design (ICCD)

Optimizing FPGA-Based CNN Accelerator Using Differentiable Neural Architecture Search

Neural architecture search (NAS) aims to find the optimal neural network automatically for different scenarios. Among various NAS methods, the differentiable NAS (DNAS) approach has demonstrated its effectiveness in terms of searching cost and final accuracy. However, most of previous efforts focus on applying DNAS to GPU or CPU platforms, and its potential is less exploited on the FPGA. In this paper, we first propose a novel FPGA-based CNN accelerator. An accurate performance model of the proposed hardware design is also introduced. To improve accuracy as well as hardware performance, we then apply DNAS and encapsulate the proposed performance model into the objective function. Based on our FPGA design and NAS method, the experiments demonstrate that the network generated by NAS achieves nearly 95% accuracy on CIFAR-10, while decreasing latency by nearly 12 times compared with existing work.

Martin Ferianc, Hongxiang Fan, Ringo S. W. Chu, Jakub Stano, Wayne Luk

January, 2020 Applied Reconfigurable Computing. Architectures, Tools, and Applications

Improving Performance Estimation for FPGA-Based Accelerators for Convolutional Neural Networks

Field-programmable gate array (FPGA) based accelerators are being widely used for acceleration of convolutional neural networks (CNNs) due to their potential in improving the performance and reconfigurability for specific application instances. To determine the optimal configuration of an FPGA-based accelerator, it is necessary to explore the design space and an accurate performance prediction plays an important role during the exploration. This work introduces a novel method for fast and accurate estimation of latency based on a Gaussian process parametrised by an analytic approximation and coupled with runtime data. The experiments conducted on three different CNNs on an FPGA-based accelerator on Intel Arria 10 GX 1150 demonstrated a 30.7% improvement in accuracy with respect to the mean absolute error in comparison to a standard analytic method in leave-one-out cross-validation.

Hongxiang Fan, Gang Wang, Martin Ferianc, Xinyu Niu, Wayne Luk

December, 2019 2019 International Conference on Field-Programmable Technology (ICFPT)

Static Block Floating-Point Quantization for Convolutional Neural Networks on FPGA

Convolutional neural networks (CNNs) have been widely applied in various computer vision and speech processing applications. However, the algorithmic complexity of CNNs hinders their deployment in embedded systems with limited memory and computational resources. This paper proposes static block floating-point (BFP) quantization, an effective approach involving Kullback-Leibler divergence, to determine the static shared exponents. Without need for retraining, the proposed approach is able to quantize CNNs to 8 bits with negligible accuracy loss. An FPGA-based hardware design with static BFP quantization is also proposed. Compared with 8-bit integer linear quantization, our experiments show that the hardware kernel based on static BFP quantization can achieve over 50% reduction in logic resources on an FPGA. Based on static BFP quantization, a tool implemented in the PyTorch framework is developed, which can automatically generate optimised configuration according to user requirements for given CNN models, where the entire optimization process takes only a few minutes on an Intel Xeon Silver 4110 CPU.

Hongxiang Fan, Cheng Luo, Chenglong Zeng, Martin Ferianc, Zhiqiang Que, Shuanglong Liu, Xinyu Niu, Wayne Luk

July, 2019 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

F-E3D: FPGA-based Acceleration of an Efficient 3D Convolutional Neural Network for Human Action Recognition

Three-dimensional convolutional neural networks (3D CNNs) have demonstrated their outstanding classification accuracy for human action recognition (HAR). However, the large number of computations and parameters in 3D CNNs limits their deployability in real-life applications. To address this challenge, this paper adopts an algorithm-hardware co-design method by proposing an efficient 3D CNN building unit called 3D-1 bottleneck residual block (3D-1 BRB) at the algorithm level, and a corresponding FPGA-based hardware architecture called F-E3D at the hardware level. Based on 3D-1 BRB, a novel 3D CNN model called E3DNet is developed, which achieves nearly 37 times reduction in model size and 5% improvement in accuracy compared to standard 3D CNNs on the UCF101 dataset. Together with several hardware optimizations, including 3D fused BRB, online blocking and kernel reuse, the proposed F-E3D is nearly 13 times faster than a previous FPGA design for 3D CNNs, with performance and accuracy comparable to other state-of-the-art 3D CNN models on GPU platforms while requiring only 7% of their energy consumption.

Hongxiang Fan, Shuanglong Liu, Martin Ferianc, Ho-Cheung Ng, Zhiqiang Que, Shen Liu, Xinyu Niu, Wayne Luk

December, 2018 2018 International Conference on Field-Programmable Technology (FPT)

A Real-Time Object Detection Accelerator with Compressed SSDLite on FPGA

Convolutional neural network (CNN)-based object detection has been widely employed in various applications such as autonomous driving and intelligent video surveillance. However, the computational complexity of conventional convolution hinders its application in embedded systems. Recently, a mobile-friendly CNN model SSDLite-MobileNetV2 (SSDLiteM2) has been proposed for object detection. This model consists of a novel layer called bottleneck residual block (BRB). Although SSDLiteM2 contains far fewer parameters and computations than conventional CNN models, its performance on embedded devices still cannot meet the requirements of real-time processing. This paper proposes a novel FPGA-based architecture for SSDLiteM2 in combination with hardware optimizations including fused BRB, processing element (PE) sharing and load-balanced channel pruning. Moreover, a novel quantization scheme called partial quantization has been developed, which partially quantizes SSDLiteM2 to 8 bits with only 1.8% accuracy loss. Experiments show that the proposed design on a Xilinx ZC706 device can achieve up to 65 frames per second with 20.3 mean average precision on the COCO dataset.