Energy-aware FPGA architecture enables parallel fuzzy hierarchical clustering for real-time embedded intelligence.

The Challenge of Embedding Complex Clustering in Low-Power Devices
A recurring challenge in embedded intelligent systems is how to bring complex machine learning algorithms into low-power devices without sacrificing performance and accuracy. In particular, two-dimensional hierarchical clustering is a powerful tool for applications such as computer vision, medical imaging, image retrieval, and smart sensing. Yet, its computational complexity often makes efficient hardware implementation difficult in practice.
Re-Engineering Fuzzy Hierarchical Clustering for Parallel Hardware
This work addresses that tension directly: it takes an unconstrained hierarchical clustering algorithm based on fuzzy logic and membership functions and completely rethinks it from a hardware perspective. The central idea is to exploit the intrinsic parallel structure of the algorithm to design an architecture capable of massively concurrent operation, reducing both computation time and energy consumption.
Grid-Based Fuzzy Membership and Persistent Hierarchical Structures
At the core of the method is a hierarchical clustering process operating on a normalised two-dimensional grid. Each grid point is evaluated with respect to dataset patterns through fuzzy membership functions. These functions are overlapped, thresholded at multiple levels, and analysed using connected components to generate a hierarchical cluster structure. The final result emerges from analysing cluster persistence across thresholds, providing robustness even in the presence of non-convex shapes and outliers.
Mathematical Simplifications for Hardware Efficiency
The real innovation lies in the hardware adaptation. To make the algorithm compatible with resource-constrained devices, targeted mathematical modifications are introduced: quantisation of the two-dimensional space, replacement of Euclidean distance with Manhattan distance, normalisation through binary shifts, and factorisation of the skewness parameter to enable multiplications implemented via barrel shifters. Each choice is driven by the goal of reducing logical complexity and eliminating costly divisions.
Massively Parallel Architecture for Real-Time Processing
The resulting architecture evaluates membership functions in parallel across an entire row of the 2D grid. Multiple evaluation modules operate simultaneously, while registers and counters coordinate the traversal of the space. The design is fully pipeline-friendly and achieves low latency, making it suitable for real-time scenarios.
FPGA Validation on Complex Two-Dimensional Datasets
Experimental validation is performed on multiple two-dimensional datasets with heterogeneous characteristics: concave clusters, nested structures, outliers, and varying densities. The FPGA implementation on a Xilinx Zynq platform shows that, despite the introduced approximations, the discrepancy between the hardware and original versions remains extremely limited. RMSE and correlation metrics demonstrate high fidelity in reconstructing overlapped membership functions, and the clustering results match those of the reference algorithm.
Authors
G. C. Cardarilli, L. Di Nunzio, R. Fazzolari, M. Panella, M. Re, A. Rosato
October 21, 2020









