Passa al contenuto principale

A decentralized algorithm for distributed ensemble clustering

Agents cluster local data and reach global consensus by sharing prototypes, not data, enabling private distributed learning.

Learning Without Centralisation: How Distributed Clustering Becomes a Collective Intelligence Process

In many real-world scenarios, data is not stored in a single location but distributed across multiple nodes, devices, or organisations. Each of these entities observes only a portion of the overall information, often with constraints related to communication, computation, or privacy. We address this setting by proposing a fully decentralised approach to unsupervised learning in which clustering emerges from collaboration rather than centralisation. Instead of aggregating all data into a single repository, the system is structured as a network of agents, each operating on its own local dataset. These agents are connected through a communication graph and can only exchange limited information with their neighbours. This setup reflects practical environments such as sensor networks, distributed IoT infrastructures, or multi-organisational data ecosystems, where moving raw data is either impractical or undesirable.

From Local Views to a Shared Structure

Each agent begins by independently applying a clustering algorithm to its local data. At this stage, the results are inherently partial and often inconsistent, since each node observes only a fragment of the global distribution. The key challenge is therefore not only to cluster data, but to reconcile multiple, potentially conflicting interpretations of the same underlying structure. To address this, the methodology introduces a collaborative phase in which agents exchange only compact representations of their local clusters, specifically the cluster prototypes. This design choice drastically reduces communication overhead and avoids sharing raw data, preserving privacy while enabling coordination. Through this exchange, each agent evaluates how its own clusters relate to those identified by others. Similarity measures between clusters are computed, allowing agents to detect inconsistencies and misalignments across the network. This process transforms clustering into a negotiation mechanism, where local models are continuously refined in response to neighbouring perspectives.

Resolving Conflicts Through Iterative Adaptation

A central aspect of the approach lies in how disagreements between agents are handled. When clusters do not align across nodes, two types of conflicts may arise: situations where a cluster corresponds to multiple clusters elsewhere, or cases where multiple clusters converge to a single counterpart. To resolve these inconsistencies, the system performs iterative operations such as merging and splitting clusters. These transformations are not arbitrary but guided by internal validation criteria that assess the quality of the clustering configuration. Each modification is accepted only if it improves the overall structure according to well-defined metrics of compactness and separation. This iterative refinement continues until a stable configuration is reached, where all agents agree on a coherent partitioning of the data. Importantly, this agreement is achieved without ever sharing the original datasets, relying solely on the exchange of abstract representations.

Achieving Consensus Without Data Sharing

Once conflicts are resolved, the system proceeds to a consensus phase. At this point, all agents possess clustering solutions that differ only in labelling. By applying a final alignment step based on shared prototypes, each node independently converges to a consistent global interpretation. The result is a distributed clustering model that closely approximates the outcome of a centralised approach, despite operating under strict constraints on communication and data access. Experimental evaluations confirm that the method achieves performance comparable to, and in some cases competitive with, traditional centralised algorithms, while significantly reducing communication costs and preserving data privacy.

Toward Scalable and Privacy-Preserving Learning

The proposed framework demonstrates that effective unsupervised learning does not require centralisation. By combining local computation, minimal information exchange, and iterative consensus mechanisms, it is possible to construct a global understanding of data that remains inherently distributed. This paradigm opens the door to new applications in domains where data fragmentation is the norm and privacy is a critical concern. From environmental monitoring to biomedical analysis and large-scale IoT systems, the ability to learn collaboratively without sharing raw data represents a significant step toward more scalable and responsible artificial intelligence.

Autori

A. Rosato, R. Altilio, M. Panella
Luglio 27, 2021

Consigliati

Consigliati

Altri articoli da leggere
Renewable energy

A Review of the Enabling Methodologies for Knowledge Discovery from Smart Grids Data

A KDD-driven pipeline turns smart meter streams into multi-step load forecasts, benchmarking feature reduction and models.
Biomedical

Enhancing Autism Detection Through Gaze Analysis Using Eye Tracking Sensors and Data Attribution with Distillation in Deep Neural Networks

A deep learning model enhances early autism diagnosis by analyzing visual patterns with eye tracking.
Quantum computing

Quantum Generative Modeling via Straightforward State Preparation

A lightweight quantum generative model creates high-fidelity data samples with minimal parameters and efficient state preparation.
Quantum computing

Enhancing QAOA Ansatz via Multi-Parameterized Layer and Blockwise Optimization

A novel quantum-classical algorithm boosts QAOA performance with fewer layers, enabling real-world optimization on NISQ devices.
Renewable energy

A Deep Learning-based Approach for Battery Life Classification

A deep learning-based LSTM network accurately classifies battery health, optimizing energy storage and predictive maintenance.
Biomedical

An explainable fast deep neural network for emotion recognition

A fast, explainable deep neural network enhances emotion recognition by optimizing facial landmark analysis.
Renewable energy

Multi-label classification with imbalanced classes by fuzzy deep neural networks

A fuzzy deep neural network accurately classifies household appliances in real time using symbolic data and multi-label AI.
Quantum computing

Quantum enhanced knowledge distillation

Classical-to-quantum knowledge distillation boosts hybrid AI performance using efficient quantum circuits and reduced model sizes.
Quantum computing

A variational approach to quantum gated recurrent units

A faster and efficient Quantum Gated Recurrent Unit (QGRU) improves time series forecasting.
Aerospace

A Neural Network Symbolic Approach to Structural Health Monitoring in Aerospace Applications

A symbolic deep learning approach enhances structural health monitoring in aerospace achieving near-perfect damage classification.

Hai un'esigenza
specifica? 

Compila il form e parlaci del tuo progetto.
Ti proponiamo la soluzione più adatta al tuo contesto.
Impossibile salvare l'abbonamento. Riprova.
Grazie per aver inviato il modulo.

Hai un'esigenza
specifica? 

Compila il form e parlaci del tuo progetto.
Ti proponiamo la soluzione più adatta al tuo contesto.
GRID+ Copyright © 2026. All Rights Reserved.
P. IVA 17387741006 | Il capitale è stato interamente versato 10.000€ | RM – 1715269
GRID+ Copyright © 2026. All Rights Reserved.
P. IVA 17387741006 · Il capitale è stato interamente versato 10.000€ | RM – 1715269
P. IVA 17387741006 · Il capitale è stato interamente versato 10.000€ | RM – 1715269