Agents cluster local data and reach global consensus by sharing prototypes, not data, enabling private distributed learning.

Learning Without Centralisation: How Distributed Clustering Becomes a Collective Intelligence Process
In many real-world scenarios, data is not stored in a single location but distributed across multiple nodes, devices, or organisations. Each of these entities observes only a portion of the overall information, often with constraints related to communication, computation, or privacy. We address this setting by proposing a fully decentralised approach to unsupervised learning in which clustering emerges from collaboration rather than centralisation. Instead of aggregating all data into a single repository, the system is structured as a network of agents, each operating on its own local dataset. These agents are connected through a communication graph and can only exchange limited information with their neighbours. This setup reflects practical environments such as sensor networks, distributed IoT infrastructures, or multi-organisational data ecosystems, where moving raw data is either impractical or undesirable.
From Local Views to a Shared Structure
Each agent begins by independently applying a clustering algorithm to its local data. At this stage, the results are inherently partial and often inconsistent, since each node observes only a fragment of the global distribution. The key challenge is therefore not only to cluster data, but to reconcile multiple, potentially conflicting interpretations of the same underlying structure. To address this, the methodology introduces a collaborative phase in which agents exchange only compact representations of their local clusters, specifically the cluster prototypes. This design choice drastically reduces communication overhead and avoids sharing raw data, preserving privacy while enabling coordination. Through this exchange, each agent evaluates how its own clusters relate to those identified by others. Similarity measures between clusters are computed, allowing agents to detect inconsistencies and misalignments across the network. This process transforms clustering into a negotiation mechanism, where local models are continuously refined in response to neighbouring perspectives.
Resolving Conflicts Through Iterative Adaptation
A central aspect of the approach lies in how disagreements between agents are handled. When clusters do not align across nodes, two types of conflicts may arise: situations where a cluster corresponds to multiple clusters elsewhere, or cases where multiple clusters converge to a single counterpart. To resolve these inconsistencies, the system performs iterative operations such as merging and splitting clusters. These transformations are not arbitrary but guided by internal validation criteria that assess the quality of the clustering configuration. Each modification is accepted only if it improves the overall structure according to well-defined metrics of compactness and separation. This iterative refinement continues until a stable configuration is reached, where all agents agree on a coherent partitioning of the data. Importantly, this agreement is achieved without ever sharing the original datasets, relying solely on the exchange of abstract representations.
Achieving Consensus Without Data Sharing
Once conflicts are resolved, the system proceeds to a consensus phase. At this point, all agents possess clustering solutions that differ only in labelling. By applying a final alignment step based on shared prototypes, each node independently converges to a consistent global interpretation. The result is a distributed clustering model that closely approximates the outcome of a centralised approach, despite operating under strict constraints on communication and data access. Experimental evaluations confirm that the method achieves performance comparable to, and in some cases competitive with, traditional centralised algorithms, while significantly reducing communication costs and preserving data privacy.
Toward Scalable and Privacy-Preserving Learning
The proposed framework demonstrates that effective unsupervised learning does not require centralisation. By combining local computation, minimal information exchange, and iterative consensus mechanisms, it is possible to construct a global understanding of data that remains inherently distributed. This paradigm opens the door to new applications in domains where data fragmentation is the norm and privacy is a critical concern. From environmental monitoring to biomedical analysis and large-scale IoT systems, the ability to learn collaboratively without sharing raw data represents a significant step toward more scalable and responsible artificial intelligence.
Authors
A. Rosato, R. Altilio, M. Panella
July 27, 2021









