Sergio Gómez   
Departament d'Enginyeria Informàtica i Matemàtiques
Universitat Rovira i Virgili

Software

mdendro

mdendro is a R package for the calculation of agglomerative hierarchical clustering (AHC). Its function linkage extends the funcionality of the standard functions hclust (in package stats) and agnes (in package cluster) in several significant ways, thus being a convenient replacement for them.

Among the advantages of mdendro we may enumerate the following: native handling of both similarity and dissimilarity (distances) matrices; calculation of pair-group dendrograms and variable-group multidendrograms; implementation of the most common AHC methods in both weighted and unweighted forms; implementation of two additional parametric families of methods, versatile linkage, and beta flexible; calculation of the cophenetic (or ultrametric) matrix; calculation of five descriptors of the final dendrogram; plots of the descriptors for the parametric methods.

Go to mdendro page.

MultiDendrograms

MultiDendrograms is a program to make the Hierarchical Clustering of real data. It implements variable-group algorithms to solve the non-uniqueness problem found in the standard pair-group algorithms and implementations. This problem arises when two or more minimum distances between different clusters are equal during the amalgamation process. The standard approach consists in choosing a pair, breaking the ties between distances, and proceeds in the same way until the final hierarchical classification is obtained. However, different clusterings are possible depending on the criterion used to break the ties (usually a pair is just chosen at random!).

The variable-group algorithms group more than two clusters at the same time when ties occur, given rise to a graphical representation called multidendrogram. Their main properties are: when there are no ties, the variable-group algorithms give the same results as the pair-group ones; they always give a uniquely determined solution; in the multidendrogram representation for the results one can explicitly observe the occurrence of ties during the agglomerative process, and the height of any fusion interval (the bands in the program) indicates the degree of heterogeneity inside the corresponding cluster.

Go to MultiDendrograms page.

Radatools

Radatools is a set of freely distributed programs to analyze Complex Networks. In particular, it includes programs for Communities Detection, Mesoscales Determination, calculation of Network Properties, and general tools for the manipulation of Networks and Partitions. There are also several programs not strictly related with networks, standing out one for Agglomerative Hierarchical Clustering using MultiDendrograms and Binary Dendrograms.

Radatools is just a set of binary executable programs whose source code is available in Radalib. Radalib is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License version 2.1 as published by the Free Software Foundation.

Go to Radatools page.

Radalib

Radalib is a library we have continuously been updating for our research within the Alephsys research group, led by Alex Arenas, at Universitat Rovira i Virgili (URV), Tarragona, since 2004. Previous experience showed that the continuous reuse of code was a painful task, thus we decided to be more structured and separate general purpose code (e.g. manipulation of networks and partitions) from the specific details of particular applications (e.g. Monte Carlo simulation of epidemic spreading). The result was the development of a general purpose library, mostly devoted to complex networks, and developed around abstract data types. This means the types are defined as "private", with public subprograms operating on them and encapsulating their implementations, thus allowing for future enhancements without having to modify the programs already using them. We could have used object oriented programming, but we believe polymorphism and inheritance are basically useless for this kind of applications.

The selected language was Ada, for several reasons: performance (it is a compiled language, not interpreted), readable code, support for abstract data types, strict data type system (allows catching many errors at compile time), advanced support of generics, high level support for concurrent programming (just in case it is needed), availability of high quality compilers for the main platforms (Windows, Linux, MacOS), and the confidence in your code when using it. The main drawback was the absence of code from other people we could reuse, but that was not a problem since we wanted full control and detailed understanding of every line of code used for our research.

Radalib is structured in three parts: source (the library itself), test and tools. Tools are programs which solve a certain problem, e.g. community detection, partitions comparison, network properties, connected components, file format conversion, etc., and which are basically mere interfaces to functionalities given by the library. The requests to make public implementations of some of the algorithms presented in our scientific papers led to the publication of Radatools, which are just executables for Windows, Linux and MacOS of some of the Radalib tools.

Go to Radalib page.