Package 'BLSM' reference manual

Title:	Bayesian Latent Space Model
Description:	Provides a Bayesian latent space model for complex networks, either weighted or unweighted. Given an observed input graph, the estimates for the latent coordinates of the nodes are obtained through a Bayesian MCMC algorithm. The overall likelihood of the graph depends on a fundamental probability equation, which is defined so that ties are more likely to exist between nodes whose latent space coordinates are close. The package is mainly based on the model by Hoff, Raftery and Handcock (2002) <doi:10.1198/016214502388618906> and contains some extra features (e.g., removal of the Procrustean step, weights implemented as coefficients of the latent distances, 3D plots). The original code related to the above model was retrieved from <https://www.stat.washington.edu/people/pdhoff/Code/hoff_raftery_handcock_2002_jasa/>. Users can inspect the MCMC simulation, create and customize insightful graphical representations or apply clustering techniques.
Authors:	Alberto Donizetti [aut, cre], Francesca Ieva [ctb]
Maintainer:	Alberto Donizetti <albe.donizetti@gmail.com>
License:	GPL (>= 2)
Version:	0.1.0
Built:	2025-03-06 03:49:51 UTC
Source:	https://github.com/adonizetti/blsm

Update step for the $\alpha$ variable

Description

Accept/reject the proposal for the $\alpha$ model variable

Usage

alpha_up(Y, lpz, W, alpha, adelta, a_a, a_b)
alpha_up(Y, lpz, W, alpha, adelta, a_a, a_b)

Arguments

`Y`	Adjacency matrix of the observed network
`lpz`	Matrix containing the negative square root of the Euclidean distances between latent positions
`W`	BLSM Weights matrix of the observed network
`alpha`	Model variable $\alpha$
`adelta`	The uniform proposal for $\alpha$ is defined on the $[-adelta,+adelta]$ interval
`a_a`	Shape parameter of the Gamma prior distribution for $\alpha$ . The value is usually set to 1, so the prior is actually an exponential distribution.
`a_b`	Rate parameter of the Gamma prior distribution for $\alpha$ .

Value

Updated value of the $\alpha$ variable

Bayesian Latent Space Model

Description

R package allowing the computation of a Bayesian Latent Space Model for complex networks.

Latent Space Models are characterized by the presence of unobservable variables (latent coordinates) that are used to compute the likelihood of the observed networks. Their goal is to map the observed network in the latent space by meeting specific probabilistic requirements, so that the estimated latent coordinates can then be used to describe and characterize the original graph.

In the BSLM package framework, given a network characterized by its adjacency $Y$ matrix, the model assigns a binary random variable to each tie: $Y_ij$ is related to the tie between nodes $i$ and $j$ and its value is 1 if the tie exists, 0 otherwise.

The model assumes the independence of $Y_ij | x_i,x_j, \alpha$ , where $x_i$ and $x_j$ are the coordinates of the nodes in the multidimensional latent space and $\alpha$ is an additional parameter such that $logit(P(Y_ij = 1)) = \alpha - ||x_i -x_j||$ .

The latent space coordinates are estimated by following a MCMC procedure that is based on the overall likelihood induced by the above equation.

Due to the symmetry of the distance, the model leads to more intuitive outputs for undirected networks, but the functions can also deal with directed graphs.

If the network is weighted, i.e. to each tie is associated a positive coefficient, the model's probability equation becomes $logit(P(Y_ij = 1)) = \alpha - W_ij||x_i -x_j||$ , where $W_ij$ denotes the weight related to link existing between $x_i$ and $x_j$ . This means that even non existing links should have a weight, therefore the matrix used in the computation isn't the original weights matrix but actually a specific "BLSM weights" matrix that contains positive coefficients for all the possible pairs of nodes. When dealing with weighted networks, please be careful to pass a "BLSM weights" matrix as input #' (please refer to example_weights_matrix for more detailed information and a valid example).

The output of the model allows the user to inspect the MCMC simulation, create insightful graphical representations or apply clustering techniques to better describe the latent space. See estimate_latent_positions or plot_latent_positions for further information.

References

P. D. Hoff, A. E. Raftery, M. S. Handcock, Latent Space Approaches to Social Network Analysis, Journal of the American Statistical Association, Vol. 97, No. 460, (2002), pp. 1090-1098.

A. Donizetti, A Latent Space Model Approach for Clustering Complex Network Data, Master's Thesis, Politecnico di Milano, (2017).

Geodesic distance

Description

Evaluate geodesic distance (shortest path) between all pairs of nodes in the network.

Usage

dst(M)
dst(M)

Arguments

`M`	Input adjacency matrix

Value

Matrix containing all the pairwise geodesic distances

Examples

dst(example_adjacency_matrix)
dst(example_adjacency_matrix)

BLSM simulation

Description

Core function of the BLSM package: run a simulation to obtain the positions of the network nodes in the latent space for each sampled iteration.

The positions are simulated accordingly to the model assumptions, please refer to BLSM for further information. The output of the function can be used to retrieve and compare specific iterations, observe their evolution or simply compute the average positions (more details in the descriptions and examples below).

Usage

estimate_latent_positions(Y, W, procrustean = TRUE, k = 2, alpha = 2,
  nscan = 8 * 10^5, burn_in = 5 * 10^5, odens = 10^3, zdelta = 1,
  z_norm_prior_mu = 0, z_norm_prior_sd = 10, adelta = 0.3,
  a_exp_prior_a = 1, a_exp_prior_b = 1, dynamic_plot = FALSE,
  dynamic_circles = FALSE, ...)
estimate_latent_positions(Y, W, procrustean = TRUE, k = 2, alpha = 2,
  nscan = 8 * 10^5, burn_in = 5 * 10^5, odens = 10^3, zdelta = 1,
  z_norm_prior_mu = 0, z_norm_prior_sd = 10, adelta = 0.3,
  a_exp_prior_a = 1, a_exp_prior_b = 1, dynamic_plot = FALSE,
  dynamic_circles = FALSE, ...)

Arguments

`Y`	Adjacency matrix of the network
`W`	(Optional) BLSM Weight matrix of the network
`procrustean`	Boolean to include/exclude (`TRUE/FALSE`) the Procrustean Transform step in the algorithm. Set `TRUE` by default.
`k`	Space dimensionality
`alpha`	Starting value of the $\alpha$ variable
`nscan`	Number of iterations
`burn_in`	Burn-in value (starting iterations to be discarded)
`odens`	Thinning: only 1 iteration every `odens` will be sampled and stored in the output
`zdelta`	Standard deviation of the Gaussian proposal for latent positions
`z_norm_prior_mu`	Mean of the Gaussian prior distribution for latent positions
`z_norm_prior_sd`	Standard deviation of the Gaussian prior distribution for latent positions
`adelta`	The uniform proposal for $\alpha$ is defined on the $[-adelta,+adelta]$ interval
`a_exp_prior_a`	Shape parameter of the Gamma prior distribution for $\alpha$ . As the value is usually set to 1 the prior is an exponential distribution.
`a_exp_prior_b`	Rate parameter of the Gamma prior distribution for $\alpha$ .
`dynamic_plot`	Boolean to plot dynamically the simulated positions (one update every `odens` iterations)
`dynamic_circles`	Boolean to add circles of radius $\alpha$ to the dynamic plots
`...`	Additional parameters that can be passed to plot_latent_positions

Value

Returns a "BLSM object" (blsm_obj), i.e. a list containing:

Alpha $\alpha$ values from the sampled iterations
Likelihood Log-likelihood values from the sampled iterations
Iterations Latent space coordinates from the sampled iterations. Latent positions are stored in a 3D array whose dimensions are given by (1: number of nodes, 2: space dimensionality, 3: number of iterations). In the non-Procrustean framework the latent distances are given instead of the positions: another 3D array is returned, whose dimensions are given by (1: number of nodes, 2: number of nodes, 3: number of iterations). The command needed in order to get the average values over the iterations for either the positions or the distances is rowMeans(blsm_obj$Iterations, dims=2) (see example below).
StartingPositions Latent space coordinates right after the initialization step. In the non-Procrustean framework starting distances are given instead.
Matrix Original matrices of the network (adjacency and BLSM weights)
Parameters List of parameters specified during the call to estimate_latent_positions

Examples


## Not run: 
 # Procrustean version followed by clustering
 blsm_obj = estimate_latent_positions(example_adjacency_matrix,  
                          burn_in = 3*10^4, nscan = 10^5, dynamic_plot = TRUE)
                          
 avg_latent_positions = rowMeans(blsm_obj$Iterations, dims=2)                   
 h_cl = hclust(dist(avg_latent_positions), method="complete")
 n = 3
 latent_space_clusters = cutree(h_cl, k=n)
 print(latent_space_clusters)
 plot(avg_latent_positions, col=rainbow(n)[latent_space_clusters], pch=20)
 
 # Non-Procrustean version followed by clustering                    
 blsm_obj_2 = estimate_latent_positions(example_adjacency_matrix, procrustean=FALSE,
                          burn_in = 3*10^4, nscan = 10^5)
 avg_latent_distances = rowMeans(blsm_obj_2$Iterations, dims=2)
 h_cl = hclust(as.dist(avg_latent_distances), method="complete")
 n = 3
 latent_space_clusters_2 = cutree(h_cl, k=n)
 print(latent_space_clusters_2)
                           
 # Weighted network 
 blsm_obj_3 = estimate_latent_positions(example_adjacency_matrix, example_weights_matrix, 
                          burn_in = 10^5, nscan = 2*10^5, dynamic_plot = TRUE)

## End(Not run)
## Not run: 
 # Procrustean version followed by clustering
 blsm_obj = estimate_latent_positions(example_adjacency_matrix,  
                          burn_in = 3*10^4, nscan = 10^5, dynamic_plot = TRUE)
                          
 avg_latent_positions = rowMeans(blsm_obj$Iterations, dims=2)                   
 h_cl = hclust(dist(avg_latent_positions), method="complete")
 n = 3
 latent_space_clusters = cutree(h_cl, k=n)
 print(latent_space_clusters)
 plot(avg_latent_positions, col=rainbow(n)[latent_space_clusters], pch=20)
 
 # Non-Procrustean version followed by clustering                    
 blsm_obj_2 = estimate_latent_positions(example_adjacency_matrix, procrustean=FALSE,
                          burn_in = 3*10^4, nscan = 10^5)
 avg_latent_distances = rowMeans(blsm_obj_2$Iterations, dims=2)
 h_cl = hclust(as.dist(avg_latent_distances), method="complete")
 n = 3
 latent_space_clusters_2 = cutree(h_cl, k=n)
 print(latent_space_clusters_2)
                           
 # Weighted network 
 blsm_obj_3 = estimate_latent_positions(example_adjacency_matrix, example_weights_matrix, 
                          burn_in = 10^5, nscan = 2*10^5, dynamic_plot = TRUE)

## End(Not run)

Example Adjacency Matrix

Description

Adjacency matrix of a 10 nodes random network for testing purposes

Usage

example_adjacency_matrix
example_adjacency_matrix

Format

A binary adjacency matrix representing links between nodes.

Example BLSM object

Description

BLSM object obtained by applying the Procrustean version of the latent space model to the unweighted network whose adjacency matrix is example_adjacency_matrix. Further details concerning the simulation are contained in the BLSM object itself.

Usage

example_blsm_obj
example_blsm_obj

Format

BLSM object (blsm_obj), i.e. a list containing:

Alpha $\alpha$ values from the sampled iterations
Likelihood Log-likelihood values from the sampled iterations
Iterations Latent space coordinates from the sampled iterations. Latent positions are stored in a 3D array whose dimensions are given by (1: number of nodes, 2: space dimensionality, 3: number of iterations). In the non-Procrustean framework the latent distances are given instead of the positions: another 3D array is returned, whose dimensions are given by (1: number of nodes, 2: number of nodes, 3: number of iterations). The command needed in order to get the average values over the iterations for either the positions or the distances is rowMeans(blsm_obj$Iterations, dims=2) (see example below).
StartingPositions Latent space coordinates right after the initialization step. In the non-Procrustean framework starting distances are given instead.
Matrix Original matrices of the network (adjacency and BLSM weights)
Parameters List of parameters specified during the call to estimate_latent_positions

Example Weights Matrix

Description

"BLSM weights" matrix of a 10 nodes random network for testing purposes

Usage

example_weights_matrix
example_weights_matrix

Format

A matrix containing positive weights for all pairs of nodes.

Given a couple of nodes, a weight expresses the importance of the distance between the coordinates associated to the two nodes in the latent space in terms of the overall likelihood of the graph. For this reason, even missing links must have a coefficient, otherwise the relative positioning of disconnected nodes would have no effect at all on the graph likelihood.

The exact probability equation is described in BLSM, as well as the notation used.

A few examples:

for unweighted networks, the "BLSM weights" matrix has all the values set to 1.
if two nodes share a strong connection, then the weight coefficient should be greater than 1 so that their positions in the latent space will be closer than they would be in an unweighted framework.
if two nodes share a weak connection, a coefficient smaller than 1 will allow the latent coordinates to be pretty far from each other even though the nodes are connected.

Network log-likelihood

Description

Compute the log-likelihood of the whole observed network based on the latent positions estimates and the model assumptions. See BLSM for more information.

Usage

lpY(Y, lpz, alpha, W)
lpY(Y, lpz, alpha, W)

Arguments

`Y`	Adjacency matrix of the observed network
`lpz`	Matrix containing the negative square root of the Euclidean distances between latent positions (output of lpz_dist)
`alpha`	Model variable $\alpha$
`W`	BLSM Weights matrix of the observed network

Value

Log-likelihood of the observed network

Network log-likelihood for individual updates

Description

Compute the log-likelihood of the whole observed network based on the latent positions estimates and the model assumptions. The function follows almost the same approach as lpY, but it is more suitable for the individual updates occurring during the simulation.

Usage

lpYNODE(Y, Z, alpha, node, diag, W)
lpYNODE(Y, Z, alpha, node, diag, W)

Arguments

`Y`	Adjacency matrix of the observed network
`Z`	Latent positions matrix
`alpha`	Model variable $\alpha$
`node`	Specific node in the network corresponding to the latent coordinate which will be used as reference
`diag`	Diagonal from `t(Z)%*%Z` matrix, passed to speed up the process.
`W`	BLSM Weights matrix of the observed network

Value

Log-likelihood of the observed network

Distance between latent positions

Description

Compute the square root of the Euclidean distances between latent positions and return them with a negative sign.

Usage

lpz_dist(Z)
lpz_dist(Z)

Arguments

`Z`	Latent positions matrix. The matrix size must be `(n,k)`, where `n` and `k` denote respectively the number of nodes in the network and the latent space dimensionality.

Value

Matrix containing the negative square root of the Euclidean distances between latent positions

Examples

pos = matrix(rnorm(20), ncol=2)
lpz_dist(pos)
pos = matrix(rnorm(20), ncol=2)
lpz_dist(pos)

lpz_dist optimized for individual updates

Description

Compute the square root of the Euclidean distances between a specific coordinate in the latent space and all the others. The function follows almost the same approach as lpz_dist, but it is more suitable for the individual updates occurring during the simulation.

Usage

lpz_distNODE(Z, node, diag)
lpz_distNODE(Z, node, diag)

Arguments

`Z`	Latent positions matrix
`node`	Specific node in the network corresponding to the latent coordinate which will be used as reference
`diag`	Diagonal from `t(Z)%*%Z` matrix, passed to speed up the process.

Value

Vector containing the negative square root of the Euclidean distances between latent positions

Network (positive) log-likelihood

Description

Compute the (positive) log-likelihood of the whole observed network based on the latent positions estimates and the model assumptions. The inputs are slightly different from those of lpY, so the function basically applies some preprocessing before calling lpY and returning its value with the opposite sign.

Usage

mlpY(avZ, Y, W)
mlpY(avZ, Y, W)

Arguments

`avZ`	Vector containing the $\alpha$ value and the latent positions
`Y`	Adjacency matrix of the observed network
`W`	BLSM Weights matrix of the observed network

Value

Log-likelihood of the observed network

Base BLSM plot function

Description

Plot latent positions from a Procrustean simulation.

Usage

plot_latent_positions(blsm_obj, colors, points_size = 0.1,
  labels_point_size = 5, labels_point_color = "yellow",
  labels_text_size = 1, labels_text_color = "blue", circles_2D = FALSE)
plot_latent_positions(blsm_obj, colors, points_size = 0.1,
  labels_point_size = 5, labels_point_color = "yellow",
  labels_text_size = 1, labels_text_color = "blue", circles_2D = FALSE)

Arguments

`blsm_obj`	BLSM object obtained through estimate_latent_positions
`colors`	(Optional) Colors of the simulated coordinate points in the latent space. Internal default colors are used if the argument is missing.
`points_size`	Size of the coordinate points
`labels_point_size`	Size of the label points
`labels_point_color`	Color of the label points
`labels_text_size`	Text size in the label points
`labels_text_color`	Text color in the label points
`circles_2D`	Plot circles of radius $\alpha$ (see the model's main variables) centered around the label points

Examples

plot_latent_positions(example_blsm_obj, labels_point_color = "black", labels_text_color = "black")

plot_latent_positions(example_blsm_obj, circles_2D = TRUE)
plot_latent_positions(example_blsm_obj, labels_point_color = "black", labels_text_color = "black")

plot_latent_positions(example_blsm_obj, circles_2D = TRUE)

BLSM traceplots and ACF

Description

Traceplots and autocorrelation functions for the $\alpha$ variable and a selected node (or pair of nodes in the non-Procrustean framework).

Usage

plot_traceplots_acf(blsm_obj, chosen_node = 1, coordinate = 1,
  chosen_pair = c(1, 2))
plot_traceplots_acf(blsm_obj, chosen_node = 1, coordinate = 1,
  chosen_pair = c(1, 2))

Arguments

`blsm_obj`	BLSM object obtained through estimate_latent_positions
`chosen_node`	Specified node for traceplot and autocorrelation function (Procrustean framework)
`coordinate`	Specified coordinate dimension from the n-dimensional latent space
`chosen_pair`	Specified pair of nodes for traceplot and autocorrelation function (non-Procrustean framework)

Examples

plot_traceplots_acf(example_blsm_obj, chosen_node=3, coordinate=1)

## Not run: 
 # Run the simulation without Procrustean step
 blsm_obj = estimate_latent_positions(example_adjacency_matrix, procrustean = FALSE, 
                          burn_in = 3*10^4, nscan = 10^5)
 
 # Plot 
 plot_traceplots_acf(blsm_obj, chosen_pair=c(2,5))

## End(Not run)
plot_traceplots_acf(example_blsm_obj, chosen_node=3, coordinate=1)

## Not run: 
 # Run the simulation without Procrustean step
 blsm_obj = estimate_latent_positions(example_adjacency_matrix, procrustean = FALSE, 
                          burn_in = 3*10^4, nscan = 10^5)
 
 # Plot 
 plot_traceplots_acf(blsm_obj, chosen_pair=c(2,5))

## End(Not run)

Procrustean corresponding positions

Description

Given a set of starting coordinates, the function returns the Procrustean Transform of the initial points that minimizes the sum of squared positional difference from a set of reference coordinates. The (Euclidean) distances between a candidate configuration and the reference are evaluated by considering the couples of corresponding points.

The reference configuration must be centered at the origin.

Usage

proc_crr(Z, Z0)
proc_crr(Z, Z0)

Arguments

`Z`	set of initial coordinates to be transformed
`Z0`	set of reference coordinates centered at the origin

Value

Set of coordinates minimizing the distance between the initial configuration and the reference one

Examples

# Create configuration and center it at the origin
pos_ref = matrix(runif(20), ncol=2)
pos_ref = t(t(pos_ref)-colMeans(pos_ref))

# Create a new configuration by adding a perturbation to the previous one
pos = pos_ref + matrix(rnorm(20, mean=1, sd=0.1), ncol=2)

# Compute the Procrustean Transform and inspect the results
proc_pos = proc_crr(pos, pos_ref)
plot(pos_ref, col="blue", pch=20, xlim=c(-1,3), ylim=c(-1,3))
points(pos, col="red", pch=20)
points(proc_pos, col="purple", pch=20)
# Create configuration and center it at the origin
pos_ref = matrix(runif(20), ncol=2)
pos_ref = t(t(pos_ref)-colMeans(pos_ref))

# Create a new configuration by adding a perturbation to the previous one
pos = pos_ref + matrix(rnorm(20, mean=1, sd=0.1), ncol=2)

# Compute the Procrustean Transform and inspect the results
proc_pos = proc_crr(pos, pos_ref)
plot(pos_ref, col="blue", pch=20, xlim=c(-1,3), ylim=c(-1,3))
points(pos, col="red", pch=20)
points(proc_pos, col="purple", pch=20)

Update step for the latent positions

Description

Accept/reject the proposals for the latent positions

Usage

Z_up(Y, Z, W, alpha, zdelta, mu_z, sd_z)
Z_up(Y, Z, W, alpha, zdelta, mu_z, sd_z)

Arguments

`Y`	Adjacency matrix of the observed network
`Z`	Latent positions matrix
`W`	BLSM Weights matrix of the observed network
`alpha`	Model variable $\alpha$
`zdelta`	Standard deviation of the Gaussian proposal for latent positions
`mu_z`	Mean of the Gaussian prior distribution for latent positions
`sd_z`	Standard deviation of the Gaussian prior distribution for latent positions

Value

Updated latent positions matrix

Package 'BLSM'

Help Index

Update step for the α\alphaα variable

Description

Usage

Arguments

Value

Bayesian Latent Space Model

Description

References

Geodesic distance

Description

Usage

Arguments

Value

Examples

BLSM simulation

Description

Usage

Arguments

Value

Examples

Example Adjacency Matrix

Description

Usage

Format

Example BLSM object

Description

Usage

Format

Example Weights Matrix

Description

Usage

Format

Network log-likelihood

Description

Usage

Arguments

Value

Network log-likelihood for individual updates

Description

Usage

Arguments

Value

Distance between latent positions

Description

Usage

Arguments

Value

Examples

lpz_dist optimized for individual updates

Description

Usage

Arguments

Value

Network (positive) log-likelihood

Description

Usage

Arguments

Value

Base BLSM plot function

Description

Usage

Arguments

Examples

BLSM traceplots and ACF

Description

Usage

Arguments

Examples

Procrustean corresponding positions

Description

Usage

Arguments

Value

Examples

Update step for the latent positions

Description

Usage

Arguments

Update step for the $\alpha$ variable