iglm: Regression under Interference in Connected Populations

Cornelius Fritz and Michael Schweinberger

Roadmap

Introduction

Motivation

Standard GLMs (lm, glm) assume independent units:
- Attributes (X_i, Y_i) of unit i are unaffected by unit j
Problem: Violated in connected population because of spillover and contagion!
Solution: A regression framework for dependent attributes (\mathbf{X}, \mathbf{Y}) and network connections \mathbf{Z} implemented in R Package iglm

Publications

Software Paper

Available on ArXiv (describing the iglm package):

Theory Paper

Published in the Journal of the American Statistical Association:

Advantages of R Package `iglm`

Comprehensive: Binary, count, and real-valued attributes \mathbf{X}, \mathbf{Y} & network connections \mathbf{Z}
Interpretable: Extension of standard GLMs with micro-behavioral foundation
Local Dependence: Neighborhoods \mathcal{N}_i to control dependencies
Scalable: Convex optimization via blockwise optimization
Guarantees: Error of O(N) weights decays at rate \sqrt{\log N / N}

Architecture & Setup

Install from CRAN (Fritz & Schweinberger, 2025):

install.packages("iglm")

R6 class architecture (Chang, 2025):
- iglm manages formula, sampler, and execution
- iglm.data manages variables
Back-end implemented in C++ for high-performance computing

R6 Classes in R

Standard R (e.g., `ergm`)

Apply external functions to data/objects.

Syntax: Pass the object as an argument:

# Fit the model:
fit <- ergm(nw ~ edges + mutual)
# Run external functions:
summary(fit)
sims <- simulate(fit, nsim = 10)

Copy-on-Modify: Modifying settings or data copies the entire object in memory

R6 Classes (e.g., `iglm`)

Data and functions live together in the object.

Syntax: Call functions/methods on the object:

model <- iglm(nw ~ edges + mutual)
model$estimate()  # Fits in-place
model$simulate()  # Simulates in-place

In-Place Modification: Modifying the object in-place without copy overhead:
```
model$set_control(max_it = 300)
```

Data Representation

Data Structure in `iglm.data`

Predictors \mathbf{X} = (X_i): binary, count, or real; fixed or random
Outcomes \mathbf{Y} = (Y_i): binary, count, or real
Connections \mathbf{Z} = (Z_{i,j}): binary Z_{i,j} \in \{0, 1\}, directed or undirected
Neighborhoods \mathcal{N}_i (exogenous): Overlap indicated by c_{i,j} = \mathbb{I}(\mathcal{N}_i \cap \mathcal{N}_j \neq \emptyset)

Running Example: Hate Speech on X

Is there a spillover effect from interacting with Republican politicians to employing offensive language in online communications?

Dataset: N = 495 state legislators across 6 states from Kim et al. (2022)
Outcome Y_i: Use of hate speech (1 = Yes, 0 = No)
Predictor X_i: Party affiliation (1 = Republican, 0 = Other) [fixed]
Network Z_{i,j}: Directed mention/repost network on X (Twitter)
Covariates: Gender (v_{i,1}), Race (v_{i,2}), and State (v_{i,3})

Running Example: Hate Speech on X

Is there a spillover effect from interacting with Republican politicians to employing offensive language in online communications?

Data Representation: `iglm.data`

Load the pre-packaged dataset directly:

data("state_twitter")
data.object <- state_twitter$iglm.data$clone()

Exogenous Covariates gender_attribute, white_attribute, match_gender, match_race, match_state are available via package

Show code

# Extracting covariates from state_twitter object
match_gender     <- state_twitter$match_gender
match_race       <- state_twitter$match_race
match_state      <- state_twitter$match_state
white_attribute  <- state_twitter$white_attribute
gender_attribute <- state_twitter$gender_attribute

Generally: Import from raw files (vectors or matrices)

Show code

z_network <- read.csv("z_network.csv")
x_attribute <- read.csv("x_attribute.csv")
y_attribute <- read.csv("y_attribute.csv")
neighborhood <- as.matrix(read.csv("neighborhood.csv"))

data.object <- iglm.data(
  x_attribute = x_attribute,  type_x = "binomial",
  y_attribute = y_attribute,  type_y = "binomial",
  z_network = z_network,      directed = TRUE,
  neighborhood = neighborhood
)

Descriptive Analysis: Data Object Summary

data.object

iglm.data object
  units                       : 495
  directed                    : TRUE
  edges (fixed = FALSE)       : 9218
  neighborhood edges          : 24398

Attribute summaries
  x_attribute (fixed = TRUE)  : binomial 1s=195, 0s=300, P(1)=0.394
  y_attribute                 : binomial 1s=204, 0s=291, P(1)=0.412

Descriptive Analysis: Visualizing the Network

The package provides a plotting function based on igraph
Node colors \rightarrow outcome variables (\mathbf{Y})
Node size \rightarrow predictor variables (\mathbf{X})
Grey lines \rightarrow directed connections (\mathbf{Z})
Orange lines \rightarrow overlapping neighborhoods (\mathcal{N}_i \cap \mathcal{N}_j \neq \emptyset)

set.seed(123)
data.object$plot(
  edge.width = 0.5, 
  edge.arrow.size = 0.2, 
  legend_size = 0.5, 
  legend_size_n_levels = 2
)

Descriptive Analysis: Different Statistics

# Plot spillover degree distribution
data.object$spillover_degree_distribution()

Spillover degree distribution: The degree in the subnetwork of connections that can enable spillover (i.e., connections where c_{i,j} x_i y_j = 1 or c_{i,j} x_j y_i = 1)

Back to Simulation

# Plot outcome distribution
data.object$y_distribution()

Outcome distribution (y_distribution): The empirical distribution of the node-level outcomes Y (density or bar plot depending on the outcome type)

Back to Simulation

# Plot traditional degree distribution
data.object$degree_distribution()

Degree distribution: Distribution of number of outgoing and incoming connections

Back to Simulation

# Plot geodesic distances distribution
data.object$geodesic_distances_distribution()

Geodesic distance distribution: The distribution of shortest path distances (geodesic distances) between all pairs of connected actors in the symmetrized network

Back to Simulation

# Plot edgewise shared partner distribution
data.object$edgewise_shared_partner_distribution()

Edgewise shared partner distribution: The distribution of edgewise shared partner (ESP) counts (i.e., the number of mutual neighbors that two connected actors share in the network)

Back to Simulation

Framework

Joint Probability Model

\begin{aligned} f_{\theta}(\mathbf{x}, \mathbf{y}, \mathbf{z}) = \dfrac{1}{c(\theta)} &\left[ \prod_{i \in \mathcal{P}} a_{\mathbf{X}}(x_i) a_{\mathbf{Y}}(y_i) \exp(\theta_g^\top g_i(x_i, y_i)) \right] \\ \times &\left[ \prod_{(i,j) \in \mathcal{D}} a_{\mathbf{Z}}(z_{i,j}) \exp(\theta_h^\top h_{i,j}(x_i, x_j, y_i, y_j, \mathbf{z})) \right] \end{aligned}

\mathcal{P} is the set of units, \mathcal{D} is the set of dyads
a_{\mathbf{X}}, a_{\mathbf{Y}}, a_{\mathbf{Z}} are reference measures
g_i(\cdot) describes unit relationships
h_{i,j}(\cdot) describes network connections and spillover

Design Configurations

Fixed vs. random designs are specified via:

data.object$set_fix_x(TRUE)          # Fixed predictors
data.object$set_fix_z(TRUE)          # Fixed connections
data.object$set_fix_z_alocal(TRUE)   # Fixes non-overlapping network

Can also all be done when defining the data object with iglm.data()

iglm.data( ...,
  fix_x = TRUE,
  set_fix_z_alocal = TRUE)

Random designs allow generalization to superpopulations
Fixed designs limit inferences to observed states

Local Dependence Assumption

Dependencies are localized using neighborhoods: h_{i,j}(x_i, x_j, y_i, y_j, \mathbf{z}) = h_{i,j}(z_{i,j}, z_{j,i}) \quad \text{for } \mathcal{N}_i \cap \mathcal{N}_j = \emptyset
Attributes of units with non-overlapping neighborhoods are conditionally independent.
Directed networks capture reciprocity via z_{j,i}.

Model Specification: Basics

data.object ~ term_1 + ...

Attribute Terms (g_i(x_i, y_i)):
- attribute_y (\sum_i y_i), attribute_xy (\sum_i x_i y_i)
Network Terms (h_{i,j}(z_{i,j})):
- edges (\sum_{i,j} z_{i,j}), mutual (\sum_{i,j} z_{i,j} z_{j,i})
Joint Terms (h_{i,j}(z_{i,j}, x_i, y_j)):
- spillover_yy (\sum_{i,j} y_i y_j z_{i,j}), spillover_yx (\sum_{i,j} y_i x_j z_{j,i})

All terms

Model Specification: Attribute Terms — g_i(x_i, y_i)

formula.iglm <- data.object ~
  attribute_y +                      # intercept for Y_i
  attribute_xy +                     # effect of X_i on Y_i
  cov_y(data = gender_attribute) +   # covariate effect: gender -> Y
  cov_y(data = white_attribute)      # covariate effect: race -> Y

attribute_y — baseline of the outcome Y_i
attribute_xy — direct effect of predictor X_i on Y_i
cov_y(data = v) — additive effect of covariate v_i on Y_i

Model Specification: Network Terms — h_{i,j}(z_{i,j})

formula.iglm <- data.object ~
  edges(mode = "alocal") +                      # baseline tie propensity (alocal)
  cov_z(data = match_gender, mode = "local") +  # gender homophily within N
  cov_z(data = match_race,   mode = "local") +  # race homophily within N
  cov_z(data = match_state,  mode = "local") +  # state homophily within N
  degrees +                                     # in/out-degree node parameters
  mutual(mode = "local") +                      # reciprocity: z_{i,j} z_{j,i} within N
  transitive                                    # transitivity: shared partners

edges(mode = "alocal") — baseline for non-neighbourhood dyads
cov_z(data = m) — homophily: tie more likely when m_{i,j} = 1
degrees — \theta_{\mathcal{O},i} and \theta_{\mathcal{I},j} per node (depends if directed or undirected)
mutual / transitive — structural network dependence within \mathcal{N}

Model Specification: Spillover Terms — h_{i,j}(z_{i,j})

formula.iglm <- data.object ~
  spillover_yx(mode = "local") +   # X_j -> Y_i via z_{j,i}  (treatment spillover)
  spillover_yy(mode = "local")     # Y_j -> Y_i via z_{i,j} + z_{j,i}  (outcome contagion)

spillover_yx — being connected to a Republican (X_j = 1) may increase hate speech in i
spillover_yy — neighbour’s outcome Y_j influences Y_i
mode = "local" — spillovers restricted to overlapping neighbourhoods \mathcal{N}_i \cap \mathcal{N}_j \neq \emptyset

Model Specification: Full Model Formula

formula.iglm <- data.object ~
  # -- Attribute terms --
  attribute_y + attribute_xy +
  cov_y(data = gender_attribute) + cov_y(data = white_attribute) +
  # -- Network terms --
  edges(mode = "alocal") +
  cov_z(data = match_gender, mode = "local") +
  cov_z(data = match_race,   mode = "local") +
  cov_z(data = match_state,  mode = "local") +
  degrees + mutual(mode = "local") + transitive +
  # -- Spillover/joint terms --
  spillover_yx(mode = "local") + spillover_yy(mode = "local")

Model Specification: Scaling and Localization

Localization restricts spillovers to overlapping neighborhoods (mode = "local")
Scaling controls spillover in large neighborhoods (important for count outcomes or high degree heterogeneity):

Unscaled (spillover_yy) \left( y_i y_j + y_j y_i \mathbb{I}_U(\mathbf{z}) \right) e_{i,j}^{(\mathtt{s})}

Scaled (spillover_yy_scaled) \left( \frac{y_i y_j}{\operatorname{deg}(i, \mathtt{s})} + \frac{y_j y_i}{\operatorname{deg}(j, \mathtt{s})} \mathbb{I}_U(\mathbf{z}) \right) e_{i,j}^{(\mathtt{s})}

Estimation

Estimation: Pseudo-loglikelihood

Maximum likelihood intractable due to normalizing constant
- Maximize the pseudo-loglikelihood via an MM algorithm: \ell(\theta; \mathbf{x}, \mathbf{y}, \mathbf{z}) = \sum_{i \in \mathcal{P}} \ell_{\mathbf{X},i}(\theta) + \sum_{i \in \mathcal{P}} \ell_{\mathbf{Y},i}(\theta) + \sum_{(i,j) \in \mathcal{D}} \ell_{\mathbf{Z},i,j}(\theta)

Optimization: Partitioned MM Algorithm

High-dimensional degree parameters (2N) require partitioned optimization:

Step 1 (MM): Update degree weights \theta_1 using a Minorization-Maximization algorithm (Quasi-Newton accelerated)
Step 2 (Newton-Raphson): Update parameters \theta_2

# Initialize model from formula
model.iglm <- iglm(formula = formula.iglm)

# Customize optimizer and UQ parameters
model.iglm$set_control(control.iglm(
  max_it = 300,              
  tol = 1e-4,                
  offset_nonoverlap = 0,     
  var_method = "Mean-value" 
))

# Fit model parameters (runs in a few minutes)
model.iglm$estimate()

# Initialize model from formula
model.iglm <- iglm(formula = formula.iglm)

# Use analytical Hessian standard errors
model.iglm$set_control(control.iglm(
  max_it = 100,              
  tol = 1e-3,                
  offset_nonoverlap = 0,     
  var_method = "Hessian" 
))

# Fit model parameters (runs in ~5 seconds)
model.iglm$estimate()

max_it & tol: Optimizer budget & stopping threshold
offset_nonoverlap: Penalty for sparse connections outside neighborhoods
var_method: Method for standard error calculation (MCMC-based mean-value vs. analytical Hessian)

Estimation: Diagnostics

Plot convergence diagnostic paths via $plot(trace = TRUE) method:

model.iglm$plot(trace = TRUE)

Simulation

Draw samples from the joint distribution of (X,Y,Z) under \theta via MCMC:

Simulating Outcomes and Predictor (Y \& X):
- Gibbs sampler to cycle over each unit
Simulating Connections (Z):
- Metropolis-Hastings or TNT sampler for local connections
- Binomial sampling for non-overlapping connections

Simulation: Specification

# Define samplers for Y and Z only since X is fixed in this example
sampler.y.obj <- sampler.net.attr(n_proposals = data.object$n_actor * 10)
sampler.z.obj <- sampler.net.attr(n_proposals = nrow(data.object$overlap) * 10, tnt = TRUE)
sampler.obj <- sampler.iglm(
  n_burn_in = 1, n_simulation = 1, seed = 123,
  sampler_y = sampler.y.obj, sampler_z = sampler.z.obj
)

n_proposals: MCMC proposals per iteration
n_burn_in: Discarded initial iterations to reach stationary distribution
n_simulation: Number of simulated datasets to generate and store
seed: Random seed to ensure reproducibility

Simulation: Example

model.iglm$set_sampler(sampler.obj)
model.iglm$simulate() 
set.seed(123)
model.iglm$results$samples[[1]]$plot(
  edge.width = 0.5, 
  edge.arrow.size = 0.2, 
  legend_size = 2, 
  legend_size_n_levels = 2
)

Simulation: Applications

MCMC simulations are used for two main tasks:

Uncertainty Quantification (UQ):
- Approximation of standard errors \mathbb{V}_{\widehat{\theta}}(\widehat{\theta}) via simulations
Model Assessment (Goodness-of-Fit):
- Evaluates whether simulations from the fitted model replicate auxiliary network features
- Via the $assess() method: Statistics from the Descriptive Analysis slide may be used

Goodness-of-Fit: Network Stats

# Other available functions include: degree_distribution, spillover_degree_distribution,
# dyadwise_shared_partner_distribution, y_distribution, and x_distribution
model.iglm$assess(formula = ~ edgewise_shared_partner_distribution +
                              geodesic_distances_distribution)

Goodness-of-Fit: Spillover Degrees

model.iglm$assess(formula = ~ spillover_degree_distribution)

Application

Application: Coefficient Table

Use the $summary() method on the fitted model to inspect results:

model.iglm$summary()

iglm object
----------------------------------------------------------------------------
Results: 

                                           Estimate    S.E. t-value Pr(>|t|)
attribute_y                                 -1.8587  0.3303   -5.63   <1e-04
attribute_xy                                -0.0533  0.2560   -0.21     0.84
cov_y(data = gender_attribute)              -0.1703  0.2242   -0.76     0.45
cov_y(data = white_attribute)                0.0806  0.2630    0.31     0.76
edges(mode = 'alocal')                      -1.0531  0.0035 -296.73   <1e-04
cov_z(data = match_gender, mode = 'local')   0.2226  0.0134   16.64   <1e-04
cov_z(data = match_race, mode = 'local')     0.6343  0.0095   66.50   <1e-04
cov_z(data = match_state, mode = 'local')    3.8760  0.0241  161.16   <1e-04
mutual(mode = 'local')                       2.5847  0.0507   51.03   <1e-04
transitive                                   0.6546  0.0159   41.24   <1e-04
spillover_yx(mode = 'local')                 0.0325  0.0375    0.87     0.39
spillover_yy(mode = 'local')                 0.0626  0.0133    4.70   <1e-04
                                              
attribute_y                                ***
attribute_xy                                  
cov_y(data = gender_attribute)                
cov_y(data = white_attribute)                 
edges(mode = 'alocal')                     ***
cov_z(data = match_gender, mode = 'local') ***
cov_z(data = match_race, mode = 'local')   ***
cov_z(data = match_state, mode = 'local')  ***
mutual(mode = 'local')                     ***
transitive                                 ***
spillover_yx(mode = 'local')                  
spillover_yy(mode = 'local')               ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Time for estimation: 19 mins

Degree Parameters:
  Outdegrees:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  -11.2    -7.9    -6.9    -7.1    -6.2    -3.8 

  Indegrees:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -3.985  -1.104  -0.563  -0.613  -0.027   3.069

Interpretation: Outcome Equation (Y)

Y_i \mid (X, Y_{-i}, Z) \sim \text{Bernoulli}(\mu_{\mathbf{Y},i}(\widehat\eta_{\mathbf{Y},i,i}))\\ \mu_{\mathbf{Y},i}(\widehat\eta_{\mathbf{Y},i,i}) = \dfrac{\exp(\widehat\eta_{\mathbf{Y},i})}{1 + \exp(\widehat\eta_{\mathbf{Y},i})}, Linear predictor for Hate Speech (Y_i \in \{0, 1\}): \begin{aligned} \widehat\eta_{\mathbf Y,i} &\coloneqq -1.859 \;-.053\, x_{i} \;-.170\, v_{i,1} \;+.081\, v_{i,2} \\ &+ .033 \sum_{j \in \mathcal{P} \setminus \{i\}} c_{i,j}\, x_j\, z_{j,i} \;+ .063 \sum_{j \in \mathcal{P} \setminus \{i\}} c_{i,j}\, y_j\, (z_{i,j} + z_{j,i}) \end{aligned}

Contagion Effect: The outcome spillover coefficient (spillover_yy = .063) is positive and significant (p < .0001)

Interpretation: Network Equation (Z)

Z_{i,j} \mid (X, Y, Z_{-i,j}) \sim \text{Bernoulli}(\mu_{\mathbf{Z},i,j}(\widehat\eta_{\mathbf{Z},i,i}))\\ \mu_{\mathbf{Z},i,j}(\widehat\eta_{\mathbf{Z},i,j}) = \dfrac{\exp(\widehat\eta_{\mathbf{Z},i,j})}{1 + \exp(\widehat\eta_{\mathbf{Z},i,j})},

Empirical Equation for Connections (Z_{i,j} \in \{0, 1\}): \begin{aligned} \widehat{\eta}_{\mathbf{Z},i,j} &= \widehat{\theta}_{\mathcal{O},i} + \widehat{\theta}_{\mathcal{I},j} - 1.053(1 - c_{i,j}) \\ &\quad + \Big[ 3.876 \text{state} + 2.585 \text{mutual} + .655 \text{trans} + .063 y_i y_j \Big] c_{i,j} \end{aligned}

Strong state homophily, reciprocity, and transitivity

Model Comparison

How can be compare different specifications of the model?
- Compare full IGLM model containing spillover terms, reciprocity, and transitivity with a GLM ignoring interference

formula.glm <- data.object ~
  attribute_y + attribute_xy +
  cov_y(data = white_attribute) + cov_y(data = gender_attribute) +
  edges(mode = "alocal") + edges(mode = "local") +
  cov_z(data = match_gender, mode = "local") +
  cov_z(data = match_race,   mode = "local") +
  cov_z(data = match_state,  mode = "local")

model.glm <- iglm(formula = formula.glm, name = "glm")
model.glm$estimate()

Predictive Performance: IGLM vs GLM

Show code

pred.iglm <- model.iglm$predict(variant = "marginal")
pred.glm  <- model.glm$predict(variant = "marginal")
roc.iglm <- pROC::roc(pred.iglm$y$target, pred.iglm$y$prediction, quiet = TRUE)
roc.glm  <- pROC::roc(pred.glm$y$target, pred.glm$y$prediction, quiet = TRUE)

par(mar = c(4, 4, 1, 1))
plot(roc.iglm$specificities, roc.iglm$sensitivities, col = "#08696b",
     type = "l", lwd = 3, bty = "l", las = 1,
     xlab = "Specificity", ylab = "Sensitivity")
lines(roc.glm$specificities, roc.glm$sensitivities, col = "#e64173",
      type = "l", lwd = 3, bty = "l")
legend("bottomleft", legend = c("iglm", "glm"), lty = c(1,1), lwd = c(3,3),
       col = c("#08696b", "#e64173"), bty = "n")

IGLM (teal) outperforms GLM (red) in ROC-AUC
Capturing transitivity and degree heterogeneity also improves network fit
Takeaway: Incorporating spillover and structural network effects yields vastly superior predictions

Goodness-of-Fit: IGLM vs GLM

Show code

# 1. Run simulations (plot = FALSE)
model.iglm$assess(
  formula = ~ spillover_degree_distribution, 
  plot = FALSE
)
model.glm$assess(
  formula = ~ spillover_degree_distribution, 
  plot = FALSE
)
# 2. Plot overlaid comparison
model.glm$results$plot(
  model_assessment = TRUE, 
  "iglm" = model.iglm$results$model_assessment
)

Comparison of IGLM and GLM via Goodness-of-Fit (GOF) simulations:

Run $assess() on both models with plot = FALSE to compute simulations
Call the plot method with model_assessment = TRUE, passing the comparison results

Custom Terms: Recipe

iglm uses a fast C++ backend.
Users can create their own custom terms:

create_userterms_skeleton()

Four Step Recipe:
1. Decompose model terms g_i(\cdot) and h_{i,j}(\cdot).
2. Specify change statistics \Delta_{\mathbf{X},i}, \Delta_{\mathbf{Y},i}, \Delta_{\mathbf{Z},i,j}.
3. Write C++ code using XYZ_class methods and register.
4. Write R wrapper InitIglmTerm.my_term.

Custom Terms: C++ Change Statistic

double my_stat_spillover(const XYZ_class &object, const int &unit_i, const int &unit_j,
                         const arma::mat &data, const double &type, const std::string &mode,
                         const bool &is_full_neighborhood) {
  double res = 0.0;
  if (mode == "x") { // x_i from 0 -> 1
    for (const int &k : object.adj_list_nb.at(unit_i)) res += object.y_attribute.get_val(k);
  } else if (mode == "y") { // y_i from 0 -> 1
    for (const int &k : object.adj_list_nb.at(unit_i)) res += object.x_attribute.get_val(k);  
  } else { // z_ij from 0 -> 1
    if (object.get_val_overlap(unit_i, unit_j)) {
      res = (object.x_attribute.get_val(unit_i) * object.y_attribute.get_val(unit_j)) +
            (object.x_attribute.get_val(j) * object.y_attribute.get_val(unit_i));
    }
  }
  return res;
}
EFFECT_REGISTER("my_spillover", ::my_stat_spillover, "my_spillover", 0);

Custom Terms: R Wrapper Function

Define wrapper in the R folder of iglm.userterms:

InitIglmTerm.my_spillover <- function(data_object, arglist, ...) {
  arglist <- iglm:::check.IglmTerm(data_object, arglist, directed = FALSE)
  list(
    term_name = "my_spillover",
    coef_name = arglist$label
  )
}

Compile the generated package and load it:

library(iglm)
library(iglm.userterms)

Exercise

Exercise: Copenhagen Network Study

Dataset: N = 409 students tracked over 28 days (Sapiezynski, Stopczynski, Lassen, & Lehmann, 2019)
Outcome Y_i \in \mathbb{R}: Log-transformed phone call duration \log(t_i)
Predictor X_i: Gender (1 = Female, 0 = Male) [Fixed]
Network Z_{i,j}: Undirected friendship network
Neighborhood \mathcal{N}_i: Bluetooth proximity scans (\ge 24 hours)

Hints: Continuous Outcomes in `iglm`

Since Y_i is continuous (real-valued), we must use features not needed in the binary case:

Outcome Scale Parameter: Before estimation, you must set the scale parameter (variance) on the data object using $set_scale_y():
```
data("copenhagen")
copenhagen$set_scale_y(var(copenhagen$y_attribute))
```

Model Assessment: To inspect the continuous outcome distribution fit, pass y_distribution to the $assess() method:

# Note: assess() accepts: degree_distribution, spillover_degree_distribution,
# geodesic_distances_distribution, edgewise_shared_partner_distribution,
# dyadwise_shared_partner_distribution, y_distribution, x_distribution
model$assess(formula = ~ y_distribution + geodesic_distances_distribution)

Exercise: Setup & Data Preparation

Load the Copenhagen network study dataset
Compute and set the outcome scale parameter (variance of y_attribute) using $set_scale_y()

Load the Copenhagen network study dataset
Compute and set the outcome scale parameter (variance of y_attribute) using $set_scale_y()

# 1. Load data and set scale
data("copenhagen")
copenhagen$set_scale_y(var(copenhagen$y_attribute))

Exercise: Model Specification

Formulate the joint model formula for continuous outcomes
Include the following terms: attribute_y, attribute_xy, degrees, edges(mode = "alocal"), transitive, spillover_xy, and spillover_yy

Formulate the joint model formula for continuous outcomes.
Include the following terms: attribute_y, attribute_xy, degrees, edges(mode = "alocal"), transitive, spillover_xy, and spillover_yy

# 2. Specify formula
formula.norm <- copenhagen ~
  attribute_y + attribute_xy + degrees +
  edges(mode = "alocal") + transitive +
  spillover_xy + spillover_yy

Exercise: Model Estimation

Initialize the model from the formula using iglm()
Estimate the model parameters using the $estimate() method
Output the coefficient summary table using $summary()

Initialize the model from the formula using iglm()
Estimate the model parameters using the $estimate() method
Output the coefficient summary table using $summary()

# 3. Fit model
model.norm <- iglm(formula = formula.norm)
model.norm$estimate()
model.norm$summary()

iglm object
--------------------------------------------------------------
Results: 

                       Estimate    S.E. t-value Pr(>|t|)    
attribute_y              1.0879  0.1245     8.7   <1e-04 ***
attribute_xy             0.4205  0.2253     1.9    0.062 .  
edges(mode = 'alocal')  -2.7644  0.0013 -2075.0   <1e-04 ***
transitive               1.9037  0.1864    10.2   <1e-04 ***
spillover_xy            -0.2327  0.0985    -2.4    0.018 *  
spillover_yy             0.3269  0.0713     4.6   <1e-04 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Time for estimation: 5.7 secs

Degree Parameters:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  -3.30   -1.24   -0.76   -0.74   -0.19    1.53

Exercise: Model Assessment

Evaluate the goodness-of-fit via MCMC simulations
Include the continuous outcome distribution (y_distribution)
Hint: Consult the “Joint Modeling of Networks and Attributes with iglm” vignette (Goodness-of-Fit section) for an explanation of all available assessment statistics

Evaluate the goodness-of-fit via MCMC simulations
Include the continuous outcome distribution (y_distribution)
Hint: Consult the “Joint Modeling of Networks and Attributes with iglm” vignette (Goodness-of-Fit section) for an explanation of all available assessment statistics

model.norm$assess(formula = ~ y_distribution)

Exercise: Interpretation

Interpret the demographic, network structure, and treatment/outcome contagion effects

Outcome Contagion (spillover_yy): Significant (\widehat{\theta} = 0.327, p <1e-04)
Gender Effect: Females show a higher baseline call duration

Conclusion

What `iglm` Does

Joint regression for dependent outcomes \mathbf{Y}, predictors \mathbf{X}, and connections \mathbf{Z} under interference with support for binary, count, and continuous outcomes
Provides MCMC simulation for uncertainty quantification and goodness-of-fit
Fast C++ backend for simulation and optimization

How to Use & Contribute

CRAN Release: Install package via install.packages("iglm")
GitHub Repository: View source & report issues to contribute
Vignettes: Read Joint Modeling of Networks and Attributes
Extending terms: Use create_userterms_skeleton() to build custom terms

Publications

Software Paper

Available on ArXiv (describing the iglm package):

Theory Paper

Published in the Journal of the American Statistical Association:

Literature

Chang, W. (2025). R6: Encapsulated classes with reference semantics. https://doi.org/10.32614/CRAN.package.R6

Fritz, C., & Schweinberger, M. (2025). Iglm: Regression under interference in connected populations. Retrieved from https://CRAN.R-project.org/package=iglm

Kim, T., Nakka, N., Gopal, I., Desmarais, B. A., Mancinelli, A., Harden, J. J., … Boehmke, F. J. (2022). Attention to the COVID-19 pandemic on Twitter: Partisan differences among U.S. state legislators. Legislative Studies Quarterly, 47, 1023–1041. https://doi.org/10.1111/lsq.12367

Sapiezynski, P., Stopczynski, A., Lassen, D. D., & Lehmann, S. (2019). Interaction data from the Copenhagen Networks Study. Scientific Data, 6, 315.

iglm: Regression under Interference in Connected Populations

Roadmap

Introduction

Motivation

Publications

Software Paper

Theory Paper

Advantages of R Package iglm

Architecture & Setup

R6 Classes in R

Standard R (e.g., ergm)

R6 Classes (e.g., iglm)

Data Representation

Data Structure in iglm.data

Running Example: Hate Speech on X

Running Example: Hate Speech on X

Data Representation: iglm.data

Descriptive Analysis: Data Object Summary

Descriptive Analysis: Visualizing the Network

Descriptive Analysis: Different Statistics

Framework

Joint Probability Model

Design Configurations

Local Dependence Assumption

Model Specification: Basics

Model Specification: Attribute Terms — g_i(x_i, y_i)

Model Specification: Network Terms — h_{i,j}(z_{i,j})

Model Specification: Spillover Terms — h_{i,j}(z_{i,j})

Model Specification: Full Model Formula

Model Specification: Scaling and Localization

Estimation

Estimation: Pseudo-loglikelihood

Optimization: Partitioned MM Algorithm

Estimation: Execution

Estimation: Diagnostics

Simulation

Simulation

Simulation: Specification

Simulation: Example

Simulation: Applications

Goodness-of-Fit: Network Stats

Goodness-of-Fit: Spillover Degrees

Application

Application: Coefficient Table

Interpretation: Outcome Equation (Y)

Interpretation: Network Equation (Z)

Model Comparison

Model Comparison

Predictive Performance: IGLM vs GLM

Goodness-of-Fit: IGLM vs GLM

Custom Terms: Recipe

Custom Terms: C++ Change Statistic

Custom Terms: R Wrapper Function

Exercise

Exercise: Copenhagen Network Study

Hints: Continuous Outcomes in iglm

Exercise: Setup & Data Preparation

Exercise: Model Specification

Exercise: Model Estimation

Exercise: Model Assessment

Exercise: Interpretation

Conclusion

What iglm Does

How to Use & Contribute

Publications

Software Paper

Theory Paper

Literature

Advantages of R Package `iglm`

Standard R (e.g., `ergm`)

R6 Classes (e.g., `iglm`)

Data Structure in `iglm.data`

Data Representation: `iglm.data`

Hints: Continuous Outcomes in `iglm`

What `iglm` Does