ADMM for Generalized Lasso Inference
L1 penalized inference is central to the field of compressed sensing. I derive an ADMM solver for the generalized Lasso problem and discuss three important special cases; standard lasso, variable fusion, and fused lasso. Further, I provide python code and demonstrate it on a simulated dataset.
Sparsity plays an important role in machine learning, statistics, and signal processing. In the basic set up, we are given a noisy observation
where
Deriving an ADMM algorithm
An overview of the ADMM framework can be found here . We can rewrite the generalized Lasso into ADMM form by introducing a variable
The corresponding scaled Augmented Lagragian is given by
Note that introducing
Updating
The standard ADMM update for
To obtain the RHS, we set the partial derivative of equation
Now rearrange to collect terms with
Factorize and solve for
Updating
Next, we find the ADMM update for
Similarly, we set the partial derivative of
Immediately, we encounter the non-differentiable
For
Collecting all terms with
When
When
which can be rearranged to get the form
In words,
More compactly this can be written using the soft-threshold operator
Updating
Given the updated
Special Cases of the Generalized Lasso
For different choices of
Standard Lasso
When
This model is commonly used in compressed sensing where we are interested in sparse signal recovery, and can be used as a convex relaxation for
Variable Fusion
When
we obtain the variable fusion model [3] which corresponds to the problem
This model is used when we expect the ordering of the weights to have smooth structure.
Fused Lasso
When
we obtain the Fused Lasso model [4] which combines both a sparsity and smoothness penalty.
Demo
I illustrate the impact of the various penalties mentioned earlier on the inference results. Noisy observations are generated from a random dictionary using smooth and sparse weights. For a given set of hyperparameters, we obtain the following ADMM solutions
The standard lasso problem encourages shrinkage towards zero of each weight independently. While it accurately captures certain regions with active support, adjacent weights can have large variance. On the other hand, the variable fusion model prioritizes smooth patterns and effectively identifies distinct areas of smoothness. However, since its penalty does not promote zero shrinkage, it incorrectly identifies that the entire support is active. The fused Lasso model excels in accurately identifying both smoothness and the active support set. Importantly, note that solutions across all models exhibit a bias towards lower magnitudes than the actual weights. This is because
Code
This experiment can be replicated with the code below and is also available here
import numpy as np
import numpy.random as npr
def soft_threshold(x, thresh):
return np.sign(x)* np.max([np.abs(x)-thresh, np.zeros(len(x))], axis=0)
def ADMM_generalized_lasso(y, D, F, rho, lam, n_iters=100):
# y: (m,) array. observation
# D: (m, n) array. dictionary
# F: (k, n) array. constraint matrix
# rho: augmented lagrange multiplier
# lam: lagrange multiplier
n = len(D.T)
# random initialization
w = npr.randn(n)
u = npr.randn(len(F))
z = npr.randn(len(F))
FtF = F.T @ F
for i in range(n_iters):
w = np.linalg.lstsq(D.T @ D + rho * FtF, D.T @ y + rho * F.T @ (z-u), rcond=None)[0]
z = soft_threshold(F @ w + u, lam/rho)
u = u + F @ w - z
return w
from sklearn.datasets import make_sparse_coded_signal
import matplotlib.pyplot as plt
n, m = 50, 100
n_nonzero_coefs = 10
# generate dictionary
_, D, _ = make_sparse_coded_signal(
n_samples=1,
n_components=n,
n_features=m,
n_nonzero_coefs=n_nonzero_coefs,
random_state=1,
)
D = D.T
# generate structured coefficients
np.random.seed(1)
w_true = np.zeros(n)
for i in range(5):
ix = np.random.choice(n)
length = np.random.randint(5,10)
w_true[ix:ix+length] = npr.randn()
# generate noisy observations
y_true = D @ w_true
y = y_true + npr.randn(m)*0.2
# define hyperparameters
rho = 0.3 # augmentation multiplier
lam = 0.5 # general multiplier for L1
lam2 = 0.3 # multipler for sparsity in Fused Lasso
# construct F matrices
F_Lasso = np.eye(n) # Lasso solution
F_fusion = (np.diag(np.ones(n),k=1)[:-1,:-1] + np.diag(np.ones(n)*-1,k=0))[:-1] # Fusion Penalty solution
F_fusedLasso = np.concatenate([np.diag(np.ones(n)/lam*lam2), F_fusion]) # Fused Lasso solution
# compute ADMM solution
w_lasso = ADMM_generalized_lasso(y, D, F_Lasso, rho, lam)
w_fusion = ADMM_generalized_lasso(y, D, F_fusion, rho, lam)
w_fusedLasso = ADMM_generalized_lasso(y, D, F_fusedLasso, rho, lam)
# Plot
plt.figure(figsize=(6,3))
plt.plot(w_lasso, label="Lasso")
plt.plot(w_fusion, label="Fusion")
plt.plot(w_fusedLasso, label="Fused Lasso")
plt.plot(w_true, 'k--', label="true")
plt.ylabel("Weight")
plt.xlabel("index")
plt.legend()
plt.show()
- Ryan J Tibshirani. “The solution path of the generalized lasso”. 2011.
- Robert Tibshirani. “Regression shrinkage and selection via the lasso”. Journal of the Royal Statistical Society Series B: Statistical Methodology, 1996.
- S Land, and J Friedman. “Variable fusion: a new method of adaptive signal regression”. Technical Report, Department of Statistics, Stanford University, 1996.
- Robert Tibshirani, Michael Saunders, Saharon Rosset, Ji Zhu, and Keith Knight. “Sparsity and smoothness via the fused lasso”. Journal of the Royal Statistical Society Series B: Statistical Methodology, 2005.