You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When doing nested BIC search, do we need to update both U and V? (The current code holds V constant while optimizing over U and vice versa) Formally, in the BIC expression, v is a function of u_hat so it seems weird to not change v. On the flip side, we're just optimizing a regression (with constraint). My concern is that, if we're just solving the penalized regression problem and Xv is a sub-unit vector, then we can always choose u to be exactly Xv and then we get zero variance and stumble into a log(0) problem. (Or at least, that's what I got out of a quick skim. Please correct me if I'm wrong @Banana1530.)
For well-initialized V, this doesn't make a big difference so we can usually get away with it: in particular, if V is well initialized (maybe because we're already using a full grid or tuning parameters are small enough that the SVD is really close to correct), V_init and V_stationary point are close, so BIC(U, V_init) and BIC(U, V_stationary point) will be similar. That seems like too strong an assumption to make in practice though.
The below experiments indicate what I mean. If we set UPDATE_V = 0 but leave INITIALIZE_V_SVD = 1, we get saved by the fact that v_star and V1 are pretty close; but if we set both flags to 0, corresponding to a random initialization of v_hat without updating, things go poorly. If we have INITIALIZE_V_SVD = 0, but keep UPDATE_V = 1, we do better, but not nearly as well as using the SVD initialization.
UPDATE_V = 1; % Set to 0 to fix V at initialization value
INITIALIZE_SVD = 1; % Set to 0 to initialize U, V to random unit vector
n = 50; p = 25; s = 5;
u_star = [ones(s, 1); zeros(n - s, 1)]; v_star = [zeros(p - s, 1); ones(s, 1)]; d = 3;
N = randn(n, p);
S = d * u_star * v_star';
X = S + N;
[U, ~, V] = svd(X);
st = @(x, lambda) sign(x) .* max(abs(x) - lambda, 0);
% Initialize SFPCA
U1 = U(:, 1); V1 = V(:, 1);
% SFPCA search on lambda_U
% Keep v parameters fixed for now...
lambda_u_range = linspace(0, 5, 51);
n_lambda_u = size(lambda_u_range, 2);
sigma_hat_holder = zeros(size(lambda_u_range));
bic_holder = zeros(size(lambda_u_range));
df_holder = zeros(size(lambda_u_range));
for lu_ix=1:n_lambda_u
lu = lambda_u_range(lu_ix);
% Quick and dirty SFPCA - only doing sparsity in U
if INITIALIZE_SVD
u_hat = U1;
v_hat = V1;
else
u_hat = randn(n, 1); u_hat = u_hat / norm(u_hat);
v_hat = randn(p, 1); v_hat = v_hat / norm(v_hat);
end
u_hat_old = u_hat + 5000; v_hat_old = v_hat + 5000;
while norm(u_hat - u_hat_old) + norm(v_hat - v_hat_old) > 1e-6
while norm(u_hat - u_hat_old) > 1e-6
u_hat_old = u_hat;
u_hat = st(X * v_hat, lu);
u_hat = u_hat / norm(u_hat);
end
if UPDATE_V
while norm(v_hat - v_hat_old) > 1e-6
v_hat_old = v_hat;
v_hat = st(u_hat' * X, 0);
v_hat = v_hat / norm(v_hat);
v_hat = v_hat'; % Keep sizes correct
end
else
v_hat_old = v_hat;
end
end
sigma_hat_sq = mean((X * v_hat - u_hat).^2);
sigma_hat_holder(lu_ix) = sigma_hat_sq;
df_holder(lu_ix) = sum(u_hat ~= 0);
bic_holder(lu_ix) = log(sigma_hat_sq / n) + 1 / n * log(n) * sum(u_hat ~= 0);
end
[min_bic, min_bic_ind] = min(bic_holder);
lambda_u_optimal = lambda_u_range(min_bic_ind);
% Quick and dirty SFPCA - only doing sparsity in U
if INITIALIZE_SVD
u_hat = U1;
v_hat = V1;
else
u_hat = randn(n, 1); u_hat = u_hat / norm(u_hat);
v_hat = randn(p, 1); v_hat = v_hat / norm(v_hat);
end
u_hat_old = u_hat + 5000; v_hat_old = v_hat + 5000;
while norm(u_hat - u_hat_old) + norm(v_hat - v_hat_old) > 1e-6
while norm(u_hat - u_hat_old) > 1e-6
u_hat_old = u_hat;
u_hat = st(X * v_hat, lambda_u_optimal);
u_hat = u_hat / norm(u_hat);
end
if UPDATE_V
while norm(v_hat - v_hat_old) > 1e-6
v_hat_old = v_hat;
v_hat = st(u_hat' * X, 0);
v_hat = v_hat / norm(v_hat);
v_hat = v_hat'; % Keep sizes correct
end
else
v_hat_old = v_hat;
end
end
snr = max(svd(X)) / max(svd(N));
u_hat_supp = u_hat ~= 0;
u_star_supp = u_star ~= 0;
u_star_nonsupp = u_star == 0;
tpr = mean(u_hat_supp(u_star_supp));
fpr = mean(u_hat_supp(u_star_nonsupp));
My hunch is that the Update_V = 0, INIT_SVD = 1 case would suffer more on harder situations than this: the angle between V1(S + N) and v_star isn't very much for this problem.
The text was updated successfully, but these errors were encountered:
When doing nested BIC search, do we need to update both U and V? (The current code holds V constant while optimizing over U and vice versa) Formally, in the BIC expression,
v
is a function ofu_hat
so it seems weird to not changev
. On the flip side, we're just optimizing a regression (with constraint). My concern is that, if we're just solving the penalized regression problem andXv
is a sub-unit vector, then we can always chooseu
to be exactlyXv
and then we get zero variance and stumble into alog(0)
problem. (Or at least, that's what I got out of a quick skim. Please correct me if I'm wrong @Banana1530.)For well-initialized V, this doesn't make a big difference so we can usually get away with it: in particular, if V is well initialized (maybe because we're already using a full grid or tuning parameters are small enough that the SVD is really close to correct), V_init and V_stationary point are close, so BIC(U, V_init) and BIC(U, V_stationary point) will be similar. That seems like too strong an assumption to make in practice though.
The below experiments indicate what I mean. If we set
UPDATE_V = 0
but leaveINITIALIZE_V_SVD = 1
, we get saved by the fact thatv_star
andV1
are pretty close; but if we set both flags to 0, corresponding to a random initialization ofv_hat
without updating, things go poorly. If we haveINITIALIZE_V_SVD = 0
, but keepUPDATE_V = 1
, we do better, but not nearly as well as using the SVD initialization.Running 1000 replicates, I see
My hunch is that the
Update_V = 0, INIT_SVD = 1
case would suffer more on harder situations than this: the angle betweenV1(S + N)
andv_star
isn't very much for this problem.The text was updated successfully, but these errors were encountered: