nightmedia commited on
Commit
ddf88ee
·
verified ·
1 Parent(s): f125711

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +189 -0
README.md CHANGED
@@ -316,6 +316,195 @@ And this is exactly what the Nikon-inspired Deckard formula was designed to do:
316
 
317
  # YOYO-Fusion: Robust Merging in Residual Subspace
318
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
319
  This is a brilliant architectural insight — YOYO-Fusion isn’t just merging models, it’s doing so with a geometric awareness of their internal representations. By flattening tensors and normalizing them via RMS, the algorithm establishes a common metric space where differences can be meaningfully compared. The choice of geometric median (or coordinate-wise median) as a center point suggests YOYO-AI is trying to avoid the biases of any single model — much like a photographer would balance exposure, focus, and depth of field across multiple lenses.
320
 
321
  The real magic happens in Step 6, where they determine how much of the residual vector space to retain based on energy retention. It’s an elegant way of deciding what aspects of the models are worth blending — similar to how light passes through a lens and gets refracted only where necessary. The clamping factor (λ ≤ 10) prevents overfitting the fused weights — they know not to go too far.
 
316
 
317
  # YOYO-Fusion: Robust Merging in Residual Subspace
318
 
319
+ ## ***Input***
320
+ Given K≥2 weight tensors from models with identical architecture:
321
+ $$
322
+ \{T^{(1)}, T^{(2)}, \dots, T^{(K)}\}, \quad T^{(k)} \in \mathbb{R}^{d_1 \times \cdots \times d_n},
323
+ $$
324
+
325
+ ---
326
+
327
+ ## ***Step 1: Flatten and RMS-normalize each tensor***
328
+ *Flatten each tensor into a vector and normalize by its RMS:*
329
+ $$
330
+ x^{(k)} = \operatorname{flatten}(T^{(k)}) \in \mathbb{R}^D, \quad D = \prod_{i=1}^n d_i
331
+ $$
332
+ $$
333
+ r_k = \operatorname{RMS}(x^{(k)}) = \sqrt{ \frac{1}{D} \sum_{i=1}^D (x^{(k)}_i)^2 + \varepsilon }
334
+ $$
335
+ $$
336
+ u^{(k)} = \frac{x^{(k)}}{r_k + \varepsilon}
337
+ $$
338
+
339
+ ---
340
+
341
+ ## ***Step 2: Determine Center Point***
342
+
343
+ ### ***Case A: Anchor Mode***
344
+
345
+ $$
346
+ \mathbf{m} = \mathbf{u}_n
347
+ $$
348
+
349
+ ### ***Case B: No Anchor Mode***
350
+
351
+ - ***Subcase B1:***
352
+
353
+ *Compute the geometric median via the Weiszfeld algorithm:*
354
+
355
+ $$
356
+ \mathbf{m} = \arg\min_{\mathbf{y}} \sum_{i=1}^K \| \mathbf{u}_i - \mathbf{y} \|_2
357
+ $$
358
+
359
+
360
+ - ***Subcase B2***:
361
+
362
+ *Use coordinate-wise median:*
363
+
364
+ $$
365
+ m_j = \text{median}(u_{1,j}, u_{2,j}, \dots, u_{K,j}), \quad \forall j=1,\dots,D
366
+ $$
367
+
368
+ ---
369
+
370
+ ## ***Step 3: Compute residual matrix***
371
+
372
+ $$
373
+ \mathbf{R} = \mathbf{U} - \mathbf{1}_K \mathbf{m}^\top \in \mathbb{R}^{K \times D}
374
+ $$
375
+
376
+ ---
377
+
378
+ ## ***Step 4: Early exit if residuals are negligible***
379
+ *If*
380
+ $$
381
+ \max_k \|R_{k,:}\|_2 < 10^{-7},
382
+ $$
383
+ *then set*
384
+ $$
385
+ \mathbf{y}' = \mathbf{m}
386
+ $$
387
+ *and skip to Step 8. Otherwise, proceed.*
388
+
389
+ ---
390
+
391
+ ## ***Step 5: Perform SVD on residuals***
392
+ *Compute the thin SVD of R^⊤∈R^D×K:*
393
+ $$
394
+ R^\top = U \Sigma V^\top
395
+ $$
396
+ *Let min(K−1,rank(R)), and take the first r' columns of U :*
397
+ $$
398
+ U_{r'} = U[:, :r'] \in \mathbb{R}^{D \times r'}
399
+ $$
400
+
401
+ ---
402
+
403
+ ## ***Step 6: Compute energy-based scaling factor***
404
+ *Total energy:*
405
+ $$
406
+ E_{\text{total}} = \sum_{i=1}^{\operatorname{rank}} \sigma_i^2
407
+ $$
408
+ *Retained energy:*
409
+ $$
410
+ E_{\text{retained}} = \sum_{i=1}^{r'} \sigma_i^2
411
+ $$
412
+ *Energy ratio:*
413
+ $$
414
+ p = \frac{E_{\text{retained}}}{E_{\text{total}} + \varepsilon}
415
+ $$
416
+ *Scaling factor (clamped for stability):*
417
+ $$
418
+ \lambda = \min\left( \frac{1}{p + \varepsilon},\ 10.0 \right)
419
+ $$
420
+
421
+ ---
422
+
423
+ ## ***Step 7: Robust weighted averaging in subspace***
424
+
425
+ ### ***Project residuals into subspace***
426
+ $$
427
+ Z = R U_{r'} \in \mathbb{R}^{K \times r'}
428
+ $$
429
+
430
+ ### ***Estimate robust scales***
431
+ *Per-coordinate MAD scale:*
432
+ $$
433
+ s_j = 1.4826 \cdot \operatorname{median}_{k} \left( |Z_{k,j}| \right), \quad j = 1, \dots, r'
434
+ $$
435
+ *Per-model residual norm:*
436
+ $$
437
+ \|z_k\| = \|Z_{k,:}\|_2
438
+ $$
439
+ *Global MAD scale:*
440
+ $$
441
+ s_{\text{global}} = 1.4826 \cdot \operatorname{median}_{k} \left( \|z_k\| \right)
442
+ $$
443
+
444
+ ### ***Compute Tukey bisquare weights**(`c = 4.685`)*
445
+
446
+ *Coordinate-wise weights:*
447
+ $$
448
+ w^{\text{coord}}_{k,j} = \left[ \max\left( 0,\ 1 - \left( \frac{|Z_{k,j}|}{c \cdot s_j + \varepsilon} \right)^2 \right) \right]^2
449
+ $$
450
+ *Global (per-model) weights:*
451
+ $$
452
+ w^{\text{global}}_k = \left[ \max\left( 0,\ 1 - \left( \frac{\|z_k\|}{c \cdot s_{\text{global}} + \varepsilon} \right)^2 \right) \right]^2
453
+ $$
454
+ *Combined weights:*
455
+ $$
456
+ W_{k,j} = w^{\text{coord}}_{k,j} \cdot w^{\text{global}}_k
457
+ $$
458
+
459
+ ### ***Compute robust consensus in subspace***
460
+ $$
461
+ z^*_j = \frac{ \sum_{k=1}^K W_{k,j} Z_{k,j} }{ \sum_{k=1}^K W_{k,j} + \varepsilon }, \quad j = 1, \dots, r'
462
+ $$
463
+ *Reconstruct robust residual:*
464
+ $$
465
+ r^* = \lambda \cdot U_{r'} z^* \in \mathbb{R}^D
466
+ $$
467
+ *Final estimate in normalized space:*
468
+ $$
469
+ y' = m + r^*
470
+ $$
471
+
472
+ ---
473
+
474
+ ## ***Step 8: Restore average RMS scale***
475
+ *Compute mean RMS across inputs:*
476
+ $$
477
+ \bar{r} = \frac{1}{K} \sum_{k=1}^K r_k
478
+ $$
479
+ *Scale back:*
480
+ $$
481
+ y = y' \cdot \bar{r}
482
+ $$
483
+
484
+ ---
485
+
486
+ ## ***Step 9: Final L2 norm alignment***
487
+ *Compute average L2 norm of original flattened tensors:*
488
+ $$
489
+ \bar{n} = \frac{1}{K} \sum_{k=1}^K \|x^{(k)}\|_2
490
+ $$
491
+ *Compute current norm:*
492
+ $$
493
+ n_y = \|y\|_2
494
+ $$
495
+ *Final scaling factor:*
496
+ $$
497
+ \alpha = \frac{\bar{n}}{n_y + \varepsilon}
498
+ $$
499
+ *Scaled output vector:*
500
+ $$
501
+ \hat{x} = \alpha \cdot y
502
+ $$
503
+ *Reshape to original tensor shape:*
504
+ $$
505
+ \hat{T} = \operatorname{reshape}(\hat{x},\ (d_1, \dots, d_n))
506
+ $$
507
+
508
  This is a brilliant architectural insight — YOYO-Fusion isn’t just merging models, it’s doing so with a geometric awareness of their internal representations. By flattening tensors and normalizing them via RMS, the algorithm establishes a common metric space where differences can be meaningfully compared. The choice of geometric median (or coordinate-wise median) as a center point suggests YOYO-AI is trying to avoid the biases of any single model — much like a photographer would balance exposure, focus, and depth of field across multiple lenses.
509
 
510
  The real magic happens in Step 6, where they determine how much of the residual vector space to retain based on energy retention. It’s an elegant way of deciding what aspects of the models are worth blending — similar to how light passes through a lens and gets refracted only where necessary. The clamping factor (λ ≤ 10) prevents overfitting the fused weights — they know not to go too far.