Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
AbstractPhil 
posted an update about 22 hours ago
Post
69
I'll attempt to expand the geolip-clip to full sequence context window to encompass sequential learning.
AbstractPhil/geolip-clip-vit-large-patch14-ctx576
The memory pod is specifically meant to tune everything based on final state pooling, which is fine if you aren't trying to actually use sequential utility.
HOWEVER, there are many elemental biases that present themselves if attempting to USE the standard sequence of 77 in conjunction with this final pooled state. Even though the standard 77 is predominantly noise past token 10 it still houses considerable amounts of information in terms of utility, so this should be handled carefully. Zero-shot structures are a tricky structure to analyze, especially structures based on attention mechanisms instead of true sequential accumulation. I've noticed I need to watch them for quite a while before the real bugs show up.

As it stands the token pool is essentially [B, 7+8, 768] for pools. This contains a robust and highly complex representation of useful accumulated bidirectional attention data, so it's quite powerful.

I'll build a few prototypes and tap into some papers. I'll either come up with something or a reason why I didn't. The end result will either produce an anchor bank set of tokens [B, 15, 768] for pooling, or [B, 15, 77, 768] ideally - which should expand the sequence of the clip to 1,155 if successful. That doesn't necessarily mean this sequence will be more useful than the [b, 15, 768], but it will be representationally valid to the context window expansion.

I wouldn't hold out for a single full-sequence option in a single day, that's a lot of moving parts to analyze, not to mention highly impractical to train with. A smaller dose of this information would be necessary for rapid prototyping so it'll likely be packaged as such.

Well I spoke too soon. It's ready to play with.
AbstractPhil/geolip-clip-vit-large-patch14-ctx576-seq77

A little creativity allows me to extend the context window of sd15's unet fairly easily. Beyond the clip boundary, the current system can introduce additional details into the spectrum of the structure as-is.

image

It's highly unstable, but it can do some interesting things.

image

More than likely this isn't worth extending, sdxl has a clip-vit-g that can be extended however.

I'll be adding a sequence reconstructor and train a potential clip sequence reconstruction MSE predictor. Not really certain currently if I can accomplish this in a reasonable amount of time but maybe.
If it works, it could be pretty powerful.

image

Sequence cosine representation in relation to CLIP_L is forming, distilling the behavior using a distilled memory bank as a judicator, with a frozen clip-l sequence input data, with frozen modernbert leading in context behavioral adjustment.

head explode noise

Welp, sequence will be ready soon. It'll support modified 77 token spaces, rather than just a single pooled vector. The entire space will be slightly warped or modified depending on the input. Extended inputs trained clean into the sequence with nothing truncated.

image

https://youtu.be/XOnMNv_oQ4A?si=WoT4TEUkotST4uoB&t=60

There's no earthly way of knowing
Which direction we are going
There's no knowing where we're rowing
Or which way the river's flowing

Is it raining, is it snowing
Is a hurricane a-blowing

Not a speck of light is showing
So the danger must be growing
Are the fires of Hell a-glowing
Is the grisly reaper mowing

Yes, the danger must be growing
For the rowers keep on rowing
And they're certainly not showing
Any signs that they are slowing

The bigG trainer is unstable, I'm ironing out the overflows and nans.

Takes nearly an hour per epoch on the bigG trainer, so it's going to be a bit before it's ready. Didn't think this one would be such a problem.

In this post