WiggleGPT

A 124M parameter transformer that challenges a 56-year-old assumption in neural network design.

What Makes It Different?

Since Minsky and Papert's Perceptrons (1969), neural networks have relied on monotonic activation functions (Sigmoid, ReLU, GELU) — requiring multiple hidden layers to solve non-linearly separable problems like XOR.

WiggleGPT replaces monotonic activations with learnable oscillating functions, enabling single neurons to create multiple decision boundaries:

f(x) = sin(ωx + φ) · tanh(x) + baseline

Where ω (frequency) and φ (phase) are learnable per-neuron parameters.

Results

Model	Parameters	Val Loss	Notes
WiggleGPT	124M	3.1621	Oscillating activation
GPT-2	124M	~3.12	Standard GELU baseline

Within 1.3% of GPT-2 performance — proving oscillating activations are a functional drop-in replacement at scale.

The Model Actually Learned to Oscillate

Parameter	Init	After Training	Change
ω mean	1.0	1.096	+9.6%
ω std	0.1	0.602	6× increase
ω range	[0.8, 1.2]	[-0.19, 5.17]	Massive expansion

95% of neurons retained active oscillation (ω > 0.1)
Some neurons learned frequencies up to ω = 5.17 (five oscillations per unit input)
Full phase coverage [-π, +π] after training

Checkpoints

File	Description
`ckpt_pretrain.pt`	Base model trained on OpenWebText (~600k iterations)
`ckpt_finetune.pt`	Fine-tuned on SmolTalk2 (instruction following)

Architecture

Component	Specification
Parameters	123,697,920
Layers	12
Attention Heads	12
Embedding Dimension	768
Oscillating Neurons	36,864 (each with learnable ω, φ, baseline)
Normalization	RMSNorm
Position Encoding	RoPE (Rotary)
Attention	Flash Attention (when available)

Usage

See the GitHub repository for full training, inference, and chat scripts.

# Quick inference example
import torch
from model_bio import GPT, GPTConfig

# Load checkpoint
checkpoint = torch.load('ckpt_pretrain.pt', map_location='cuda')
config = GPTConfig(**checkpoint['config'])
model = GPT(config)
model.load_state_dict(checkpoint['model'])
model.eval()

# Generate text (see sample_bio.py for full implementation)

Training Details

Pretraining:

Dataset: OpenWebText (~9B tokens)
Iterations: 600,000
Hardware: RTX 3070 (steps 0–354k) → RTX 5060 Ti 16GB (steps 354k–600k)
Time: Roughly 20 days total (~15 days on 3070, ~5 days on 5060 Ti)

Fine-tuning:

Dataset: SmolTalk2 (406K examples)
Oscillation parameters (ω, φ) remained stable — 0.0% of neurons shifted by >0.1

Citation

@software{wigglegpt2025,
  author = {O'Brien, Phillip C.},
  title = {WiggleGPT: Revisiting the Monotonicity Assumption in Neural Networks via Oscillating Activation Functions},
  year = {2025},
  url = {https://github.com/Eden-Eldith/WiggleGPT}
}

Author

Eden (Phillip C. O'Brien)
Independent AI Researcher | ORCID: 0009-0007-3961-1182

Built in a garage lab in Gosport, UK. No academic affiliation, no institutional funding — just curiosity and an RTX 3070.

License

GPL-3.0 — if you build on this, keep it open source.

Downloads last month: -; Downloads are not tracked for this model. How to track

edeneldith
/

WiggleGPT