qgallouedec/huggingface-static-fc6272-bucket
423 kB
[batch × seq × vocab] logits tensor before computing cross-entropy, which dominates peak memory at long context lengths. The new loss_type="chunked_nll" path drops ignored-label tokens before the lm_head matmul and computes cross-entropy in checkpointed chunks of 256.nll, and unlocks sequence lengths that don't fit at all under the standard path.SFTConfig(loss_type="chunked_nll")trl.experimental.openreward adapter plugs any environment speaking the [Open Reward Standard](https://openrewardstandard.io) protocol into any TRL trainer that takes an environment_factory. One string — a catalog name or a URL — wires the dataset, factory, and reward_func slots; tools are bound dynamically from JSON Schema, no per-env wrapper code:from trl import GRPOTrainer
from trl.experimental.openreward import OpenRewardSpec
spec = OpenRewardSpec("Eigent/SETA", num_tasks=64)
trainer = GRPOTrainer(
...,
train_dataset=spec.train_dataset,
environment_factory=spec.environment_factory,
reward_funcs=spec.reward_funcs,
)Compare two code files side‑by‑side
View project metrics with an interactive dashboard
View and monitor your data with an interactive dashboard
Track and visualize your project data on an interactive dashboard
Track and visualize your data with a real‑time dashboard
View and manage your tracking data with an interactive dashboard