- Diffusion Twigs with Loop Guidance for Conditional Graph Generation We introduce a novel score-based diffusion framework named Twigs that incorporates multiple co-evolving flows for enriching conditional generation tasks. Specifically, a central or trunk diffusion process is associated with a primary variable (e.g., graph structure), and additional offshoot or stem processes are dedicated to dependent variables (e.g., graph properties or labels). A new strategy, which we call loop guidance, effectively orchestrates the flow of information between the trunk and the stem processes during sampling. This approach allows us to uncover intricate interactions and dependencies, and unlock new generative capabilities. We provide extensive experiments to demonstrate strong performance gains of the proposed method over contemporary baselines in the context of conditional graph generation, underscoring the potential of Twigs in challenging generative tasks such as inverse molecular design and molecular optimization. 4 authors · Oct 31, 2024
- Growth of cancer stem cell driven tumors: staged invasion, linear determinacy, and the tumor invasion paradox We study growth of solid tumors in a partial differential equation model introduced by Hillen et al for the interaction between tumor cells (TCs) and cancer stem cells (CSCs). We find that invasion into the cancer-free state may be separated into two regimes, depending on the death rate of tumor cells. In the first, staged invasion regime, invasion into the cancer-free state is lead by tumor cells, which are then subsequently invaded at a slower speed by cancer stem cells. In the second, TC extinction regime, cancer stem cells directly invade the cancer-free state. Relying on recent results establishing front selection propagation under marginal stability assumptions, we use geometric singular perturbation theory to establish existence and selection properties of front solutions which describe both the primary and secondary invasion processes. With rigorous predictions for the invasion speeds, we are then able to heuristically predict how the total cancer mass as a function of time depends on the TC death rate, finding in some situations a tumor invasion paradox, in which increasing the TC death rate leads to an increase in the total cancer mass. Our methods give a general approach for verifying linear determinacy of spreading speeds of invasion fronts in systems with fast-slow structure. 1 authors · Oct 26, 2023
10 Axiomatic Preference Modeling for Longform Question Answering The remarkable abilities of large language models (LLMs) like GPT-4 partially stem from post-training processes like Reinforcement Learning from Human Feedback (RLHF) involving human preferences encoded in a reward model. However, these reward models (RMs) often lack direct knowledge of why, or under what principles, the preferences annotations were made. In this study, we identify principles that guide RMs to better align with human preferences, and then develop an axiomatic framework to generate a rich variety of preference signals to uphold them. We use these axiomatic signals to train a model for scoring answers to longform questions. Our approach yields a Preference Model with only about 220M parameters that agrees with gold human-annotated preference labels more often than GPT-4. The contributions of this work include: training a standalone preference model that can score human- and LLM-generated answers on the same scale; developing an axiomatic framework for generating training data pairs tailored to certain principles; and showing that a small amount of axiomatic signals can help small models outperform GPT-4 in preference scoring. We release our model on huggingface: https://huggingface.co/corbyrosset/axiomatic_preference_model 5 authors · Dec 2, 2023 1
- SSPO: Self-traced Step-wise Preference Optimization for Process Supervision and Reasoning Compression Test-time scaling has proven effective in further enhancing the performance of pretrained Large Language Models (LLMs). However, mainstream post-training methods (i.e., reinforcement learning (RL) with chain-of-thought (CoT) reasoning) often incur substantial computational overhead due to auxiliary models and overthinking. In this paper, we empirically reveal that the incorrect answers partially stem from verbose reasoning processes lacking correct self-fix, where errors accumulate across multiple reasoning steps. To this end, we propose Self-traced Step-wise Preference Optimization (SSPO), a pluggable RL process supervision framework that enables fine-grained optimization of each reasoning step. Specifically, SSPO requires neither auxiliary models nor stepwise manual annotations. Instead, it leverages step-wise preference signals generated by the model itself to guide the optimization process for reasoning compression. Experiments demonstrate that the generated reasoning sequences from SSPO are both accurate and succinct, effectively mitigating overthinking behaviors without compromising model performance across diverse domains and languages. 8 authors · Aug 18, 2025