research
What we are learning.
Field notes, experiments, and working theories on how models behave inside real systems.
A matched four-run comparison of `Qwen 3.5 27B` and `Gemma 4 31B` shows that both families recover the same two colour axes under controlled short response extraction, but they organise that control differently across depth. Qwen is late-stack and prompt-stable. Gemma keeps a mid-stack band active and sharpens the colour axes during rollout.
A research note on whether inference-time persona steering can push Qwen 3.5 9B into colour-matched working modes that perform better on the right tasks. In a 40-task cross-colour benchmark, matched personas outperformed both the base model and mismatched personas on every task family, with the clearest steering gains in red and green.