Discussion about this post

User's avatar
Per Kraulis's avatar

An idea: Our Adaptive Layer may be more fragile than the deeper layers, since it was created most recently. If the brain experiences some external insult (alcohol, drugs, etc), then that will be the part of our mind that fails first. "In vino veritas", i.e. the "Press Secretary" is no longer responding to the questions, but the President is.

Neural Foundry's avatar

Fascinating paralell between LoRA and cognitive architecture. The catastrophic forgetting angle is especially relevant - I've worked with finetuning models and that exact problem of distributed knowledge makes selective unlearning basicly impossible. What's intresting is how this frame might inform alignment research when we're trying to add safety layers without breaking core capabilities.

5 more comments...

No posts

Ready for more?