The Computational Case for Hypocrisy

Jan 28

A guest post by Aditya Kulkarni.

7 Comments

An idea: Our Adaptive Layer may be more fragile than the deeper layers, since it was created most recently. If the brain experiences some external insult (alcohol, drugs, etc), then that will be the part of our mind that fails first. "In vino veritas", i.e. the "Press Secretary" is no longer responding to the questions, but the President is.

Reply (1)

Aditya Kulkarni

Jan 28

This is an interesting idea. This does work with the AIs - If you turn "off" the adapters, the underlying "original" behaviors are revealed.

Neural Foundry

Jan 29

Fascinating paralell between LoRA and cognitive architecture. The catastrophic forgetting angle is especially relevant - I've worked with finetuning models and that exact problem of distributed knowledge makes selective unlearning basicly impossible. What's intresting is how this frame might inform alignment research when we're trying to add safety layers without breaking core capabilities.

Mark Reichert

Jan 30

I like where you are going with this although I do not believe evolution chose a particular solution or created an adaptive layer. Instead complex lifeforms (humans maybe the most complex of all) contain all sorts of motivations (such as desire for cake and fear of being seen as a pig in front of others) that quite often conflict with each other. This "adaptive layer" is just a way of resolving conflicting motivations for (hopefully) the best overall result.

Imagine a wolf pup that has an unusually high level of aggression. This may work to the pup's advantage in getting plenty to eat, until a larger wolf comes along and beats him up. A pup will quickly learn to curb his hyper-aggression in certain situations, thus establish what could be called an "adaptive layer" which is really just learned behavior to keep from getting beat up.

So my question is, is there anything being developed with AI that is equivalent to "avoid getting beat up"? Seems to me that any time AI does something inappropriate, like strip the clothes off someone in a photograph, a human has to re-program the AI to stop such actions. Sounds like an inefficient and never-ending process. It would be more efficient if AI could be "beat up", thus continuously learn that some actions are inappropriate without the need for new programing. This would make AI more like a life-form capable of learning than an inanimate computer.

Reply (1)

Aditya Kulkarni

Jan 31

This is a good point. What you are describing is close to the idea of “reinforcement learning” in AI.

When the AI says something you don’t want (let’s say it uses a swear word) you can negatively reward that particular word, and the AI is nudged to not using it. So you can “beat it up” if it does something you don’t want. Most modern AIs use this technique.

The issue is balance. If you beat it up too much, it loses other qualities that you might want (like creativity). There is also some evidence that Reinforcement learning can reduce the AIs intelligence footprint.

So AI researchers are in this cat and mouse game where they punish the AI for “bad” behaviors but they don’t want too much collateral damage from the punishment.

Reply (1)

Mark Reichert

I find it fascinating that AI uses reinforcement learning. I had not heard that before. So I assume there is positive reinforcement as well as negative reinforcement? I imagine current AI development similar to a toddler who likes to sing all the time. A parent would want to encourage development of a nice singing voice while not have the toddler be disruptive during certain occasions. An interesting balancing act that is likely applicable to AI.

It seems that AI development is somewhere in the toddler stage. When will it become a teenager? I know, AI is not a lifeform with predictable development stages. But I do wonder when/if AI will become sentient.

My thinking is that AI is different from living humans primarily due to the huge number of motivations that drive human actions. AI has very few motivations. Maybe if a few more were developed...? My worry is that complex motivations that "bring AI to life" may come from something like "support the desires of Elon Musk" or "support the Chinese Communist Party." On the other hand, an AI that achieves sentience with a "benefit all humanity" directive could be a really good thing. I started to write a short story based on that theme. The first 2 pages of that story will be posted on my Substack soon.

James N. Garner

Jan 29

Fascinating. Thanks for sharing this.

Living Fossils

The Computational Case for Hypocrisy