![]() ![]() The weaponization of social media platforms is an extreme example of what can happen when RL agents’ policies aren’t properly conceived, monitored, or constrained. But they did instruct managers to maximize the platforms’ growth and revenues, and the RL agents they devised to do just that brilliantly succeeded - with alarming consequences. Whatever you think of Facebook’s and Twitter’s leadership, they surely didn’t set out to create a strategy to sow discord and polarize people. This has important implications for social media, clearly, but the point is broadly applicable in any of an increasing number of business situations where RL agents interact with people. While it’s hard to change the behavior of humans in the human-AI system it’s a simpler matter to change an RL agents’ policies, the actions it can take in pursuing its own reward. Understanding how RL agents pursue their goals makes it clearer how they can be modified to prevent harm. If this process sounds familiar, it’s because it’s modelled on how our own brains work behavioral patterns ranging from habits to addictions are reinforced when the brain rewards actions (such as eating) taken during given states (e.g., when we’re hungry) with the release of the neurotransmitter dopamine or other stimuli. Through an iterative trial-and-error process, the agent gets better and better at maximizing its reward. As it learns what works best, it pursues that optimal strategy and abandons approaches that it found less effective. The agent is free to experiment within the bounds of its policies to see what combinations of actions and states (state-action pairs) are most effective in maximizing the reward. Policies broadly define how an RL agent can behave in different circumstances, providing guardrails of sorts. Combined, the agent’s reward, the states within which it operates and its set of permitted “actions” are called its “policies.” The algorithm’s state might be time of day. The agent’s permitted actions might include who to target and the frequency of promotions. Unlike algorithms that follow a rigid if/then set of instructions, RL agents are programmed to seek a specified reward by taking defined actions during a given “state.” In this case, the reward is views - the more the better. This type of algorithm is called a reinforcement learning (RL) agent and while these agents’ activities are perhaps most visible in social media they are becoming increasingly common throughout business. To understand the cause and effect cycle we see on social platforms, it’s helpful to know a bit more about how the algorithm works. ![]() As companies adopt it, leaders should look to social media companies’ problems to understand how it can lead to unintended consequences - and try to avoid making predictable mistakes. But social platforms aren’t the only ones who use reinforcement learning AI. It doesn’t take an AI expert to see where this leads: provocative posts that evoke strong emotions will get more views and so the algorithm will favor them, leading to ever-increasing revenues for the platform. When an algorithm promotes a given post and sees an upsurge of views, it will double down on the strategy, selectively timing, targeting and pushing posts in ways that it has found will stimulate further sharing, a process called reinforcement learning. ![]() Views are the algorithms’ “reward function” - the more views the algorithms can attract to the platform the better. For them, more views mean more money and so they’ve optimized their algorithms to maximize engagement. Social media firms claim they’re just trying to build communities and connect the world and that they need ad revenues to remain free. ![]()
0 Comments
Leave a Reply. |