Don't Believe the H(AI)pe

This part of the year is a time of new beginnings, looking forward, and thinking about Spring. Some people use this time to start new regimens, others begin new projects. Often, people take time to develop new hobbies or plan for their summer holidays. What do I do in the bleak midwinter? I wait for the inevitable Boston Dynamics robot videos so I can dunk on them.

What can I say? Tradition matters.

This year, we’re treated to a new hominid-robot skillset video (but, sadly, without any upbeat pop musical overlay). Once again, you have to hand it to BD: they’ve managed to create a video where you can’t quite explain why, but the vibes are way off. Take a look:

https://youtu.be/-e1_QhJ1EhQ

My favourite part is where the robot throws the bag at the human.

There are a number of reasons why we should read more into this video than meets the eye. First of all, why are they showing this to us? Why make these videos publicly available and show off to the public at large? When Northrop Grumman develops a new supervillain weapon, they don’t post it to Instagram and say “#blessed.” It's not like the average person is ever going to buy one of these robots, so what gives?

Let me answer that question with another question. What kind of response do these videos normally get? Unsurprisingly, there's lots of engagement and debate. Most importantly, public awareness grows. Once again, why does that matter? If, as we said, the average person isn't going to buy a robot, what's the benefit of all of this PR? Is there any organisation who would be incentivised to buy Atlas robots and would want the public to be aware of the type of things the robots can do?

We really can't be shocked that law enforcement and the military are the primary customers for these robots -- after all, they've always been BD's most important clients. Did we mention that the entire Atlas robot program began at DARPA, the same group that both started the Internet (yay!) and researches remote-controlled insects (oh no!).

(Nobody's) Watching the Watchers

Now you might be saying that "surely they're building in appropriate controls" and that I'm just being excessively nervous. Maybe. But let's think about a parallel track in the automated behaviour space and conduct a thought experiment. These robots are, certainly, just a short way away from having an integrated speech and command function. That is, rather than the hard-coded acts you see in BD's carefully choreographed videos, robots would instead respond to spoken commands through specialised prompts.

The best example of what this might look like is ChatGPT -- the now-famous generative text tool that responds to queries with human-like speech and has become an instant meme-able media darling. There are two primary facts that you have to understand about large language models (LLMs) like ChatPT.

1. They can only respond based on the data they contain

2. They are not people, and so their answers come either from their training or because they were hard coded.

Those two facts are what underlay some of the justification for how we use AI/ML systems or robots, or how we pair them together. If you have good governance or controls, the robots/AI can't go off the rails and make bad choices, or contradict itself, or lie.

To test this out, the litigator in me couldn't resist subjecting ChatGPT to a little deposition questioning to see how it would respond to a directed, difficult line of inquiry. Ideally, the answers would have been consistent, or would have included "I don't know" when the answer was "I don't know." The results are...interesting.

So here we see a few problems. First, the supposedly closed-set universe of data that ChatGPT relies on isn't quite so closed. Otherwise, how could it have known that Elon Musk was, today, the CEO of Twitter when in 2021, he wasn't. You can see the response when I pressed this point, which amounts to "uh..yes...I, uh...yes it's not him, it's uh...Jack Dorsey!" Please let me tell you that, in real life, giving that kind of an answer to a lawyer taking a deposition or cross examining you is like waving a red flag at a bull and then insulting the bull's mother.

When I pressed, the contradictions and excuses piled up until it was obvious that someone had not only allowed additional data into the ChatGPT dataset, but that someone had also hard coded the response that I eventually got. At that point, it was clear that there would be some value to investigating the integrity of the training data, hence the Wikipedia question. Once again, suspicions confirmed: there's access to Wikipedia, or at least there had been at one point. Then, when I followed up, the responses were varying degrees of contradiction, denials, and misstatement. Lies, damned lies, and statistical models.

You Made the Chatbot Cry. Are You Happy Now?

What was the point of all this? To demonstrate that, even when there are putative limits and rules that restrict what an AI-based system can do, you can never be sure. And that indeterminacy, that doubt, is what's often missing from discussions about advanced technologies and systems. That a chatbot can sound convincing in the individual instance does not mean it is reliable across cases, nor that it has the ability to understand when it is being untruthful. And that a robot can safely toss a bag of tools to a human once does not imply that the robot will not engage in unanticipated activities based on uncontrolled circumstances. Put those together and the opportunities for mistakes or, let's be honest, tragedy, compound logarithmically.

The solution? There's no solution. There is no answer to this problem that will solve for all cases. If there were, we'd have invented it by now. Instead, the only way to address this problem is a consistent, critical inquiry into what's being done, how, why, and on whose behalf. It's only the critical approach that yields the kind of revelatory answers you saw in my questioning above, and that's precisely the kind of intensity we need when talking about deploying automated systems at scale.

The stakes, for now, are just about laughs and snarky blog posts. But in the real world, when these systems are deployed and at large (and they're going to be, soon), the stakes couldn't be higher.

Don't Believe the H(AI)pe

Join the Waitlist

Join the Conversation