SingularityU South Africa Summit 2021 Highlights: Benjamin Rosman - Lessons from training my Robot
Robotics and machine learning researcher Benjamin Rosman presented a keynote talk at the SingularityU South Africa Summit Online 2021. Here are a few key highlights from his presentation.
Lessons from training my robot
Benjamin Rosman
I've been working as a robotics and machine learning researcher for almost a decade. In that time, I have spent a lot of time thinking about how to get robots, to make better decisions. This includes for example, how does your robot go around in the world? How does it learn new skills, learn to make decisions and solve interesting problems?
Really the important aspect here is that the decisions come in sequences, and this could be at various levels. We could think at a high level where your robots trying to think, do I make coffee or do I make the bed? Which door should I open? How do I interact with a certain person? Or it could be a low level decisions such as I should move my elbow joint by 10 degrees. Historically, a lot of this was done via extensive engineering and thinking about the problems, coding precisely and exactly what the robot should be doing.
Obviously this isn't scalable to the complex modern robots that include so many joints that are very flexible and dynamic, and able to enact complex behaviours. It takes extensive time to design and code a behaviour, so we can't easily adapt this to new environments and new settings. Instead we rely on a branch of machine learning called reinforcement learning, and this is the way we get our robots to do all sorts of interesting and exciting things.
In this way, they learn through trial and error. The researchers working with the robots specify goals, and build these algorithms and methods that are based on the idea that the robot should try and figure out how to achieve the goals that we've specified. There is psychology involved. As the robot does different things, it is rewarded for good outcomes, and effectively punished for bad outcomes.
Essentially, we're trying to reinforce the behaviours that are desirable, so that the robots work towards the goals that we've set for them. The time that I have I spent thinking about how to get robots, to make better decisions, has led me to actually think about how do we make better decisions in our own lives as humans.
If a robot is trying to figure out how to do something, such as to pick up a coffee cup, it needs to do is try this in a whole lot of different ways. The reason is it might not know what the best way to do this would be, is that it needs to explore. It needs to try. It needs to experiment very much in the same way that a child does.
This is called exploration exploitation. Essentially, what it means is that when you're making a decision, you need to balance between two things. One is exploiting any knowledge you already have.
At the same time, we need to think about exploration, which is trying new things or things that I'm uncertain about. As I say, this is a balance you've got to trade off these two different concepts so that we're always thinking about exploration with trying new things and exploitation, leveraging the knowledge we already have.
The crux of this idea is we need to keep exploring, to learn the best options for any problem that we're thinking about as a human. Do you keep ordering the same pizza at your favourite restaurant, or do you try new things and perhaps find something even tastier? You'll never really know unless you keep going with exploration. In the robotics, we think about this problem all the time - trading off between these two factors.
Robots also make a whole lot of the decisions based on the information that they have available to them. They need the right kinds of information and the right amount of information to make useful decisions.
Do you keep using the same route to go to the airport, or do you ever check if there are new routes available? Or something happening on the route that may affect your trip today? Making your decision based on this additional information could actually give you more efficient decision-making with a more efficient solution to the problem.
This begs the question: do we collect as much information as we should about any decision we need to make, particularly paying attention to the variables that are important?
Another example is measuring your daily steps. Everyone's got smart watches and phones that do this all the time. Without that information, can you really quantify exactly how active or sedentary you are? And can you really take interventions to solve that? You might have a vague idea but you can only do this in a very detailed fine-grained way if you've collected the appropriate data. This becomes more and more relevant as technology advances, where you can diagnose various medical conditions or allergies to foods and all sorts of things like this.
You need data. One of the most critical aspects of decision-making in robots is to balance the goals that you're aiming for. Specifically, when a robot makes a decision, it's got to balance what's happening in the short-term versus what's happening in the long term. Again, you can equally ask yourself: are you making decisions based on the short term outcomes or are you looking at the long term benefits?
For example, do you stay in bed and watch TV? Or do you get up and go to university to get qualified, and ultimately get educated? One way that we deal with this problem by having society push you. We build these norms that you go to school every day to put yourself in this better position. We're aware of this in some situations, but generally humans as decision-makers are very bad at thinking about the long-term payoffs of the decisions that they make. What can we do about this?
There are a few strategies. We do this by building pros and cons lists for certain kinds of decisions we have to make, but this isn't quite the full story. It's no good saying, well, maybe this thing will happen, but maybe this thing will happen. What you really have to do is think about how likely are these different events to happen. However, we tend to not do that.
When we build our pros and cons list, we're going to think of some notion of the probability and also the relative benefits or costs of these different options. There's actually more to them than we usually think about, but perhaps an easier strategy to take is to build our habits beforehand. This is very much in line with the way we think about going to school or going to gym. It is useful to think about the value of creating a habit. Some very long-term goal might be difficult to instantiate immediately just based on the upfront costs of doing.
If you can think about it as a habit and then kickstart this habit on a regular basis, this can actually help you get over that hurdle in a similar way to the way we think about going to school and we have societal pressures around making that happen.
Understanding the different problems that appear in robot decision-making help us understand human decision-making. Additionally, this gives us an interesting insight into where AI can take us. We often think of AI artificial intelligence as a race to automate as much as possible or to build cool new technologies. Actually, some of the roots of artificial intelligence lie in an attempt by science to try and figure out what's happening in the human brain.
This incredibly complicated organ governs everything we do in our lives. We can think of robots as sandboxes to explore some of the fundamentals of our own intelligence in simulated environments, where we can run experiments. By looking at some of the problems they have, we can analyze them quantitatively and use those insights to help we understand our own minds. This can even give us a deeper, greater insight into how our own minds work and ultimately help us to live better lives.