A team of European researchers explores how to bring artificial neural networks closer to the energy efficiency of biological brains, according to a study published in the Journal of Statistical Mechanics: Theory and Experiment.
We all know that the more lottery tickets we buy, the higher our chances of winning. However, spending more than we could win is not a clever move. AI-powered deep learning follows similar principles: the larger a neural network is, the better it can learn a new task. However, making infinitely large networks is extremely inefficient.
Researchers have tried to imitate the highly efficient way biological brains operate. The aim is to train machines starting with simple examples and gradually progressing to more complex ones, in a model known as curriculum learning. This works well for small networks, but this seemingly sensible strategy is irrelevant for very large networks.
A team of UK researchers decided to investigate why this “failure” occurs, speculating that these large networks are so “rich” that they learn by following a path based more on the quantity of resources than quality when input is organised by increasing difficulty. This might be good news, as curriculum learning can be possible by adjusting the initial size of the network. This promises more resource-efficient, and therefore less energy-consuming, neural networks.
Neural networks are computational models comprising many “nodes” performing calculations. They resemble the networks of neurons in biological brains, which are capable of learning autonomously based on the input they receive. For example, neurons can “see” images and learn to recognise content without direct instruction.
It is well known that the larger a neural network is during the training phase, the more precisely it can perform the required tasks. This strategy is known as the Lottery Ticket Hypothesis, but it needs massive computing resources and energy.
Curiously, our brains can perform tasks requiring supercomputers a lot of resources and energy. The order in which we learn things may be the answer. “If someone has never played the piano and you put them in front of a Chopin piece, they’re unlikely to make much progress learning it,” explained Dr Luca Saglietti, a physicist at Bocconi University in Milan. “Normally, there’s a whole learning path spanning years, starting from playing ‘Twinkle Twinkle Little Star’ and eventually leading to Chopin.”
In contrast, the most common way to train neural networks is to feed them input randomly into mighty large networks. Once the network has learned, it is possible to reduce the number of parameters—even lower than 10% of the initial amount—because they are no longer used. However, if you start with only 10% of the parameters, the network fails to learn. So, while an AI might eventually fit into our phone, during training, it requires massive servers.
Scientists have wondered whether gradual learning (like the human brain) could save resources. Previous research suggests that curriculum learning is irrelevant and does not improve training for very large networks.
Saglietti and his team were keen to understand why. “What we’ve seen is that an overparameterized neural network doesn’t need this path because, instead of being guided through learning by examples, it’s guided by the fact that it has so many parameters—resources that are already close to what it needs,” explained Dr Saglietti. In other words, even if you offer it optimized learning data, the network prefers to rely on its vast processing resources, finding parts within itself that, with a few tweaks, can already perform the task.
This is actually good news for AI networks. This means that networks can take advantage of curriculum learning with the correct number of initial parameters. In theory, starting with smaller networks and adopting curriculum learning is possible. “This is one part of the hypothesis explored in our study,” concluded Dr Saglietti. “At least within the experiments we conducted, we observed that if we start with smaller networks, the effect of the curriculum—showing examples in a curated order—begins to show improvement in performance compared to when the input is provided randomly. This improvement is greater than when you keep increasing the parameters to the point where the order of the input no longer matters.”
Minnelli S, te al (2024) Tilting the Odds at the Lottery: the Interplay of Overparameterisation and Curricula in Neural Networks. Stat. Mech. (2024) 114001DOI 10.1088/1742-5468/ad864b