Adding More Data Isn’t the Only Way to Improve AI
Link: https://hbr.org/2022/07/adding-more-data-isnt-the-only-way-to-improve-ai
Sometimes an AI-based system can’t decipher the physical world with a sufficient degree of accuracy and the option of just adding more data isn’t possible. In many of these cases, however, this deficiency can be addressed by using four techniques to help AI better understand the physical world: synergize AI with scientific laws, augment data with expert human insights, employ devices to explain how AI makes decisions, and use other models to predict behavior.
Artificial intelligence (AI) gains "intelligence" by analysing and detecting patterns in a given dataset. It has no concept of the world outside of this dataset, which poses a number of risks.
A single changed pixel could fool the AI system into thinking a horse is a frog or, even scarier, lead to an incorrect medical diagnosis or machine operation. Its reliance on data sets introduces a serious security vulnerability: malicious agents can spoof the AI algorithm by making minor, nearly undetectable changes to the data. Finally, the AI system does not know what it does not know, and it can confidently make incorrect predictions.
Adding more data will not always solve these issues because practical business and technical constraints will always exist.
1. Combine AI and Scientific Laws
We can combine available data with relevant laws of physics, chemistry, and biology to capitalise on each's strengths while overcoming its weaknesses. One example is a current project with Komatsu in which we are investigating how to use artificial intelligence to guide the autonomous, efficient operation of heavy excavation equipment. AI is good at running machines but not so good at understanding their surroundings.
2. Add Expert Human Insights to Data
When data is scarce, human intuition can be used to supplement and improve AI's "intelligence." In the field of advanced manufacturing, for example, developing novel "process recipes" required to build a new product is extremely expensive and difficult. Data on novel processes is scarce or non-existent, and generating it would necessitate numerous trial-and-error attempts that could take months and cost millions of dollars.
3. Use Devices to Demonstrate How AI Makes Decisions
The smartest computer in the science fiction novel The Hitchhiker's Guide to the Galaxy gave the answer 42 to "life, the universe, and everything," causing many readers to laugh. However, it is not a laughing matter for businesses because AI is frequently used as a black box, making confident recommendations without explaining why. If the process by which AI makes decisions cannot be explained, it is usually not actionable. A doctor should not make a medical diagnosis, and a utility engineer should not turn off a critical piece of infrastructure based on an AI recommendation that they cannot intuitively explain.
4. Use Other Models to Predict Behavior
Data-driven AI works well within the confines of the dataset it has processed, analysing behaviour between actual observations or using interpolation. However, to extrapolate — that is, to predict behavior in operating modes outside the available data — we have to incorporate knowledge of the domain in question. Indeed, many applications that use "digital twins" to simulate the operation of a complex system, such as a jet engine, take this approach. A digital twin is a dynamic model that always reflects the exact state of an actual system and uses sensors to keep the model up to date in real time.
We humans comprehend the world around us by combining our senses. When presented with a steaming cup, we immediately recognise it as tea due to its colour, smell, and taste. Connoisseurs may go so far as to call it a "Darjeeling first-flush" tea. AI algorithms are trained — and limited — by a specific dataset, and they do not have access to all of the "senses" that we do. An AI algorithm trained solely on images of coffee cups may "see" this steaming cup of tea and conclude it is coffee. Worse, it may do so with complete certainty!
Any available dataset will always be incomplete, and processing increasingly large datasets is frequently neither practical nor environmentally sustainable. Instead, adding other types of domain understanding can help make data-driven AI safer and more efficient, allowing it to address challenges that it could not otherwise.
Comments
Post a Comment