It’s something else.
Recently, there’s been a lot of talk about AI, or artificial intelligence. With the rise of online services to generate images or content in response to a simple textual prompt, the news has been full of both breathless reports and samples that run the range of impressive to cringe-worthy.
The problem? What most people are referring to as AI isn’t really AI at all.
Yes, I’m being somewhat pedantic, but I think it’s important to understand what’s really happening so as to set expectations — and perhaps blame — appropriately.
Become a Patron of Ask Leo! and go ad-free!
What artificial intelligence isn't.
Most of what’s being labeled as AI, or artificial intelligence, isn’t AI at all, but rather ML, for machine learning. ML is nothing more than software that analyzes huge collections of data for characteristics that can then be used by other software to perform specific tasks. While those tasks can look artificially intelligent, in some ways, they’re nothing more than the result of a lot of data processing.
AI is ML
Most of what we’re seeing discussed as artificial intelligence is really something called Machine Learning, or ML.
Machine learning is nothing more than collecting lots of data, analyzing the heck out of it for patterns, and then using the results of that analysis to perform other tasks.
For example, some so-called “assistive driving” technologies use data collected from thousands and thousands of cars on the road to “learn” what roads are, what objects are, what signs and lights are, etc. Having learned (and continually refining that knowledge), ML uses that information to steer an automobile safely down the road.
Of course, it’s nowhere near that simple. Particularly when it comes to something as critical as driving a car, there are boundaries on what can be “learned” and what requires innate knowledge, along with governors and other safety measures.
Let’s choose a simpler example. Dogs.
How ML works
With apologies to the pedants and ML researchers, this is a gross oversimplification to keep it somewhat understandable.
It all starts with people writing software. They design the software to analyze data fed to it for commonalities and differences and then record those characteristics as data of some sort.
They then feed the software data: lots and lots of data.
For our example, let’s give the software 5,000,000 images. Half of those images contain dogs and half do not. There is information called meta-data (data about the data) with each image that says “This is a dog” or “This is not a dog.” The software creates a large database of characteristics of what it means to be a dog. Or not.
People write more software. This software uses the database to analyze more photos. This time, the goal is not to update the database, but rather to determine whether the photo being analyzed is or is not a photo of a dog. The goal of this software, then, is to respond “Yes, this is a dog” or “No, this is not a dog.”
Since nothing is absolute, most such analysis software includes a measure of certainty. A response might be “I’m 99% certain this is a dog.”
People write still more software. This software now uses the information in the database to create images of dogs.
The ability to accurately analyze images and the ability to create realistic images are highly dependent on the quality of data from the original analysis. That data’s quality depends in turn on how much was collected — the more the better — and how accurate or error-free the initial meta-data was.
And, of course, how well all the software was written.
So far, though, no artificial intelligence was involved; just a lot of data collection, analysis, and processing.
ML and writing
Another area that’s gotten a lot of attention of late is the ability of “AI” to write text. In reality, it’s the same process:
- Software analyzes lots and lots (and lots and lots) of text to learn what the written word means.
- Software uses the result to generate written words based on a prompt.
As an example, I asked popular ChatGPT to “Describe the difference between artificial intelligence and machine learning.” Here’s what it came up with.
Artificial intelligence (AI) and machine learning (ML) are often used interchangeably, but they are not the same thing. AI refers to the ability of a computer or machine to mimic the cognitive functions of the human mind, such as learning and problem-solving. In other words, AI is the broader concept of machines being able to carry out tasks in a way that we would consider “smart.”
Machine learning, on the other hand, is a specific type of AI that involves the use of algorithms and statistical models to enable a system to improve its performance on a specific task over time. In other words, machine learning involves the use of algorithms to enable a machine to learn from data and improve its performance on a specific task without being explicitly programmed to do so.
In summary, AI is the broader concept of machines being able to carry out tasks in a way that is intelligent, while machine learning is a specific type of AI that involves the use of algorithms to enable a machine to learn from data and improve its performance on a specific task.
It’s not wrong.
On the other hand, when I asked it to describe the process to download and install Windows 11, it started by telling me that Windows 11 isn’t available yet.
It appears that my job is safe. For now.
Poisoning the well: the risks of ML
One of the things I hope you noticed above is that each step of the machine-learning journey started with people writing software.
There’s nothing magical about ML; it’s just very complex software written by humans. Imperfect humans.
Much of the so-called AI you see today — including my examples from DALL-E and ChatGPT above — are considered to be in “beta”. The software is being tested and presumably improved over time. But, as my Windows 11 example showed, it’s not perfect by any means.
Even with perfect software, machine learning suffers from the “garbage in – garbage out” phenomenon. It’s only as good as the data you feed it. If you feed it poor data, you can expect poor results.
Perhaps more concerning is that if you feed it intentionally misleading data, you can poison the entire dataset. Consider telling the analysis software above that several thousand pictures of gorillas are really dogs. Results could be… disturbing.
Yikes.
Now consider something more serious, like poisoning the data used for automobile assistance, and you can see that not only are there pragmatic issues (“That dog just ain’t right!”) but also safety and security issues (“Why is my car suddenly veering to the left?”).
But once again, there’s nothing “intelligent” about this. It all comes down to computer software processing data. Lots and lots of data.
Do this
Honestly, these are exciting times. I drive one of those cars, and it’s truly impressive what it can do and how it responds to the surrounding environment. Do I trust it completely? Of course not! But it’s a harbinger of great things to come, of that I’m certain.
As with any technology, it’s worth becoming familiar with at least the basic terms and concepts. Understanding that it’s just people and software and not (yet) HAL 9000 should make things a little more concrete and a little less scary.
I’m not artificial. Not yet, anyway. Let me prove it to you: Subscribe to Confident Computing! Less frustration and more confidence, solutions, answers, and tips in your inbox every week.
Is it possible that ChatGPT said that Windows 11 was not available yet was because it’s data only goes up to a few years ago sand therefore, is not current?
Among other things, sure.
I remember when the Captchas used distorted words and the second word was not used to verify if you are human. It was used to get data to improve letter recognition software. I figured, “Why should I do their work for free?” so I would type in wrong letters. So much for the validity of ML.
Now, that was just MEAN, Mark!
Besides, you weren’t doing it “for free;” you “payment” was being permitted to access whatever site or service you were attempting to use.
I just don’t want Google to have too much information. 🙂
That ChatGPT quote is amazing, even dumbfounding!
That said, however, its use of the phrase, “…machines being able to carry out tasks in a way that we would consider ‘smart.'” make me rather nervous. Did you catch that, by using the word “we,” it identifies itself as human…? Creepy!
When I worked at Texas Instruments, people referred to the computer as “He”. I found that creepy.
People worry about AI (or ML) committing crimes. But part of the data load that you would need for a functioning AI would a list of what is a crime and what is not. Just like with people, the rules of behavior have to be loaded in the brain. It does not matter if the rules of behavior are loaded by programing or by schooling, they have to be taught.
Actually an AI or ML that has legal codes loaded or learned, would be less likely to commit a crime because it would not be able to forget what is loaded. An ML would have to have programming loaded for everything it does. An AI would have to have teaching, an AI, just like people, would not know things without learning them.
I agree with Leo that most claims of “AI” software is not AI. Even legitimate AI is not actually intelligent. About a year ago I wrote a brief paper on AI in order explain some of its inner workings in ordinary English. I hope I can share this paper here. It’s a bit long, but it can add to the conversation and supplement what Leo has already said:
——————————————————————-
Someone once said AI is neither artificial nor intelligent. The term AI is more of a marketing term which is resurrected every decade or so to promote software sales or hype techniques that seem to show promise. This has been going on since the 1950s when the term AI was first used. We are now in such a period of AI exuberance (2020s). Today, software tools that can more legitimately be called AI are the neural networks (NN). The term NN was coined in the mid 1940s, so people have been trying to do this AI thing for a long time. It sounds like it has something to do with the human brain, but it doesn’t – not at all. The analogy that gave rise to NN’s associations with brain functionality is that its mathematical model can be represented pictorially as a bunch of nodes (bubbles or circles) connected by lines. The nodes represent brain neurons and the lines represent nerves connecting the neurons. In neural networks, the nodes are the locations where data processing occurs and the lines are the information pathways that pass data between the nodes. The visual similarity to brain neurons can be further extended by having millions of computational nodes arranged in layers, exchanging data back and forth. For better or worse, the NN community has adopted many of the biological terms associated with the brain, but NN is not a simulation of the brain.
Before getting to a description of NN, a brief history of software techniques that have called themselves “AI” will help bring some perspective. Before the current NN techniques, AI methods depended on traditional software logic algorithms, that is, extensive “if-then-else” formulations to account for most situations that might be encountered in the solution of a problem. For example, the various expert systems, such as IBM’s chess playing Deep Blue relied on such logic techniques. By using the brute force of hundreds of processors, Deep Blue computed millions of permutations of chess moves for each step and selected the one it thought might have the best outcome for a game move. By contrast, NN does not entirely depend on traditional software logic to do its job.
NN software refers to an assortment of algorithms that have been shown to be good at finding patterns in large data sets, in some cases much better than humans. The reason for the “assortment of algorithms” is because each specific NN tool is customized to a specific problem or application domain. Although NN doesn’t depend on if-then-else logic to do its job, it is not devoid of traditional logic because the latter is needed to moderate, fine tune, correct or steer the results toward what humans expect.
An interesting distinguishing feature of NN, compared to traditional logic, is that mathematicians don’t entirely understand the theoretical reasons for why NN algorithms work as well as they do. On the programming side, NN algorithms are not amenable to traditional code debugging techniques to find a reason a NN doesn’t produce good results. Algorithms may misbehave because of the size or quality of data fed into them, or the basic design of the NN algorithm, or because the real world problem it is supposed to solve is far more nuanced and complex than expected.
AI or NN research is now centered on three areas: understanding and formulating a mathematical theory for why it works, finding if NN algorithms can be generalized to apply to a wide range of applications, and designing custom computing hardware to speed up processing. Almost 60% of AI money comes from the big tech companies and most of it is focused on developing commercial applications with a quick payback. Running large NN systems also requires deep pockets because of high power consumption costs, which rivals cryptocurrency processing.
To put NN into perspective within the AI arena, NN is just one approach and currently the most popular (because it has the most money influx). Other AI techniques exist and have preceded NN, each with its own strengths and weaknesses. The telephone answering voice, expert systems and natural language recognition all started using other techniques. None of these, including NN, is “intelligent”, although they can be very impressive in what they do. Any exhibited “intelligence” must be specifically designed in. Thus far NN tools need a lot of hand-holding from humans, along with complex pre-processing and post-processing before they can be applied to solving a real world problem. The pre-processing phase is referred to “machine learning” or “training”. The post-processing is to make use of the “learned” knowledge, along with other classical software logic, to do something useful. So, a NN system is not a real time software package that computes solutions on the fly. If you buy an off-the-shelf NN application to perform a function, you’re buying software that’s already been trained to do a very particular task, with some specific constraints and limitations.
The underlying principle on which the NN learning is based is calculus, and more specifically derivatives and using derivatives to minimize a function. A brief review of calculus may be helpful: a “function” is the problem to be solved or analyzed. A function can be represented pictorially by a line graph or curve. The “derivative of the function” is the up or down slope along the curve at any given point. If the derivative is positive, the curve slopes up. If the derivative is negative, the curve slopes down. And if the derivative is zero, the curve goes through a peak or trough at the point where the derivative is zero. For example, imagine a curve that looks like the letter U. At the very bottom of the U, where the line turns from going downward to going up, the derivative is zero. At that point we say that the function is at its minimum or it’s minimized. Let’s say that the curve represents the difference between two numbers, or the difference between a computed (predicted) number and a number known to be correct or accurate. In that sense, the curve represents the error or divergence between a computed number and correct number. If we find the location(s) on this curve where the derivative is minimized, then this error is at a minimum or zero.
What does this have to do with NN? When formulating an NN algorithm, the working function used to determine how well the algorithm works is an error function. That is, a function that represents the error (or difference, or divergence) between a computed solution and the known or desired solution. If we have such a function and we minimize that error then presumably the computed solution will be reasonably close and acceptable to within a tolerance. This is a very simplistic view of the principle working in NN. This approach is not new, nor unique, to NN. There is another field of math and engineering known as “optimization” or “linear programming”, which has been used since the 1800s. Optimization techniques have many proven algorithms used in just about every industry. For example, in logistics, optimizing delivery truck routes, airline schedules, traffic light sequences, business profit and loss, etc.
How the learning happens:
In order for NN to learn it has to be given a large database of information as input from which to learn. Let’s say the NN is tasked to do character recognition, in which case it’s given a large database of pixilated alphanumeric characters (… and by the way, NN character recognition is not the same as OCR logic). The NN algorithm iterates through this input database and finds pattern of lines and light and dark pixels. It compares its evaluated characters against known (correct) characters using the error function. This process occurs thousands of times, iteratively, operating on each pixel or set of pixels (for this example of character recognition). For each iteration the computational paths are evaluated for their success. Paths that are more successful (correct) are scored or weighted to emphasize their contribution in subsequent iterations. Similarly, computations that stray from a desired result are weighted down to minimize their subsequent contribution. This iterative process goes on until the algorithm is able to correctly identify an input character. The final determination of all the weights is, in essence, the learning process. Conceptually, if the algorithm determines weights for every computational path that gives an overall minimum error and reliably determines acceptable results, then the NN has learned.
Thus far, all of the above description has been about how the NN learns, and not about how a trained NN is used to do useful work. Once the set of weights are determined (i.e. training is complete), the NN can be used as an application to process new input data. Unfortunately, this NN methodology is not always deterministic. A given set of weights that have shown a high accuracy with the training data may not work well on real world data when operating on new inputs. This overall AI methodology can require lots of tweaking and customizing to work for a given application on random inputs. How quickly and accurately a NN converges to a correct solution depends on many factors. For example, the number of computational nodes used, the number of layers of nodes, the error function, or the size and quality of the training data. etc.
I started out saying that AI is not intelligent. Neither is NN. All any NN can do is find patterns and compute numbers as the designer programmed it. It may correctly identify the letter “A”, but it doesn’t know if that’s an alphanumeric character or a piece of lint. Or, if NN identified a pencil it would have no idea as to how it’s used, what it’s made of, what size it is, etc. In other words, all those attributes of human knowledge and intelligence that bring contextual usefulness to the game are missing. In order to give a NN more substance or “intelligence” it needs to be combined with other software logic and algorithms. Even then, that additional software processing would only be able to add more information within the limits of its programming. For example, it could not tell you that an alternative to a pencil may be a pen, unless someone specifically programmed that in the software.
Any AI algorithm is subject to the profound software adage of “garbage in, garbage out”. NN systems can be infused with bias that’s in the training data and fail to work on real world data. A couple of examples of such bias are noteworthy: In one case the purpose of a NN was to be able to write prose text that mimics human writing. The resulting passages produced were readable and grammatically correct, although not necessarily meaningful, and they also exhibited a harsh, divisive and mean tone. This was traced to the fact that the training data was compiled from writings on the Internet. In another case a facial recognition system failed to identify faces of African Americans because the training data predominantly consisted of white faces. Another application of NN is in driverless car software. One prominent car manufacturer is known to rely mostly on NN trained decisions and not so much on software logic using sensor inputs (LIDAR, radar, sonar, maps). Other car makers do use NN, but supplement and cross-check the decisions with sensor and map information to bring more “good sense” into the processing. As in any new technology, NN will have problems, but will evolve and improve. We just need to put it into perspective and be cautious in how we apply it or rely on it before it’s fully cooked.
Wow, very informative paper, thank you.
I had never before considered how modern AI and NN systems were our real life experiments with Plato’s Allegory of the Cave.
It will never be true AI until they can tell which photos contain traffic lights. 😉