The generative AI paradox

Note: After publishing this post, I have made some changes to my thinking. I will leave the below but want to mention:

  • tension is a better word than paradox
  • hubristic determinacy is a term word than over-determinacy

These changes can be found in the subsequent post.

Original Post

I have been working on a forthcoming and longer post (article?) about LLMs doing math. I cut a bunch (because I am verbose!), but thought the following two excerpts were worth sharing on their own. They demonstrate a key paradox about generative AI, where LLMs are a type of generative AI:

Generative AI feels human and so is attributed human qualities (anthropomorphized), but is not human and so is attributed deterministic qualities (over-determinacy).

Each of these excerpts capture in a single written breath the paradox of generative AI.

The first excerpt comes from a course created by Khan Academy for teachers, students, administrators, and parents grappling with the role of AI in the classroom. In a section advising students about the importance of critical thinking skills when using AI in an educational purposes, we find this excerpt:

“In addition to the obvious potential for letting an AI do your thinking for you, we know that AIs aren’t always right! They can give you inaccurate information, and they don’t have any judgment! […] They sometimes calculate math incorrectly, or in an inefficient way.” [source: Khan Academy]

The second excerpt comes from a tutorial about the framework Langchain published by the company Pinecone. In this excerpt, the following explanation is provided about how an LLM handles a math question:

“LLMs are generally bad at math, but that doesn’t stop them from trying to do math. The problem is due to the LLM’s overconfidence in its mathematical ability.” [source: Pinecone]

These excerpts demonstrate the two sides of the paradox. We see anthropomorphization in the words calculate and overconfidence. The presumption here is LLMs respond to questions in a manner a human would respond if presented with the same questions. As such, asking an LLM to answer a math problem does not necessarily result in the LLM doing calculations, even though we ourselves may do precisely that when asked the same question. Without getting into the debate of whether machines can think, this side of the paradox equates the question with the approach, and so overlooks that the answer can be obtained by other means.

We see over-determinacy in the use of the words inaccurate, bad, and incorrect. The output—and so LLM—is evaluated on the wrong criteria. Better words to describe the results would be unexpected or unanticipated. LLMs produce unexpected or unanticipated responses to prompts because they are non-deterministic. Although we might say “2+2=5” is incorrect, doing so doesn’t capture the important nuance that the 5 is not calculated but generated as a prediction when an LLM is in the mix. Characterizing LLMs as wrong is, in my opinion, a category error akin to saying a possibility is incorrect, inaccurate, or bad. Instead, a possibility has a degree of likelihood. The issue, then, is confusing a prediction with a certainty, and that falls on the observer rather than the prediction itself. As such, the objects and processes in question are held to the wrong standards.

This paradox makes talking and writing about LLMs complicated as these excerpts illustrate. More broadly, I see the paradox underscoring the veracious appetite many companies and people have to apply LLMs everywhere possible and then work to constrain their innate non-deterministic qualities. I view the impulse to apply and constrain LLMs to be a contemporary version of Foucault’s idea of “docile bodies.” In the case of generative AI, actual human bodies are transmuted by data and subsumed into a model. The application and fine-tuning of that model constrains outputs to make a predictable worker (or assistant, or task-completer). As such, generative AI makes bodies docile twice over: first in the model creation and then in the model application, and makes docile both human bodies and anthropomorphic machines.

I’ll be publishing the larger piece from which this was cut soon (this week, maybe next), and probably will be writing more about docile bodies thinking another time (maybe when I am not just wedging it into a scrap of paper).

I would love feedback or thoughts. I’m sure I am off base with some of my thinking or explanations. Reach out!

Thomas Lodato @deptofthomas