Language Games: How Chatbots & Language Models Could Impact Healthcare

healthcare Chatbots

Key Takeaways

  • The practice of medicine is the least likely area where LLMs may have an impact within healthcare, due to the complexity of use with patients and regulatory hurdles.
  • Repetitive and standardized administrative tasks allow for the entry and growth of LLMs to help reduce costs and save time.
  • Care must be taken when incorporating LLMs into communication and patient education systems as there remains the risk of misinformation.

A computer’s strength is grounded in its ability to do math quickly. Whenever a problem has regular, mathematically structured rules, you can bet that computers will exceed humans’ ability to solve it. 

Most board, card, and video games have such rules, and because people love to play them, they’ve long served as a popular way to test the latest advances in artificial intelligence and to dramatize its potential consequences.  Consider the impact of moments like WarGames (1983), Deep Blue’s chess victory (1997), and the unexpected arrival of AlphaGo (2015). In turn, these contests drive funding and further breakthroughs in artificial intelligence.

The Evolution of Language Models

Early game-playing AIs, with tiny working memory available, were based on manually encoded rules. The next generation of AI ran on faster machines, including brute-force look-ahead, exponentially increasing computational cost. For a fixed hardware configuration, a computer will generally play worse at games with more possible moves; in other words, tic-tac-toe is trivial, chess is hard, and Go was believed to be nearly impossible. 

What changed was the arrival of deep reinforcement learning – a neural network technique allowing the computer to approach the game without preconceived notions of good strategy. By playing billions upon billions of games against itself, systems like AlphaZero are able to generate vast datasets of potential moves.

It is precisely the availability of this big data that allows reinforcement learning algorithms to find sophisticated patterns buried deep in the gameplay 

– the structure that “naturally” emerges from the rules after iterating a process of competitive play + strategy updating + more competition ad infinitum – 

revealing techniques that an individual human might require hundreds of lifetimes to learn through their own trial and error.

When it comes to language, then, the promise is clear. Syntax, grammar, and usage follow regular rules. Exceptions are learned by rote repetition (“i before e except after c except in receive, deceit, ceiling…”). The perfectly right word (le mot juste) doesn’t usually exist, but statistical patterns emerge, such that the pattern of human speech can be replicated  (“the adjectives used to describe ‘sunset’ are most commonly, ‘beautiful’ (7%), ‘stunning’ (2%), ‘crimson’ (1.1%), … ‘gooey’ (0.000002%), …”). These statistics can become very complex because the context of surrounding (and sometimes, distant) words affects present word choice.

Still, given enough data, a sophisticated neural network – one with billions or even trillions of parameters – should in principle be able to untangle the highly interdependent nature of human language. 

We’ve seen explosive growth in users of and uses for large language models (LLMs) in the mere 9 months since OpenAI released the first ChatGPT model. Is human language the next game that AI will come to dominate?

A Review Of Language and Life

The philosopher Ludwig Wittgenstein coined a phrase – the “language-game.” The “language-game” actually referred to something quite unlike the games we’ve discussed so far. Wittgenstein meant that the words and sentences we utter have consequences outside the conversation, in contrast to the self-contained nature of a typical game. 

In other words, language is a move in the larger game we call “human life.” Different “moves” have different meanings and outcomes in different contexts. And the complexity of the human experience – all the different senses, physics, biology, sociology, and so on, that form the “playing field” of life – means that the state space that emerges for language is large indeed.

Even when limited to the written word, language is complex. So far, this essay contains about 900 tokens. If only an average of 1% of possible tokens would make sense in any given location, that still gives 102438 valid essays of this length. The number is beyond human comprehension. And ChatGPT3’s token capacity per generated sequence is larger yet at 2,048. The computer scientist John Tromp came up with an estimate of the number of valid chess configurations, less than 1046, and, impressively, an exact calculation of the number of possible Go configurations, 2.1 × 10170. Remember, these are exponential scales, so the language-game’s “900 token essay” state space isn’t just 14 times larger than Go’s; it’s 102266 times larger.

“…computers will, for the foreseeable future, be somewhat amateurish in the way they play the language game.”

Steven Steinke

This is not meant to dismiss the amazing progress of language models thus far. It is simply meant to point out that computers will, for the foreseeable future, be somewhat amateurish in the way they play the language game. 

The question for anyone wanting to integrate LLMs into their business is, “In what ways could an enthusiastic “amateur” linguist contribute to healthcare?”

We can divide the practice of healthcare into three broad facets: medicine, including testing, diagnosis, treatment, and follow-up; business, or billing, note-taking, and administration; and patient communication, incorporating both specific patient-doctor correspondence and general health education.

LLMs have very different potential roles in each of these facets.


Medicine is the least promising arena for LLMs within healthcare. Patient intake forms already use symptom checklists, and asking patients to write this information in essay format seems unlikely to improve its accuracy or ease of analysis. 

The imagination wanders to clues about the early stages of neurodegenerative diseases like Parkinson’s or Alzheimer’s hidden in the text of writing samples, but it currently appears that physical handwriting is a more reliable marker for such conditions than semantic content. 

The potential of LLMs as “conversational encyclopedias” and, hence, “diagnosis savants” is limited precisely by the probabilistic nature of their training procedure. Rare diseases are still rare, and the rarer the answer to a question is, the more likely a LLM is to “hallucinate” one.

This isn’t to say that AI in general won’t have a big role in this healthcare facet, just that the most useful tools won’t be centered on language itself. Computer vision is already proving transformative in the field of radiology, and as sensors improve, providing richer and more complex data, computer analysis will only become more valuable; smart monitoring on wearable sensors can provide precision responses where only blunt tools were available before; and, computer-assisted surgery is becoming progressively integrated into various specialties. However, none of these technologies are powered by LLM. 

The additional regulatory hurdles in the actual practice of medicine provide a secondary barrier to the widespread adoption of LLMs without significant potential upside, which has not yet clearly manifested and probably never will.

Business & Administration

Communications related to business and administration tend to be highly regular and standardized, which narrows the state space considerably. Big speedups are possible in the fields of billing and record-keeping, freeing doctors and their administrative assistants from rote repetition. 

Administrative costs associated with healthcare in the United States are estimated to be on the order of $1T annually. A conservative reduction of 10% in these costs opens up a potential $100B industry for automated language services in the administrative sector of healthcare. To look at one activity within this broad umbrella, doctors spend about 15 minutes per patient, averaging 2.5 hours per 10-hour day, just completing electronic health records. Cutting that 15 minutes to 12 would save the healthcare industry about $10B per year in the US.

Of course, we might anticipate an arms race, as doctors pursue the most thorough LLMs to extract billing codes from their notes, while insurance companies deploy competing LLMs to limit their liability as much as possible. Focusing on the patient experience, standardization and indexing of patient notes can improve doctors’ quick access to information, which is certain to be of net benefit.

Epic electronic medical record systems recently made a big commitment to integrating language models into its services; a move that most major players in this space are likely to make in the coming months.


We might politely note that “bedside manner” is an unevenly distributed resource in medical practice. When it comes to keeping up correspondence with hundreds of patients with evolving needs under circumstances that are often literally life or death, it’s possible to be sympathetic to the difficulty of maintaining a warm and patient demeanor. 

Language models are quite good at imitating tone, so perhaps a personalized touch could be added from a quite impersonal source. It’s a little difficult to value such a service, but on the other hand, it shouldn’t cost much to develop, either. Keep an eye out for formerly curt and clinical doctors suddenly becoming a little more engaging and personable in their email correspondence!

Incorporating LLMs into patient education is something of a double-edged sword. Until they can demonstrate a greater fidelity to objective and factual information, the risk for accidental spread of misinformation remains. However, this risk should be compared to the current situation, not a hypothetical ideal. “Doctor Google” already has a tendency to lead patients down the wrong rabbit hole, so perhaps a centralized LLM consumer-facing medical reference would represent an overall improvement despite the persistence of occasional factual errors. 

The commercial case for such a service is a bit dubious; this may be a project best created by an academic team funded by public grants, rather than a private sector service paid for by subscriptions or advertisements.


It’s not likely that language itself will be the mechanism by which artificial intelligence provides its greatest benefit to either human health or the healthcare industry. However, by its sheer scale, the amount of paperwork and text generated in the process of administering the medical bureaucracy still provides healthy opportunities, to the tune of billions of dollars, for LLMs to break into healthcare. 

There are additional niche opportunities to improve patient outcomes through indirect means like improved communication and follow-up. However, optimism should be tempered by extreme pragmatism – companies with concrete objectives will contribute far more than moonshot projects in this space.

Steven Steinke, Ph.D

Steven K. Steinke, PhD, is building a generative AI platform for drug development at a startup he founded (currently in stealth mode). Before taking the plunge into entrepreneurship, Steven worked at Nvidia. He developed training programs in deep learning for the biomedical field and then improved large language models as an applied scientist. He has a Ph.D. in physics from the University of Arizona, specializing in quantum measurement theory and hybrid quantum systems.

Scroll to Top