LLMs in a Vacuum Are Useless

LLMs in a Vacuum Are Useless

“What hath God wrought?”

What hath God wrought?” That was the first message ever delivered via telegraph. The four-word phrase was sent by inventor Samuel Morse on May 24th, 1844 at 8:45 am, and traveled from Washington, D.C., to Baltimore, Maryland in the blink of an eye—a journey which would have previously taken 4-8 hours on horseback, even in the most ideal of conditions.

Samuel Morse, whose name you may recognize immortalized in Morse code, was aware of the gravity of this event. He was a smart man. He knew that telecommunications were going to change the world in some way or another. Hence the melodramatic message.

And it seems that we’re on top of yet another moment in time. Some very smart people have come out and said that this is the biggest invention since the internet.

I’m referring to ChatGPT, of course, which needs no introduction.

Winter Is Coming?

So, we’re coming out of the peak heat of another, sizzling-hot AI summer, and the world might never be the same.

The world might never be the same…” I know how that sounds. Maybe a little bit over the top? Let me explain.

If you were to ask a person disillusioned by the new advances in artificial intelligence, they might tell you that LLMs are a fad, a passing trend, and that it’s just a matter of time before they go the way of Bitcoins and Ethereums and Google+ and such. And if you ask an AI evangelist, or doomsayer, they might tell you that your job’s in danger, or that your company should restructure, or that the foundation of the education system is at risk, and we’d all better adapt or become obsolete.

Wherever you land on that spectrum, let’s put that aside for now and assume that the community as a whole has run up on, and perhaps even surpassed an inflection point in the hype cycle—the point sometimes called the “Peak of Inflated Expectations”. This is where the expectations which, in the heat of the moment, grew to unreachable, unrealistic heights, and subsequently the hype begins to dissipate.

Of course we can argue about whether or not we’re actually at this point or not. There are people smarter than me out there actually studying such trends very closely, but for the sake of this blog post, let’s carry on with the hypothetical that we have passed the Peak of Inflated Expectations.

The thing about the hype cycle, though, is that it’s not all just hot, inflated air. It’s inflated, yes, but there is usually some smaller, consistent flame burning beneath, carrying the metaphorical hot air balloon of LLMs, with its Basket of Usefulness™ up and down through the Sky of Uncertainty™. All that to say—LLMs are actually useful.

And as the summer of AI comes to an end, two things are clear:

  1. LLMs are here to stay (in some capacity), and;
  2. GPT-4 is the heavyweight champion.

At this point, you’ve probably heard of GPT-4. It’s widely available, it’s usable and accessible through ChatGPT or the GPT-4 API endpoints, and it’s mostly affordable. Even after the inflated hype, we’re seeing ChatGPT used for many tasks, from tutoring, to learning, to paired-programming, to accelerating administrative tasks, and so on.

I joke about ChatGPT not needing an introduction, but—

Even though ChatGPT broke the shortest-time-to-100 million-users record, and even though openai.com gets over 1 billion visits per month, it’s still true that most people don’t use ChatGPT!

I talk about ChatGPT with my coworkers, but have you talked about ChatGPT with your close friends, family? Your neighbors, your Amazon driver? Oh, what, you don’t discuss software with your friends?

Well, it seems that whenever I mention it to people outside of work, they haven’t even tried it!

I take this as a side-effect of the Peak of Inflated Expectations. No matter how excited the tech community gets about a new invention, the rest of the world takes much longer to adapt. Yes, even in 2023.

So what would it take for ChatGPT to really break into the public conscience? Maybe, let’s say, how long will it be until it reaches the level of ubiquity as something like a Google search engine?

(I know. Comparing the ubiquity of an AI chatbot to a search engine is not a perfect apples-to-apples comparison, but we don’t have much else to go on.)

ChatGPT, anecdotally, is already creeping into some of Google’s search engine territory. People are querying the chatbot with questions they would have asked the search bar just a year ago, at least for certain types of searches.

But if ChatGPT can become as commonly used as Google search, it will not just be because it’s used in conjunction with search engines, but it will mostly be because ChatGPT is being used in novel ways. It’s being used in areas that were previously untouchable by the cold, metallic hands of artificial intelligence. These are areas like teaching, tutoring, math assistance, cheating on homework(?), brainstorming, code generation, as a writing partner, secretary, completing administrative tasks, etc.

So where are we now? As the hype comes to an end—and as the dust still settles—where have we landed?

It looks to me that people aren’t using Large Language Models in the very epic, extraordinary ways that were ideated at the peak of the hype cycle. No, they’re in the much more reserved, basic ways—the brainstorming, the code-completion, or the replacement of a Google search here and there.

So how much further can LLMs go?

In the remainder of this post, I’ll delve into the true value of Large Language Models (LLM) and attempt to back up the idea that the usefulness and ubiquity of LLM’s will ultimately depend on the capabilities of their supporting software.

The Robots Are Going to Take Our Jobs

So is ChatGPT going to take your job?

Probably not.

But someone using ChatGPT might take your job.

I’ve heard Marc Andreesen talk around this sentiment, and I recently heard Damien Riehl say it on the Practical AI podcast as well.

Here’s the quote, referring specifically to lawyers.

I’d say to lawyers that are worried about AI, that AI will not take a lawyer’s job, but a lawyer that uses AI will take the job of a lawyer that does not use AI.

(Quote by Damien Riehl, Practical AI, Episode 232, somewhere around the 38:27 mark)

 

So, if you’re a programmer, you should probably be using AI tools like GitHub Copilot, or starting to learn how to incorporate them into your workflow.

That’s my recommendation.

Good programmers are going to be a lot more productive because of tools like GitHub Copilot and Sourcegraph’s Cody. They’re really good at this stuff already, and they’re just going to get better.

But even without some of the code-specific tools, programmers are also getting more productive by having ChatGPT as a paired programmer. Figuring out a path around or through a roadblock can sometimes be tough, and could potentially take hours, or, so help us, days. We’ve all been there. I’ve found ChatGPT extremely helpful in these situations.

Now, the counterpoint to all of this is that you don’t have to adapt. There are still mainframes, and COBOL programmers, and those who eat machine code all day long. And that’s fine too! There are different paths that people can take, and still make money, and have a very fulfilling career—and at no step are you required to use artificial intelligence.

Should My Business Be Using ChatGPT?

Large Language Models are excellent at certain tasks.

If your business leans heavily on chat-based systems, if you do a lot of customer service, receive a lot of emails, service requests, or anything which involves a lot of short, unstructured or semi-structured text, then GPT-4 might just revolutionize your business and you should absolutely begin figuring out how to incorporate artificial intelligence.

There are many solid use-cases for LLMs. But LLMs are not good for everything. Let me restate—you might not need ChatGPT.

LLMs might make your marketing department 10x more productive, and it might make your web developers 10x, 20x, or 100x more productive, but you do not need to put together your own custom web interface that interacts with the GPT-4 API, or to build a vector database with all of your company data, or to buy an on-premise machine learning cluster to power your business. Yes, there are some instances where it might make sense for a business to build these solutions, but for most people these complex solutions are not going to be worth it.

If you’re a home renovation business, you might just have ChatGPT help you draft some responses to customer reviews, or navigate the county or city permitting systems. It’ll be helpful, but if you mostly install windows and build decks, your life probably isn’t going to be flipped upside down.

And let’s not forget that machine learning and artificial intelligence are bigger fields than just language models. Machine learning algorithms have revolutionized fault analysis, fraud detection, and protein modeling, just to name a few. Pick the right tool for the job—it’s not always going to be a large language model.

Other times, when you find yourself hankering for ChatGPT, you might just need a Python script. Need to reformat a 100,000 line CSV? Use a Python script! Due to context window restrictions, ChatGPT can’t be used to parse your 100,000 line CSV—does that mean you need to build a complex system to break down the 100,000 lines into digestible chunks for LLM to decipher, then rebuild the CSV? No! Just use a Python script! Now, do you find yourself writing a short Python script? ChatGPT can definitely help you with that.

We might see game-changing productivity boosts. We might already be seeing such productivity changes in programming. We will continue to see improvement among some administrative tasks, and we might see some less important decision-making in some industries be changed forever by large language models. But most of the other stuff? Well, that’s not going to change very much.

What Makes an LLM Useful?

Thus far I’ve talked about large language models, specifically GPT-4, which we interact with through ChatGPT or GitHub Copilot. I’ve talked about how popular these tools are and how much they might affect your work and your life at large.

Now I want to focus on what actually matters, and that’s everything outside of the large language model.

LLMs in a vacuum are useless.

You see, if there was a model which was 100x more powerful than GPT-4, but you had to interact with it using Morse code via a telegraph, it wouldn’t be very useful, now would it?

(Just imagine what Samuel Morse might’ve written to GPT-4 via Morse code if that had happened in 2023. “New phone, who dis?”)

 

Image2
(By Mathew Benjamin Brady – Christies, Public Domain)

 

ChatGPT’s success has been based not only in its secret sauce (GPT-4), but in its novelty.

It was the first readily available and good chatbot, and it’s only $20 per month. That’s an amazing value. But as we go forward, it becomes clearer and clearer that there’s just not that many instances where interacting with ChatGPT via a chatbot-style interface is that useful.

I won’t continue to hammer on the real use-cases for LLMs. Now what I want to focus on is how GPT-4 might continue to grow, in a steadier, more functional fashion. On the hype cycle graph, this is what we might call the “Slope of Enlightenment”.

GitHub Copilot is useful not just because of the LLM that backs it, but because I don’t have to leave my code editor to use it. And because they figured out how to recommend the perfect amount of code without losing the context. And because it’s so conveniently easy to insert the suggested code at just the press of a ([tab]) button.

We are still discovering how useful LLMs are. We’re only seeing nascent fruits of these early applications. Who knows how many hundreds of venture backed startups are building products, and searching for these other niches—the note-taking applications, the train-AI-on-your-data apps—of which only a few will succeed. But it will be these more specific, more integrated use-cases which drive LLMs to the level of ubiquity as something like a Google search.

Another way to think about what I mean by LLMs in a vacuum are useless is that there are two distinct problems—the LLMs and how can we use them? You have OpenAI, Anthropic, Meta, and a few others, along with the open source world trailing behind just a bit, and they’re all working on making the best language models possible. But then you have the product people and the consumers who are taking that model and figuring out how to use it.

Slapping a ChatGPT window on top of your product probably isn’t going to be very useful. But if your users have a specific need and you have an elegant way to incorporate the interface, then you might be onto something.

ChatGPT as a tutor, as a tool to help students with homework, as a writing partner, and as a paired programmer—these roles will probably continue to exist. But we’ve seen some months of ChatGPT usage drops. I’d posit that those users are not completely disappearing though. They’re still using GPT-4—just via the API, or more specifically, through products which have integrated GPT-4.

The Difficulties Faced by LLM Products

After playing around with many of the open source LLMs, I must say that it would be difficult for me to take the decrease in quality after using GPT-4.

Running privately and locally is certainly a benefit for some applications, and/or some companies, but getting the open source models to work at the same level as GPT-4 is difficult. And it’s not just about the language model. Again, LLMs don’t exist in a vacuum!

I’m not trying to minimize the work that these open-source folks are doing. The democratization of LLMs is very important. But the point I want to emphasize is that it’s not just about getting a model closer and closer to GPT-4’s accuracy. These local, open-source systems can be slow, can be difficult to set up, difficult to deploy, and they tend to require specific hardware, which is expensive.

That being said, these challenges might be conquerable for some larger companies, and depending on the size of the company or product, it might save you a lot of money to develop a custom LLM pipeline which uses an open source model. But for most of the population, if you want to save money by using an open source LLM, you might end up paying much more in an even more valuable resource–time.

Of course, you could always use AWS SageMaker, or Google’s VertexAI to run the open source LLM of your choice, but then it’s not so local and not so private anymore. You could rent cloud GPU to fine-tune the model on your data, but that can be difficult and expensive as well.

At the end of the day, many larger companies might just pay one of the old-heads, like IBM, who just recently released their generative AI product, watsonx.

Even then, if you do choose to pay for one of the hosted, and/or proprietary solutions, there are still plenty of challenges that you’re going to face, data integrations you’re going to have to spend developer time on, and so on.

We’re still at the tip of the iceberg when it comes to solving all of the problems, technical and otherwise, that surround large language models, but if you’re interested in the difficulties that you’ll face actually using an LLM in production, check out this article from Honeycomb. It’s a really good read.

Design Is King

A sentiment that has been going around is:

Wait a second, that’s not an AI startup! That’s just a UI on top of the GPT-4 API…

To that, I say, good! GitHub Copilot could be overly simplified into being called a UI on top of GPT-4. Just as easily, an iPhone could be called a UI on top of a processor.

These language models need to find the right use-cases to unlock their potential. They need great design and skilled application of those designs.

Big, great inventions, like electricity, the internet, or large language models often get built out more quickly than the rest of us can keep up. Liken it to the Field of Dreams motif, “Build it and they will come”. The products will come. I guess the baseball players (or ghosts, or whatever happens in that movie) are designers in this metaphor.

I pause here just to emphasize the value that a design practice provides to the world of software and digital products.

This idea of taking a more holistic approach to product design is by no means a bespoke idea. At least not in the last 10-15 years of software. It’s all about figuring out what the product is, what it does or should do, doing product and user research, putting together the right pieces, and finding the right fit for the product. But when a technology gets as hyped-up as ChatGPT and LLMs, sometimes it becomes difficult to see straight, and we start throwing it at everything. Maybe it’s a fear of falling behind the curve. Or maybe in the thrill of potentially being ahead of the curve, and, you know, gaining a competitive edge.

It can be dizzying when it feels like all that people are talking or writing about is ChatGPT. So, I guess, sorry to contribute to that. But hopefully you took away something worthwhile about the post-ChatGPT world we live in today.

Thanks for Reading!

If you want custom software, Simple Thread can help you. If you want help integrating ChatGPT or other LLM technology into your business, let’s talk. We’ve been making products for a long time, and we’re really good at all of the stuff around the model.

Agree or disagree, love or despise what you read? Leave a comment!

Loved the article? Hated it? Didn’t even read it?

We’d love to hear from you.

Reach Out

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

More Insights

View All