Optimising LLM Implementation: Focusing on Word-Wrangling Tasks for Long-Term Utility

Optimising LLM Implementation: Focusing on Word-Wrangling Tasks for Long-Term Utility

The Future of GenAI: Strategic Implementation of Large Language Models

ChatGPT is great. But the behavioural family of which it is a member - generative AI - is not the type AI researchers are capitalizing on in order to build the next generation of more powerful AI models.

The Evolution Towards AGI

The next models on the path to artificial general intelligence (AGI) will include agentic behaviour, meaning models that experience streams of data and act autonomously upon them. Put simplistically: “AI agents are like generative AI (GenAI) in an eternal for loop”.

Further down the timeline, AGI (as predicted by an OpenAI researcher to be in full force by 2028) will automate learning itself, creating artificial superhuman intelligence (ASI). In other words, the relatively hands-on training done to create today’s models will be automated and scaled by many orders of magnitude in the future, making the creation of AI models exponentially cheaper.

The Corporate Dilemma

Say you work at a large corporation. We can assume that the genuinely cutting-edge GenAI use case you’re working on right now can become obsolete, gathering dust in the attic before it’s even fully adopted.

So is it nonsensical to work on GenAI projects? Should we just sit out the current technology trends in order to wait for the arrival of agents or AGI to take care of all our problems? I think there is a better solution. We should ask the following question: “How can we minimize the risk of building a GenAI system that will come to be mostly sunk costs in a couple years?”

LLMs: Word-Wranglers and Rapid Commoditization

Unbeknownst to many GenAI enthusiasts, LLMs are prosaic and specialized rather than imaginative and generalized. They are fundamentally great at predicting the next word from a given word. Scale and tweak this, and you get something impressive. However, they will always be fundamentally limited in their intelligence because they’re fundamentally a word-prediction-machine constrained by its training parameters.

The future for our soon-to-be-legacy LLMs might look bleak if it weren’t for them commoditizing rapidly. Meta in particular, has been open-sourcing their LLMs to obliterate the margins of their closed-source competitors OpenAI, Microsoft, Google and co. Add to that the current glut of tiny LLMs - called small language models (SLMs) - that might be less capable but run well on consumer hardware and we got ourselves rock-bottom inference and fine-tuning prices for LLMs.

Optimal Use Cases for LLMs

Based on this, one should focus on implementing LLMs for highly repetitive and monotonous tasks in which words should be wrangled; a subset of the tasks usually given to LLMs. I define word-wrangling as relatively simple, natural language-based tasks such as:

  • “Give a profanity-score between 0.00 and 1.00 for the following text”
  • “Strip the following text of patient identifiable data”
  • “Categorize the following text into fitting categories from
  • “Create embeddings for each chunk of text so I can save them for semantic search”

Due to GenAI inference time being/becoming cheap, we can chain word-wrangling tasks together. This combines the simple word-wrangling tasks into a potentially powerful iterative reasoning chain that could even be able to achieve LLM-tier results just with a chain of SLMs.

Non-Optimal Use Cases

One should not focus on implementing LLMs in tasks that they aren’t intelligent enough for and maybe erroneously expected to be soon. Tasks that are dependent on some non-language reasoning and thereby not optimal for LLMs:

  • Open-ended chatbots
  • “Tell me if the following advertisement is of high quality”
  • “Generate creative content”
  • “Explain this topic in-depth” (user doesn’t provide any context)

These use cases expose an interface to the user that is too broad. The user will expect things from the LLM that it is not intelligent or knowledgeable enough to provide; today as well as in the future.

Industry Best Practices for LLM Systems

I see many end-users being pessimistic about today’s AI apps because they had bad experiences. Usually, the experience was bad because they chatted with a chatbot (a bad implementation of LLMs) that ran on an LLM way too underpowered for that task (cheap or open-source models) and the users realized it was quite a waste of time. This beautifully exemplifies an implementation of an LLM that is bad today and will stay bad.

Solutions that effectively use LLMs are often harder to spot because they usually don’t provide a broad interface for the user. An example would be search. An ecommerce website might have a search interface so users can find the most relevant products for their needs. The pre-LLM version would have been a mix of string matching, BM25 and full-text search. Then came LLMs and by integrating search with an LLM you could search by meaning (semantic search). Now think about implementing that in your enterprise knowledge search system and how powerful that could be.

The still young and emerging industry best practices for LLMs (be aware there does not exist a body of authority on LLM industry standards (yet)) hint on exactly the examples above. One of the main takeaways from Hamel H.’s and Dan Becker’s conference on implementing LLMs in enterprises is the endorsement of constraining LLMs, relying on the - what I call - word-wrangling capabilities of LLMs and not necessarily on their reasoning and wisdom capabilities. If you want to learn more about this golden path of LLM implementation, I recommend reading Hamel’s blog.

Conclusion

By taking these ideas to heart while designing an LLM system, you’re insuring your LLM system against its premature deprecation by technologically more advanced successors. I am not saying that, e.g., a chatbot will be doomed to rapid obsolescence; it would depend much more on the alternative solutions available in your environment. I am pointing to optimization gradients that you can think of when you design an LLM system. An optimization gradient that I only hear world-leading ML engineers talk about and not front-line developers with consumer or business end users talk about.

In the rapidly evolving landscape of AI, strategic implementation of Large Language Models is crucial for long-term value. By focusing on LLMs’ core strength in word-wrangling tasks, organizations can build systems that remain relevant even as more advanced AI technologies emerge. This approach - constraining LLMs to specific, language-based tasks and chaining these tasks for complex operations - aligns with emerging industry best practices. It allows businesses to harness the power of current LLM technology while minimizing the risk of rapid obsolescence. As we move towards a future potentially dominated by agentic AI and AGI, those who have built robust, task-specific LLM systems will find themselves with a solid foundation for adaptation. The key is to remain both optimistic about the potential of LLMs and realistic about their limitations, always keeping an eye on the horizon of AI advancement. By doing so, organizations can navigate the exciting yet challenging waters of AI implementation, ensuring their investments in today’s technology pave the way for tomorrow’s innovations.