Is Building Your Own LLM Worth It? Probably Not

November 13, 2024 | Andrew Lawlor

I’ve spoken with many technology leaders in the past few years, and there’s a palpable excitement in boardrooms about generative AI.

Your competitors are exploring it, your employees are experimenting with it, and you’re right to take it seriously.

But here’s where I see companies making a critical mistake: many are seriously exploring the proposition of building their own large language models (LLMs).

Let me be direct: unless you’re sitting on billions in AI research funding, building your own LLM is likely a shortcut to draining your tech budget with little to show for it.

You might think, “But our industry is unique; we need our own model.”

I hear this often, and I understand the impulse. There’s an AI arms race mentality pushing companies to seek a competitive edge through custom AI solutions.

But what I’ve observed is that this drive to build proprietary LLMs often stems from misconceptions about what it really takes to build, train, and maintain AI models.

Stat for Large language model being adopted by industries

“Proprietary” sounds good, but only in theory

I understand the allure of building your own LLM.

Your data is highly sensitive, and you want complete control over how it’s processed and stored
Your industry has unique terminology and specialized knowledge that general-purpose AIs might misunderstand
You want to build a competitive moat through proprietary AI technology
You’re wary of depending on major tech companies and want to own your AI destiny
You believe a custom model will produce more accurate results for your specific use cases

These concerns aren’t irrational.

Companies like Anthropic, Microsoft, and Google have shown what’s possible with custom LLMs. They’ve built powerful models that understand context, generate human-like responses, and tackle complex tasks.

Their success makes it tempting to follow their path.

OpenAI’s got the capital, but have you?

But here’s what I’ve noticed in my conversations with leaders: they look at these tech giants as role models without considering the vast gulf in resources.

These companies pour millions into AI research, infrastructure, and talent.

Some estimates say that OpenAI’s GPT-3.5 model with about 150 billion parameters cost them about $5 million to build. GPT-4 costs are estimated to be 20x more eye-watering. Bloomberg’s relatively smaller GPT (50 billion parameters) is estimated to have cost them $1 million+.

OpenAI goes in loss making ChatGPT.

These companies maintain massive data centers, employ thousands of AI researchers, and can absorb the costs of multiple failed iterations. And more importantly, AI is their core business—not a tool to support their core business.

Your company likely has different constraints and priorities. Before committing to building an LLM, you need to understand the true scope of what you’re considering. The resources required aren’t just substantial but are major barriers to entry.

Time and costs are prohibitive, to say the least

Training and serving an LLM isn’t like typical software development. Looking at companies that have successfully built LLMs reveals a sobering picture of resource requirements.

You need massive computational power. We’re talking about multiple GPU clusters running continuously for extended periods. That’s expensive, especially if you’re considering Nvidia GPUs (the market leader)
The training data requirements are staggering. You’ll need vast amounts of high-quality, carefully curated data. OpenAI’s GPT models are trained on hundreds of billions of tokens
Infrastructure costs spiral quickly. Beyond raw computing power, building the infrastructure for training demands scarce and costly resources—especially Nvidia GPUs, which dominate the industry
Access to these GPUs is limited, and assembling enough to run multiple training iterations is a resource drain for most organizations. You’ll also need robust data pipelines, monitoring systems, and scalable architecture to handle both training and inference
Every training run is a significant investment, and most models need multiple iterations to achieve acceptable performance
Even after successful training, serving the model requires substantial ongoing computational resources

For context, even relatively modest LLMs with fewer parameters than leading models require immense resources to train and maintain.

The costs scale exponentially with model size. Leading models like GPT-4 required investments that dwarf the entire technical budgets of most organizations.

Gartner's approximate costing for different use cases for integrating LLMs

Think of it this way: every time you need to update the model with new data or capabilities, you’re looking at another full training cycle. Even if you start small, the resource requirements quickly become unsustainable for most organizations.

Remember those AI project failure rates I mentioned? Many of those failures stem from underestimating these fundamental resource requirements. Teams often discover midway through that they’ve burned through their budget without achieving their desired results.

Costs beyond training and serving

The computational costs of building an LLM are restrictive as it is. But what many tech leaders also forget is that:

Your top engineering talent will be consumed by this project. These are the same people who could be building revenue-generating features or improving your core products
You’ll need specialized AI talent, i.e., data scientists, ML engineers, and AI researchers. These roles command premium salaries and are incredibly difficult to retain
Your legal and compliance teams will need to navigate the complex copyright and privacy implications of training data
Quality assurance for AI systems requires specialized testing frameworks and ongoing monitoring—it’s not like traditional software QA
Documentation and maintenance become exponentially more complex. Every training iteration needs to be tracked, every data source documented, and every model behavior understood
Security considerations multiply. You’re not just securing code; you’re securing training data, model weights, and inference endpoints
When things go wrong, debugging AI systems requires specialized tools and expertise. The complexity of large language models makes root cause analysis particularly challenging

And there’s the opportunity cost.

While your team struggles with the complexities of LLM development, competition might be rapidly deploying solutions using existing models.

Every month spent on custom LLM development is a month you’re not spending on actual business problems.

When does building your own LLM make sense?

Almost never.

Consider it only if you have unique requirements that existing models absolutely cannot meet, can justify the massive investment through clear revenue gains, and have both the technical expertise and financial resources for a multi-year commitment.

Even then, I’d recommend you reconsider.

The companies I see succeeding with AI aren’t building models from scratch. They’re focusing their resources on smart implementation, effective integration, and solving actual business problems.

One highly effective approach is the RAG (retrieve, augment, generate) pattern, which creatively leverages existing LLMs to meet specialized needs.

By retrieving relevant information, augmenting it with specific context, and generating targeted outputs, companies can deploy off-the-shelf models that align precisely with their business requirements—without the overhead of building from scratch.

Most companies need general-purpose LLMs

General-purpose LLMs grant you access to advanced AI capabilities while someone else handles the expensive, complicated parts. The costs for access are minimal compared to building and maintaining your own LLM.

Think about what you’re getting.

Models trained on vast datasets that most organizations could never gather
Enterprise-grade security and compliance frameworks
Scalable infrastructure that handles load spikes and ensures reliability

With companies like OpenAI, Anthropic, and Google continually pushing boundaries, your subscription includes automatic updates and improvements.

This way, your engineering team can focus on adding business value—building applications that solve real problems, integrating AI into workflows, and creating competitive advantages.

That’s where real innovation happens—not in training models, but in applying them cleverly to business problems.

Accuracy concerns with general-purpose LLMs

“But what about accuracy? Won’t a custom-trained model better understand our specific needs?”

I hear this concern constantly, and it’s based on a fundamental misconception about how modern LLMs work.

The real key to accuracy isn’t having your own model—it’s optimizing how you work with existing ones. This optimization happens along two critical axes: context and model behavior.

Representation of Context optimisation and LLM optimisation by OpenAI

Context optimization

Context optimization becomes your primary focus when your use cases extend beyond the model’s general knowledge.

For instance, a financial services company might need the model to understand proprietary trading strategies. A healthcare provider might need it to work with the latest clinical guidelines. A manufacturing firm might need it to understand specific operational procedures.

In each case, the challenge isn’t the model’s fundamental capabilities, it’s ensuring it has access to the right information at the right time.

RAG to improve output accuracy

This is where RAG (Retrieve, Augment, Generate) patterns and prompt engineering shine.

Instead of training a new model, RAG lets you give a general-purpose LLM dynamic access to your specific knowledge base.

A vector database for a user to use a LLM

Vector databases, which store the semantic meaning of text passages, enable the model to retrieve similar information based on meaning rather than exact wording. This allows for precise identification of relevant data for each query.

Prompt engineering then structures this information optimally for the model, essentially teaching it to speak your language and understand your context in real-time.

Model optimization

Model behavior refinement focuses on getting consistent, properly formatted outputs that align with your needs. Your:

Organization might need responses in a specific format for seamless system integration
Unique brand voice might require a particular tone and style
Compliance requirements might demand specific reasoning patterns or decision frameworks

This is where fine-tuning proves invaluable and helps reinforce these patterns and expectations, making a general-purpose model behave like a specialized one without the overhead of full model training. More on this in a bit.

Getting the most out of general-purpose LLMs

The path to effective AI implementation doesn’t run through building your own LLM.

I’ve watched companies exhaust their resources trying to build custom models when they should have been focusing on clever applications of existing technology.

Instead, you could optimize general-purpose LLMs to align outcomes with your goals. There are three ways to do this: in-context learning, model fine-tuning, and prompt engineering.

Is Building Your Own LLM Worth It? Probably Not

In-context learning

This is the most efficient approach available..

Modern LLMs are astonishingly adaptable. They can understand and apply new information on the fly. You provide context in your prompts, guide the model with examples, and get remarkably accurate outputs. No special training is required.

The beauty of this approach is its flexibility—you can adjust your prompts and context as your needs evolve.

Fine-tuning

Fine-tuning offers another powerful middle ground.

Instead of building a model from scratch, you’re customizing an existing one for your specific needs.

The resource requirements are orders of magnitude smaller than full model training. You need thousands of examples rather than billions, weeks rather than months, and reasonable computing power rather than massive clusters.

Fine-tuning shines when you need consistent behavior or a deep understanding of specialized terminology.

Prompt engineering

Let me emphasize something that often gets overlooked: prompt engineering.

This is your most powerful tool when working with existing LLMs, yet many organizations rush to build custom models without mastering this crucial skill.

Effective prompt engineering can make a general-purpose LLM perform like a specialized one at a fraction of the cost.

Guide the model to understand industry-specific terminology
Enforce consistent formatting and response structures
Implement complex decision trees and logical flows
Maintain specific tones and writing styles
Control for biases and ensure compliance with your guidelines

The key is understanding that prompts are essentially programming; they’re how you tell the model exactly what you want and how you want it.

Also, prompt engineering is infinitely adaptable. Unlike a custom model that’s expensive to retrain, you can modify your prompts instantly as your needs change.

You can A/B test different approaches, refine your instructions, and optimize for different scenarios without any additional computing costs.

Reconsider building from scratch

The allure of building your own LLM is understandable, but it’s a path paved with hidden costs and complexity.

Through proper application of open LLMs, vector databases, and RAG patterns, you can build sophisticated AI systems without the expense and risk of custom model development.

Your competitive advantage in AI won’t come from owning a model. It will come from how effectively you apply existing technology to solve real business problems.

As I mentioned, the organizations I see succeeding aren’t the ones building models from scratch but are the ones focusing their resources on smart implementation and effective integration.

This approach is inherently future-proof. As general-purpose models continue to improve, your applications automatically benefit from these advances. While others are stuck maintaining their custom models, you can focus on what really matters: creating value for your business.

Is Building Your Own LLM Worth It? Probably Not

“Proprietary” sounds good, but only in theory

OpenAI’s got the capital, but have you?

Time and costs are prohibitive, to say the least

Costs beyond training and serving

When does building your own LLM make sense?

Most companies need general-purpose LLMs

Accuracy concerns with general-purpose LLMs

Context optimization

RAG to improve output accuracy

Model optimization

Getting the most out of general-purpose LLMs

In-context learning

Fine-tuning

Prompt engineering

Reconsider building from scratch

Related Insights

Predictive and Generative AI: From the 1980s to Now

AI-Enhanced Lawyer Referral Service for D.C. Bar

Enhancing Customer Value with Salesforce and AI

Ready to Get Started? Let's Talk.