The future of artificial intelligence lies not in grandiose, resource-draining systems but in leaner, more efficient models. For AI to truly flourish, the barriers of high hardware costs and exorbitant usage fees must come down. In 2025, a wave of AI-powered apps is set to reshape industries, delivering on the long-awaited promise of generative AI for both consumers and businesses.
Right now, however, the pursuit of artificial general intelligence (AGI) dominates the AI landscape. Companies like OpenAI, Google, and Elon Musk’s xAI are locked in an expensive arms race to create the most powerful large language models (LLMs). These endeavors require staggering investments. For instance, Musk’s xAI reportedly spent over $3 billion on 100,000 Nvidia H100 GPUs to train its LLM, Grok. At such costs, only the wealthiest tech leaders can afford to compete.
This high-stakes competition has led to a top-heavy ecosystem, where cutting-edge LLMs are too costly for widespread adoption. While these models deliver exceptional performance, their inference costs—the price of processing user queries—are prohibitively high for most app developers. It’s as though everyone owned 5G smartphones but couldn’t afford the data to watch a video or browse social media. Developers are left with a tough choice: either use cheaper, underperforming models that fail to impress users, or risk financial ruin by relying on premium LLMs.
By 2025, the narrative will shift. Drawing lessons from earlier tech revolutions, like the rise of PCs and mobile devices, the industry will prioritize efficiency and accessibility. Just as Moore’s Law drove down costs and improved performance in computing, a similar principle is beginning to emerge in AI.
Inference costs are already plummeting, thanks to advancements in AI algorithms, hardware, and architecture. Consider the cost of using OpenAI’s top-tier models: in May 2023, running an AI-powered search query cost about $10, compared to $0.01 for Google’s non-generative search—a stark 1,000-fold difference. By May 2024, OpenAI reduced this cost to around $1 per query. If this trend continues, developers will soon have access to high-quality, affordable models, paving the way for a surge in innovative AI applications.
This paradigm shift will inspire a new approach to building AI. Instead of chasing the elusive AGI crown, entrepreneurs will focus on creating “good enough” models that are lightweight, fast, and inexpensive. These purpose-built systems will be tailored for specific commercial applications, achieving remarkable efficiency without the astronomical costs associated with top-tier LLMs.
A Silicon Valley startup, Rhymes.ai, offers a glimpse into this future. By vertically integrating the development of its model, inference engine, and application, the company trained a model nearly as capable as OpenAI’s best for just $3 million—compared to the reported $100 million needed for GPT-4. The resulting inference costs for their AI search app, BeaGo, are a mere $0.03 per query, only 3% of GPT-4’s. This was accomplished with a team of five engineers in just two months, highlighting the potential of streamlined, integrated approaches.
Generative AI has already demonstrated its transformative potential across education, work, and everyday life. However, for AI to become a truly universal tool, the ecosystem must overcome its cost barriers and strike a balance between performance and affordability. By focusing on leaner models and scalable solutions, we can create a sustainable, thriving AI landscape that works for everyone.