When are AI Products Good Enough to Release?

In a new video, Silverchair CTO Stuart Leitch offers a recap of the talk he gave at last month’s Platform Strategies Satellite Event on AI in Berlin. He was joined by 45 other industry thought leaders for a day of interactive discussions on the key challenges in applying AI techniques or technologies in scholarly publishing.

Stuart Leitch speaking at the Berlin AI event

Discussion topics included:

What ethical rules and normative practices do we need to encode in generative AI features or products?
Given the inherent limitations of GenAI, and cost to quality tradeoffs, how do we determine when a product or feature is good enough to release?
Practical advice for getting started – what should organizations think about?

In the first part of the talk, he offers an overview of the capability advancements, economic influence, and possible projections. He also discusses the strengths and limitations of AI applications in scholarly publishing (as laid out in his recent Scholarly Kitchen article). The last part of the talk, linked and summarized below, shares where he sees the technology going and some of the quality tradeoffs currently with deployment for content discovery and consumption.

As discussed in the Scholarly Kitchen article, the capabilities of a RAG system are heavily impacted by cost, leading to major tensions in solution designs. Cost-quality tensions can generally be bucketed as follows:

LLM strength: quality and cost are highly correlated. Claude Opus, arguably the strongest model currently, is around 150 times the cost of Mistral’s Mixtral 8x22B, a fast and cheap but respectable model ($15 per million input tokens for Opus compared to 10¢ for Mixtral with 100 tokens equating to around 75 words).
Depth and breadth of material injected into context window: sophisticated analysis, where the answer can’t just be found from a single authoritative source, often requires a lot of information to be injected into the LLM’s context window. This can require a lot of tokens.
Sophistication of content preparation: pre-preparation prior to ingestion to weight content for authority and suppress content that could lead to misleading results adds up-front cost. For example, reference sections often yield very high semantic similarity scores but are not sufficient for an LLM to draw conclusions from (although they will happily do so).
Retrieval strategy: techniques like query rewriting, HyDE, hybrid search, ReACT and FLARE all improve quality at significant additional cost
Validation strategy: additional steps to check the quality of responses will catch many outliers but adds additional engineering complexity and incur higher token consumption costs
Sophistication of ranking strategy: adds engineering effort and compute costs of cross-encoding
Agentic strategies: as frontier models become better at following precise instructions, it is possible to create recursive loops where an agent (or many agents) builds a research plan and forages for information until satisfied. Sophisticated agentic techniques such as FLARE typically incur 10x the token cost of simpler approaches. Within a few generations, it is likely models will have the reasoning abilities and be sufficiently reliable at instruction following to perform comprehensive literature reviews, at potentially unbounded cost.

Think about the risks / tradeoffs in using AI in products as in the example of self-driving cars. Though driving is a highly bounded context, conversational interfaces are inherently multidimensional, and usage of these features in various situations and physical conditions will defy neat classifications. Highly engaged users of AI tools will be more familiar with which aspects are more reliable and therefore will know where human judgement calls need to be made. But when releasing an AI product, there’s a distinct need for user education around the degree to which a human should be in the loop.

Leaders join a cocktail reception following the Berlin event for continued discussions

As leaders in your companies, you should be thinking about AI applications in terms of technology S-Curves. Too early and you waste a lot of money. Too late and your competitors have advantage.

This is not a technology cycle we can sit out, but where we are on the S-Curve depends on the application. For practitioners using beginner or intermediate reference material, the curve us going vertical in this timeframe. Other cases will vary. If we do nothing, most of the value will accrue to larger companies and publishers will simply become licensers and there will be significant disintermediation. There’s a strong current of efforts happening right now in our industry to get ahead of that, which is important given the quality of research content and the value it will bring to those systems.

Ultimately, when is good enough to release is a combination of two factors: 1) the holistic cost of quality, which is heavily contingent on managing user expectations; and 2) the recognition that disruption is coming and if we wait for the optimal quality cost profile, audiences will likely have already substantially migrated to competing services.

The main point is to get closer to the technology and get a better understanding of how it currently performs and how that performance is steadily ticking up while costs are quickly coming down.

There’s no magical answer, but we as an industry are in a very interesting time. At Silverchair, we tend to be optimistic—there’s a lot of progress happening around benchmarking, around multi-modality, around safeguards and policies, and around overall industry literacy. We expect significant advancements in sophisticated reasoning in the coming months and years, as the various models have caught up to one another and continue to improve behind the scenes.

We don’t know the path these products will take either globally or in our industry, but we remain committed to learning together as a community and sharing our insights.

Want to join the conversation? Join us at Platform Strategies 2024 on September 25^th in Washington, DC. Register here by September 1^st.

When are AI Products Good Enough to Release?

Contact Info

Silverchair

Sign Up For Our Newsletter