What LLMs Actually Commoditize
The Mushy Step in a Rigid Workflow
The Slop Problem Is a Misuse Problem
Judgment Is the New Constraint
- What Judgment Means in Practice
Where We’re Seeing Results
A Note of Humility
The Transistor Lesson

TL;DR: LLMs commoditize transformation. They take information in one form and reshape it into another. They work best as a single “mushy step” in an otherwise deterministic workflow. The constraint for many white-collar tasks has shifted from creation to judgment. Organizations that understand this will build remarkable things.

In 1947, a couple of Bell Labs physicists invented the first solid-state transistor. Transistors had several advantages over vacuum tubes: they were faster, smaller, and more energy-efficient by orders of magnitude. While the significance of this invention was obvious, the transition from vacuum tubes to transistors did not happen overnight. It required years of experimentation and design, proving out new applications one by one. Over time, the transistor fundamentally changed what was possible by commoditizing information transfer.

Large language models (LLMs) will similarly commoditize something. The disappointment many people feel with AI today comes from misapplying the technology: a misunderstanding of what LLMs actually commoditize.

What LLMs Actually Commoditize

The transistor did not think. It did not replace engineers. It performed a specific function (switching and amplifying electrical signals) faster, smaller, and cheaper than what came before. Engineers who understood this built remarkable things that birthed the world we live in today. The transistor would have seemed useless if engineers had tried to use it for something it couldn’t do.

LLMs have a similar core utility: they commoditize transformation—the work of taking information in one form and reshaping it into another. Summarizing, translating, reformatting, synthesizing. These tasks previously required significant human labor. Now they don’t.

Synthesizing a 50-page technical document into a 2-page executive summary takes a human hours at best; an LLM can do it in a minute flat. Analyzing trends in customer feedback across twelve languages might take a team weeks, and now it takes a machine an hour. API documentation that used to be updated annually because it’s an undesirable and low-priority chore is now always up to date, with minimal human effort.

LLMs are built on an architecture called the transformer, and that name is not accidental. These tools are meant for transformation.

The analogy isn’t perfect. Transistors are deterministic: Input A always produces Output B. LLMs are probabilistic: Input A produces Output B-ish, most of the time. That’s a significant difference! You can’t build reliable systems on unreliable foundations. You can, however, insert a single unreliable step into an otherwise reliable system and get remarkable-yet-predictable results.

The Mushy Step in a Rigid Workflow

LLMs work best as a single “mushy step” in an otherwise deterministic workflow.

Traditional software is rigid. Input A produces Output B, every time. That rigidity is what makes software reliable. Traditional software struggles with tasks requiring flexibility, ambiguity, or natural language understanding.

LLMs are the opposite. Probabilistic, flexible, capable of handling ambiguity. LLMs also have trade-offs, though. They are unreliable when chained together without oversight.

The math is straightforward. A fully agentic workflow with five LLM-powered steps has five opportunities to fail, and those failure rates compound. A five-step workflow with one AI-powered step and four deterministic steps has one opportunity for AI-related failure. Five LLM-powered steps with a 95% success rate have a combined 77% success rate (0.95^5 = 0.77).

If you have spent any time debugging complex systems, you recognize this pattern. It is the same reason we do not put five developers in a room and ask them to read project requirements from one to the next like a game of telephone. Each handoff introduces error. The errors compound. By the end, you are nowhere near where you started.

The successful deployments I have seen treat LLMs as a process component, not a process replacement. Often, these deployments have clear boundaries and human oversight. They place deterministic rails around probabilistic capabilities. They inject the judgment of intentional people to handle unresolved ambiguity.

The Slop Problem Is a Misuse Problem

The frustration with AI-generated “slop” flooding the internet is not evidence that the technology does not work. It is evidence of pervasive misuse.

When someone prompts an LLM with a sentence or two and publishes whatever comes out, they are asking the model to create from nothing. That is not its strength. LLMs generate without intent, judgment, or care about the outcome. (You may be just as disappointed giving a human author the same meager context for your task!)

Hand that same model a technical specification and ask it to identify inconsistencies? Provide customer interview transcripts and ask it to surface recurring themes? You get a usable output. Because you gave the LLM the context it needed to do something useful.

Research from Harvard Business School confirms this. When Boston Consulting Group used GPT-4 for tasks within the model’s capability frontier, they completed more tasks faster and with higher quality. Applying the same tool to tasks outside that frontier, performance was significantly worse than consultants who worked without AI. The researchers called it a “jagged technological frontier.” There is a capability boundary. You just may not notice it until you’re deep in a failure state.

Judgment Is the New Constraint

When any technology commoditizes a function, adjacent functions become relatively more valuable. The transistor made raw switching speed cheap; system architecture became the constraint. Word processors made document production cheap; thinking clearly about what to write became the differentiator.

If you think about this through the lens of the Theory of Constraints, the bottleneck moves. Transformation used to be a constraint in knowledge work. Skilled people spent hours reshaping information from one format to another. Now that constraint is lifting. Throughput becomes limited by something else.

The new constraint is judgment. The evidence is everywhere. People share AI-generated content without concern for how it will be perceived. LLMs lack intent or concern for outcomes. Only people possess the ability to understand why something matters to their organization at this moment. Their good discernment comes from good knowledge (context!) about their work.

What Judgment Means in Practice

If transformation is cheap and judgment is the constraint, organizations will ask new questions for all aspects of their work:

Hiring: Are we selecting for people who can create a work product, or are we selecting for people who can evaluate and guide a work product? Humans are needed to guide and evaluate work, not necessarily to produce all work.

Process: Are we utilizing the right tools in the right sequence? Where is transformation needed and where can errors be tolerated? Are we inserting human or automated verification checkpoints? The compounding failure rate of sequential LLM steps means oversight is an architectural necessity.

Strategy: What decisions require deep context that can’t be fully captured in a prompt? Those decisions are where human attention should concentrate.

Cheap transformation changes the skills we value in people, but does not replace the need for skilled people.

Photo of a document which states the following: A COMPUTER CAN NEVER BE HELD ACCOUNTABLE. THEREFORE A COMPUTER MUST NEVER MAKE A MANAGEMENT DECISION. — Source: Simon Willison's Weblog.

Economists say that technology transforms tasks, not jobs. Jobs are bundles of tasks. Some tasks get automated, some get augmented, and new tasks emerge. The bundle (“the job”) is changed, not eliminated. A Stanford study tracking 5,000+ customer service agents found AI assistance improved average productivity by 14%, but the gains were uneven. Novice workers improved 34%. Highly skilled workers saw minimal change. The AI compressed the skill distribution by giving everyone access to patterns that previously required years to develop. Tasks shifted. Jobs remained.

For jobs whose tasks include lots of “mushy” transformation steps, the strategic aspects of work will become more valuable relative to execution. In software development, simple requirements can be transformed into code: the code writing task diminishes. The prominent software development tasks become requirements specification, understanding goals and needs, designing architecture, solving complex problems, optimizing performance, ensuring the product fulfills requirements, and guiding adjustments as use generates feedback.

One Google engineering leader calls this the “70% problem.” Non-engineers using AI tools can quickly reach 70% of a solution. That final 30% (handling edge cases, ensuring security, building maintainable systems) still requires human expertise. Debugging that 70% when it breaks requires understanding that the AI never had.

Anyone who has inherited a codebase from a “hero” developer knows this feeling. The code works until it does not. When it breaks, nobody understands why.

Where We’re Seeing Results

For Twin Sun and our clients, the pattern is consistent: workflows with a single, well-bounded LLM step outperform both fully manual processes and fully LLM-driven ones.

Requirements synthesis takes time, especially with new clients. We often spend hours reviewing manually typed notes, meeting transcripts, and requirements documents before providing a simple high-level quote for a new project. Now, we utilize an LLM to itemize product requirements, which a human can quickly review and refine to produce a final quote.

Specification work is diminished. The need for good judgment remains.

Software upgrades are also changing. In the past, it would take us 2-3 days of mostly manual work to upgrade software libraries, fix regressions, and validate that software is ready for deployment. Now, we have workflows that feed release notes, upgrade instructions, and security tool scan results to an LLM to devise an upgrade plan. A combination of internal tooling, automated tests, and visual inspection steps are then used to validate the success of software upgrades before manual testing begins. A significant code base upgrade can be validated in a few hours with confidence, versus days of manual work.

The code writing is diminished. The need for good judgment remains.

We are also recognizing that the task of generating code is increasingly solvable with LLMs and supporting tools. Code generation has rarely been the constraint in moving faster with software development. We estimate that only 30% of a developer’s time is devoted purely to typing (brainstorming, planning, executing, and iterating take the majority of time). That typing time will increasingly be replaced with time devoted to judgment: reviewing generated software, iterating on the specification, and focusing on the more innovative aspects of building software. Our team is already adopting generative AI tools for software development, and naturally shifting into a focus on discernment. Is this tool doing the right thing? Is this the right tool? Is the specification clear enough for the tool to succeed?

In short, we are assuming that AI capabilities will continue improving, but that human oversight and orchestration will remain necessary for organizations to achieve their software-driven goals. With the capabilities LLMs give us today, there is ample room for most companies to materially benefit from the commoditization of transformation.

A Note of Humility

I want to be careful not to oversell this framing. The strongest counterargument is temporal. Simply put, the current limitations may be temporary. Scaling laws have continued to hold for newer generations of language models. If investments in AI research and infrastructure continue to hold (which is a big “if”), there is no convincing reason I’ve seen to expect progress to stop.

However, I can wrongly predict what the future may hold. What I describe is the situation today based on my experience: LLMs are great transformers, excelling at reshaping information. They are task-takers, not job-takers.