AI productivity is not a shortcut to competence
/ Bridgekeeper team
A new paper from Anthropic, “How AI Impacts Skill Formation”, reports a clean randomized experiment that should settle one question and open a harder one.
52 experienced Python developers were asked to learn Trio, an asynchronous library none of them had used before. Half had AI assistance. Half did not. Everyone took the same quiz afterward.
The headline number: the AI group scored 17% lower on the comprehension quiz, with a Cohen’s d of 0.738 and p = 0.010. The gap was largest on debugging questions, where the no-AI group had been forced to read more error messages and trace more code. They were slower to finish, but they understood more of what they had built.
That is the easy finding. AI shortcuts comprehension. The harder finding is that not all AI use produced the same outcome.
Shen and Tamkin clustered AI users into six interaction patterns. Three of them produced quiz scores below 40%:
- AI delegation. Hand the whole task to the model.
- Progressive AI reliance. Start independent, slide into delegation.
- Iterative AI debugging. Ask the model to fix every error.
Three patterns kept scores above 65%:
- Generation-then-comprehension. Ask for code, then ask follow-up questions about what it does.
- Hybrid code-and-explanation. Request code with an explanation in the same prompt.
- Conceptual inquiry. Ask the model questions about the concepts. Write the code yourself.
The dividing line is whether the human ever forms an independent mental model of the change. The patterns that preserved learning all involve the human doing some explanation work, either by asking, by reading, or by writing. The patterns that destroyed learning all involve handing the model a problem and accepting whatever it returns.
This is exactly what Bridgekeeper tries to enforce at review time. We ask the reviewer to explain a salient change in their own words before we show them what the model thinks it does. It is the conceptual-inquiry pattern, applied to code review. The author already wrote the code; the question is whether anyone, including the author, can articulate what it does.
If your team is shipping AI-generated code without articulating what it does, your team’s competence is depreciating faster than the codebase is appreciating.