The pre-flight check I run before shipping any AI feature

Most AI features fail in production for reasons the team could have caught in 30 minutes.

Here is the pre-flight check I run before shipping any AI feature. Seven questions. If you cannot answer all of them in plain language, you are not ready.

1. What happens when the model returns nonsense?

Not when it returns the wrong answer — when it returns syntactically broken output. Malformed JSON, hallucinated function calls, refusal-mid-stream. Your pipeline has to survive this. “It usually works” is not an answer.

2. What is the cost ceiling per user per day?

Token costs compound silently. A power user with a runaway loop can burn $80 in tokens overnight. If you do not have a hard cap with graceful degradation, you have an outage waiting to happen.

3. Which decisions are reversible and which are not?

Sending an email is irreversible. Saving a draft is reversible. Deleting a record is irreversible. Showing a recommendation is reversible. Your AI’s autonomy budget should map directly to this — high for reversible, near-zero for irreversible.

4. What does the audit log look like?

If a user complains the AI did something they did not authorise, can you reconstruct exactly what happened? Inputs, model output, tool calls, final action — all traceable. If not, you cannot debug, cannot comply, and cannot trust your own system.

5. What is the fallback when the model is unavailable?

Provider outages happen. Rate limits happen. Your AI feature should degrade to “useful without AI” or “clearly unavailable” — never silently broken.

6. How do you know if it is getting worse?

Models drift. Prompts that worked six months ago can quietly degrade. If you do not have ongoing evals, you are flying blind and will not know until users complain.

7. What is the human override path?

Every AI decision should have a way for a human to overrule it without escalating to engineering. If your support team cannot undo what the AI did, you do not have a feature — you have a liability.

The teams that ship AI reliably are not smarter. They just answer these seven questions before writing the first line of code.

There is an eighth question

It did not make the original list, but it matters more at month twelve than it does on launch day:

Who owns the prompt when the engineer who wrote it leaves?

Prompts are code. They have business logic embedded in them. They get tweaked, optimised, A/B tested. But almost no team treats them like code — no review process, no documentation, no clear owner. One year in, the prompts that power your product are tribal knowledge held by whoever happened to write them.

This is not a launch-day problem. It is a twelve-month problem. Which is why most teams have not felt it yet.

Which of these eight has bitten your team hardest?