Skip to main content
Back to blog

Critiq team

  • philosophy

The trust gap in AI-assisted coding, and what inspectable feedback looks like

Developers use AI assistants daily but often distrust review feedback. Inspectable rules, evidence, and local checks close that gap.

The trust gap in AI-assisted coding, and what inspectable feedback looks like content

Most teams now write code with an assistant at the keyboard. Autocomplete, refactors, test stubs, and “explain this diff” are normal. The same developers who accept generated patches often ignore automated review comments on the pull request. That is not hypocrisy. It is a trust gap, and it will not close with louder bots or vaguer summaries.

If feedback cannot be traced to a rule, reproduced locally, and argued about on the merits, it reads like opinion. Opinion from a tool you did not configure feels like noise. Opinion from a black box feels like risk. The way out is not “more AI review.” It is feedback that behaves like engineering: inspectable, severitized, and tied to evidence you can verify before you merge.

Why the trust gap shows up in PRs

AI-assisted coding increases throughput. It also increases the number of changes a human skimmed instead of authored line by line. Reviewers compensate by leaning on automation, but only when the automation earns credibility. A comment that says “this might be insecure” without a stable identifier, a severity you can compare across files, or a fix you can evaluate in context does not reduce cognitive load. It adds another voice to an already noisy thread.

Generic PR bots amplify the problem. They summarize diffs in fluent prose, flag “potential issues” without reproducible steps, and rarely point to a definition you can open in the repo. When the model changes or the prompt shifts, the same code might pass on Tuesday and fail on Thursday. Teams learn to treat those comments as suggestions from an intern who never files a ticket: sometimes useful, never authoritative.

  • No rule ID, you cannot look up what was violated or tune policy
  • No evidence chain, the bot does not show the fact the engine used
  • No local reproduction, you cannot run the same check before you push
  • No stable severity, “consider fixing” does not map to merge policy

Developers are right to be skeptical. Merge decisions need accountability. If a check cannot be named, versioned, and re-run on a laptop, it does not belong in the same category as a failing unit test or a compiler error.

What inspectable feedback looks like

Inspectable feedback starts with identity. Every finding should cite a rule ID, a stable string such as ts.security.hardcoded-credentials, that maps to a rule file in a catalog you can read. The ID is the contract between the comment on the PR, the JSON from your CI job, and the documentation your team trusts. When someone asks “why did this fire?”, the answer is not a paraphrase. It is a path to the rule.

Next comes severity and scope. Severity should be coarse enough to gate merges and fine enough to prioritize fixes. Scope should say which language, which paths, and which fact triggered the match, for example a literal secret in source, an unsafe SQL interpolation, or a floating promise you never awaited. The reviewer should see the same structured fields in the terminal, in SARIF, and in the inline comment.

A concrete comment shape

Compare a vague bot note with a rule-backed finding. The first is prose you cannot audit. The second is a small record you can diff against policy.

Rule: ts.security.hardcoded-credentials
Severity: high
File: src/auth/signing.ts:14

Finding: Literal signing key in source.

Fix: Load the key from environment or your secrets manager; do not commit material secrets.

References:
- Rule: @critiq/rules (catalog)
- CWE-798: Use of Hard-coded Credentials

That shape is deliberately boring. Boring is good. It means your security lead can allowlist a rule, your tech lead can downgrade severity in config, and a new hire can read the rule text without asking Slack whether the bot “meant it.” The fix line is actionable; the references tie the finding to standards you already track.

Deterministic checks vs generic AI review

Critiq is built on deterministic, rule-backed analysis, not on shipping an AI reviewer today. The open source CLI loads the @critiq/rules catalog, evaluates your tree locally, and emits findings you can inspect. The same rule that fires in CI is the rule you read in the repo. That is a different product promise than “we read your PR and wrote a paragraph.”

Hosted AI review and premium packs are part of the longer roadmap, but they are not the foundation of trust. Trust starts when every result is explainable without a model card: rule ID, engine version, file, line, severity, and remediation. AI-assisted coding can stay in your editor; the merge gate should behave like infrastructure you operate.

  • Generic AI PR bots, fluent, adaptive, hard to reproduce locally
  • Rule-backed scanners, stable IDs, catalog you can fork, same output in CI and on your machine
  • Critiq today, OSS CLI, @critiq/rules, critiq-action for GitHub; findings tied to inspectable rules

We are not arguing assistants are bad. They are excellent for exploration. The gap is review feedback that inherits their opacity. When only the cloud can explain why a comment appeared, you have traded one black box for another. Local-first checks reverse that: the evidence runs where your code lives.

Teams that adopt this habit report a quieter PR thread: fewer drive-by opinions, more threads that end with “disabled rule X until we migrate” or “fixed per ts.security.*.” That is the bar for merge automation, not whether the comment sounds confident, but whether a second engineer can verify it without a vendor dashboard.

Run the check, then read the rule

Closing the trust gap is a habit, not a purchase. Run critiq check from the repo root before you open the PR. When something fires, open the rule in the catalog (or your fork) and read what it actually matches. If the finding is wrong, you have a specific rule to fix or disable, not a mystery model mood.

npx critiq check .
critiq check --format json .

JSON output is useful for CI and for comparing runs across branches. The fields you care about, ruleId, severity, location, message, should stay stable so critiq-action can post inline comments that match what you saw locally. If local and CI disagree, you debug versions and config, not prompt drift.

In GitHub Actions, wire the same check with critiq-action so reviewers see rule IDs on the diff. Treat high-severity security findings like any other failing policy you own. Document exceptions in code or config with the rule ID in the commit message so the exception is searchable six months later.

What we are optimizing for

Critiq’s public surface today is the OSS CLI, the rules catalog, and Action integration. We optimize for developers who want review feedback that is as inspectable as the code it comments on, privacy-preserving, local-first, and honest about what shipped. Thoughtful AI-assisted coding in the editor can coexist with merge gates that do not ask you to trust a summary you cannot reproduce.

The trust gap closes when feedback looks like engineering artifacts: named rules, severities you can policy, fixes you can implement, and evidence you can re-run. Everything else is commentary.

Next steps

  • OSS CLI and rules, https://critiq.dev/products/oss
  • Documentation and install, https://docs.critiq.dev/
  • GitHub Action, https://critiq.dev/integrations/github-actions