Critiq is an open source code review toolchain. The public CLI (`@critiq/cli`) and rules catalog (`@critiq/rules`) scan your repository locally and return deterministic findings you can use in the terminal or CI.

Does Critiq send my code to the cloud?

The OSS CLI runs locally on your machine or in your CI runner. It analyzes files in your checkout and does not upload your source to Critiq by default. Hosted Critiq Cloud features are optional and separate from the open source CLI.

How is this different from AI review tools?

Critiq findings come from explicit, testable rules, not probabilistic model output. You can inspect each rule, reproduce results, and tune what runs in CI. That makes feedback consistent and auditable.

Is Critiq open source?

Yes. The core engine, public CLI, and OSS rules catalog are open source under the Critiq GitHub organization. Premium rule packs and hosted product features are separate offerings.

Back to blog

March 14, 2026Critiq team

rules

Change-risk signals: what testing hygiene rules are really catching

Testing hygiene and change-risk signals: merge-confidence gaps, silent skips, flaky timers, and diff-scoped coverage holes before ship.

A green CI run tells you the pipeline finished. It does not always tell you whether the change you are about to merge still has the safety net you think it does. Testing hygiene rules exist to close that gap. They read test files and diffs the way a careful reviewer would, looking for signals that confidence is leaking out of the suite, not for stylistic perfection.

Critiq ships these as change-risk signals, not lecture notes. Each finding ties to a rule you can open, inspect, and tune. When a pull request touches auth, billing, or another critical path, the question is not “did someone write tests?” but “did this change preserve the evidence we rely on at merge time?” The rules below answer that question in specific, repeatable ways.

What “change-risk” means here

Change-risk is the distance between what a diff actually does and what your test suite still proves afterward. A refactor can leave behavior intact while removing the only assertion that would catch a regression. A hotfix can add a skip with good intentions and no ticket, which becomes permanent coverage debt three sprints later. A unit test that hits a live API can pass locally and flake in CI, which teaches teams to retry instead of fix.

Security and correctness rules catch dangerous code. Testing hygiene rules catch dangerous gaps in how we verify code. Both belong in the same review pass because both affect merge confidence. For syntax-level correctness gaps, see /blog/correctness-bugs-in-review. The difference is tone: these findings assume you want tests; they flag when the suite stops being trustworthy for the change at hand.

Skips without traceability

Rule: ts.testing.no-skipped-test-without-ticket (and polyglot equivalents such as go.testing.t-skip-without-ticket-reference, py.testing.pytest-skip-without-ticket-reference, java.testing.disabled-without-ticket-reference, rust.testing.ignore-without-ticket-reference, ruby.testing.skip-without-ticket-reference, and php.testing.mark-test-skipped-without-ticket-reference).

A skipped test is a conscious decision to stop checking something. That is sometimes correct, flaky upstream, environment missing, work tracked elsewhere. The risk signal is the skip with no adjacent traceability: no issue key, no GitHub URL, no TODO with an expiry, no accepted suppression comment. Those skips decay silently. The suite still reports green while a branch of behavior goes un exercised every run.

From a merge-confidence perspective, an untraceable skip is a hole with no owner. Reviewers cannot tell if it is a five-minute fix or a quarter-long deferral. Linking the skip to a tracker id or a dated note keeps the debt visible and gives CI a fair story: we know this is uncovered, we know why, and we know when to revisit it.

// Risk signal: skip with no ticket or expiry
it.skip('rejects expired sessions', () => {
  expect(revokeSession(token)).toBe(false);
});

// Traceable: reviewers know where to look
it.skip('rejects expired sessions, PROJ-482', () => {
  expect(revokeSession(token)).toBe(false);
});

Flaky timers and wall-clock tests

Rule: ts.testing.no-flaky-timer-test, plus sleep-style rules across Go, Java, PHP, Python, Ruby, and Rust (for example go.testing.time-sleep-in-unit-test, java.testing.thread-sleep-in-unit-test, py.testing.time-sleep-in-unit-test, ruby.testing.sleep-in-unit-test, rust.testing.thread-sleep-in-unit-test, php.testing.sleep-in-unit-test).

Real timers and sleeps in unit tests race CI load, parallel workers, and laptop battery settings. A test that waits 100ms might pass nine times and fail on the tenth, which erodes trust in the whole pipeline. The change-risk signal is not “never use time.” It is “this file asserts timing behavior without fake timers or a clock abstraction, so the result is non-deterministic under merge pressure.”

When a diff adds setTimeout, Date.now, or thread sleep inside a unit path, reviewers should ask whether the behavior under test is time logic or incidental delay. Fake timers (jest.useFakeTimers, vi.useFakeTimers, injected clocks) keep the assertion about the code, not about the scheduler on the runner that day.

// Risk signal: wall-clock wait in a unit file
await new Promise((r) => setTimeout(r, 200));
expect(cache.get('key')).toBeNull();

// Stabilized: advance fake time instead of sleeping
vi.useFakeTimers();
vi.advanceTimersByTime(200);
expect(cache.get('key')).toBeNull();

Real network in unit tests

Rule: ts.testing.no-network-call-in-unit-test and path-scoped polyglot rules (go.testing.real-network-in-unit-test, py.testing.real-network-in-unit-test, ruby.testing.real-network-in-unit-test, rust.testing.real-network-in-unit-test, java.testing.http-client-in-unit-test, php.testing.curl-in-unit-test). Integration and e2e paths are excluded by design; the signal targets unit-shaped files that still open real sockets.

Outbound HTTP in a unit test couples merge confidence to external uptime, DNS, rate limits, and data drift. A PR can be logically correct and still fail CI because a staging endpoint hiccuped. That noise trains teams to rerun jobs instead of reading failures, the opposite of deterministic review.

The rule does not argue against integration tests. It marks when a file labeled as unit work still depends on the network. Doubles, recorded fixtures, and in-memory servers preserve the scenario while keeping the merge signal about your diff, not about someone else’s API.

// Risk signal: live fetch in *.test.ts
const res = await fetch('https://api.example.com/v1/status');
expect(res.ok).toBe(true);

// Local confidence: mock or MSW handler
server.use(http.get('/v1/status', () => HttpResponse.json({ ok: true })));
const res = await client.status();
expect(res.ok).toBe(true);

Snapshot intent

Rule: ts.testing.no-snapshot-without-intent.

Snapshots can be excellent guardrails when reviewers know what behavior they freeze. Anonymous expect(...).toMatchSnapshot() calls churn in diffs without saying what regressed. The change-risk signal is a snapshot matcher with no string name and no // snapshot: intent comment on the preceding line, hard to review, easy to rubber-stamp.

Named snapshots and one-line intent comments turn a blob diff into a contract: “this protects the empty-state markup,” “this locks error copy.” That is merge confidence you can discuss in review instead of wave through because the snapshot file grew.

// Risk signal: unnamed snapshot, no intent
expect(renderPage()).toMatchSnapshot();

// Reviewable: name + intent
// snapshot: empty cart shows continue-shopping CTA
expect(renderPage({ cart: [] })).toMatchSnapshot('empty-cart');

Diff-scoped missing edge cases

Rule: ts.testing.no-missing-edge-case-tests.

This rule looks at project shape, not a single line. When a diff touches branch-heavy logic on a critical service path, many conditionals, guards, error returns, and the paired test file does not change in the same PR, that is a change-risk signal. The code moved; the evidence did not.

It is a heuristic, not a verdict. Small typo fixes should not demand new cases. Large control-flow edits without any test update are exactly when reviewers worry about un exercised boundaries. The finding invites a concrete question: which new branches, failures, or guards does this PR introduce, and where would we see them fail if they broke?

Used well, this rule complements diff-scoped CI (run tests touched by the change) by highlighting when the touched production file and its test file diverged in scope. Green on narrowed test runs plus a missing-edge-case signal means you may be proving less than the diff warrants.

Test-only imports in production

Rule: ts.testing.no-test-only-code-in-production.

Production modules that import test doubles, mock servers, or helpers from __tests__ trees carry a high-severity change-risk signal. So do NODE_ENV === "test" branches in shipping paths. Test harness coupling can ship dead code, accidental behavior splits, or dependencies that bundlers and tree-shaking were never meant to carry.

This is less about test style and more about boundary integrity. Shared neutral modules belong in src; test-only utilities stay behind test entrypoints. When a merge introduces a production import from a test folder, confidence in what actually runs in prod drops, even if unit tests pass.

// Risk signal: production imports test helper
import { buildMockUser } from '../__tests__/fixtures/user';

// Safer: shared factory in neutral module
import { buildUser } from '../factories/user';

Polyglot coverage without a separate playbook

The TypeScript rules above are the deepest set today: snapshots, diff-scoped edge cases, and production import boundaries are TS/JS-first. Across Go, Java, PHP, Python, Ruby, and Rust, Critiq ships parallel *.testing.* rules for the same merge-confidence themes, skips without ticket references, sleeps in unit paths, and real outbound calls in unit-shaped files.

You do not need a different mental model per language. Skip traceability, deterministic unit isolation, and network doubles answer the same question everywhere: does this change still have verifiable evidence when we merge? Polyglot rules let monorepos enforce that consistently without waiting for every language to reach feature parity.

Testing hygiene signals in pull request review

Treat findings as prompts, not failures on sight. A skip with PROJ-123 and an expiry is information. An untraceable skip on a payment path is a merge stop until someone owns the gap. Low-severity timer and snapshot findings might batch into a hygiene pass; high-severity production import findings deserve the same attention as a correctness bug.

In CI: run critiq check on the PR diff via /integrations/github-actions so testing rules surface next to security and quality findings.
In review: read the rule id and remediation summary, each maps to a catalog entry on docs.critiq.dev.
In triage: fix, suppress with traceability, or move the scenario to an explicit integration suite, but do not leave silent holes.

npx @critiq/cli check .
critiq check --format json --staged

Merge confidence is the product

Testing hygiene rules will never replace judgment about what to test. They make judgment easier by naming the gaps that most often slip through green CI: untracked skips, flaky isolation, anonymous snapshots, diffs that outrun their tests, and production paths that still depend on test-only code.

That is the thread through ts.testing.* and polyglot *.testing.*, not scolding about test purity, but readable signals about whether this merge still has the evidence you are betting on. Run the checks, open the rules, and decide from the output the same way you would for any other Critiq finding.