What Insurance Auditors Look For in AI-Generated Therapy Notes

A practical guide for therapists using AI documentation tools who want their notes to survive insurance audits. Covers what auditors actually review, how AI-generated notes fail, specific red flags auditors look for, and step-by-step strategies to ensure your AI-assisted notes meet payer standards.

The Problem with AI Notes That Look Like AI Notes

The therapists who get audited are not usually the ones writing fraudulent notes. They are often the ones who adopted AI documentation tools without adjusting their review process, and whose notes ended up reading like they were generated from a template, because they were.

Insurance auditors are experienced reviewers who read hundreds of therapy records. They know what a real session with a real client in ongoing treatment looks like. They also know what copy-paste looks like, what auto-generated language looks like, and what a note that was never actually tied to a specific clinical moment looks like.

This guide is not about avoiding AI tools. Most of the tools on the market can produce compliant notes. The problem is that the output of those tools requires clinical judgment to finalize, and many therapists are treating AI-generated drafts as final documents.

Here is what auditors actually look for, where AI notes commonly fail, and how to close the gap.

What Insurance Auditors Actually Review

When a payer audits your therapy records, the review is rarely random. Retrospective audits are triggered by statistical outliers (you bill 90837 for more than 90% of sessions when the regional average is 60%), by complaints, or by probe audits that sample newly credentialed providers. Prepayment reviews are increasingly common for certain CPT codes and provider types.

Regardless of trigger, auditors are looking at the same five dimensions across every record they pull.

1. Medical Necessity

Medical necessity is the foundational requirement. Every session you bill must be clinically necessary for treatment of a diagnosed condition. That means your note needs to show:

The diagnosis is still active and clinically supported
The presenting symptoms or functional impairment documented this session justify the level of care
Treatment is not maintenance, social support, or general wellness (unless the payer covers those)

Vague AI-generated language like "client continues to struggle with anxiety symptoms" does not establish medical necessity. What does establish it: specific functional impairments, recent symptom escalation, or documentation of why the client has not yet met their treatment goals.

2. Treatment Plan Alignment

The interventions you document in your session note must match what is in the treatment plan. If your treatment plan lists CBT for panic disorder with a goal of reducing avoidance behaviors, and your session note describes "supportive counseling and psychoeducation," an auditor will flag the mismatch.

AI tools that generate notes from session summaries are particularly prone to this error, because they describe what happened in the session without checking whether that matches the documented treatment approach.

3. Session-Specific Clinical Detail

Auditors are trained to distinguish individualized notes from templated ones. A session note that could apply to any client with the same diagnosis on any given week is not a good session note.

Session-specific detail includes:

The specific content the client brought to the session
Concrete behavioral observations (not "client appeared anxious" but what that looked like)
What the client said or disclosed (in paraphrase, not verbatim unless clinically relevant)
The therapist's clinical reasoning, not just the intervention name

4. CPT Code Accuracy

The CPT code you bill must match the documented session time. 90834 (individual therapy, 45 minutes) requires face-to-face time of 38-52 minutes. 90837 (individual therapy, 60 minutes) requires 53 or more minutes. Billing 90837 when your note documents a 45-minute session, or when you have multiple sessions with no documented start and end time, creates an audit flag that is easy for reviewers to find and hard for you to defend.

AI tools that do not prompt you to enter session duration can inadvertently produce notes that default to no time documentation, or that use generic language about session length.

5. Timely Documentation

Most payers require notes to be completed within 24-72 hours of the session. Backdated documentation creates a chain of problems: it is often detectable from EHR metadata, it suggests notes were not written contemporaneously, and it opens questions about whether the note reflects what actually happened or was reconstructed later.

How AI-Generated Notes Fail Audits

AI documentation tools vary significantly in architecture. Ambient recording tools generate notes from transcribed session audio. Post-session text-entry tools like template-based systems generate notes from your written inputs. Both types can produce compliant notes, and both can fail audits in predictable ways.

Generic Language That Lacks Individualization

The most common AI audit failure is a note that sounds plausible but contains no specific clinical detail. Here is an example that would fail medical necessity review:

"Client attended 60-minute individual therapy session. Client reported ongoing symptoms of depression and anxiety. Clinician provided CBT interventions. Client tolerated session well. Plan: continue therapy."

This note contains nothing that establishes this specific client's need for this specific session. An auditor reading this in a 20-note sample will see the same language across multiple sessions, or across multiple clients, and flag the entire record for review.

Interventions That Do Not Match the Treatment Plan

Consider a fictional therapist, Dr. Amara, who uses an AI ambient scribe for her outpatient practice. Her client Miguel has a treatment plan with goals around trauma processing using prolonged exposure. In a recent session, she and Miguel spent most of the session doing psychoeducation about the trauma response before doing any in-vivo work.

Her AI tool generated a note that described "psychoeducation and supportive counseling provided." Her treatment plan listed "Phase 2 PE protocol: in-vivo exposure exercises." An auditor reviewing her chart sees a note describing a supportive session against a treatment plan describing an active exposure protocol, with no explanation for the deviation.

The documentation failure is not the AI's fault. The AI described what happened. But a compliant note would document why the session deviated from the treatment plan, what the clinical rationale was, and what the plan is to return to the protocol.

Identical Language Across Multiple Sessions

AI tools that use the same template and receive similar inputs will produce notes that look nearly identical across sessions. This is one of the clearest signals to an auditor that the clinician is not reviewing or individualizing their notes.

An audit reviewer at a regional BCBS plan described this pattern in a 2024 industry compliance newsletter: when the Assessment section of ten consecutive notes contains word-for-word identical language, she treats that as evidence that the clinician is not documenting the actual session.

Session Time Documentation Errors

AI tools that generate notes without prompting for session duration create CPT code vulnerability. A note that says "60-minute session" without a documented start and end time is weaker than one with timestamps. Some payers require clock times in the progress note itself, not just in the billing record.

Progress Contradictions

AI tools working from session summaries can generate progress language that contradicts earlier notes in the chart. If session 12 documents "significant improvement in anxiety symptoms" and session 17 documents "client continues to present with severe anxiety," an auditor reviewing the chart for medical necessity will question whether treatment is working and whether continued billing is justified.

This is not always an AI error. It is an error that AI tools make more likely because they generate progress language from a single session's input without cross-referencing the longitudinal record.

Specific Red Flags Auditors Look For

These are the patterns that move a record from routine review to active investigation.

Identical or near-identical assessment language across three or more consecutive sessions. This suggests template reuse without individualization.

Intervention described in the note does not appear in the treatment plan and no deviation rationale is documented. This is the treatment plan alignment failure described above.

Progress described as "good" or "improving" while treatment plan goals remain unchanged for 6+ months. If progress is documented but goals are never updated or discharged, the clinical logic does not hold.

90837 billed consistently for clients with one presenting issue in stable outpatient treatment. This is a statistical outlier flag. 60-minute sessions are medically justified for complex presentations. Billing them uniformly suggests billing pattern rather than clinical necessity.

Session note completed more than 72 hours after the session. EHR metadata makes this visible to auditors from large payers who request system-generated audit logs.

No documented response to interventions. Listing interventions without documenting how the client responded is insufficient. "CBT techniques were used" is not enough. "Client engaged in thought records and identified three automatic thoughts related to failure; expressed ambivalence about challenging these beliefs" is.

Mental status exam sections that are copied from session to session. Mental status exam (MSE) documentation must reflect the client's presentation on that specific date. Identical MSE language across sessions is a red flag, especially when the client has a condition that should show variability.

Audit Scenarios: Pass and Fail

These fictional examples illustrate the difference between notes that clear an audit and notes that do not.

Scenario 1: GAD Client, Session 14

Fail version (AI-generated, unedited):

"Client attended individual therapy session. Client reported symptoms of anxiety including worry and physical tension. Clinician implemented cognitive-behavioral therapy techniques. Client showed motivation for change. Treatment goals continue to be addressed. Next session scheduled."

This note fails medical necessity because it contains no specific clinical content. It fails treatment plan alignment because no specific intervention is named. It fails session-specific documentation because nothing in it is tied to this client's actual session content.

Pass version (AI draft, clinician reviewed and individualized):

"Session 14, 60 minutes (1:00-2:00 PM). Client presented with elevated anxiety (self-rated 7/10) related to a performance review scheduled at work this week. Discussed automatic thoughts around catastrophizing the outcome (e.g., 'I'll be fired'). Applied cognitive restructuring using the thought record format from the treatment plan; client was able to generate two alternative explanations with moderate belief (50-60%). Also identified physiological escalation pattern (shallow breathing, muscle tension) and practiced diaphragmatic breathing. Client's belief in catastrophic outcome decreased from 80% to 50% by end of session. Plan: client will complete thought record for one worry event before next session. Review work outcome at next session."

This note establishes medical necessity (active GAD with functional impact), demonstrates treatment plan alignment (cognitive restructuring is the documented approach), contains session-specific detail, documents client response to the intervention, and sets a measurable next step.

Scenario 2: MDD Client, Deviation from Treatment Plan

Fail version:

"Client attended session. Discussed current mood. Client reports low motivation and sleep difficulties. Supportive therapy provided. Client engaged throughout. Continue treatment."

Pass version:

"Session 9, 50 minutes (10:00-10:50 AM). Treatment plan identifies behavioral activation as primary intervention for MDD F33.0. Today's session deviated from activation protocol at client's request: client disclosed a significant family conflict this week involving sibling estrangement that was acutely distressing and needed immediate processing. Clinical decision: addressed immediate stressor using supportive reflection and problem-solving framework given acute presentation. Behavioral activation protocol will resume next session. PHQ-9 this session: 16 (was 14 at session 6). Increase in score reflects current situational stressor; overall trajectory since intake (PHQ-9 = 22) remains positive. Safety: no SI/HI."

This note documents a treatment plan deviation with explicit clinical rationale, a plan to return to the protocol, and longitudinal context for an apparently worsened score.

How to Ensure Your AI-Assisted Notes Pass Audits

Review Every AI Draft Before Signing

This sounds obvious but is the single most important step. AI-generated drafts are starting points. The clinician's signature on a note is a legal attestation that the note accurately represents the session. Treat the AI output as a first draft that requires your clinical judgment before it becomes a record.

Add at Least Three Session-Specific Details

After reviewing the AI draft, add a minimum of three details that are specific to this client, this session:

What the client said or raised (specific content, not category)
A concrete behavioral observation from the session
The client's response to the primary intervention used

These three additions will take two to three minutes and will transform a generic note into an individualized one.

Verify CPT Code Against Documented Session Time

Before submitting, confirm that the CPT code you are billing matches the session duration you have documented. If you bill 90837, make sure your note contains a start and end time, or at minimum a documented total session length of 53+ minutes.

Check Treatment Plan Alignment Quarterly

Set a calendar reminder every 90 days to review active clients' treatment plans against the interventions you have been documenting. If you have been describing interventions that differ from the plan, either update the plan to reflect the current clinical approach or document the rationale for deviation in your notes.

Document Response to Interventions, Not Just Interventions

Every note should answer: what did you do, and how did the client respond? The response is the clinical evidence that the intervention was appropriate and that treatment is progressing. It is also what distinguishes individualized documentation from a billing checklist.

Use Outcome Measures to Anchor Progress Language

Standardized outcome measures like the PHQ-9, GAD-7, or PCL-5 provide objective data that anchors your progress descriptions. Rather than writing "client reports improvement," write "PHQ-9 decreased from 18 to 11 over the past four sessions, consistent with clinician observation of increased engagement and reduced psychomotor slowing." This kind of documentation is difficult to challenge on audit because it is data-driven and specific.

Tools like NotuDocs include template fields that prompt you to enter outcome measure scores so they are captured in the note structure rather than left to after-the-fact documentation. The template architecture ensures these fields appear every time, reducing the likelihood they are skipped under caseload pressure.

Payer-Specific Documentation Standards

Different payers have specific requirements that go beyond baseline standards. These are the major ones to know.

Medicare (Medicaid varies by state): Requires documented medical necessity in every note, with functional impairment language directly connected to the diagnosis. G-codes and functional limitation ratings were phased out but the underlying requirement to document functional impact remains. Session time must be documented with sufficient specificity to support the CPT code billed.

Aetna: Requires treatment plan reviews every 90 days and expects documented progress toward goals in every note. Aetna auditors specifically look for notes where progress is documented as "good" without corresponding goal revision or discharge planning.

UnitedHealthcare: Has a practice of requesting records when 90837 is billed for more than 65% of sessions in a given year. Notes should document clinical rationale for 60-minute sessions, particularly for clients with stable presentations.

BCBS (varies by plan): Increasingly using retrospective audits with EHR metadata requests. The metadata flag for late documentation is common. Complete notes within 24 hours of the session when possible.

Medicaid: Medicaid requirements vary significantly by state but universally require medical necessity documentation with diagnosis specificity, treatment plan updates at defined intervals, and service delivery documentation that matches what was prior-authorized. Many state Medicaid plans require prior authorization for sessions beyond a defined annual limit, and session notes must specifically document why continued treatment is clinically necessary.

Audit-Ready Documentation Checklist

Use this before signing any AI-assisted therapy note.

Medical necessity

Active diagnosis is documented with ICD-10 code
Note includes at least one specific functional impairment or symptom that justifies this session
Treatment is not described as maintenance or social support unless clinically justified

Treatment plan alignment

Intervention described in the note matches the treatment approach in the active treatment plan
If session deviated from the treatment plan, clinical rationale is documented
Treatment plan was reviewed within the payer's required interval (usually 90 days)

Session-specific detail

Note includes specific content from this session (not generic symptom categories)
At least one concrete behavioral observation is included
Client's response to the primary intervention is documented

CPT code accuracy

Session duration is documented (start and end time, or total minutes)
CPT code billed matches the documented session length
Modality matches the code (individual vs group, psychotherapy vs E/M)

Progress documentation

Progress language is specific and data-anchored where possible
If outcome measure was administered, score is recorded in the note
Progress language is consistent with prior notes in the chart

Administrative

Note was completed within 24-72 hours of the session
No copy-paste from prior notes without updating for this session's content
Mental status exam reflects this session's presentation, not a standing template