Grounded AI That Knows Its Ground: A New OCT-Powered Coach Elevates PCI Planning Beyond General Models
Percutaneous coronary intervention (PCI) is a life-saving procedure, but getting it right hinges on reading inside the artery with optical coherence tomography (OCT) and translating that image into precise device choices. That’s a tall order even for seasoned operators, and it’s exactly where artificial intelligence (AI) is stepping in as a real-time decision helper. A new study introduces a domain-specific, RAG-augmented AI-OCT system named CA-GPT, designed to plan and assess OCT-guided PCI. The headline: CA-GPT outperforms a general-purpose model (ChatGPT-5) and junior operators across a range of decision tasks, especially in complex scenarios.
If you’ve ever wondered how AI could be tamed to work with the fast-moving, high-stakes world of interventional cardiology, this study is a compelling read. Here’s what it means, in plain language, and what it could mean for the practice and training of PCI going forward.
What’s the big idea here?
OCT gives a detailed, color-by-number view of plaque, calcium, and stent fit inside the artery. But interpreting OCT isn’t trivial. There’s notable variability among readers, especially for less-experienced clinicians, and that variability can affect the quality of PCI — from device sizing to ensuring the stent sits squarely and fully expands.
Enter CA-GPT: a purpose-built AI-OCT system that combines two key ideas:
- A small, specialized OCT analysis layer that does concrete measurement tasks (lumen segmentation, plaque characterization, stent apposition, OCT-based FFR computation, etc.).
- A large-language-model (DeepSeek-R1) layer that reasons over OCT outputs and guidelines, but with a crucial twist: it uses retrieval-augmented generation (RAG). RAG grounds AI reasoning in a curated knowledge base that includes current guidelines and tens of thousands of annotated PCI cases, reducing “hallucinations” and keeping outputs evidence-based.
In short: CA-GPT is designed to be an OCT intuition engine, plus a grounded knowledge backbone that keeps recommendations aligned with real-world guidelines and data.
How the study was set up (the quick version)
- Setting: A single center (Tangdu Hospital, Fourth Military Medical University, China) analyzed 96 patients who underwent OCT-guided PCI, covering 160 lesions.
- Comparators:
- CA-GPT: The domain-specific AI-OCT system.
- ChatGPT-5: A general-purpose AI model used as a baseline comparator.
- Junior physicians: Interventional cardiologists with 1–5 years of PCI experience, interpreting OCT on their own.
- Reference standard: The actual, expert-operated procedural records adjudicated by senior PCI experts.
- What was measured: Ten predefined decision metrics split into pre-PCI (planning) and post-PCI (assessment) phases, with each metric scored 0 or 1 (total possible score 5 per phase). This gave a total agreement score per case per method (0–5).
- Primary endpoint: Overall agreement score against the expert standard.
- What was compared: The agreement scores across CA-GPT, ChatGPT-5, and junior physicians, plus performance on individual metrics and subgroup analyses (e.g., lesion location, ischemia by OCT-FFR, ACS vs SCAD, calcium severity).
Key point: This is retrospective and single-center, so it’s a strong signal about potential, not a definitive multi-center validation yet.
The headline results (what actually happened)
Pre-PCI planning (the “let’s plan this case” phase)
- CA-GPT led the pack: median total pre-PCI agreement score of 5 (IQR 3.75–5).
- ChatGPT-5: median 3 (IQR 2–4).
- Junior physicians: median 4 (IQR 3–4).
- Statistically, CA-GPT outperformed both comparators (P < 0.001 for CA-GPT vs both others).
- Specific metrics where CA-GPT shined:
- Pretreatment device type: 73.6% agreement for CA-GPT vs 37.5% for ChatGPT-5 and 61.1% for juniors.
- Pretreatment device sizing: 70.8% vs 40.3% (ChatGPT-5) and 61.1% (juniors).
- Stent diameter: 90.3% agreement for CA-GPT vs 63.9% (ChatGPT-5) and 72.2% (juniors).
- Stent length: 80.6% vs 54.2% (ChatGPT-5) and 52.8% (juniors).
Post-PCI assessment (the “how did we do after the work is done?” phase)
- CA-GPT again led in overall agreement: median 5 (IQR 4.75–5).
- ChatGPT-5: median 4 (IQR 4–5).
- Junior physicians: median 5 (IQR 4–5) — still solid, but CA-GPT was significantly higher than ChatGPT-5 and showed superiority over juniors in certain metrics.
- Notable metrics:
- Minimum stent area (MSA): 100% agreement across CA-GPT and ChatGPT-5; juniors were at 95.5%.
- Stent expansion: CA-GPT 78.4% vs ChatGPT-5 33.0% (significant difference) vs juniors 84.1% (comparable to CA-GPT for this metric).
- Stent apposition: CA-GPT 93.2% vs juniors 76.1% (CA-GPT outperformed juniors; no difference vs ChatGPT-5).
- Severe dissection and significant tissue prolapse: CA-GPT performed at very high levels, comparable to or better than both comparators.
Subgroups (where CA-GPT’s strengths were most evident)
- Across subgroups, CA-GPT’s superiority over ChatGPT-5 persisted in pre-PCI planning.
- Compared to juniors, CA-GPT’s advantage was more pronounced in LCx/RCA lesions, ischemia-defined lesions (OCT-FFR ≤ 0.80), ACS presentations, and mildly calcified lesions.
- In post-PCI assessments, CA-GPT’s edge over juniors was most evident in LCx/RCA lesions; in LAD or ACS subgroups, the difference was smaller or not statistically significant, but CA-GPT still outperformed ChatGPT-5.
Representative case
- The paper includes a case where CA-GPT’s integrated plan (pre-treatment strategy, device sizing, and post-dilation steps) matched the expert procedure across all metrics, illustrating how the system can coordinate OCT findings with guideline-grounded decision logic in a real-world scenario.
What this all means in plain terms: CA-GPT isn’t just faster; it consistently aligns with expert choices on a range of important PCI decisions, and it does so more reliably than a general AI model and better than junior clinicians in several challenging situations.
Why CA-GPT seems to perform better (the guts of the approach)
Three core ideas power this performance:
1) Domain-specific design
- CA-GPT isn’t a generic “talking brain.” It has a dedicated OCT analysis layer that executes 13 core tasks, including precise lumen segmentation and stent appraisal, plus an OCT-FFR calculation. This is the hands-on, image-reading side of things.
2) RAG grounding
- The retrieval-augmented generation framework ties the AI’s reasoning to a knowledge base that includes current guidelines and a large library of annotated PCI cases. This grounding helps prevent the AI from making up facts or misapplying guidelines, a known risk with plain LLMs.
3) End-to-end, workflow-aligned decision support
- The system is designed to produce structured, evidence-backed recommendations for each PCI stage, not just a vague summary. It’s built to slot into the real-time decision-making rhythm of a cath lab: read the OCT, compare to guideline anchors, and suggest concrete device choices and optimization steps.
In contrast, ChatGPT-5, while linguistically fluent, is a general-purpose model that isn’t tuned to OCT specifics or anchored to a live, domain-specific knowledge base. The study notes how Western guideline emphasis and English-language focus could also create gaps when compared to a domain with Chinese expert consensus and locally practiced decision paths.
The take-home here is not that “AI is bad.” It’s that for high-stakes, image-driven procedures like OCT-guided PCI, a grounded, domain-specialized AI with a reliable knowledge backbone can do a better job translating data into reliable, guideline-consistent decisions.
What this could mean for practice, training, and workflow
1) Enhanced consistency and reduced learning curve
- For junior operators, a system like CA-GPT can serve as an authoritative guide, offering transparent reasoning tied to guidelines and real cases. This could shorten the steep learning curve of OCT interpretation and PCI decision-making.
2) Time savings and smoother workflows
- The study notes that AI-driven outputs could reduce interpretation time substantially, from minutes to seconds in some settings. That speed matters in busy cath labs and can free up clinicians to focus on patient-specific nuance and immediate procedural steps.
3) Safer, more standardized care in complex cases
- The strongest gains appeared in complex lesion contexts (e.g., significant calcification, multivessel considerations, ACS presentations). In these settings, AI-grounded guidance can help align decisions with evidence when human readers might diverge due to experience gaps or cognitive load.
4) Educational value and feedback loops
- CA-GPT’s explainable chain of reasoning, supported by RAG, offers a transparent basis for feedback. Trainees can see why a recommendation was made and how it aligns with guidelines, which is valuable for competency-based training.
5) Potential roadmap for integration
- The authors frame this as an end-to-end decision support system, which means future steps could include real-time integration into cath-lab information systems, multicenter validation, and prospective outcome studies to see if standardized decisions translate into fewer complications or better long-term results.
Of course, it’s important to temper enthusiasm with realism: this study is retrospective and single-center. External validation across diverse patient populations, imaging systems, and operator teams is needed before broad clinical rollout. Long-term outcomes (MACE, mortality) weren’t reported here, as the focus was on decision-making consistency.
A few practical prompts and prompts-design tips (for those curious about prompting AI in this space)
If you’re a clinician or a researcher thinking about using RAG-based AI for OCT-guided PCI or similar tasks, here are ideas inspired by how CA-GPT is structured:
Grounded-task prompts
- “Given the OCT outputs [list specific measurements], provide a pretreatment plan including device type, sizing, and justification anchored to the Chinese Expert Consensus on OCT in PCI.”
- “Assess post-PCI images for MSA, expansion, and apposition. Flag any parameters that fall outside guideline targets and propose a post-dilation strategy with rationale.”
Evidence-backed outputs
- “Cite the guideline clause or evidence source for each recommendation.” Encourage the model to return sources from the knowledge base used in RAG.
- “If a discrepancy arises with the expert record, explain the difference and how guideline anchors would resolve it.”
Uncertainty handling
- “If the data are borderline or imaging quality is suboptimal, provide a confidence score and suggest additional imaging or tests to resolve uncertainty.”
Subgroup awareness
- “Stratify recommendations by lesion location (LAD vs LCx/RCA), calcification severity, and ACS vs SCAD presentation to tailor guidance to the patient’s context.”
Educational mode
- “Explain each decision in plain language and show how it aligns with a specific guideline—intended for trainee review.”
Prompt hygiene
- Keep prompts concise, structured, and anchored to quantifiable outputs (numbers, thresholds, and clearly defined metrics) to minimize ambiguity.
These are not “one-size-fits-all” prescriptions, but they illustrate how robust, explainable, guideline-grounded AI outputs can be designed to be useful in the Cath Lab or training room.
Limitations to keep in mind
- Single-center, retrospective design: The findings are promising but may not generalize to all centers, systems, or patient populations.
- Not long-term outcomes yet: The study focused on decision agreement, not on clinical endpoints like MACE or mortality.
- OCT-imaging gaps: Not every patient had both pre- and post-PCI OCT imaging, which can influence the completeness of the metrics.
- Real-world integration: How CA-GPT performs in real-time, daily practice across multiple operators and equipment will require broader testing and workflow integration.
The authors themselves call for multicenter, prospective studies to validate these results and to explore long-term outcomes and broader implementation strategies.
Real-world implications: what changes, if any, could we expect?
- If validated broadly, RAG-grounded, domain-specific AI-OCT systems could become a standard assistant in OCT-guided PCI, particularly helping less experienced operators reach expert-level decision quality more consistently.
- Training programs might incorporate AI-driven feedback loops, letting trainees compare their decisions to guideline-based AI recommendations and learn from discrepancies.
- Hospitals could see more standardized PCI planning and post-procedural assessment, potentially reducing variability in device sizing and post-PCI optimization.
All of this would be aimed at safer, faster, and more consistent care for patients with complex coronary disease.
Key Takeaways
- Domain-specific, grounded AI (CA-GPT) paired with OCT analysis and a retrieval-augmented generation framework outperformed a general-purpose AI (ChatGPT-5) and junior operators in pre-PCI planning and post-PCI assessment across ten PCI decision metrics in a retrospective single-center study.
- The CA-GPT system combines a dedicated OCT analysis layer with a knowledge base anchored to current guidelines and thousands of PCI cases, reducing AI hallucinations and increasing evidence-based recommendations.
- In pre-PCI planning, CA-GPT showed higher agreement with expert records, particularly in device type, sizing, stent diameter, and stent length. In post-PCI assessment, CA-GPT achieved higher agreement on stent expansion and apposition, with universally strong performance on MSA.
- Subgroup analyses suggest CA-GPT’s advantages are most pronounced in LCx/RCA lesions, ischemia-defined lesions, ACS presentations, and mildly calcified lesions; gains in other subgroups were still evident but more modest.
- While the results are promising, they come from a single center and retrospective design without long-term outcomes. Multicenter validation and prospective studies are needed before broad clinical adoption.
- Practical implications include potential improvements in consistency, efficiency, and training for OCT-guided PCI, with careful attention to integration into clinical workflows and ongoing validation.
- If you’re designing prompts for similar AI-augmented imaging tools, grounding outputs in guidelines, providing explicit metrics, and citing sources can help improve reliability and educational value.
And finally: for clinicians and educators, the big idea here is not to replace expertise with machines, but to use a grounded AI system as a high-fidelity, explainable partner that standardizes core decisions, highlights uncertainties, and accelerates learning. That combination could help ensure that more patients receive precise, guideline-consistent PCI planning and evaluation — even in busy, high-pressure cath lab environments.
If you want, I can tailor a version of this for a specific audience—patients curious about AI in heart care, cardiology trainees, hospital administrators evaluating AI tools, or AI researchers interested in prompting strategies and evaluation metrics.