What is AI-supported intro Python education over 3 years?

It’s an introductory programming course where GenAI tools are available to students, studied longitudinally across multiple cohorts. This particular work tracks changes in awareness, student–AI interaction, and course outcomes from 2023 to 2025.

How does GenAI change student interaction in an intro Python course?

As familiarity grows, students increasingly use AI for more than just “write code.” They shift toward partnering with the assistant during debugging and help-seeking, reflecting higher AI literacy and changing expectations.

Why is students’ evolving AI awareness important?

Because it affects how students decide when to ask for help and how they interpret AI’s responses. The study reports that familiarity and uptake become more normative over time, changing what students expect an assistant to provide.

What are the benefits of integrating AI programming support in course design?

The key benefit is more productive learning practices: redefining help-seeking so students maintain agency while using AI effectively. This supports better collaboration patterns like “debug together” instead of relying on AI as a direct answer source.

How can instructors apply the study’s findings for the next term?

Redesign assignments and guidance to steer students toward debugging and reflection with AI, not just solution generation. Explicitly teach AI literacy and set expectations for when and how students should use GenAI to support learning.

AI in Intro Python Over 3 Years: Student Use Shifts

Introduction: What Happens After GenAI Becomes Normal?
Why This Matters
How Students’ Awareness of GenAI Changed
How Students Interacted with AI: From “Write Code” to “Debug Together”
Did Performance Improve (or Stay the Same)?
What Instructors Should Do Next
Key Takeaways
Sources & Further Reading

Introduction: What Happens After GenAI Becomes Normal?

If you teach (or are learning) intro programming, you’ve probably felt the shift: students don’t wait for help as long, they try fewer “mystery debugging” stretches, and they start asking an AI for answers the moment something breaks. But here’s the big question most classrooms still can’t answer with confidence: what changes over time once GenAI stops being a novelty and becomes routine?

That’s exactly what a new three-year study from Japan digs into. Based on the original paper, researchers followed the same introductory Python course across three successive AI-supported cohorts (2023–2025). Instead of just checking what students do on day one (or week one), they tracked how students’ awareness, interaction patterns with an AI tool, and course outcomes evolve as more students arrive already familiar with GenAI.

The findings are a bit surprising: students absolutely change how they use AI—but not in a simple “they ask more for answers” way. Their help-seeking becomes more iterative and more aligned with real programming workflows. And despite those interaction shifts, grades and weekly assignment performance stay fairly stable at the cohort level.

Why This Matters

Right now, we’re living through a classroom-level experiment whether we planned it or not. GenAI is being adopted rapidly, but most policies and teaching designs are still based on short-term assumptions: “Will students cheat?” “Will they over-rely?” “Will they learn less?” Those are real concerns—but this research suggests something more nuanced: the key variable isn’t whether students use AI. It’s how courses shape “productive” AI use over time.

A scenario you can apply today

Picture an intro Python course where students have AI access through a tool integrated into the learning platform. Early on, you notice a predictable behavior: students paste the assignment prompt and ask for a complete solution (“just write the code for me”). After a few weeks, you see a second pattern emerge among stronger students: they start asking AI to explain errors, verify outputs, and suggest smaller fixes.

This study suggests that this “workflow maturation” can happen at the cohort level—especially as students become more familiar with AI before the course starts. That means instructors shouldn’t only design “AI usage rules.” They should design AI interaction habits: how to prompt, how to iterate, and when to stop using AI and test/reflect.

How this builds on previous AI research

Earlier work often focused on GenAI’s capabilities (can it generate correct code?) or short deployments (what happens in a week?). This paper builds on that by adding longitudinal evidence: it shows not just that students use AI, but that their expectations and interaction sequences change in structured ways across years.

In other words, it moves the conversation from “AI as a tool” to “AI as a teammate in learning practices”—and it raises a more actionable instructional challenge: maintain student agency while redefining what “help” means.

How Students’ Awareness of GenAI Changed

One of the clearest shifts across cohorts is how students arrive at the course already understanding what GenAI is for.

Familiarity became mainstream

Students were surveyed at the beginning of each course offering. In the 2023 cohort, only 5.2% reported high familiarity with GenAI, while 26.3% reported low familiarity. By 2024, the “high familiarity” group nearly exploded to 47.2%, and by 2025, 100% reported at least medium familiarity—with 60% in the high category.

That’s a huge warning sign for anyone teaching with “novice AI” assumptions. By 2025, students aren’t coming in blind—they come in with expectations and habits already formed from everyday use.

Usage followed the same upward trend

The story repeats for actual use. In 2023, only 36.8% said they actively used GenAI; a sizable chunk reported no use. By 2024, active usage rose to 63.9%, and by 2025, active usage was 91.7%—with just 8.3% not using it.

Perceived benefits got stronger and more uniform

Students’ views on whether GenAI helps learning were overwhelmingly positive—and became even more so over time:
- 2023: split between positive/strongly positive/neutral
- 2024: more strongly positive, fewer neutral responses
- 2025: 100% rated it positive or strongly positive; no negative responses

This matters because it suggests rising confidence isn’t only about newer AI models. It’s also about normalization: students learn what works, what doesn’t, and what kinds of responses to trust (or at least compare against tests).

How students described AI shifted, too

When students described GenAI in their own words, their vocabulary changed:
- 2023: terms like “AI,” “ChatGPT,” “Conversation” (high-level novelty)
- 2024: more functional words like “Answer,” “Question,” “Information”
- 2025: more learning-oriented language like “Help,” “Understand,” “Learned”

So by 2025, students didn’t just think “AI gives answers.” Many thought “AI helps me understand,” which likely changes how they prompt and evaluate responses.

How Students Interacted with AI: From “Write Code” to “Debug Together”

Awareness is only half the story. The researchers analyzed logged student–AI conversations from a web-based GPT interface. There were 10,632 interactions total, and they manually coded 2,782 sampled instances (~26%) to categorize prompt types and response quality.

Interaction frequency increased after 2023—but then leveled

They measured “requests per student-week.” The mean requests were:
- 2023: 3.05
- 2024: 4.68
- 2025: 4.75

The jump from 2023 to later years was statistically significant, but 2024 and 2025 were similar. That suggests a “new normal” phase: once most students already know how to use AI reasonably well, they don’t keep increasing usage forever.

Week-by-week usage also tracked curriculum difficulty—peaking during conceptually demanding topics and multi-step tasks.

The biggest shift: prompt types changed dramatically

Here’s where the longitudinal story becomes really interesting.

In 2023, student prompts were dominated by Code Implementation (42.67%)—basically, “write or complete the code.”

In 2024, this flips: Code Verification becomes the most common prompt type (52.25%), while Code Implementation drops to 7.87%.

By 2025, the system becomes more “debugging-native.” Error Message Interpretation rises to 36.18%, with Code Verification (23.27%) and Code Implementation (17.28%) still present.

Think of it like a kitchen workflow:
- 2023: “Just cook the whole dish for me.”
- 2024: “Check whether my cooking is correct.”
- 2025: “When the recipe fails, interpret the error, adjust the ingredients, verify again.”

Sequential workflow changed too (not just “what they asked”)

They didn’t stop at counting prompt types. They also used Transition Network Analysis (TNA) to study the sequence of prompts—what came right after what within a task episode.

2023 network was sparse: fewer strong transitions; interactions looked more like one-shot queries.
2024 network got connected and verification-centered: Code Verification became a hub linking to implementation, correction, and explanation.
2025 network was densest and most interconnected: students more often moved between problem understanding, conceptual questions, debugging, correction, implementation, and verification in multi-stage loops.

This is crucial: it suggests students weren’t only using AI more confidently—they were using it as part of an iterative problem-solving cycle.

Response quality stayed high, but “correct not helpful” existed

To see whether AI outputs themselves were degrading or improving, the researchers coded responses along:
- Correctness (technically accurate?)
- Helpfulness (does it actually help the student progress in the course task?)

Overall, Correct & Helpful stayed very high:
- 2023: 94.67% (Correct & Helpful)
- 2024: 89.39%
- 2025: 91.01%

The main “quality loss” wasn’t widespread wrong answers. It was more like: the AI response might be correct in isolation, but not aligned to the course constraints or the student’s immediate need. For example, it may propose a technique beyond what they’ve learned, skip key reasoning steps, or not address the exact failure mode.

That’s a practical reminder: even high-quality AI can still be mismatched to a student’s current learning stage.

Did Performance Improve (or Stay the Same)?

Given all this shifting behavior, you might expect grades to change. But the cohort-level results are pretty steady.

Weekly assignment performance: no significant differences across cohorts

They compared each student’s weekly assignment scores and ran a Kruskal–Wallis test:
- no statistically significant differences across cohorts (Kruskal–Wallis H = 3.23, p = 0.357)

Final course grades: also broadly similar

Final grade means were “broadly similar” across the three years. So even with changing interaction patterns—more verification, more debugging loops—performance didn’t dramatically shift at the group level.

Why might grades have stayed stable?

The paper offers plausible reasons:
1. Assignment grading allowed resubmission. Performance reflected eventual correctness more than a student’s first attempt.
2. Students could already seek help elsewhere, even in the baseline (pre-GenAI) offering: TAs, peers, online resources.

So GenAI access changed interaction styles, but it didn’t completely transform outcomes—at least not in this course setup.

This is important for instructors: behavior changes don’t automatically produce higher or lower grades. Sometimes AI changes how students work without changing what they ultimately achieve, especially when assessments are structured to allow revision and when other support channels exist.

What Instructors Should Do Next

So what should a course designer take from all this?

1) Stop asking “Should students use AI?”

This research strongly suggests a better question is: “How will we redefine productive AI use?”

In 2023, students used AI more like a one-shot answer engine. By 2025, they used it more like a verification-and-debugging collaborator. Your course design likely steers that trajectory.

A practical implication: build prompts and workflows into instruction. For instance, teach students to:
- ask AI to explain their reasoning, not just generate code
- request verification steps (“What tests should I run for this function?”)
- interpret error messages before asking for a full fix

2) Design AI literacy as a learning objective

Students’ prompt habits matured across cohorts as familiarity increased. That doesn’t mean you should assume maturity will happen automatically. You can accelerate it by treating AI literacy like a skill.

Possible course moves:
- require students to show the reasoning chain they used with AI (what they checked, what they corrected)
- use short “prompt practice” exercises early in the term
- provide examples of “good” AI prompts tailored to course tasks

3) Align AI with course scope to reduce “correct but not helpful”

A recurring issue was responses that were technically correct but not helpful for the specific assignment context. That can happen when AI suggests approaches beyond what’s been taught.

Practical fix:
- make course-aligned templates and constraints explicit (“Use only loops we covered,” “Don’t use libraries beyond X”)
- encourage students to request explanations tied to taught concepts

4) Assess learning processes, not just final correctness

Because assignments allow resubmission and focus on passing tests, grades may not reflect the deeper reasoning changes. If you care about concept mastery, you may need assessments that capture:
- debugging strategy
- explanation quality
- reflection on why a fix works

This lines up with the study’s conclusion: the central challenge isn’t just AI access—it’s maintaining student agency while shaping productive help-seeking.

And if you want to go deeper, the original paper Three Years with Classroom AI in Introductory Programming: Shifts in Student Awareness, Interaction, and Performance is where all the detailed methods and coding decisions live.

Key Takeaways

Student GenAI awareness rose sharply from 2023 to 2025, with high familiarity going from 5.2% → 60%.
GenAI use became nearly universal by 2025 (active use 91.7%).
Students’ help-seeking evolved:
- 2023: mostly Code Implementation prompts (42.67%)
- 2024: mostly Code Verification (52.25%)
- 2025: more debugging workflows, especially Error Message Interpretation (36.18%)
Interaction workflows also changed structurally (from sparse, one-shot exchanges to dense, multi-stage iterative collaboration).
Response quality from AI stayed high, with Correct & Helpful around 89–95%.
Cohort-level assignment scores and final grades were broadly stable, suggesting that changes in AI interaction style don’t automatically translate into major grade shifts in courses with resubmission and other support channels.
The main instructional challenge in the AI era is how to redefine productive learning practices, not whether students use AI.

Sources & Further Reading

Original Research Paper: Three Years with Classroom AI in Introductory Programming: Shifts in Student Awareness, Interaction, and Performance
Authors: Authors:
Boxuan Ma,
Huiyong Li,
Gen Li,
Li Chen,
Cheng Tang,
Atsushi Shimada,
Shin'ichi Konomi

AI in Intro Python Over 3 Years: Student Use Shifts

Table of Contents

Introduction: What Happens After GenAI Becomes Normal?