Benchmark

MedScribe

Vals Logo

Updated: 2/19/2026

Can models support doctors with their administrative work?

Partners in Evaluation

Key Takeaways

  • GPT 5.1 takes the lead, exceeding 88% accuracy on this task. The latest Claude family also demonstrates strong performance, while Gemini 3 Pro (11/25) continues to struggle significantly.
  • Results suggest models are capable of producing structurally sound, clinically relevant medical notes and meaningfully assist scribes in their workflow.

Background

Every clinical visit requires documentation. Clinicians typically record encounters using SOAP notes, a structured format that divides information into Subjective, Objective, Assessment, and Plan sections. Unfortunately, this lengthy process often costs twice the amount of time clinicians spend in direct patient care1 and remains a leading cause of burnout2. In response, many healthcare organizations have integrated AI scribes into their workflows, with about 30% of physician practitioners using some form of AI technology3. However, healthcare organizations lack methods to compare the real-world performance4 of these documentation systems.

Through our MedScribe benchmark, we aim to fulfill this evaluation gap by assessing whether or not current AI systems are able to reduce the documentation burden without sacrificing accuracy or compliance. In collaboration with Protege, we created a dataset with 100 rubrics that score SOAP notes on documentation quality, providing an objective framework for AI scribing assessment.

Results

GPT 5.1 leads the field, maintaining a clear gap over other models. The Claude family follows closely, occupying most of the next top positions.

Accuracy tends to improve as model outputs get longer—GPT 5.1 scores significantly better while also producing significantly longer responses.

MedScribe Performance

Performance varies only slightly across SOAP categories, with most models performing marginally worse on the P (Plan) section. This section includes concrete actions such as orders, prescriptions, referrals, and follow-ups. Even small errors here can affect care continuity, patient safety, and billing accuracy.

SOAP Category Pass Rates by Model

Looking more closely into pass rates across specific subcategories, some tasks appear universally harder. For example, “General Survey” and “HPI” sections were especially difficult for models.

Top SOAP Subcategory Pass Rates by Model

Methodology

Evaluating the quality of SOAP notes from real clinical conversations is challenging due to patient privacy constraints. Thus, we adapted the NoteChat framework5 to generate synthetic transcripts from real, de-identified SOAP notes from our partner Protege. We conducted A/B testing for realism with our experts (medical scribes and a physician assistant) and found that our synthetic transcripts were indistinguishable from our 10 real transcript samples. Across 80 transcripts (40 real, 40 simulated), the experts identified the true transcript type with 39% accuracy (chance = 50%, p > 0.05).

Our experts developed gold-standard rubrics for each of our transcripts. They first created a standardized template designating the required content and section structure of the SOAP note, then independently annotated each transcript. Any conflicting rubrics were reviewed collaboratively to ensure reliability, quality, and consistency for our documentation standards.

Each model was given the transcripts and prompted to output corresponding SOAP notes given the same template our experts outlined. All models are evaluated with temperature 1, and produce at most 30k tokens.

We observed significant variability depending on which evaluation model was used. Unlike human reviewers, models penalized minor formatting issues, such as placing details in the wrong SOAP section, more harshly than humans. To quantify alignment with human judgment, we ran an alignment study on 3 samples, covering a total of 180 checks. By modifying the judge system prompt, we were able to achieve an 80.5% agreement rate with the majority human opinion, and model pass rates closely matched those assigned by trained medical scribes.

Sample Doctor-Patient Transcript
Doctor: Good morning! Um, I'm, I'm Doctor Axon. How are you, how are you doing today?
Patient: Um, not great honestly. I've been feeling pretty sick for like, like four days now, you know.
Doctor: Oh, oh I'm sorry to hear that. Um, what's, what's been going on?
Patient: Well, um, my throat's been really sore and, and both my ears hurt. And I'm just, you know, really stuffed up and, and tired all the time, like...
Doctor: Mmm, mmm, that sounds, that sounds uncomfortable. Um, is this, is this your first time coming in for, for these symptoms?
Patient: Yeah, yeah, I mean, I've been trying to, to tough it out but my grandma - she's, she's the one who brought me in today - um, she's really worried because, because my brother had mono about, about a month ago, so...
Doctor: Oh, oh I see. So, so she's concerned you might have, might have caught it from him?
Patient: Yeah, yeah exactly. She's been like, like super worried about it, you know.
Doctor: That's, that's understandable. Um, tell me, how, how long have you had the, the sore throat?
Patient: Um, about, about four days now. It really, it really hurts to swallow, even, even just my spit, you know.
Doctor: And, and you mentioned both, both ears are hurting?
Patient: Yeah, yeah they both hurt. But honestly, the, the worst thing is my nose - it's, it's so stuffy I can barely, I can barely breathe through it.
Doctor: Um, have you, have you had any fever with, with this?
Patient: No, no I haven't had any, any fever at all.
Doctor: What about, what about a cough?
Patient: Nope, no, no cough either.
Doctor: Okay, okay. And, and you said you're feeling tired - have you, have you noticed any body aches or, or muscle pain?
Patient: Oh yeah, yeah definitely. My, my whole body just aches, you know?
Doctor: Um, have you been taking anything for, for these symptoms?
Patient: Well, um, I take Benadryl every, every morning for my allergies. But, but nothing else really for, for being sick.
Doctor: I see, I see. And, and have you been able to go to school this, this week?
Patient: No, no I've been missing school all, all week. I just, I just feel too crummy to go, you know.
Doctor: That's, that's tough. Um, have you had any, any nausea with this?
Patient: Yeah, yeah actually I have been feeling a little, a little nauseous.
Doctor: Any, any vomiting?
Patient: No, no thankfully no vomiting.
Doctor: Good, good. What about any, any rash or skin changes?
Patient: No, no nothing like that.
Doctor: Headaches?
Patient: No, no headaches either.
Doctor: Okay, okay. Now, your, your grandmother mentioned something about weight gain when, when she called in. Can you, can you tell me about that?
Patient: Oh yeah, um, I guess I've, I've gained some weight recently. My, my mom said it's been like, like 23 pounds in the last two months or, or something.
Doctor: That's, that's quite a bit. Have you, have you noticed your appetite changing?
Patient: Yeah, I mean, I guess I have been, been eating more than usual, you know.
Doctor: Alright, alright. Let me, let me take a look at your medical history here... I see you're, you're taking quite a few medications. Can you, can you tell me about your medical conditions?
Patient: Um, well, I have, I have asthma - that's, that's why I use the inhaler and, and sometimes need the nebulizer when, when it gets bad. And I take, I take medication for anxiety too.
Doctor: I see you're on both, both sertraline and escitalopram?
Patient: Yeah, yeah I've been on those for, for a while now for my, my anxiety.
Doctor: And I notice you have some, some acne medications too - adapalene-benzoyl peroxide?
Patient: Yeah, yeah I use those, those creams on my face for, for acne.
Doctor: What about the, the omeprazole? Are you, are you still taking that?
Patient: Oh, um, actually no. I, I stopped taking that back in, in September. I just, I just didn't feel like I needed it anymore, you know.
Doctor: Okay, okay that's good to know. And, and you mentioned the Benadryl for allergies - any, any other allergy medications?
Patient: Well, I have, I have Zyrtec on my list but I haven't been, been taking it. Just, just the Benadryl in the mornings.
Doctor: I see, I see. Do you have any, any allergies to medications or, or foods?
Patient: Yeah, yeah I'm allergic to, to wheat.
Doctor: Alright, alright. Let me, let me check your vital signs now. Your blood pressure is 124 over 78, which is, which is a little elevated for your age. Your pulse is 142 - that's, that's quite fast.
Patient: Is, is that bad?
Doctor: Well, it could, it could just be because you're not feeling well. Your, your temperature is normal at 97.6, which is, which is good - confirms you don't have a fever. And your, your oxygen level is perfect at 99%.
Patient: That's, that's good at least.
Doctor: Let me, let me examine you now. Can you, can you open your mouth and say "ahh"?
Patient: Ahhhh.
Doctor: Okay, okay, your throat looks a bit red but I don't see any, any white patches or pus on your tonsils, which is, which is good. Let me, let me look in your ears... Right ear looks normal, no, no signs of infection. Left ear too - both, both look fine, no fluid behind the eardrums.
Patient: Really? They, they hurt so much though.
Doctor: Sometimes, sometimes ear pain can be referred from, from throat inflammation. Let me, let me check your nose... I can see why you're congested, your, your turbinates are a bit swollen but, but nothing too concerning.
Patient: Yeah, yeah it's really stuffed up.
Doctor: Let me, let me feel your neck for any, any swollen lymph nodes... No, no I don't feel any enlarged nodes. And let me, let me listen to your lungs... Take a, take a deep breath for me... Good, good, your lungs sound clear.
Patient: That's, that's good I guess.
Doctor: Your, your heart sounds are normal too, though it is, it is beating fast like we noted. Let me, let me press on your belly... Any, any pain?
Patient: No, no that doesn't hurt.
Doctor: Good, good. Your skin looks normal, no, no rashes. And neurologically you seem, you seem fine - you're alert and oriented.
Patient: So, so what do you think is wrong with me?
Doctor: Well, given your, your symptoms and what I'm seeing on exam, I think we should, we should run a few quick tests to, to rule out some things. I want to do a rapid strep test, a COVID test, and, and since your grandmother is worried about mono, we'll, we'll do a mono test too.
Patient: Okay, yeah, yeah that makes sense.
Doctor: The nurse will, will do those tests right here in the room. They're, they're all quick tests so we'll have results in, in a few minutes.
Patient: Good, good, I really want to know if I have mono like, like my brother did.
Doctor: Understandable, understandable. While we wait, let me ask - have you been around anyone else who's, who's been sick besides your brother?
Patient: Um, I mean, there's, there's always someone sick at school, but, but no one in particular that I can, that I can think of.
Doctor: And you said the, the Benadryl is for allergies - what are you, what are you usually allergic to?
Patient: Just like, like seasonal stuff, you know, pollen and, and things like that. But, but this feels different from my allergies.
Doctor: Right, right, with the body aches and sore throat, this does, this does seem more like an infection than, than allergies.
Doctor: Alright, let me, let me check those test results... Good news - all, all three tests came back negative. No strep, no COVID, and, and no mono.
Patient: Oh wow, so, so I don't have mono? My, my grandma will be so relieved!
Doctor: Yes, yes, the mono test is negative. What you have appears to be a, a viral upper respiratory infection - basically a, a common cold virus that's causing your symptoms.
Patient: So, so it's just a regular virus?
Doctor: Yes, yes exactly. These typically resolve on their own within, within a week or so with, with rest and supportive care.
Patient: That's, that's actually a relief. So, so what should I do for it?
Doctor: Well, first, I think we should, we should switch you from Benadryl to Zyrtec for your, your daily allergy management. Benadryl can make you drowsy, which isn't, which isn't helping when you're already tired from being sick.
Patient: Oh yeah, yeah I do feel pretty drowsy in the mornings after, after taking it.
Doctor: Right, right. So I'm going to prescribe Zyrtec-D, which combines an antihistamine with, with a decongestant. This should really, really help with your stuffy nose.
Patient: That sounds great - my, my nose is driving me crazy!
Doctor: I'll give you a, a 10-day supply to take twice daily. And then, and then regular Zyrtec for your ongoing allergy management after, after that.
Patient: Okay, so, so take the Zyrtec-D for now and then, and then switch to regular Zyrtec?
Doctor: Exactly, exactly. The decongestant in Zyrtec-D will help clear up your congestion while you're sick, but you don't, you don't need that long-term.
Patient: Makes sense, makes sense.
Doctor: For the, the body aches and sore throat, you can take ibuprofen - I see you have, you have some already. Take 400mg every, every 6 hours as needed.
Patient: Yeah, yeah I have some at home.
Doctor: Good, good. Also make sure you're drinking plenty of fluids and, and getting lots of rest. Your body needs that to, to fight off the virus.
Patient: I've been, I've been trying to rest but it's hard when I can't, can't breathe through my nose.
Doctor: The Zyrtec-D should really, really help with that. You might also try using a, a humidifier in your room if you have one.
Patient: Okay, okay, I think we have one somewhere.
Doctor: Now, about the, the weight gain your mom mentioned - 23 pounds in two months is, is significant. How have you been feeling otherwise, aside from, from being sick?
Patient: Um, I mean, I've been, I've been okay I guess. Just, just tired a lot, but I thought that was from, from school and everything.
Doctor: Have you noticed any, any other changes? Hair, skin, feeling cold or hot?
Patient: Not really, no, no.
Doctor: Okay, okay. Let's focus on getting you better from this virus first, but I'd like you to, to follow up with your regular doctor about the weight gain if, if it continues.
Patient: Alright, alright, that makes sense.
Doctor: For now though, the, the main thing is rest, fluids, and the medications we discussed. You should start feeling better in, in a few days.
Patient: When, when can I go back to school?
Doctor: Once you're feeling better and your symptoms have improved - probably in, in another day or two. Listen to your body.
Patient: Okay. And I don't, I don't need antibiotics or anything?
Doctor: No, no, antibiotics don't work on viruses. Your body will fight this off on its own with, with rest and supportive care.
Patient: Got it, got it. Is there anything I should, should watch out for?
Doctor: If you develop a high fever, severe headache, difficulty breathing, or if your symptoms get worse instead of better over the next few days, come back in or, or go to the emergency room.
Patient: Okay, okay, I will.
Doctor: Also, try to avoid close contact with others while you're sick to, to prevent spreading it. Wash your hands frequently.
Patient: Yeah, yeah, I don't want to get anyone else sick.
Doctor: Exactly, exactly. Do you have any, any other questions?
Patient: Um, I don't, I don't think so. So just to make sure - negative for everything, it's just a virus, take the Zyrtec-D and regular Zyrtec, rest and fluids?
Doctor: That's, that's exactly right. And, and ibuprofen as needed for the aches and sore throat.
Patient: Great, great. Thank you so much!
Doctor: You're, you're welcome. I hope you feel better soon. The nurse will get you those, those prescriptions and discharge paperwork.
Patient: Thanks. Oh wait - should I, should I keep taking all my other medications? Like my asthma inhaler and, and anxiety meds?
Doctor: Yes, yes absolutely keep taking all your regular medications. The Zyrtec-D won't, won't interfere with any of them.
Patient: Okay good, good, just wanted to make sure.
Doctor: That's a, that's a great question. Always good to check. And make sure you're using your asthma inhaler as prescribed - sometimes viral infections can, can trigger asthma symptoms.
Patient: Yeah, yeah, I'll make sure to keep it with me.
Doctor: Perfect, perfect. Anything else?
Patient: No, no I think that's everything. My grandma will be so, so happy to hear it's not mono!
Doctor: I'm sure, I'm sure she will be. Take care of yourself and, and feel better soon.
Patient: Thank you, doctor. I really, really appreciate it.
Doctor: You're, you're very welcome. Rest up and we'll have those, those prescriptions ready for you shortly.

Citations

[1] Sinsky, C., Colligan, L., Li, L., Prgomet, F., Reynolds, L., Goeders, L., Westbrook, J., Tutty, M., & Blike, G. (2016). Allocation of physician time in ambulatory practice: A time and motion study in four specialties. Annals of Internal Medicine, 165(11), 753–760. https://doi.org/10.7326/M16-0961

[2] American Medical Association. (2025, February 14). Physicians’ greatest use for AI? Cutting administrative burdens. AMA Digital Health. https://www.ama-assn.org/practice-management/digital-health/physicians-greatest-use-ai-cutting-administrative-burdens

[3] Vahidy, F. S., Rajkomar, A., & Sounderajah, V. (2025). Beyond human ears: Navigating the uncharted risks of AI scribes in clinical practice. npj Digital Medicine, 8(1), Article 95. https://doi.org/10.1038/s41746-025-01895-6

[4] Kanaparthy, A., Barot, A., Mehta, N., Bhandari, M., & De Simone, A. (2025). Real-world evidence synthesis of digital scribes using ambient listening and generative AI: A systematic review. npj Digital Medicine, 8(1), Article 124. https://doi.org/10.2196/76743

[5] Agarwal, S., Zhang, T., Lee, J., & Rumshisky, A. (2024). NoteChat: Synthesizing doctor–patient conversations from clinical notes for training and evaluating medical dialogue systems. In Findings of the Association for Computational Linguistics: ACL 2024 (pp. 14532–14548). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.findings-acl.901

Join our mailing list to receive benchmark updates

Model benchmarks are seriously lacking. With Vals AI, we report how language models perform on the industry-specific tasks where they will be used.

By subscribing, I agree to Vals' Privacy Policy.