What Makes a Simulation Model Worth Using? Lessons from Two Surgical Education Conferences
Written by Donya Mand, MD
June 2026

Surgical education is evolving rapidly, driven not only by advances in simulation technology but also by new approaches to developing and accessing training models, including 3D-printed models, virtual and augmented reality platforms, and perfused human cadavers. These improvements have the potential to advance graduate medical training, promote patient safety, and improve clinical outcomes.
To keep current on the latest in technology and educational theory, I recently attended two major meetings: the American College of Surgeons (ACS) Surgical Simulation Summit in Chicago and the Association of Surgical Education (ASE) Annual Conference in Atlanta. Both conferences are focused on medical education, but on different aspects. The ACS Summit featured the latest advances in medical simulation model design and implementation—the models themselves and how they’re being used. ASE, on the other hand, centers more on the educational process—evaluating skills, trainee well-being, and teaching-the-trainer approaches to help educators become successful teachers, allies, and leaders. This reflection will explore overlapping themes at both conferences, and I’ll share my takeaways on recent trends in simulation curriculum and the use of animals in medical training.
One trend stands out: the shift away from using buzzwords to affirm the applicability or relevance of a simulation model. Twenty to 30 years ago, publications describing medical training simulation models often included phrases such as “gold standard,” “face validity,” “high fidelity,” and “strong validity.” These terms were used to describe the quality of the simulation model and were often treated as endpoints. Now, discussions center around when it is appropriate to use these phrases, the value of the context in which they are applied, and the need for caution when presenting these terms as evidence that the skills trained on these models will transfer to clinical applications.
Gold standard
Depending on the surgical skill being taught, different models are referred to in academic and medical literature as being the “gold standard.” Human cadavers are the most common to be called the gold standard in surgical training;1 however, for certain specialized skills, such as microsurgery, this designation is given to training performed on live animals, such as rats.2
During an ACS simulation workshop, participants examined the limitations of this terminology and questioned whether the title of “gold standard” has been applied to the simulation model that most closely replicates a procedure or to the one that most closely mimics human tissue. Discussants suggested that the gold standard should also account for cost, accessibility, scalability, and logistical feasibility. Additional questions remain for future work, including:
- Should ethical considerations factor into what we define as the gold standard?
- Who determines the gold standard?
- How should perceptions of the new gold standard evolve as new technologies emerge?
- If one modality allows for more robust and objective skill assessment than another model, even one that offers higher anatomical fidelity (discussed more below), should that factor into what is the best gold standard?
Ultimately, the true gold standard is operating on a live human patient, a threshold no simulation model can fully meet. From that perspective, all models inherently fall short. The notion of a “gold-standard” simulation model is used too broadly, and in practice, it is unlikely that a single model can adequately teach all skills required for any specialty. Rather, the most appropriate model will be procedure- or surgery-specific and will vary depending on the trainee’s level of experience. Even within this framework, educators may choose to design simulation labs that focus on discrete components of a larger procedure, prioritizing the development of specific skills rather than attempting to replicate an entire operation.
If medical educators continue to use the term “gold standard,” clearer criteria are needed to define it within the context of evolving technologies. This includes establishing how to evaluate and compare newer simulation models with those that have been widely relied upon, particularly given the ongoing challenge of determining the extent to which skills acquired in simulation settings translate to operating room performance and ultimately to patient outcomes.
Fidelity
Participants in the ACS simulation workshop also explored what it means for a model to be considered high‑fidelity, with fidelity loosely defined as how analogous a simulator is to reality. However, there is poor consensus as to what constitutes “high” or “low” fidelity, and the use of this term to make binary distinctions fails to account for the varying degrees of features across simulators. The particular feature(s) of a simulator that simulation researchers choose to emphasize will often determine whether it is considered high or low fidelity.3 For example, high fidelity might refer to whether the “tissue” in the simulator replicates the feel of human tissue (e.g., fresh frozen cadaver), or if the simulator allows trainees to practice in an environment resembling an operating room (e.g., virtual reality simulator). High fidelity is sometimes conflated with high technology. This is particularly notable when contrasted with the use of human cadavers, which are often regarded as high‑fidelity despite being objectively low‑technology. Similarly, the use of live animals is often considered high fidelity, not because it replicates human anatomy, but because the animal bleeds, can decompensate, and animal tissue has a similar feel to human tissue.
Some educators have argued for abandoning the term fidelity in favor of more specific descriptions, such as whether the simulator resembles the procedure, or more importantly, whether the simulation allows trainees to practice in a way that closely matches the specific skill or goal they are trying to learn.1
At the ACS conference, the emphasis placed on simulation models often depended on the features highlighted by their developers. Some models strongly focus on realism and physical resemblance, while others, particularly those developed with fewer high-cost materials, emphasize their ability to replicate the technical aspects of the skills. These differing approaches reflect the field’s lack of consensus on how to shift away from prioritizing visual and physical fidelity and toward models that better support skill acquisition. Overall, perspectives appear inconsistent, varying based on the intended purpose and design priorities of each simulation model.

Transferability
Throughout both conferences, there were discussions about the challenge of determining transferability—whether skills acquired through simulation lead to measurable improvements in clinical performance. Operative performance in human patients is highly multifactorial, and while trainees may be given opportunities to perform procedures, attending surgeons remain ethically and professionally obligated to intervene when necessary. Trainees are never allowed to fail in ways that could compromise patient safety, and attending surgeons ultimately correct technique to ensure the highest-quality outcome before a patient leaves the operating room. While essential, this reality makes it exceedingly difficult to assess patient outcomes as a direct reflection of trainee simulation training.
During the ASE meeting, participants discussed how assessing transferability is complicated by the burden placed on educators, who must simultaneously manage busy operating rooms, evaluate trainees across a sufficient number of procedures, and document their assessments in real time. Given the demands already placed on attending surgeons, projects are underway to automate the evaluation process to reduce the burden on the evaluating surgeon, obtain a more accurate picture of trainees’ performance, and reduce inter-rater variability among different attending surgeons.
At present, there is no reliable method to directly assess transferability from simulation models to operative performance in the operating room. This is why continued conversation about assessing transferability should be prioritized, particularly when justifying the use of certain training methods—such as the use of live animals—on the basis of presumed educational benefit. Without clear evidence that a model improves clinical performance, educators risk relying on approaches that appear effective, seem intuitively transferable, or have been historically used, rather than those that are shown to provide measurable benefit.

Sustainability
During the ACS conference, sustainability came up frequently in discussions of proposed simulation models, though usually in practical rather than environmental terms (emissions, recyclability, or biodegradability). Sustainability in this sense emphasized whether models were reusable, low-cost, portable, and feasible to produce without specialized tools.
For example, using live animals is not logistical or cost-effective, as it requires purchasing animals (who can cost several hundred to thousands of dollars depending on the species) and paying for transportation, specialized housing facilities, veterinary support during training, access to facilities that permit live animal use, and disposal. In contrast, an animal-free synthetic model may rely on a 3D printer and cast molds for creating gel-based organ models, which can be prepared on site, designed as modular components for easy transport and storage, and structured so that only select worn or used components can be replaced without discarding the entire model. As with the other key terms described above, sustainability should not be viewed as a binary concept. Rather, models may exhibit varying degrees of feasibility and practicality that influence their adoption in simulation labs.

How should surgical simulation education literature be assessed?
The standards for surgical simulation model development and assessment have evolved to emphasize objective, measurable outcomes over subjective evaluations. However, there remains no clear framework for determining how well these models translate to real-world clinical performance. A systematic review evaluating the use of live animals in emergency trauma training found that many studies provided insufficient detail on training methodologies and lacked well-defined educational frameworks.4 Despite being essential for understanding the evidence base that influences whether educators continue to use animals in medical education, such evaluations remain limited.
Future work should prioritize specialty-specific assessments of the literature on live animal use to evaluate whether those studies are strong or rigorous, particularly given that much of the foundational literature may not reflect modern theories of simulation-based education. If ongoing evaluations show that this body of literature is heterogeneous, inconsistent, and methodologically weak, educators should be made aware of these limitations before using live animals in training. From an educational perspective, this raises a separate question from the ethical debate: whether live animals can still be justified as training tools when the evidence of educational benefit remains limited.
1. Hamstra SJ, Brydges R, Hatala R, Zendejas B, Cook DA. Reconsidering fidelity in simulation-based training. Acad Med. 2014;89(3):387-392. doi:10.1097/ACM.0000000000000130
2. Kazzazi D, Kazzazi F, Pafitanis G. Moving toward a near-total reduction, refinement, and replacement of live animal use in microvascular training. Cureus. 2026;18(2):e103850. doi:10.7759/cureus.103850
3. Swain CS. Validity and fidelity in the context of live animal training in surgical education. Glob Surg Educ – J Assoc Surg Educ. 2026;5(1):28. doi:10.1007/s44186-025-00439-6
4. Swain CS, Cohen HML, Helgesson G, Rickard RF, Karlgren K. A systematic review of live animal use as a simulation modality (“live tissue training”) in the emergency management of trauma. J Surg Educ. 2023;80(9):1320-1339. doi:10.1016/j.jsurg.2023.06.018