In Total, 2201 unique studies were identified. After applying the in- and exclusion criteria for titles and abstracts, 186 full-texts were screened, leading to 51 inclusions. Citation screening led to the inclusion of another 11 studies. Figure 2 shows the flowchart of the search and selection process. The quality assessment resulted in seven studies with a good score for training (score ≥ 9), and 23 with a good score for the evaluation (score ≥ 9). Ten studies had a good quality score after combining the scores (score ≥ 17). All scores can be found in Additional file 4.
Five studies covered ETE in a cross-border setting, either a border region (n = 2), a point of entry (n = 2), or a multi-country setting aimed at international cooperation (n = 1). All other ETEs were in a non-cross border setting.
The target group of the ETE varied among studies, but was often improperly described in the studies. Examples are ‘public health leaders’, and ‘all staff of regional health departments’. Other studies specified a wide variety of professionals with different tasks in emergency preparedness or mixed public health professionals with emergency responders, university staff, and civilians. Participants’ motivations to participate are hardly derivable.
Recruitment & Autonomy
The majority of studies left any recruitment technique or clarified participants’ motivation unnoticed. Three studies reported mandatory participation, six studies highlighted the free choice of people participating, and two reported on freely available online courses. In Hoeppner et al., participants had to apply for participation, thereby suggesting motivation . Fowkes et al. 2010 formulated their highly motivated participants as a limitation in the interpretation of their identified effectiveness of the ETE .
In total, eleven studies performed a training needs assessment among the target population before designing the ETE. Also, training needs were obtained via literature studies, the ETE designers’ experienced-based vision , or by inquiry of disaster plans and local emergency management policies [55, 69]. Several studies specifically aimed to identify gaps and needs through the exercise [56, 68].
The studies discussed a wide variety of ETE topics. Twenty-three studies focused on preparedness and seventeen on response. The main topics were bioterrorism (n = 8), a pandemic (n = 8), or a specific disease outbreak (n = 9), of which five focused on influenza (n = 5). Odd ones out were among others training on risk communication , leadership , and one health . Five studies, all TOTs, incorporated didactics as a training topic.
A minority of studies indicated to have competent, experienced trainers or facilitators (n = 18). A majority of studies described the trainers without showing their experience or competence, by generally describing them as “instructor” or “university staff”, or left trainers completely unreported (n = 30).
Development & quality of the material
The development of learning material was discussed in all but seventeen studies. Most theories were derived from constructivist learning principles, such as the Adult Learning Theory [37, 60], or problem-based learning . Other used theories included the Dreyfus model , theory from Benner [49, 59], continuing education [28, 59], and blended learning . ETEs were also based on existing competencies [37, 44, 50, 71], previously existing materials, and developers’ experience from previously performed training or exercises. The developers of the material were mostly public health professionals (n = 12), followed by people from universities or public health schools (n = 10). The help of higher departments, such as from ministry level, the national center for disease control, or the WHO, were named several times [32, 63, 74]. In two studies, graphical designers were involved in the development of realistic images or virtual environments [76, 82].
Eight studies described educational programs as part of university programs or courses, of which Yamada et al. describe an interdisciplinary and problem-based methodology during education , and Orfaly & Biddinger et al. and Rega et al. integrated table-top exercise in university courses [61, 67]. In the other six studies, methods were weakly described, merely referring to university programs or courses.
Nineteen studies evaluated a training of which several combined their training session with an exercise [25, 65] or real-life project . Two studies left their training methodology unspecified [35, 42]. Of the other studies, all except one supported interactivity among learners or between learners and trainers by referring to interactive lectures or discussion. Detailed descriptions of training designs lacked and were restricted to summarizing words such as “using participatory methods”  or “an online lecture” [36, 63]. Studies delivering any detail on methodology refer to the adult learning principles, active learning, interactivity, multi-disciplinarity, or participatory methods, and explicitly away from passive methods.
Exercises were described in 24 studies, of which sixteen were table-top exercises and six simulation exercises specifically. The most common elements of table-top exercises in these studies were a lecture beforehand; a presentation of the scenario; an initial individual response; a pre-arranged and guided discussion in small, multi-disciplinary groups of local partners. Subsequently, a presentation in a larger group and a debriefing followed. Often, more than one scenario was included in the exercise. Most considerable differences between studies are the detail level of described methodology, and whether individuals, small- or large groups have to respond. Again we see more detailed study descriptions for studies that refer to the adult learning principles.
Innovative design - wide reach
Seven studies had a TOT design, of which three integrated the second wave of training. This second wave was delivered by the TOT participants [33, 54, 74], whereupon participants could immediately apply what was learned. All TOTs contained mixed methods. Often passive methods, such as lectures or presentations, were combined with active methodologies, such as guided discussions, clinical training, or active presenting. For two TOTs, the used ETE methodologies were largely unknown.
Seven studies studied ETE with online or new methodologies such as a virtual reality training , audience response system , the use of the intranet for training , e-modules [28, 31, 50], and combinations of e-learning and on-site learning . Online ETEs had natural opportunities to spread the learning moments over a longer period. Also, participants were able to follow the ETE at their own pace. Some simulation exercises also used online methodologies in the form of blog websites where participants had to respond from their office to signals [21, 22, 52].
Innovative design - enhanced realism
Elements that were described to enhance the feeling of reality were among others the use of real work locations such as at an airport ; a computer simulation model generating feedback depending on participants’ decisions in a simulation exercise [26, 27]; interaction with scenario cards guiding each exercise to different possible outcomes ; initial ambiguity in an exercise case and drop-out of participants during the exercise ; moulaged or simulating patients ; and external consultations of experts during the exercise . Rega & Fink 2013 report on a semester-long simulation exercise to keep up a realistic time frame .
Duration, interval & goals
General duration of ETEs varied between 30-min training and years-long curricula. TOTs mostly lasted several days to weeks. Educational courses lasted between 14 h and two years, training between 14 h and one year. Fifteen studies did not elaborate on the duration of the ETE. The interval and time between intervals are hardly described. The goals of ETE were addressed in most studies (n = 47), although often stated on the organizational level or implicitly integrated into the text instead of presenting trainable and measurable competencies. An overview of the outcomes on context, input and process are shown in Table 1.
Evaluation & Outcome
System-performance was evaluated by four studies that used participants’ evaluations of organizational achievements after the ETE [32, 38], or external evaluations [45, 48]. None of the studies assessed the system effects of ETEs in a cross-border setting. Becker et al. 2012 evaluated a postgraduate education curriculum after two years in a developing setting . This curriculum impressively increased the local public health system. The three other studies (n = 682; 1496; unknown) evaluated several table-top- and simulation exercises. These exercises seem effective on the system level regarding improving a prepared workforce by emergency planning , relationships among colleagues , and communication systems . Potter et al. 2005 did not aim to evaluate system-performance but had a coincidental finding on this level: right after the training period, a real infectious disease outbreak occurred. According to the involved professionals, the response was well managed because the members of the response team had become acquainted with each other during the training .
Nine studies, including two TOTs [60, 62], evaluated the outcomes on a behavioral level. Evaluation of behavior was primarily timed directly after the ETE, while six studies performed an additional follow-up test. Behavioral change was mainly self-assessed by participants, leading to subjective measurements. In one study, local supervisors were appointed to assess trainees’ behavioral change ; another used a report on ministry level next to participants’ self-assessments . No control-groups were used.
The educational curricula seem to change behavior such as initiating the updating of plans, expanding professional networks, and improving collaboration (n > 244). Table-tops lead, according to ministries’ reports, to increased development of further exercises and a more regular assessment of public health preparedness (n = unknown). Online modules had a low response rate (< 18%), but changed behavioral intentions among responding participants (n > 55) [28, 63]. According to local supervisors (n = 511), the combination of online learning and on-site training led to improved work performance. One study reported on behavioral change after table-tops in a multi-country setting but did not mention any result in interaction between countries . According to Orfaly et al. 2005 and Otto et al., TOTs seem moderately effective, since 20 and 44%, respectively, conducted exercises after six months (n = 118; n = 168) [60, 62].
Learning – knowledge
Thirty-three studies used knowledge to evaluate the effect of an ETE, including four TOTs, and four ETEs in a cross-border setting. The majority of knowledge was evaluated in pre- and post- knowledge tests (n = 20) compared to self-assessments of knowledge using Likert scales. Compared to studies using knowledge tests, those using self-assessments reported more detail on how knowledge had improved. Knowledge particularly improved on organizational and functional content, such as understanding response protocols or describing functional roles or the chain of command within an organization. This is understandable since self-assessments can explicitly ask what they aim for, while knowledge tests can only provide a test score. No control-groups were used, one study compared two groups that were exposed to two different methodologies .
Knowledge shows a clear increase directly after ETEs. The five studies that used knowledge tests, performed follow-up tests and reported the results show a scientificly significant improved knowledge level directly after the ETE and up to 12 months after [42, 76, 78,79,80]. Response rates were unknown, and the duration of these ETE programs varied between fourteen hours and four weeks. Umble et al. showed equal increase in knowledge between classical education and a broadcast . Regarding ETEs in a cross-border setting, all using mixed methods and clearly stated their goals, knowledge increase was shown after table-tops and training. However, these studies used self-assessments or unknown scoring methodologies.
Learning – skills
Twenty-one studies evaluated an ETE on skills, including three TOTs but none in a cross-border setting. Practiced skills vary from a majority of organizational, communicational, team, and leadership skills, to a minority of more medical skills such as surveillance or the use of personal protective equipment. Except for one study using skill demonstrations , most studies performed self-assessment of improvements comparing pre- and post-tests. Seven studies also performed a follow-up test.
According to participants’ self-assessments, all ETEs were effective skill-builders. A statistically significant increase in skills is shown for training, while this outcome remains insignificant for most tabletop- and simulation exercises. Follow-up evaluations indicated even a further increase in skills in the period after the ETE, although these results are self-assessed and mainly statistically insignificant. Two TOTs showed a significant increase in planning, implementation, and evaluation after a table-top exercise [33, 54]; follow-up results were unavailable here.
Learning – attitude
Fifteen studies reported on a change in attitude, including one for a TOT , and one for several table-tops in a cross-border setting . The evaluated attitudes comprised the awareness of and motivation to develop future preparedness plans and programs, or an increase in confidence. We saw mainly training and exercises evaluating attitude. Attitude was assessed by rating statements.
We saw a sustainable change in attitude directly and 1–3 months after both online and face-to-face training. These training programs lasted between 1,5 and 14 h but had unclear methods. Table-top exercises varied in their capability to change attitude, since both significant change [72, 75] and fairly indifference [34, 71] was shown, indicating that more detailed evaluation is required. The table-tops in a cross-border setting seemed to enhance participants’ motivation to develop and exercise programs . Dickmann et al. 2016 reported a relation between knowledge and attitude: participants with higher knowledge also had congruent confidence levels to respond and advocate for change . Data regarding TOTs do not suffice aggregation of results.
Forty-five studies assessed ETE on the reaction level, mostly by participants rating statements on satisfaction and methodology using Likert scales, directly after the ETE. The ETEs in crossborder settings show high satisfaction among participants regarding table-tops and simulation exercises. One TOT showed satisfied participants of the second wave of training. We will present the results for different designs.
Training programs scored satisfactorily directly after the training, despite the substantial differences in design: after a 30-min pandemic preparedness training , 98% of participants thought the program valuable, as thought 95% after several face-to-face modules on emergency preparedness , and 92–96% after a preparedness training of 14 days [78, 80]. Remarkably, the one study performing a follow-up test identifies the lowest satisfaction of all training programs, with a mean score of 4/5 after a 2-day Zika response training .
Only one study evaluated reaction after an exercise with a follow-up test , all others were restricted to post-tests. Table-top exercises overall scored high on satisfaction, mainly based on their potential to practice together (77% agreed ), to build relationships (80–90% agreed ); to improve emergency or contingency planning (73% agreed ); and to identify gaps (89%  and 77%  agreed). Biddinger et al. identified higher satisfaction among regional exercise respondents compared with single institution respondents regarding their understanding of agencies’ roles and responsibilities (p < 0.001), engagement in the exercise (p = 0.006), and satisfaction with the combination of participants (p < 0.001) . The right combination of participants was in several studies scored as one of the most valuable aspects. A disadvantage of table-top exercises was the lack of identification of key gaps in individuals’ performance . Further made recommendations for exercises were: to clearly formulate specific objectives; to be as realistic as possible; to ground practical response in theory; to be designed around issue-areas rather than scenarios; to have a forced, targeted and time delineated discussion and decision making; to have limited number of participants but to include all key perspectives and especially leadership perspectives; to be collaboratively designed and executed with representatives from participating agencies, external developers, and facilitators; to have networking possibilities; and to use trained evaluators.
Simulation exercises were less assessed on reaction, and outcomes show a slightly lower satisfaction than the table-top exercises. However, in three studies, “most participants” or over 80% of participants still agreed on their readiness being increased by simulation. The full-scale simulation at an airport stresses the need for specific goals, in this way preventing deprioritizing the public health response by trying to test everything at the same time . Also, it is paramount to have clear roles and responsibilities of the various agencies involved, and to have all required capacity available . One study showed a positive relationship between the duration and the contact and communication between health departments after a joint exercise .
Ten studies reported reaction directly after innovative methodologies. Several studies added online blogs, pages, or systems to a simulation exercise , a lecture , or a combination of classical designs. Other studies evaluated pure technologies such as an audio-response system , or a virtual reality environment . For innovative methods, satisfaction was generally high, although technical issues were often reported. For example, the e-modules in Baldwin et al. were launched via the intranet of a public health organization , thereby benefitting from high accessibility but facing extensive, unforeseen updates, a rigidness for change and delayed updated because ownership was not designated. The VR environment exercise met its objectives and was time well spent, but the participants and authors suggest further technology innovations before this method can be used at large scale . An overview of all outcomes, including those not mentioned above [24, 39, 47, 51, 53, 57, 66, 73], are shown in Table 2.