Safer Surgery part 23 pps

Safer Surgery 194 Performance Dimension Training (PDT): Increase rating accuracy by facilitating dimension-relevant evaluations. Behavioural Observation Training (BOT): Increase rating accuracy by focusing on the observation of behaviour. Frame of Reference training (FOR): Increase rating accuracy by focusing on the different levels of performance (Salas et al. 2001). Reportedly, it is seemingly straightforward to train a single group of raters and achieve a relatively high level of inter-rater agreement and accuracy when compared to a standard set rated by an expert (Salas et al. 2001). The potential sources of error means that it is not as easy to translate this to many groups who are trained in separate centres or by different trainers. The rate of between-group rater agreement and accuracy is known to drop signicantly without addressing such errors. Salas et al. (2001) outlined guidelines for training raters in the use of behavioural markers. These guidelines have been developed to try and minimize the sources of error described above. There is limited recent evidence on effective rater training in the medical domain and so these guidelines were followed to develop our rater training for ANTS. These guidelines were previously followed by Flin and Glavin’s research group (Fletcher et al. 2003) to evaluate inter-rater agreement for ANTS. Baker et al (2001) employed an eight-hour rater training programme for a behavioural marker system. They achieved an adequate level of inter-rater agreement and accuracy in this time frame. Workpackage Report 7 from the University of Aberdeen investigating ANTS reports a four-hour training programme (Fletcher et al. 2002). With this minimal amount of training, inter-rater agreement of rwg = 0.55–0.67 was achieved at an elemental level, and 0.56–0.65 for a categorical level. This was also without feedback and calibration. During the initial evaluation of ANTS by the Scottish team, feedback from experts and calibration were deliberately excluded from rater training to isolate the impact of these on inter-rater agreement. In this way, the training provided to their participants was intentionally limited to try and isolate the reliability of the tool itself. With our research we hoped to move this forward a phase by including feedback from experts to test calibration, and thus improve inter-rater agreement. Rater Training Day When planning the project we aimed to train 15 participants as raters. This was the number we reasonably thought we could expect since we were only able to recruit from our own department. We chose a single day of training as a feasible length. Were the college to have a widespread roll-out of ANTS as an assessment tool, a short course would be necessary if many assessors were to be trained. Surprisingly, 26 participants applied and attended the training which was held at a venue with excellent audio-visual capacity, desk space and good catering for participants and trainers. • • • Using ANTS for Workplace Assessment 195 Pre-reading was sent out prior to the workshop including the ANTS Handbook (<www.abdn.ac.uk/ANTS>) and ‘Recommendations for the use of behavioural markers’ (Klampfer et al. 2001). An outline of the rater training is as follows: welcome and introduction; ANTS background information; Behavioural Observation Training; Rater Error Training; Performance Dimension and Frame Of Reference training; assessment practice and calibration. The attendees found the behavioural observation training to be an enjoyable experience. This session involved observing continuity errors in lms, amongst other activities. The performance dimension and frame of reference training utilized the excerpts that were selected from the videos. This was the rst opportunity our raters had to practise their assessment skills. Five videos were shown in the afternoon for the purpose of practice assessment and calibration. ANTS scores from these ve videos were collected for statistical analysis. Score sheets were collected from each rater prior to any discussion about the video. Expert ratings and a discussion followed for the purpose of calibration. Many of the learning points that we gained from this project arose during these sessions and will be discussed later. Results Participants rated performance in the ve test videos, scoring in each case for 15 identied skill elements. Following lengthy discussion with our statistician, intraclass correlation (ICC) (Shrout and Fleiss 1979) was chosen to demonstrate agreement between the 26 raters. ICC allows different sources of variability to be included and is essentially another way of calculating inter-rater agreement. By choosing ICC we were able to include multiple sources of variability. This reects the many sources of variability that are introduced when investigating inter-rater agreement. ICC = var(target) / var(target) + var(judge) + var(residual). Intraclass correlations were calculated for each element of ANTS (see Figure 12.1). A random effects linear model approach was used to estimate variance components and to ultimately estimate the ICC. None even reached the minimal acceptable value of 0.7, let alone the higher value of 0.9 that would be considered necessary for a high stakes assessment. We therefore showed a lack of reliability. As you will see later in this chapter, we learnt a number of lessons while trying to achieve inter-rater reliability with such a large group. • • • • • • Safer Surgery 196 Comparison of scores with those of the ‘expert raters’ also showed unsatisfactory results. Qualitative Data Pre-workshop Questionnaire All participants were involved in the supervision of ANZCA trainees with a range of experience in terms of supervision from 1–30 years. Of the 26 participants, eight had one year or less of supervising experience. Only seven participants already had a system for assessment in the workplace. Of these, one was a regional education ofcer, and the others were generally experienced (5–30 years as a consultant). New supervisors were less likely to have a method for assessment. However, there were still some experienced anaesthetists amongst those who felt they had no real system for assessment. The most common aims the participants identied for the workshop were to ‘provide better or constructive feedback’, and ‘to have a systematic method for assessment’ (22 out of 26). No one felt that the elements of the current in-training assessment process were useful for assessment. Figure 12.1 Intraclass correlations calculated for each component of ANTS Using ANTS for Workplace Assessment 197 Post-workshop Questionnaire All participants thought ANTS was a useful system for structuring assessment and the majority found the system easy to use. Four participants found it difcult to place appropriate rating scores next to the observed behaviour. Pre-reading was universally helpful and one participant even requested extra pre-reading. The level for most of the training was scored at ‘just right’. The amount of information in each section was ‘just right’ for the majority of participants. Specic comments about the training were few, but noted to adapt future training sessions. Most participants felt they needed more practice before using ANTS for assessment. Despite this, most felt they had received enough training to use the ANTS system. All participants felt that ANTS was useful for consultants to give training to junior anaesthetists. One comment was that it sets a ‘gold standard’ for behaviours in theatre. All felt it is important for trainees to have or develop these skills. Each participant felt that ANTS was useful as a formative assessment tool. The comments associated with this question included the fact that it was useful to give structured feedback. Most also felt that it would highlight the importance of non-technical skills. The question as to whether ANTS was suitable as a summative assessment tool divided the participants. Thirteen raters thought it was suitable as a summative assessment tool and 13 thought more work should be done. During which year of training summative assessment of non-technical skills should be performed also divided opinion. This could be determined by a future project but would require large numbers of both trainees and raters. The comments associated with this question give an insight into what the average ANZCA fellow may think about using ANTS in its current form. Comments from ANZCA fellows regarding the use of ANTS as a summative assessment tool: Only a small number of expert raters should perform this assessment. Such an assessment may have severe implications for supervision of trainees. This should only be used for trainees who are already struggling. Anaesthesia trainees should be exposed to non-technical skills training early to encourage good behaviours. Discussion It seems unusual that the Scottish team investigating ANTS achieved an inter-rater reliability of r=0.5–0.7 with minimal training yet, in comparison, our correlation is so poor. What was different? We were hoping to see the introduction of ANTS as a summative assessment tool and unfortunately, this does not look to be a great start. • • • • Safer Surgery 198 We believe there are a number of reasons why we could not achieve sufcient agreement between raters in one day. Some of these were obvious from the discussion during calibration on the training day, and some have become clearer afterwards. Following the viewing of each video, the expert ratings were discussed to see if we could achieve further agreement amongst the group before the next video. We observed a number of interesting opinions, discussions and thoughts about using ANTS from our raters. At the time we thought we were seeing potential problems with our rater training, but what we actually saw were some warnings for the use of workplace-based assessment in general. Misclassication When discussing the viewed scenario it seemed that everybody was seeing the same behaviours yet there was a denite lack of agreement. This was because the behaviours were being placed into different elements. Anybody who has used the ANTS system will know that one behaviour can potentially be placed beside more than one element. When behaviours are placed beside a different set of elements by raters, it can result in different ratings for each element. This will ultimately result in a lack of agreement. Disagreement on Safety Standards Not everyone agrees on safety standards within anaesthesia. An example of this is ‘test ventilation’. Some anaesthetists believe this is mandatory practice, while others think it is dangerous. This is a problem that is directly relevant to the future of workplace assessment. Since ANTS is essentially based on patient safety, it is vital that assessors agree on what is ‘safe’ if they are to enforce this view on trainees. We believe these two problems were mainly responsible for the lack of inter- rater agreement. This was obvious from some of the ‘lively’ discussions during calibration. However, this still does not account for the large difference between the Scottish raters and the Australian ones. Herein lies the next difference. When we designed our study, we kept in mind the fact that ANTS could be used, in the future, on a large scale and by a variety of anaesthetists. The participants in our rater training day were all specialist anaesthetists of varying experience, but without specic training in education or simulation. In fact, most of them play no formal role in education or training. All the anaesthetists in the Scottish study had some involvement in education and training activities (Fletcher et al. 2002). The large difference in inter-rater agreement could be an effect of education. The same argument that holds for teaching anaesthesia trainees formally about non-technical skills, if we are to assess them, may apply to consultants doing the assessing. The Scottish raters were also assessing acted scenarios as opposed to the real cases that our raters observed. This may be yet another source of bias that was Using ANTS for Workplace Assessment 199 introduced. We learnt many lessons from our rater training day, and realized how many sources of bias can be introduced. Lessons Learnt ANTS as an appropriate instrument for workplace-based assessment for anaesthetists-in-training: a. Validity. Content validity was established in previous work on ANTS. Our trainee raters agreed in the post-workshop questionnaire. Criterion validity is almost impossible to establish due to a lack of gold standard. We compressed the data to improve our reliability. Whilst this appears to have face validity, we have not formally assessed the validity of this approach. b. Reliability. We failed to demonstrate acceptable inter-rater reliability with the level of training we offered. Acceptable reliability could be gained by compressing the data. If the data are compressed to an average score from each rater then the inter-rater agreement substantially increases. A longer course may offer higher reliability but may not be feasible. Reliability is difcult to demonstrate because variability is multifactorial. Addressing it would require larger sample sizes or greater control over video content. Changing the latter would detract from the validity of the real world experience. c. Acceptability. There was high acceptability of this tool amongst both video subjects and workshop participants. However, the voluntary nature of the participation introduces bias. Data from these motivated individuals may not translate to a wider population. Resistance to implementation could be predicted to occur with many of the stakeholders. This would include trainees needing to accept the importance of this dimension of their practice and qualied anaesthetists, who work with trainees, having to incorporate the principles of ANTS explicitly in their practice. Those involved with the central examination process would need to accept devolution of their power and local centres would have to accept an increase in non- clinical workload. Trainees may also worry about the introduction of local bias into their assessment, a process which until now, has been central and viewed as impartial. d. Feasibility. ANTS does not appear to be a feasible tool to use for summative assessment in its current state. Signicant training and practice are likely to be needed, leading to low interest in potential assessors. A widespread implementation would require large human and nancial resources. Potentially, it could be used as a screening tool by assessors with limited training and using a compressed scoring system. Those rated as underperforming could be assessed further 1. Safer Surgery 200 by a small number of highly trained and committed assessors. The validity of this modication would also need to be examined. 2. General lessons learnt about implementation of a workplace-based assessment tool. a. The use of video footage was invaluable to teach potential assessors, though still failed to show all the information the audience wanted. High audiovisual quality was critical. Video has been shown to have similar validity and reliability to real-time observation (Hays et al. 2002) but is unlikely to be useful in our setting for trainee assessment due to the increased cost. b. The workshop demonstrated that techniques borrowed from other industries were appropriate in this setting. The dynamics of the group impacted more on results than we would have expected, with vigorous discussion becoming unhealthy at times. We appeared unable to dislodge preconceptions about some behaviours, despite others in the group making it clear these beliefs were held by only a very small minority. Observations of those with dissenting opinions in the group tallied closely with their standing as an outlier in the rating process. The workshop can therefore also be an opportunity to decide who is reliable enough to become an assessor. Conclusion The overriding strength of ANTS is its content validity, with effective coverage of the domains of non-technical practice of anaesthesia. This coverage is also its downfall, giving it a complexity that limits its feasibility as a summative assessment tool. The poor inter-rater reliability that we demonstrated is likely to be a feature of any workplace-based assessment tool for anaesthesia as the subtleties of medical practice make maintaining high validity and reliability together difcult. Simplifying ANTS can improve reliability and may do this without impairing validity although successfully applying the tool will remain complex. The well-mapped-out domains of anaesthesia practice should make this an ideal speciality to pioneer workplace- based assessment in medicine. Given our responsibility to government and patients to provide safe, appropriately trained anaesthetists, it seems unfeasible to not introduce an appropriate comprehensive assessment programme. References Baker, D., Mulqueen, C. and Dismukes, R. (2001) Training raters to assess resource management skills. In E. Salas, C. Bowers and E. Edens (eds), Improving Teamwork in Organizations (pp. 131–45). Mahwah, NJ: Lawrence Erlbaum Associates. Using ANTS for Workplace Assessment 201 CanMEDS (2000) Extract from the CanMEDS 2000 Project Societal Needs Working Group Report. Medical Teacher 22(6), 549–54. Downing, S.M. (2004) Reliability: On the reproducibility of assessment data. Medical Education 38(9), 1006–12. Fletcher, G., Flin, R., McGeorge, P., Glavin, R., Maran, N. and Patey, R. (2002) WP7 Report: Evaluation of the Prototype Anaesthetist’s Non-Technical Skills (ANTS) Behavioural Marker System: University of Aberdeen Workpackage Report for SCPMDE. Available from <http://www.abdn.ac.uk/iprc/ants> [last accessed March 2009]. Fletcher, G., Flin, R., McGeorge, P., Glavin, R., Maran., N and Patey., R. (2003) Anaesthetists’ Non-Technical Skills (ANTS): Evaluation of a behavioural marker system. British Journal of Anaesthesia 90(5), 580–8. Fletcher, G., Flin, R., McGeorge, P., Glavin, R., Maran, N. and Patey, R. (2004) Rating non-technical skills: Developing a behavioural marker system for use in anaesthesia. Cognition, Technology and Work 6, 165–71. Gleason, A.J., Daly, J.O. and Blackham, R.E. (2007) Prevocational medical training and the Australian Curriculum Framework for Junior Doctors: A junior doctor perspective. Medical Journal of Australia 186(3), 114–16. Hays, R.B., Davies, H.A., Beard, J.D., Caldon, L.J.M., Farmer, E.A., Finucane, P.M., McCrorie, P., Newble, D.I., Schuwirth, L.W. and Sibbald, G.R. (2002) Selecting performance assessment methods for experienced physicians. Medical Education 36(10), 910–17. Klampfer, B., Flin, R., Helmreich, R.L., Hausler, R., Sexton, B., Fletcher, G., Field, P., Staender, S., Lauche, K., Dieckmann, P. and Amacher, A. (2001) Enhancing Performance in High Risk Environments: Recommendations for Using Behavioural Markers. Zurich: Group Interaction in High Risk Environments. Swissair Training Centre. Salas, E., Bowers, C.A. and Edens, E. (eds) (2001) Improving Teamwork in Organizations: Applications of Resource Management Training. Mahwah, NJ: Lawrence Erlbaum Associates. Shrout, P.E. and Fleiss, J.L. (1979) Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin 86(2), 420–8. Spike, N., Alexander, H., Elliott, S., Hazlett, C., Kilminster S., Prideaux, D., and Roberts, T. (2000) In-training assessment – its potential in enhancing clinical teaching. Medical Education 34(10), 858–61. Woehr, D.J. and Huffcutt, A.I. (1994) Rater training for performance appraisal: A quantitative review. Journal of Occupational and Organizational Psychology 67(3), 189–205. This page has been left blank intentionally Chapter 13 Measuring Coordination Behaviour in Anaesthesia Teams During Induction of General Anaesthetic s Michaela Kolbe, Barbara Künzle, Enikö Zala-Mezö, Johannes Wacker and Gudela Grote Introduction Working in groups is widespread in medicine, especially in the operating room. Anaesthesia is a classic small-group performance situation where a variety of organizational, group process and personality factors are crucial to outcomes such as patient safety. Human factors such as breakdown in the quality of teamwork have been identied as a main source of failures in medical treatment (Arbous et al. 2001, Cooper et al. 2002, Gaba 2000, Helmreich and Davies 1996, Lingard et al. 2004, Reason 2005, Sexton et al. 2000). There is growing evidence that the ability of medical teams to deal with the required complex work processes strongly depends on adaptive team coordination (e.g., Manser et al. 2008, Risser et al. 1999, Rosen et al. 2008, Salas et al. 2007b, Schaafstal et al. 2001, Zala- Mezö et al. 2009). Coordination has been dened as the ‘structured patterning of within-group activities by which groups strive to achieve their goal’ (Arrow et al. 2000, p. 104). However, for anaesthesia teams, there is very little empirical evidence in which specic coordination behaviours can help teams maintain effective clinical performance – especially in transitions from routine situations to the management of non-routine events. In our ongoing work, we attempt to ll this gap by analysing coordination behaviour and clinical performance in routine and non-routine events. In this chapter, we will analyse the relevance of adaptive coordination in anaesthetic work and present our approach to measuring team coordination behaviour in anaesthesia. Teamwork in Anaesthesia The induction of anaesthesia is particularly demanding compared to the other tasks involved in the anaesthetic process (see Phipps et al. 2008). Clinical team performance is inuenced by a variety of factors such as team member experience . group. • • • • • • Safer Surgery 196 Comparison of scores with those of the ‘expert raters’ also showed unsatisfactory results. Qualitative Data Pre-workshop Questionnaire All participants were. project we aimed to train 15 participants as raters. This was the number we reasonably thought we could expect since we were only able to recruit from our own department. We chose a single day. trained. Surprisingly, 26 participants applied and attended the training which was held at a venue with excellent audio-visual capacity, desk space and good catering for participants and trainers. • • • Using

Safer Surgery part 23 pps

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan