Attention, Working Memory and Listening in Simultaneous Interpretation
Published: Dec. 14, 2017
Latest article update: Sept. 17, 2022
Simultaneous interpreting (SI) is difficult to reduce to an efficient and ecologically valid experimental paradigm allowing the measurement of working memory (WM) and attention — key components of all cognitive models of SI. These models could be validated and further refined in light of electrophysiological evidence. Here we propose a novel method for estimating WM load and report the results of our electroencephalographic study that lend support to the prediction of the Efforts Model of SI (Gile, 1988). In particular, this model predicts that increased WM load impairs the processing of auditory stimuli. Consistent with the model, the P1 and N1 components, elicited by task-irrelevant tone probes embedded in the source message and used as an index of attention, were significantly modulated as a function of WM load but not the direction of interpretation. Negativity in the Pl/Nl range decreased at higher values of WM load, suggesting shallower processing of the source message under high WM load regardless of the direction.
Working memory, attention, simultaneous interpretation, Efforts Model, ERP
Simultaneous interpreting (SI) is a highly complex task that involves the perception of a source message in one language and production of its translation in another language with only a few seconds’ lag time. SI requires mastery of a set of special skills which take a long time to acquire. But even after years of practice, it may still be difficult to deliver stable quality translations due to several challenges inherent in SI. First, there is a constant overlap between the speaker’s and the interpreter’s voices: the speaker starts the next utterance before the interpreter has finished the translation of the previous chunk (Chernov, 2004; Signorelli & Obler, 2013). Second, there is a sustained and substantial load on the interpreter’s working memory, because a certain amount of time is needed to react to source message stimuli (words and phrases). Even if the reaction were instantaneous, a certain delay would still be necessary, since meaningful translation is not just a word substitution exercise: it takes a phrase — sometimes an entire proposition - to establish a context within which separate words begin to make sense. Moreover, it may not always be possible to maintain linearity in the target message (Gile, 1995): syntactic constraints of the target language may make it impossible to immediately begin translation of a certain phrase. Conversely, the syntax of the source language may be such that a critical word disambiguating an entire proposition comes at the end, forcing the interpreter to wait for that word and accumulate the preceding information in WM.
Holding information in WM requires attention. Since the latter has a limited capacity, it must be borrowed from other attention-demanding cognitive tasks, including analysis of the source message, a search for translation equivalents, articulation, self-monitoring and self-correction. This is the central tenet of the Efforts Model (EM), which conceptualizes SI as a three- component mixture of Listening, Memory and Speaking (Gile, 1988,1995,1999). Each component (or effort) of SI critically depends on and competes for a limited supply of attention. For example, the more attention is engaged in the maintenance of a previously heard utterance in WM, the less attention is available for the listening effort. Indeed, giving priority to the listening effort may cause the memory trace of the previous chunk to decay before the interpreter is ready to translate it. In fact, the challenge of allocating enough attention to the listening effort while ensuring satisfactory WM performance may have a biological basis. There is evidence suggesting that attention and WM may share the same neural ‘wetware’ (Sabri et al., 2014). Some fMRI studies show a significant overlap between WM and attention in terms of which brain areas are involved (Corbetta & Shulman, 2002; Cowan, 1998). Clearly, while the prediction of the Efforts Model (that WM overload may impair the perception and processing of the speaker’s message) raises no fundamental objections, neuroimaging evidence would help identify if and how the EM could be improved.
It is possible to get a behavioral estimate of WM load by counting the number of words in the source-translation lag, but measuring attention during SI is problematic. However, numerous event-related potential (ER?) studies on non-interpreters have demonstrated that the amplitude of early Pl/Nl event-related potentials evoked by task-irrelevant probes embedded in a speech stream depends on whether or not the listener is attending to the speech (Coch, Sanders, & Neville, 2005; Hillyard, Hink, Schwent, & Tieton, 1973; Hink & Hillyard, 1976; Teder, Alho, Reinikainen, & Näätänen, 1993; Woods, Hillyard, & Hansen, 1984). Therefore, these early ER? components offer a way to monitor interpreters’ attention to the source message.
Our study aimed to find evidence in support of the Efforts Model which predicts, inter alia, that larger source-target lags (and, by assumption, higher levels of WM load) would partially impair the interpreters’ processing of the auditory stimuli. Another question that we had in mind was whether the direction of interpretation has any impact on the effect of WM load on auditory processing. One of the authors’ introspective observations as well as informal testimonies by professional interpreters (all are LI Russian speakers) strongly suggested that interpreting from English into Russian is more difficult than in the opposite direction, with other conditions being equal.
To our knowledge, the Efforts Model has never before been tested using the method of electroencephalography. Below we present and discuss the results of our ER? quasiexperiment designed to this end.
Nine professional interpreters (males aged 25 -47, M=36.9, SD = 6.757) participated in the study. All were LI Russian speakers with an average of 10.65 (SD = 6.675) years of professional SI experience. One participant was excluded from the analysis due to a very high level of noise in the EEG.
The participants were instructed to interpret 8 speeches (4 Russian and 4 English) originally delivered at the 6,849th meeting of the United Nations Security Council. Our interpreters were familiar with the subject matter discussed at the meeting (rule of law) and relevant terminology and did not require any preparation to deliver quality translations. The speeches were originally in a language other than Russian or English. The stimulus material consisted of an audio recording of the translated versions of the meeting transcripts read by a bilingual speaker highly proficient in both Russian and English. To reduce variability due to the original speakers’ different voice features (including rate, pitch, timbre, loudness, prosody and accent), the audio recording was digitally edited to ensure a constant delivery rate of 105 wpm. The total playback time was 53 minutes (excluding periods of rest between the speeches). To control for order effects, the speeches were played to the participants in a pseudo-random fashion according to a Latin square such that for every participant the text order was different. The audio of the source texts was mixed with probe stimuli (440-Hz 52-ms pure sine tones, including a rise and fall period of 4 ms) delivered with a jittered inter-stimulus interval (ISI) of 400-600ms (M-500ms). These parameters were a tradeoff between maximizing the number of probes per unit time and diminished ER? amplitude resulting from their relatively high frequency (Teder et al., 1993).
The EEG was recorded using the ActiCHamp recording system and Brain Vision PyCorder software (Brain Products GmbH) at a sampling rate of 2000 Hz from 32 electrodes. Electrode impedances were kept below 10 kfl. After recording EEG data from all participants, the source and target audio were manually transcribed. The transcripts were then time-stamped and reformatted using a custom VBA script to allow the calculation of how many content words a particular participant lagged behind the speaker at any given time. These lags were used as an estimate of WM load. The raw EEG datasets were resampled from 2000 to 250 Hz, converted to a linked-mastoids reference (TP9 and TP 10) and filtered in the pass band of 0.25 to 30 Hz using a zero-phase FIR filter. Independent component analysis (IGA) was used to remove the two most dominant components associated with eye blinks. Further, we cleaned the EEG with the artifact subspace reconstruction (ASR) algorithm (Mullen et al., 2015), with the burst parameter set to 3.5 standard deviations. The continuous EEG was then chunked into epochs of 500 ms post-stimulus onset including a 100 ms pre-stimulus baseline, yielding about 5,840 epochs per participant. The epochs were inspected for any residual artifacts that may have remained after the cleaning with ASR and IGA. Less than 5 % of the epochs were rejected.
The epochs were divided into three groups corresponding to low, medium and high WM loads estimated at probe onsets, and averaged within these three groups and two directions of interpretation (Eng-Rus and Rus-Eng). Due to a large between-subject variance in median WM loads (Figure 1), the boundaries between the three levels of WM load were computed individually for each participant and direction of interpretation (the 10th quantile separated the low and medium load, and the 90th quantile separated the medium and high load). The average Pl (10-100 ms post-stimulus) and N1 (100-200 ms post-stimulus) amplitudes were computed for the observed peaks in the grand average ERP waveforms. The dependent variables in all the statistical tests were Pl and N1 amplitudes obtained by averaging voltages at nine electrodes (F3, Fz, F4, C3, Cz, C4, P3, Pz, P4; see Figure 2).
A repeated measures ANOVA with Direction (Rus–Eng, Eng–Rus) and WM Load (low, medium, high) showed a significant main effect of WM Load in the N1 range (F(2, 14)=4.90, p=.024, η2 p=.41). There was no main effect of Direction (F(1, 7)=0.96, p=.359, η2 p=.12) or interaction (F(1.12, 7.86)=0.03, p=.883, η2 p<.01). In the P1 range, however, the main effect of WM Load failed to reach significance (F(2, 14)=3.55, p=.057, η2 p=.34), as did the main effect of Direction: (F(1, 7)=0.04, p=.856, η2 p<.05) and interaction (F(1.03, 7.24)=0.01, p=.937, η2 p<.01).
Our results are in agreement with the Efforts Model of SI (Gile, 1988,1999), at least insofar as it predicts competition between WM and listening for a limited pool of attention and that maintaining information in WM is an active attention-demanding process. Importantly, this suggests that the ability to quickly resolve competing attentional demands of the different cognitive processes in SI may be a skill that can be purposefully trained not only through SI practice, but also by doing specially designed highly efficient exercises. It also highlights that the mastery of good WM management strategies may be another important ingredient in the ability to deliver complete and accurate translations. In fact, several studies (Bartlomiejczyk, 2006; Chernov, 1994, 2004; Ilykhin, 2001; Li, 2010) have shown that simultaneous interpreters do utilize a set of strategies to manage their processing loads. For example, words and whole propositions that contribute little to the purpose of a message are the first to go.
Based on our observations, we expected that the subjectively greater difficulty of English-Russian interpretations would be related to heightened working memory load and — under the Efforts Model — diminished attention to the source. Contrary to our expectation, the absence of a main effect of interpretation direction and its interaction with WM load suggests that this is not the case. Although the median WM load was consistently lower in the English-Russian direction, there was no WM load effect on the early ERP components (Pl/Nl) that we used as an index of attention to the source. However, the fact that in the English - Russian direction the interpreters’ WM load was consistently less than in the other direction, can be seen as indirect support for the EM. It is possible that because processing an L2 source message is more effortful compared to LI, less processing capacity remains available to WM, which forces the interpreter to follow the speaker with a smaller lag.
In this study, by estimating WM load we assumed that all content words tax working memory equally. Clearly, this was a simplification of reality. The method could be improved in future research by weighting each content word according to its frequency, length or some other measure of difficulty or complexity, for example. In doing so the resulting estimate would more accurately capture the amount of effort associated with holding each specific word in memory.
Future neuroimaging studies of SI could enable researchers to validate the current psycholinguistic models of SI. Most importantly, they may guide SI instructors and practitioners in designing more efficient training curricula, practices and industry standards. If estimating WM load during SI is automated (e. g., based on speech-to-text and machine learning algorithms) it could be leveraged in real-time performance monitoring applications.
Bartlomiejczyk, M. (2006). Strategies of simultaneous interpreting and directionality. Interpreting, 8(2), 149-174. doi:10.1075/ intp.8.2.03bar
Chernov, G.V. (1994). Message redundancy and message anticipation in simultaneous interpreting. In S. Lambert, & B. Moser-Mercer (Eds.), Bridging the Gap: Empirical research in simultaneous interpretation (pp. 139-153). Chapel Hill: John Benjamins Publishing Co. doi:10.1075/ btl.3.13che
Chernov, G.V. Setton, R., & Hild, A. (Eds.). (2004). Inference and anticipation in simultaneous interpreting. Vol. 57. Amsterdam: John Benjamins Publishing Company. doi:10.1075/ btl.57
Coch, D., Sanders, L.D., & Neville, H.J. (2005). An event-related potential study of selective auditory attention in children and adults. Journal of Cognitive Neuroscience, 17(4), 605-622. doi:10.1162/0898929053467631
Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven attention in the brain. Nature Reviews Neuroscience, 3(3), 215-229. doi:10.1038/nrn755
Cowan, N. (1998). Visual and auditory working memory capacity. Trends in Cognitive Sciences, 2(3), 77. doi:10.1016/ 51364-6613(98)01144-9
Gile, D. (1988). Le partage de l'attention et le “modele d'effort” en interpretation simultanee. The Interpreters' Newsletter, 1988 (1), 4-22. Retrieved from http://hdl.handle.net/10077/2132.
Gile, D. (1995). Basic concepts and models for interpreter and translator training Vol. 8 (1st). Amsterdam: John Benjamins Publishing Company. doi:10.1075/btl.8(lsf)
Gile, D. (1999). Testing the Effort Models' tightrope hypothesis in simultaneous interpreting — A contribution. Hermes, 1999 (23), 153-172.
Hillyard, S. A., Hink, R. E, Schwent, V. L., & Picton, T.W. (1973). Electrical signs of selective attention in the human brain. Science, 182(4108), 177-180. doi:10.1126/ science.182.4108.177
Hink, R. E, & Hillyard, S. A. (1976). Auditory evoked potentials during selective listening to dichotic speech messages.
Perception & Psychophysics, 20(4), 236-242. doi: 10.3758/ BF03199449
Ilykhin, V.M. (2001). Strategii v sinkhronnov perevode (na materiale anglo-russkoi i russko-angliiskoi kombinacii perevoda [Strategies in simultaneous interpeting (based on Russian- English and English-Russian materials]. Unpublished doctoral dissertation, Moscow State Linguistics University. (In Russian).
Li, C. (2010). Coping strategies for fast delivery in simultaneous interpretation. The Journal of Specialised Translation, (13), 19-25.
Sabri, M., Humphries, C., Verber, M., Liebenthal, E., Binder, J. R., Mangalathu, J., & Desai, A. (2014). Neural effects of cognitive control load on auditory selective attention. Neuropsychologia, 61(1), 269-279. doi:10.1016/i. neuropsychologia.2014,06.009
Signorelli, T.M., & Obler, L. (2013). Working memory in simultaneous interpreters. In J. Altarriba, & L. Isurin (Eds.), Memory, language, and bilingualism: Theoretical and applied approaches (pp. 95-125). New York: Cambridge University Press.
Teder, W., Alho, K., Reinikainen, K., & Näätänen, R. (1993). Interstimulus interval and the selective-attention effect on auditory ERPs: “N1 enhancement" versus processing negativity. Psychophysiology, 30(1), 71-81. doi:10. Ill 1/j. 1469-8986.1993.tb03206.x
Woods, D.L., Hillyard, S.A., 8r Hansen, J.C. (1984). Event-related brain potentials reveal similar attentional mechanisms during selective listening and shadowing. Journal of experimental psychology. Human perception and performance, 10(6), 761-77. doi: 10.1037/0096-1518.104.22.1681