Cognitive Translation Studies · Multimodality

Multimodal Integration of Cognitive and Embodied Processes in Interpreting

Alfred Nobel University Journal of Philology 2026 · 1(31) · pp. 409–421 UDC 81’25:81’33 DOI ↗
12
Interpreters studied
2
Expertise groups
3
Interpreting stages
4
Gesture types
Scroll to explore the study
Abstract

The body as a cognitive tool in real-time interpreting

This study reveals the multimodal integration of cognitive and embodied processes in interpreting by examining the interaction of verbal, visual, and kinetic modalities, focusing on gestures, gaze, and eye contact in meaning formation, managing cognitive load, and communication effectiveness in both simultaneous and consecutive interpreting. Using a mixed-methods design, the study analyzed conference speeches, simulated dialogues, and short texts, coding gaze and gestures to examine their coherence with speech and linking visual–kinetic behaviour to performance metrics across groups of experienced professional interpreters and graduate student interpreters with different experience levels.

The results reveal a dynamic interplay of bodily actions and cognitive processes. Gestures served as external memory aids: subtle pointing or rhythmic movements facilitated the structuring of information and the memorization of terminology, reducing cognitive load and freeing working memory for comprehension and retrieval. Gaze patterns reflected attentional strategies — experienced professional interpreters proactively anticipated key information, while graduate student interpreters demonstrated a reactive gaze, sometimes delaying paraphrasing.

The study presents a multimodal model of interpreting as an integrated system of cognitive and bodily processes, where gaze cues attention, gestures serve as memory aids and communicative tools, and verbal output with prosody interacts with these modalities to enhance clarity and coherence. Overall, the findings confirm that interpreting is an inherently multimodal process, and bodily behaviours serve as integrated cognitive strategies.

multimodalityinterpreting efficiencygaze gestureseye contactcognitive load embodied cognitioncognitive processing multimodal model of interpreting process
Introduction

Interpreting as a multimodal form of human communication

Traditional studies focused on linguistic precision, cognitive effort, and memory. Modern cognitive translation studies see interpreting as something the whole body performs.

In recent decades, interpreting has increasingly been viewed not merely as a linguistic activity but as a complex, multimodal form of human communication. Interpreting naturally incorporates multiple modes of expression — speech, intonation, gaze, gestures, and body movements — that operate simultaneously to create and convey meaning between the interpreter, speaker, and audience.

Integrating multimodal analysis allows scholars to explore how visual and kinetic modalities contribute to the comprehension, processing, and transmission of messages, offering a more holistic understanding of the interpreting process.

The research gap

Despite growing interest in multimodal communication, empirical research on interpreters’ eye movements and gestures remains limited. Gesture studies focus on public speaking and bilingual interaction; eye-tracking is widely used in reading and translation, yet rarely applied to the interpreting process.

Systematic research is therefore needed to explore how visual attention and gesture dynamics interact with cognitive processing during interpreting.

Theoretical Framework

From cognitive linguistics to embodied cognition

The study draws together two research traditions — cognitive-linguistic approaches emphasizing mental representations, and interactionist approaches highlighting real-time, culture-specific coordination.

Pinar · 2013

Integrates cognitive linguistics and multimodality, viewing meaning as co-constructed through multiple semiotic channels, with verbal, visual, and gestural cues interacting with perception, memory, and attention.

Jelec · 2020

Explores multimodal patterns in cognition and communication, providing empirical evidence that gestures, gaze, and posture systematically contribute to meaning-making.

Cohn & Schilperoord · 2024

Propose a cognitive framework in which multimodal language is an integrated system — gestures, gaze, and visual symbols act as linguistic elements organizing meaning and facilitating comprehension and production.

Makaruk · 2025

Examines multimodal syntactic constructions in digital English, showing they are systematically structured and classifiable by dominant semiotic components — revealing cognitive and pragmatic potential.

Özyürek · 2021

From a crosslinguistic perspective, shows that gestures and gaze are shaped by linguistic and cultural norms — multimodal strategies reflect both universal cognitive processes and language-specific regularities.

Feyaerts, Brône & Oben · 2017

Emphasize the social and pragmatic functions of gestures, gaze, and bodily cues — supporting turn-taking, emphasis, and coordination in a dynamic, interaction-oriented view of multimodality.

Managing cognitive effort

Research in interpreting often relies on Gile’s Effort Model, which describes the management of cognitive load during listening, production, and memorization. Developing this concept, Boiko (2025) examines these efforts in business translation, showing how high terminology density, rapid speech rate, and cultural differences increase cognitive stress — and reveals adaptive techniques such as anticipation, segmentation, and reformulation.

The interpreter as a bodily agent

Embodied cognition theory (Macrine & Fugate; Gallagher) posits that cognition is grounded in sensorimotor experience rather than purely abstract symbols. Milošević & Risku (2024) argue interpreters act as bodily agents whose gestures, posture, and gaze actively support attention allocation, memory retrieval, and real-time reformulation — the body becomes a cognitive tool for coping with high cognitive loads.

Aim & Tasks

Tracing where the mind meets the body

Aim of the study

To reveal the multimodal integration of cognitive and embodied processes in both simultaneous and consecutive interpreting — focusing on how mental operations of perception, comprehension, attention, working memory, and meaning reformulation are dynamically coordinated with bodily actions such as gaze, gestures, and posture during meaning construction, cognitive load management, and interpreter-mediated communication.

1

Analyze the interpreter’s efficiency through the integration of eye-tracking and gesture analysis methods.

2

Examine how interpreters coordinate visual attention and bodily movements during different interpreting stages.

3

Identify patterns of multimodal synchronization and their cognitive-pragmatic implications for interpreting efficiency.

4

Propose a multimodal model of the interpreting process that accounts for both cognitive processing and bodily interactional behaviour.

Methodology

A mixed experimental and observational design

Quantitative eye-tracking measures were integrated with qualitative gesture analysis, with all data streams time-aligned for cross-modal analysis.

Corpus
Conference speeches (5–7 minutes each) with formal and informational content, simulated professional dialogues from business and academic contexts, and brief expository texts for consecutive interpreting tasks.
Participants
Two groups of 12 participants — Group A: 6 experienced professional interpreters; Group B: 6 graduate student interpreters. Balanced and fluent in both source and target languages, so differences reflect strategy, not proficiency.
Gaze metrics
Fixation duration, saccade direction, and gaze shifts across the listening, reformulating, and presenting phases.
fixation durationsaccade directiongaze shifts
Gesture coding
Classified by type, then analyzed for frequency, function, and timing relative to speech.
iconicdeicticrhythmicmetaphorical

Why combine the channels?

All data streams were time-aligned to allow cross-modal analyses. Multimodal correlations then examined the relationships between gaze, gestures, and interpreting performance — including accuracy, fluency, and cognitive load — giving a comprehensive view of how visual and kinetic modalities contribute to effective interpreting.

The interpreter acts as a mediator, dynamically coordinating verbal and non-verbal cues under time pressure to direct attention, ensure coherence, and enhance message clarity.

Key Concepts

The modalities at work

Meaning is co-constructed through several semiotic channels rather than language alone. Each modality carries a distinct cognitive function.

Multimodality

The integration of verbal, visual, gestural, and prosodic channels into communication. In interpreting, multimodal signals support comprehension, attention management, and message delivery.

Gaze

A primary indicator of attention, focus, and cognitive processing. It guides comprehension and interaction management — distributing visual attention between speaker, notes, and audience.

Gestures

Iconic, deictic, metaphorical, and rhythmic gestures facilitate verbal expression, memory retrieval, and emphasis — functioning as external cognitive tools and memory aids.

Prosody

Intonation, stress, and rhythm convey discourse structure, emotional tone, and pragmatic intent — interacting with gestures and gaze to enhance clarity and coherence.

Embodied cognition

Cognitive processes are grounded in sensorimotor experience. The body mediates and facilitates cognition, linking mental representations with external actions during meaning-making.

Temporal synchronization

The alignment of verbal, visual, and kinetic channels in time — the mechanism that optimizes the combination of cognitive and bodily resources during interpreting.

Findings

What the eyes and hands revealed

Eye-tracking and gesture analysis exposed systematic differences between experienced professionals (Group A) and graduate students (Group B) across stages, gesture types, and coordination.

Table 1

Visual attention patterns across interpreting stages

StagePatternGroup A — professionalsGroup B — studentsCognitive implication
ListeningFixations on speaker’s face and visual aidsMore targeted fixations, moderate gaze shifts, some reliance on visual aidsLong fixations, frequent gaze shifts, reliance on visual cuesGroup B exerts higher cognitive effort; Group A shows emerging efficiency
ReformulationFixations on critical information for retrievalIntermediate fixation patterns, partially focused gazeDiffuse, chaotic gaze patterns, high cognitive loadGroup B struggles with attention allocation; Group A consolidates visual strategies
OutputFixations toward audience or notesInconsistent transitions, frequent returns to speaker or aidsMore stable transitions, occasional backtrackingGroup A shows ongoing effort; Group B demonstrates emerging automaticity
Table 2

Gesture types and functions

GestureFunctionGroup AGroup BInterpretation
IconicRepresent concrete concepts or actionsUsed inconsistently, often to reinforce comprehensionUsed strategically to emphasize key termsGroup B integrates gestures purposefully; Group A uses them compensatorily
DeicticDirect attention or reference materialsFrequent; high reliance on visual cuesLess frequent; moderate useGroup A depends more on visual cues; Group B shows emerging independence
MetaphoricalConvey abstract ideas or relationshipsEqually distributedEqually distributedBoth groups use gestures for non-literal meaning similarly
RhythmicMark speech rhythm, emphasize key pointsSporadically usedConsistently used; supports coherence and fluencyGroup B shows more automatic integration; Group A is irregular
OverallMemory support, emphasis, clarificationLess consistent; compensatory useGreater consistency and integration with speechStrategic vs. compensatory multimodal behavior
Table 3

Multimodal coordination patterns

AspectGroup AGroup B
Gaze–speechOften delayed; fixations lag behind speechAnticipates or complements speech; smoother transitions
Gesture timingGestures frequently follow verbal outputGestures anticipate or align with speech
Prosody–gestureLess consistent; rhythmic gestures not always timedRhythmic gestures consistently coincide with prosodic emphasis
Integration of cuesSlower coordination across channelsEfficient integration, supporting load management
EffectivenessLower coherence; cognitive effort visibleHigher coherence; more fluid and structured
Visualized

Coordination profile by group

Qualitative comparison — ordinal encoding of the descriptive findings in Table 3.
Figure 1

Gesture frequency and cognitive load measures

Reproduction of the study’s statistical comparison (relative values on a 0–5 scale). Series — Saccade count, Fixation duration, Gesture frequency.

Group A — strategic

A more strategic use of gestures, coordinating them with verbal output to support memory retrieval and reduce cognitive effort — reflected in shorter and more focused fixations.

Group B — compensatory

A higher but less systematic frequency of gestures, often used inconsistently as a compensatory mechanism under higher cognitive load — accompanied by longer fixations and more diffuse gaze.

Bodily action as a functionally integrated cognitive strategy

Gestures as memory aids

Subtle pointing or rhythmic hand movements helped interpreters structure information segments and recall terminological elements when paraphrasing — freeing working memory to focus on comprehension rather than retention.

Proactive vs. reactive gaze

More experienced participants showed proactive gaze, anticipating important information; less experienced interpreters showed reactive gaze, attending to notes or speaker only after key material had passed — sometimes delaying paraphrasing.

Audience monitoring

Redirecting gaze toward the audience signalled active monitoring of comprehension. Brief eye contact maintained alignment and pragmatic coherence, particularly during information-dense segments.

Coordination differs by mode

Simultaneous

Tight, minimal, strategic

  • High cognitive load, time constraints, and parallel processing demand stricter synchronization between modalities.
  • Gestures are minimalistic and rhythmic — short movements aligned with prosodic stress, so as not to disrupt speech production.
  • Gaze is limited to brief, strategic fixations on the speaker or reference materials, avoiding excessive audience engagement.
Consecutive

Flexible, expressive, orchestrated

  • Greater temporal flexibility allows more purposeful coordination of multimodal processes.
  • While taking notes and reformulating, interpreters show broader gaze movements between notes, speaker, and audience.
  • A wider range of iconic and deictic gestures supports discourse structuring, referent tracking, and listener engagement.
The Model

A multimodal model of the interpreting process

Interpreting conceptualized as an integrated system of cognitive and bodily processes, where temporal synchronization links comprehension and production.

Cognitive Regulation
attention · memory · load control
Multimodal Integration
synchronization of gaze, gesture & speech
Gaze
monitoring
Gestures
memory / load
Verbal Output
meaning
Interpreting Process
coherence · efficiency · temporal alignment

Gaze behavior reflects attentional allocation and cognitive monitoring, showing how interpreters distribute visual attention between speaker, notes, and audience.

Gestures act as external cognitive tools — functioning simultaneously as memory aids, communicative cues, and cognitive load regulators that facilitate information retention and discourse organization.

Verbal output and prosody form the primary channels of meaning transmission, where rhythm, intonation, and stress interact with gestures and gaze to enhance clarity and coherence.

The model emphasizes the temporal synchronization of these modalities and their dynamic, bidirectional contributions to both comprehension (processing the input) and production (formulating the output).

Pedagogy

Training multimodal coordination

Targeted strategies that progressively build automatic attentional control, cognitive management, and multimodal coordination.

Strategy 01

Gaze-focused training

Watch short clips and recognize key visual cues — gestures, slide highlights, facial expressions — while ignoring irrelevant movement, optimizing load management.

Strategy 02

Gesture awareness

Interpret short speeches while intentionally incorporating iconic, deictic, and rhythmic gestures, then review recordings for accuracy, rhythm, and coherence with speech.

Strategy 03

Multitasking training

Combine listening, note-taking, and verbal inference under gradually increasing difficulty and controlled distractions to build cognitive resilience.

Strategy 04

Feedback & reflection

Use eye-tracking and video to explore gaze patterns, fixation durations, and gesture synchronization, comparing against expert models for targeted adjustment.

Strategy 05

Gradual complexity

Move from short, clear speeches to terminologically rich lectures and interactive simulations with audience questions — developing adaptive, real-time integration.

Synthesis

A progressive multimodal approach

Combining gaze, gesture, and speech in increasingly complex tasks reduces cognitive load, improves interactional coherence, and prepares interpreters for high-pressure environments.

Conclusions

Interpreting is bodily, multimodally integrated cognition

Effective interpreting extends beyond linguistic competence to encompass integrated multimodal expertise. The study makes three contributions.

1

It reveals the dynamic interplay between bodily behaviour and cognitive processes during real-time interpreting, showing how gaze, gestures, and prosody are coordinated to manage attention, regulate cognitive load, and maintain accuracy under stress.

2

It integrates eye-tracking and gesture analysis within a single multimodal framework, offering a comprehensive, empirically grounded understanding of interpreter behaviour across perceptual, kinetic, and verbal domains.

3

It provides empirical support for cognitive-pragmatic and embodied cognitive models, demonstrating that temporal synchronization of gaze, gesture, and speech enhances efficiency and communicative accuracy in both simultaneous and consecutive interpreting.

Future research

Future work should examine how language-specific syntactic structures, information-packaging patterns, and cultural traditions of gesture and gaze influence multimodal strategies. Interpreters working with flexible-word-order or high-context languages such as Ukrainian or Japanese may exhibit different gaze distribution and gesture synchronization than those working with fixed-syntax languages such as English or German — pointing toward a cross-culturally sensitive model of multimodal interpreting effectiveness.

References

Works cited

Cohn, N., Schilperoord, J. (2024). A Multimodal Language Faculty: A Cognitive Framework for Communication. London: Bloomsbury. DOI: 10.5040/9781350404861
Boiko, Ya. V. (2025). Interpreting in Business: Challenges and Solutions. Folium, 6, 32–38. DOI: 10.32782/folium/2025.6.4
Feyaerts, K., Brône, G., Oben, B. (2017). Multimodality in interaction. In B. Dancygier (Ed.), The Cambridge Handbook of Cognitive Linguistics (pp. 135–156). Cambridge University Press. DOI: 10.1017/9781316339732.010
Gallagher, S. (2011). Interpretations of embodied cognition. In W. Tschacher & C. Bergomi (Eds.), The Implications of Embodiment: Cognition and Communication (pp. 59–74). Exeter: Imprint Academic.
Giberga, A., Ahufinger, N., Igualada, A., Aguilera, M., Guerra, E., Esteve-Gibert, N. (2024). Prosody and gesture in the comprehension of pragmatic meanings: The case of children with developmental language disorder. In Yi. Chen, A. Chen, A. Arvaniti (Eds.), Proceedings of the 12th International Conference on Speech Prosody (pp. 697–701). Leiden: Leiden University Publ. DOI: 10.21437/SpeechProsody.2024-141
Gile, D. (2021). The effort models of interpreting as a didactic construct. In R. Muñoz Martín, S. Sun, D. Li (Eds.), Advances in Cognitive Translation Studies (pp. 139–160). Singapore: Springer. DOI: 10.1007/978-981-16-2070-6_7
Hu, T., Wang, X., Xu, H. (2022). Eye-tracking in interpreting studies: A review of four decades of empirical studies. Frontiers in Psychology, 13: 872247. DOI: 10.3389/fpsyg.2022.872247
Jelec, A. (2020). Multimodal patterns in cognition and communication. Studia Anglica Posnaniensia, 55 (s1), 179–184. DOI: 10.2478/stap-2020-0007
Makaruk, L. (2025). Multimodal Syntactic Constructions: A Striking Feature of Digital Communication in Modern English. Alfred Nobel University Journal of Philology, 1 (29), 265–283. DOI: 10.32342/3041-217X-2025-1-29-16
Macrine, S., Fugate, J. (2020). Embodied Cognition. In K. Hytten (Ed.), Oxford Research Encyclopedia of Education. New York: Oxford Academic. DOI: 10.1093/acrefore/9780190264093.013.885
Milošević, J., Risku, H. (2024). Interpreting and embodied cognition. In C. D. Mellinger (Ed.), The Routledge Handbook of Interpreting and Cognition (pp. 324–340). London: Routledge. DOI: 10.4324/9780429297533-24
Oben, B., Brône, G. (2015). What you see is what you do: On the relationship between gaze and gesture in multimodal alignment. Language and Cognition, 7 (4), 546–562. DOI: 10.1017/langcog.2015.22
Özyürek, A. (2021). Considering the nature of multimodal language from a crosslinguistic perspective. Journal of Cognition, 4 (1): 42, 1–5. DOI: 10.5334/joc.165
Pinar, M. J. (2013). Multimodality and cognitive linguistics: Introduction to the special volume. Review of Cognitive Linguistics, 11 (2), 227–235. DOI: 10.1075/rcl.11.2.01pin
Singer, M. A., Radinsky, J., Goldman, S. R. (2008). The role of gesture in meaning construction. Discourse Processes, 45 (4), 365–386. DOI: 10.1080/01638530802145601
Tiselius, E., Sneed, K. (2020). Gaze and eye movement in dialogue interpreting: An eye-tracking study. Bilingualism: Language and Cognition, 23 (4), 780–787. DOI: 10.1017/S1366728920000309
Vranjes, J., Brône, G. (2020). Eye-tracking in interpreter-mediated talk: From research to practice. In H. Salaets & G. Brône (Eds.), Linking up With Video: Perspectives on Interpreting Practice and Research (pp. 203–233). London: Benjamins Translation Library. DOI: 10.1075/btl.149.09vra