From Methodology to Research Findings: Tomáš Gráf on Spoken English and Learner Corpus Research

Date : 2026-01-23 Department : College of Foreign Languages & Literature
【Article by Department of English】

On Tuesday, January 20, 2026, an academic lecture hosted by Professor Siaw-Fong Chung of the Department of English at National Chengchi University (NCCU), and supported by the National Science and Technology Council (NSTC), was held at NCCU Dah Hsian Seetoo Library. The lecture featured Dr. Tomáš Gráf from the Faculty of Arts at Charles University, Prague, affiliated with the Department of Linguistics and the Department of English Language and ELT Methodology. His talk, entitled ‘Methodological Approaches and Questions in Learner Corpus Research’, addressed key issues in learner corpus research from a methodological perspective.

The event opened with remarks by Professor Siaw-Fong Chung, who introduced the speaker’s academic background. With long-standing engagement in corpus linguistics research, Professor Chung drew on her extensive research experience to outline the scholarly context and significance of the lecture and helped the audience appreciate the importance of the topic. The event attracted enthusiastic participation from scholars and students in linguistics and language teaching, as well as faculty and students from the Colleges of Communication, Education, and Social Sciences. This wide participation highlighted a vibrant atmosphere of cross-disciplinary academic exchange on campus.

Dr. Gráf has long been devoted to learner corpus research, with a particular focus on English learners’ fluency and accuracy, as well as differences between advanced learners and professional language users. In this lecture, he reviewed his fifteen years of research on the design and construction of learner corpora, from early general-purpose corpora to more recent specialized corpora developed for specific research aims. He also shared methodological reflections drawn from years of hands-on research practice.

In her opening remarks, Professor Chung further noted that, compared with general corpora, learner corpora pose greater challenges in construction. A key reason lies in the prevalence of interlanguage phenomena in learner language, which require corpus developers to make informed judgments and to annotate linguistic features accordingly. When additional variables such as register and language variety are incorporated into corpus design, linguistic annotation itself becomes a major challenge, and decisions about what to include or exclude become critical. These practical considerations correspond closely to the core methodological issues addressed by the speaker in learner corpus research.

During the lecture, Dr. Gráf used the internationally renowned spoken learner corpus LINDSEI (Louvain International Database of Spoken English Interlanguage) as a central case study to illustrate how corpus design can profoundly shape research outcomes. He emphasized that seemingly neutral elements, such as task design, instructions, and interviewer feedback, often function as ‘hidden variables’ that influence learner language production. For example, asking learners to avoid making mistakes as much as possible versus encouraging them to express themselves naturally without worrying about errors can lead to markedly different language performances and can therefore affect analyses of fluency and accuracy. Likewise, framing a task as a test may alter learners’ communicative strategies, with consequences for researchers’ observations of fluency, accuracy, and structural complexity.

Dr. Gráf also introduced several learner corpus projects developed by his research team based on LINDSEI, including the CzErasmus Corpus, which tracks changes in students’ language performance before and after study-abroad experiences, and the Expanded LINDSEI, which incorporates data from Taiwanese learners. Drawing on these projects, he highlighted the crucial role of data comparability and the collection and design of metadata in cross-linguistic and cross-learning-background comparisons. He stressed that only through systematic collection of detailed information on learners’ linguistic backgrounds, learning experience, and task conditions at the corpus design stage can researchers avoid the conflation of proficiency differences with task effects or first-language influence, and this improves the interpretability and reliability of research findings.

At the level of methodological reflection, Dr. Gráf further cautioned that while large-scale corpora are effective in revealing overall trends and group-level patterns, they may also weaken or obscure differences among individual learners. When researchers examine measures of language performance, such as fluency, accuracy, or complexity, they should therefore interpret averages with care and remain attentive to individual-level variation and the diverse developmental strategies used by different learners, rather than equate statistical trends directly with actual learner behavior.

At the conclusion of his talk, Dr. Gráf returned to fundamental questions in learner corpus research and discussed the potential ‘flattening’ effect that large corpora may impose on individual learners’ developmental paths in both research design and analysis. He called on researchers to look beyond quantitative analysis and to continue attention to learners’ individual developmental paths, strategic choices, and the variation and turning points that appear across different stages of learning.

He quoted, “The beauty of corpora is that they can be created with a purpose,” and encouraged early-career scholars to define their research questions clearly and to plan methodological designs carefully before data collection, so that corpora truly serve research aims rather than merely increase in scale. Overall, the lecture offered rich and thought-provoking insights into learner corpus research and second language acquisition, and it also demonstrated the concrete outcomes of NSTC-funded projects carried out by faculty at this university, as well as their continued efforts to promote linguistic research and international academic exchange.