publications | Furkan Akkurt

2024

TURNA: A Turkish Encoder-Decoder Language Model for Enhanced Understanding and Generation

Gökçe Uludoğan, Zeynep Balal, Furkan Akkurt, and 3 more authors

In Findings of the Association for Computational Linguistics ACL 2024, Aug 2024

Abs Bib HTML PDF Code

The recent advances in natural language processing have predominantly favored well-resourced English-centric models, resulting in a significant gap with low-resource languages. In this work, we introduce TURNA, a language model developed for the low-resource language Turkish and is capable of both natural language understanding and generation tasks.TURNA is pretrained with an encoder-decoder architecture based on the unified framework UL2 with a diverse corpus that we specifically curated for this purpose. We evaluated TURNA with three generation tasks and five understanding tasks for Turkish. The results show that TURNA outperforms several multilingual models in both understanding and generation tasks and competes with monolingual Turkish models in understanding tasks.
@inproceedings{uludogan-etal-2024-turna, title = {{TURNA}: A {T}urkish Encoder-Decoder Language Model for Enhanced Understanding and Generation}, author = {Uludo{\u{g}}an, G{\"o}k{\c{c}}e and Balal, Zeynep and Akkurt, Furkan and Turker, Meliksah and Gungor, Onur and {\"U}sk{\"u}darl{\i}, Susan}, editor = {Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek}, booktitle = {Findings of the Association for Computational Linguistics ACL 2024}, month = aug, year = {2024}, address = {Bangkok, Thailand and virtual meeting}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2024.findings-acl.600}, pages = {10103--10117}, }
Strategies for the Annotation of Pronominalised Locatives in Turkic Universal Dependency Treebanks

Jonathan Washington, Çağrı Çöltekin, Furkan Akkurt, and 7 more authors

In Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, May 2024

Abs Bib HTML PDF Slides

As part of our efforts to develop unified Universal Dependencies (UD) guidelines for Turkic languages, we evaluate multiple approaches to a difficult morphosyntactic phenomenon, pronominal locative expressions formed by a suffix -ki. These forms result in multiple syntactic words, with potentially conflicting morphological features, and participating in different dependency relations. We describe multiple approaches to the problem in current (and upcoming) Turkic UD treebanks, and show that none of them offers a solution that satisfies a number of constraints we consider (including constraints imposed by UD guidelines). This calls for a compromise with the ‘least damage’ that should be adopted by most, if not all, Turkic treebanks. Our discussion of the phenomenon and various annotation approaches may also help treebanking efforts for other languages or language families with similar constructions.
@inproceedings{washington-etal-2024-strategies, title = {Strategies for the Annotation of Pronominalised Locatives in {T}urkic {U}niversal {D}ependency Treebanks}, author = {Washington, Jonathan and {\c{C}}{\"o}ltekin, {\c{C}}a{\u{g}}r{\i} and Akkurt, Furkan and Chontaeva, Bermet and Eslami, Soudabeh and Jumalieva, Gulnura and Kasieva, Aida and Kuzgun, Asl{\i} and Mar{\c{s}}an, B{\"u}{\c{s}}ra and Taguchi, Chihiro}, editor = {Bhatia, Archna and Bouma, Gosse and Dogruoz, A. Seza and Evang, Kilian and Garcia, Marcos and Giouli, Voula and Han, Lifeng and Nivre, Joakim and Rademaker, Alexandre}, booktitle = {Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024}, month = may, year = {2024}, address = {Torino, Italia}, publisher = {ELRA and ICCL}, url = {https://aclanthology.org/2024.mwe-1.25}, pages = {207--219}, }
Evaluating the Quality of a Corpus Annotation Scheme Using Pretrained Language Models

Furkan Akkurt, Onur Gungor, Büşra Marşan, and 4 more authors

In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024

Abs Bib HTML PDF Code Poster Slides

Pretrained language models and large language models are increasingly used to assist in a great variety of natural language tasks. In this work, we explore their use in evaluating the quality of alternative corpus annotation schemes. For this purpose, we analyze two alternative annotations of the Turkish BOUN treebank, versions 2.8 and 2.11, in the Universal Dependencies framework using large language models. Using a suitable prompt generated using treebank annotations, large language models are used to recover the surface forms of sentences. Based on the idea that the large language models capture the characteristics of the languages, we expect that the better annotation scheme would yield the sentences with higher success. The experiments conducted on a subset of the treebank show that the new annotation scheme (2.11) results in a successful recovery percentage of about 2 points higher. All the code developed for this work is available at https://github.com/boun-tabi/eval-ud .
@inproceedings{akkurt-etal-2024-evaluating-quality, title = {Evaluating the Quality of a Corpus Annotation Scheme Using Pretrained Language Models}, author = {Akkurt, Furkan and Gungor, Onur and Mar{\c{s}}an, B{\"u}{\c{s}}ra and Gungor, Tunga and Ozturk Basaran, Balkiz and {\"O}zg{\"u}r, Arzucan and Uskudarli, Susan}, editor = {Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro and Sakti, Sakriani and Xue, Nianwen}, booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)}, month = may, year = {2024}, address = {Torino, Italy}, publisher = {ELRA and ICCL}, url = {https://aclanthology.org/2024.lrec-main.577}, pages = {6504--6514}, }

2023

TULAP - An Accessible and Sustainable Platform for Turkish Natural Language Processing Resources

Susan Uskudarli, Muhammet Şen, Furkan Akkurt, and 4 more authors

In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, May 2023

Abs Bib HTML PDF Code Poster

Access to natural language processing resources is essential for their continuous improvement. This can be especially challenging in educational institutions where the software development effort required to package and release research outcomes may be overwhelming and under-recognized. Access towell-prepared and reliable research outcomes is important both for their developers as well as the greater research community. This paper presents an approach to address this concern with two main goals: (1) to create an open-source easily deployable platform where resources can be easily shared and explored, and (2) to use this platform to publish open-source Turkish NLP resources (datasets and tools) created by a research lab. The Turkish Natural Language Processing (TULAP) was designed and developed as an easy-to-use platform to share dataset and tool resources which supports interactive tool demos. Numerous open access Turkish NLP resources have been shared on TULAP. All tools are containerized to support portability for custom use. This paper describes the design, implementation, and deployment of TULAP with use cases (available at https://tulap.cmpe.boun.edu.tr/). A short video demonstrating our system is available at https://figshare.com/articles/media/TULAP_Demo/22179047.
@inproceedings{uskudarli-etal-2023-tulap, title = {{TULAP} - An Accessible and Sustainable Platform for {T}urkish Natural Language Processing Resources}, author = {Uskudarli, Susan and {\c{S}}en, Muhammet and Akkurt, Furkan and Gürbüz, Merve and Gungor, Onur and {\"O}zgür, Arzucan and Güng{\"o}r, Tunga}, booktitle = {Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations}, month = may, year = {2023}, address = {Dubrovnik, Croatia}, publisher = {Association for Computational Linguistics}, pages = {219--227}, }

2022

BoAT v2 - A Web-Based Dependency Annotation Tool with Focus on Agglutinative Languages

Salih Furkan Akkurt, Büşra Marşan, and Susan Uskudarli

In Proceedings of The International Conference and Workshop on Agglutinative Language Technologies as a challenge of Natural Language Processing (ALTNLP), Jun 2022

Abs Bib PDF Code Slides

The value of quality treebanks is steadily increasing due to the crucial role they play in the development of natural language processing tools. The creation of such treebanks is enormously labor-intensive and time-consuming. Especially when the size of treebanks is considered, tools that support the annotation process are essential. Various annotation tools have been proposed, however, they are often not suitable for agglutinative languages such as Turkish. BOAT-v1 was developed for annotating dependency relations and was subsequently used to create the manually annotated BOUN Treebank (UD_Turkish-BOUN). In this work, we report on the design and implementation of a dependency annotation tool (BOAT-v2) based on the experiences gained from the use of BOAT-v1, which revealed several opportunities for improvement. BOAT-v2 is a multi-user and web-based dependency annotation tool that is designed with a focus on the annotator user experience to yield valid annotations. The main obiectives of the tool are to: (1) support creating valid and consistent annotations with increased speed, (2) significantly improve the user experience of the annotator, (3) support collaboration among annotators, and (4) provide an open-source and easily deployable web-based annotation tool with a flexible application programming interface (API) to benefit the scientific community. This paper discusses the requirements elicitation, design, and implementation of BOAT-v2 along with examples.
@inproceedings{akkurt-etal-2022-boat, author = {Furkan Akkurt, Salih and Marşan, Büşra and Uskudarli, Susan}, title = {{BoAT v2 - A Web-Based Dependency Annotation Tool with Focus on Agglutinative Languages}}, booktitle = {Proceedings of The International Conference and Workshop on Agglutinative Language Technologies as a challenge of Natural Language Processing (ALTNLP)}, year = {2022}, month = jun, }
Enhancements to the BOUN Treebank Reflecting the Agglutinative Nature of Turkish

Büşra Marşan, Salih Furkan Akkurt, Muhammet Şen, and 7 more authors

In Proceedings of The International Conference and Workshop on Agglutinative Language Technologies as a challenge of Natural Language Processing (ALTNLP), Jun 2022

Abs Bib PDF

In this study, we aim to offer linguistically motivated solutions to resolve the issues of the lack of representation of null morphemes, highly productive derivational processes, and syncretic morphemes of Turkish in the BOUN Treebank without diverging from the Universal Dependencies framework. In order to tackle these issues, new annotation conventions were introduced by splitting certain lemmas and employing the MISC (miscellaneous) tab in the UD framework to denote derivation. Representational capabilities of the re-annotated treebank were tested on a LSTM-based dependency parser and an updated version of the BoAT Tool is introduced.
@inproceedings{marsan-etal-2022-enhancements, author = {Marşan, Büşra and Furkan Akkurt, Salih and Şen, Muhammet and Gürbüz, Merve and Güngör, Onur and Betül Özateş, Şaziye and Üsküdarlı, Suzan and Özgür, Arzucan and Güngör, Tunga and Öztürk, Balkız}, title = {{Enhancements to the BOUN Treebank Reflecting the Agglutinative Nature of Turkish}}, booktitle = {Proceedings of The International Conference and Workshop on Agglutinative Language Technologies as a challenge of Natural Language Processing (ALTNLP)}, year = {2022}, month = jun, }