In India, English is the most important language and has a status of the associated language. After Hindi, it is the most commonly spoken language in India and certainly the most read and written language. The number of second language speakers of English has constantly been on the increase and this has also contributed to its rich variation. English is blended with most of the Indian languages and is used as a second language or the third language frequently. Regional and educational differentiation distinguish the language usage and shows the stylistic variations in English. Spoken English shows great variation across the states of India and it is relatively easy to identify the native speaker using their English accent. But finding the native language of the user based on the comments or posts written in English is a challenging task in the current scenario.
Native Language Identification (NLI) is the well-known shared task its focus was to identify the native language of non-native speakers, First Native Language Identification task conducted at 2013 based on essays and 2016 spoken responses used to identify the native language globally. Recently announced NLI shared task (co-joined with EMNLP) is proposed to conduct using the essays and spoken responses from the two previous tasks. A well-known workshop PAN included the "language variety identification in Twitter" in their Author Profiling task - 2017 . Here, we have proposed a shared task to identify the native language of an Indian user based on their comments in social media.
Task: The task is to identify the native language of the writer from the given Text/XML file which contains a set of Facebook comments in English language. Six Indian languages are proposed to consider for this shared task they are Tamil, Hindi, Kannada, Malayalam, Bengali and Telugu .
Native Language Identification (NLI) can be important for a number of applications. In forensics, native language is often used as an important feature for authorship profiling and identification. Nowadays due to the huge usage of social media sites and online interactions, receiving a violent threat is a common issue faced by commuters. If a comment or post poses any type of threat, then identifying the native language of the person will be one of the significant measures in finding the source.
Anand Kumar M, Assistant Professor, Dept of IT, NITK-Surathkal |
Soman K P , CEN, Amrita Vishwa Vidyapeetham, Coimbatore, India |
Mr. Barathi Ganesh HB , Research Scholar, CEN, Amrita Vishwa Vidyapeetham
Mr. VinayaKumar R , Research Scholar, CEN, Amrita Vishwa Vidyapeetham
Mr. Shivkaran Singh , Research Associate, CEN, Amrita Vishwa Vidyapeetham
Mr. Vivek Vinayan , Post Grad-student, CEN, Amrita Vishwa Vidyapeetham
Shalini K. , Post Grad-student, CEN, Amrita Vishwa Vidyapeetham
Anand Kumar M, Barathi Ganesh HB, Shivkaran Singh, Soman KP and Paolo Rosso. Overview of the INLI PAN at FIRE-2017 Track on Indian Native Language Identification. In Proc. of Forum for Information Retrieval Evaluation 2017.
Shervin Malmasi. 2016. Native language identification: explorations and applications. Sydney, Australia: Macquarie University (2016).
Sze-Meng Jojo Wong and Mark Dras. 2011. Exploiting parse structures for native language identification. Association for Computational Linguistics, 1600-1610.
Sze-Meng Jojo Wong and Mark Dras. 2009. Contrastive analysis and native language identification. In Proceedings of the Australasian Language Technology Association Workshop. 53-61.
Shervin Malmasi, Keelan Evanini, Aoife Cahill, Joel Tetreault, Robert Pugh, Christopher Hamill, Diane Napolitano, and Yao Qian. 2017. A Report on the 2017 Native Language Identification Shared Task. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications. 62-75.
Joel R Tetreault, Daniel Blanchard, and Aoife Cahill. 2013. A Report on the First Native Language Identification Shared Task. In BEA@ NAACL-HLT. 48-57.
You Have any Queries? That's great! Give us a call or send us an email and we will get back to you as soon as possible!