Shared task on Indian Native Language Identification (INLI)

held in conjunction with FIRE 2018 @ DAIICT , Gandhinagar (7-9 Dec, 2018)

About CENTask Description

Top two best performing teams will be awarded by Cash Prize with Sheild.

download results here

download results for late submission here

Task description

In India, English is the most important language and has a status of the associated language. After Hindi, it is the most commonly spoken language in India and certainly the most read and written language. The number of second language speakers of English has constantly been on the increase and this has also contributed to its rich variation. English is blended with most of the Indian languages and is used as a second language or the third language frequently. Regional and educational differentiation distinguish the language usage and shows the stylistic variations in English. Spoken English shows great variation across the states of India and it is relatively easy to identify the native speaker using their English accent. But finding the native language of the user based on the comments or posts written in English is a challenging task in the current scenario.

Native Language Identification (NLI) is the well-known shared task its focus was to identify the native language of non-native speakers, First Native Language Identification task conducted at 2013 based on essays and 2016 spoken responses used to identify the native language globally. Recently announced NLI shared task (co-joined with EMNLP) is proposed to conduct using the essays and spoken responses from the two previous tasks. A well-known workshop PAN included the "language variety identification in Twitter" in their Author Profiling task - 2017 . Here, we have proposed a shared task to identify the native language of an Indian user based on their comments in social media.

Task: The task is to identify the native language of the writer from the given Text/XML file which contains a set of Facebook comments in English language. Six Indian languages are proposed to consider for this shared task they are Tamil, Hindi, Kannada, Malayalam, Bengali and Telugu .

Native Language Identification (NLI) can be important for a number of applications. In forensics, native language is often used as an important feature for authorship profiling and identification. Nowadays due to the huge usage of social media sites and online interactions, receiving a violent threat is a common issue faced by commuters. If a comment or post poses any type of threat, then identifying the native language of the person will be one of the significant measures in finding the source.

Registration

Please register here

Dataset

  • Training set and Test set released

Important Dates

Training Data Released

May 15, 2018


Test Data Released

June 15, 2018


Run Submission Deadline

June 25, 2018


Results Declare

July 05, 2018July 12, 2018

Working Notes Due

August 15, 2018

Conference

December 07-09, 2018

evaluation and results

download results here

download results for late submission here

Top two best performing teams will be awarded by Cash Prize with Sheild.
Organizers
Anand Kumar M, Assistant Professor, Dept of IT, NITK-Surathkal
Soman K P , CEN, Amrita Vishwa Vidyapeetham, Coimbatore, India
Student Organizers

Mr. Barathi Ganesh HB , Research Scholar, CEN, Amrita Vishwa Vidyapeetham
Mr. VinayaKumar R , Research Scholar, CEN, Amrita Vishwa Vidyapeetham
Mr. Shivkaran Singh , Research Associate, CEN, Amrita Vishwa Vidyapeetham
Mr. Vivek Vinayan , Post Grad-student, CEN, Amrita Vishwa Vidyapeetham
Shalini K. , Post Grad-student, CEN, Amrita Vishwa Vidyapeetham

References

Anand Kumar M, Barathi Ganesh HB, Shivkaran Singh, Soman KP and Paolo Rosso. Overview of the INLI PAN at FIRE-2017 Track on Indian Native Language Identification. In Proc. of Forum for Information Retrieval Evaluation 2017.

Shervin Malmasi. 2016. Native language identification: explorations and applications. Sydney, Australia: Macquarie University (2016).

Sze-Meng Jojo Wong and Mark Dras. 2011. Exploiting parse structures for native language identification. Association for Computational Linguistics, 1600-1610.

Sze-Meng Jojo Wong and Mark Dras. 2009. Contrastive analysis and native language identification. In Proceedings of the Australasian Language Technology Association Workshop. 53-61.

Shervin Malmasi, Keelan Evanini, Aoife Cahill, Joel Tetreault, Robert Pugh, Christopher Hamill, Diane Napolitano, and Yao Qian. 2017. A Report on the 2017 Native Language Identification Shared Task. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications. 62-75.

Joel R Tetreault, Daniel Blanchard, and Aoife Cahill. 2013. A Report on the First Native Language Identification Shared Task. In BEA@ NAACL-HLT. 48-57.

GET IN TOUCH

You Have any Queries? That's great! Give us a call or send us an email and we will get back to you as soon as possible!

WANT TO KNOW HOW WE CAN HELP YOU?

Phone: 0 422 2685000 (5594)

CEN,NLP

WANT TO KNOW HOW WE CAN HELP YOU?

inli_cen(at) cb.amrita.edu