Shared Task & Workshop                                                                                                                                                                                                                   on
                                                                                        Machine Translation System
                                                          Indian Languages                 
7th and 8th September,2017 @ Amrita Vishwa Vidyapeetham (Amrita University), Coimbatore, India
       All the accepted Shared Task and Workshop Proceedings will be submitted to for online publication.
                          About CEN Shared Task Description About Workshop


                                                    SHARED TASK DESCRIPTION
About MT :
Machine Translation is a way of converting a text from one language to another. The Automatic Machine Translation has been explored well already in English and other European Languages. But, it is still naive to Asian Languages, especially Indian Languages. The Machine Translation shared task focuses mainly on the four language pairs.
                                                                                 1.English - Tamil
                                                                                 2.English - Malayalam
                                                                                 3.English - Hindi and
                                                                                 4.English - Punjabi
Context and Goals :
In recent years, the multilingual content over the web is growing exponentially along with the development of the internet. Such multilingual content in the web is excluded from the naïve users who know only one language. Automatic machine translation is the only scalable solution to make these contents available for that type of crowd. To achieve such a massive and perennial goal, this shared task competition is proposed with the following objectives.
1) To scrutinize the state-of-art machine translation mechanisms when translating from English to Indian languages.
2) To explore the challenges faced in translating between morphologically divergent languages in terms of syntactic structure and morphology.
3) To create high quality open source parallel corpora for machine translation task for Indian languages.
4) To explore the recent Deep Neural translation systems for Indian Languages.
We believe that open-source parallel corpora provided in the shared task competition will motivate beginners and established research groups to participate in this task

Shared Task Descriptions:
*    The parallel corpus for two domains, General and Agriculture will be given for English - Tamil and English - Hindi language pair.
     For the rest, we provide parallel corpus for General domain only.
*    The training corpus can be split as training and development data.
*    Participants are requested to develop MT system under the constrained environment (use the provided training data).
*    However, participants are allowed to use their own language model.
*    Participants using their own language model should provide the statistics and source of data used for development.
*    Participants are allowed to use other open-source linguistic tools such as POS tagger, morphological analyzer/generator etc. for developing MT.
*    Participants using external linguistic tools must flag their system about the tools used.
*    The test data will be released later.
*    The translation quality is measured by a manual evaluation and automatic evaluation metric.
*    Participants are requested to contribute to do the manual evaluation.
Training Dataset Size:
*    English - Tamil         : 138k Pairs (46k from Amrita + 92k from TDIL).
*    English - Malayalam  : 103k Pairs (40k from Amrita + 63k from TDIL).
*    English - Hindi         : 162k Pairs (60k from Amrita + 102k from TDIL).
*    English - Punjabi      : 130k Pairs (53k from Amrita + 77k from TDIL).
                                                         ABOUT WORKSHOP
In addition to the MTIL shared task, the workshop will also accept scientific papers on topics related to Machine Translation in Indian Languages. Topics of interest include, but are not limited to:
*    Rule-based and Statistical Machine Translation.
*    Comparable corpora and Bilingual Embeddings for SMT.
*    Machine Translation for Morphologically Rich languages.
*    Post Editing in Machine Translation.
*    Automatic Machine Translation Evaluation.
*    Neural Machine Translation (NMT).
*    Incorporating linguistic information into SMT and NMT.
Important Dates (Workshop):
*    Paper submission deadline - 15th June,2017
*    Paper notification - Ist August,2017
*    Camera-ready version due - 8th August,2017
                                                         ABOUT PUBLICATION
*    All the accepted shared task working notes and workshop proceedings will be submitted to for online publication.
*    Extended version of the best working notes and workshop papers will be submitted to the following journal:
1. Journal of Intelligent Systems - Indexed by Scopus and E-SCI
2. Translation Review - Indexed by Scopus

Shared Task Registration closed

Workshop Registration will be commence very soon


  • Tamil
  • Malayalam
  • Hindi
  • Punjabi


We acknowledge the following Students and Ph.D Scholars, who collected and cleaned the parallel corpus in first phase of MTIL data creation activity.
 S.No            Language            Domain             Name           
  1           Hindi            General            Shivkaran Singh, M.Tech-IInd year           
  2           Hindi            Agri            D.Jothiratnam, Ph.D Scholar  & M.P Sushama, Amrita University           
  4           Malayalam            General            B.Premjith, Ph.D Scholar & G.Athira, Ph.D Scholar           
  5           Malayalam            Agri            B.Premjith, Ph.D Scholar           
  3           Punjabi            General            Shivkaran Singh, M.Tech-IInd year           
  6            Tamil           General            CEN           
  7            Tamil            Agri            S.Vaithehi, Project Staff           

   Important Dates
    (Shared Task)

           Training Data Release            Test Data Release            Run Submission Deadline            Results Declared
           ----------------------------            -----------------------            ----------------------------------            -----------------------
                 Ist May, 2017               Ist June, 2017                   5th June, 2017              25th June, 2017

           Working Notes Due            Ack.of Working Notes            Camera Ready DeadLine            Conference Date
           --------------------------            ----------------------------            --------------------------------            -----------------------
                 10th July, 2017               10th August, 2017               15th August, 2017            7th & 8th Sep.,2017


Updates on Paper Submission
Workshop papers can be submmitted to the following link

Cash Prizes or Travel Grant will be awarded for the Top performing Teams (Each Language Pair) in the Shared Task.