

This process gives target language based initialized weights to the LSTM model before training the network with the original parallel data. The research involves echoing monolingual dataset, such that in the pre-training phase, the input and output sequences are ditto, to learn the target language. This paper shares two things: an efficient experiment for “pseudo” transfer learning by using huge monolingual (mono-script) dataset in a sequence-to-sequence LSTM model and application of proposed methodology to improve Roman Urdu transliteration.

The analyses of results show that the proposed RUTUT approach translates accurately than Google online translator and ijunoon online transliteration. To evaluate the RUTUT, different translational tests are performed and compare those results with famous Google online translator and ijunoon online transliteration. This research is carried out using a python programming language in programming tool Anaconda in Jupiter notebook and user-friendly Graphical User Interface (GUI) created by using Tkinter library. This research analyzed Roman Urdu data and identified different rules-based character substitution techniques that transform the Roman Urdu into Urdu script at fundamental levels. The focus of this research is to analyze the issues related to the Roman Urdu script to Urdu script transliteration and develop a translator based on the concepts of transliteration. It can transliterate the messages or descriptions from the Roman Urdu script to Urdu script which may help the Urdu speaker to elaborate their message in efficient manners. In this research, Roman Urdu to Urdu Translator (RUTUT) is proposed that consists of preprocessing methods, rule-based character substitution and Unicode based character mapping techniques. Today most of the modern technologies like computers and mobile phones using English script, due to this local Urdu user has to use English letters to type Urdu script that is Roman Urdu. A survey acknowledges that 300 million people are speaking Urdu and about 11 million speakers in Pakistan from which maximum users prefer Roman Urdu for the textual communication. In pronunciation, both are the same but different in spelling and have different shapes of the alphabet. Urdu language written in English alphabets for communication is known as Roman Urdu.
