03/09/2018 7:05 AM IST | Updated 03/09/2018 7:12 AM IST

Will Google's Project Navlekha Give A Second Life To Small Indian-Language Publishers?

Project Navlekha can quickly digitise scanned pages of Hindi papers or magazines, and Google plans to add support for more languages.

Rajan Anandan, Google’s vice-president for South East Asia and India, on stage at Google For India in Delhi.

New Delhi — In India, content in regional languages dominates the offline market, but still makes for just a small part of the Internet. As Google pointed out at its annual Google for India conference last week in Delhi, "When you search in Indian languages, the content available is just 1% of what's available in English."

Even though Indian publishers have moved to digital printing platforms, most of these are proprietary, and it's not easy to bring this writing online. However, a new tool unveiled at Google for India could change this, while ensuring that the search giant is not left behind as the Indian language online market opens up.

Last week, Google unveiled Project Navlekha, a technology that allows publishers to quickly turn PDFs with Hindi text into an editable format that can instantly be published on the Internet and distributed, at no charge. This was difficult to do earlier because of a lack of standards in Indic fonts — simply copy-pasting from a scanned page from a magazine could result in gibberish and the layout of the page would also be lost. Project Navlekha uses artificial intelligence to identify the letters in a Hindi text and converts them into a standardised font.

In a live demo on the stage of Google For India, the company showed a PDF page being turned into editable text in less than 60 seconds. The presenter simply loaded the PDF, drew boxes around the Hindi text that needed to be captured, and it instantly showed up in the Navlekha window.

It was a pretty cool demo, and at Google For India, the company said that it also plans to offer Navlekha as a free publishing platform so that small Indian language publishers who may not be able to afford to work with a technology platform can quickly and cheaply bring their content online. Google added that this content could be monetised through AdSense as well, so that publishers can easily start earning from their content. Hindi publishers can sign up now, and the company plans to increase support for other Indian languages in future.

HuffPost India has reached out to Google to learn more about its plans for Navlekha and the terms of its arrangements with publishers, but the company has not responded yet.

A page created using Project Navlekha.


"There exist massive volumes of Indian language content offline, content that speakers consume for a wide range of purposes," said Arvind Pani, co-founder and CEO of Reverie Language Technologies, a Bengaluru-based company that builds Indic-language software and offers solutions for publishers looking to use Indic languages.

"However, online, it's a different story. Today, according to Google stats, only 1% of the Internet's content is in an Indian language. This is despite the fact that most of India's Internet users, over 350 million people, use the Internet In an Indian language," Pani said.

These users are starting to come online now, thanks to low-priced mobile devices that deliver a good experience (for example, India's most popular brand today is Xiaomi, whose Redmi series of budget phones is its top seller) and extremely cheap data, thanks to the "Jio effect". To make it easier for these people to access the Internet, Google also announced a number of new initiatives, such as Hindi support on Google Home, multiple language support on Google Assistant and on the Google Search feed which also makes it to entry-level phones now, via Google Search Go.

But the amount of Indian language content has to increase significantly to serve these users. "There are three ways to do this," said Pani. "One, by encouraging and facilitating the creation of more Indian language content. Two, by converting digital content in English into Indian languages. Three, by converting the large volumes of offline Indian language content through digitisation."

Reverie Language Technologies

"Rewind three years and it would have been difficult to convince most businesses that offerings in local languages have to be a key part of a company's strategy. In early 2017, Jio came and changed that perception in a matter of months – suddenly 'Bharat' came online, uncapped and lit up this opportunity," said Akshay Bhushan, Partner, Lightspeed India Partners, a venture capital firm. "Data prices have fallen from Rs.152 per GB in 2016 to less than Rs 10 per GB today making the Internet affordable and giving access to a segment of society which is experiencing the digital revolution for the first time."

With Navlekha, Google can now bring Hindi publishers online and ensure content for users to find. This also makes sure that Indian-language content on the Internet is both standardised and readily accessible.

"By increasing the availability and range of content, these users will become quicker to engage with platforms and progress to the next stages," said Pani.

This sounds like good news for both Indian-language publishers and Google, although companies which work to digitise Indian publications will find their businesses disrupted.

"It is definitely a great move from Google and would hugely benefit vernacular publishers in India who have a legacy of print publication," said Chirdeep Shetty, CEO of Quintype, a media technology company that works with Indian language publishers. Shetty did not want to comment on the specifics of Navlekha before it is formally rolled out, but a company executive, who requested anonymity, said, "This is great for the small publishers that couldn't afford to spend money on technology, but anyone who has made an investment in the last four-five years on fonts is going to be very unhappy right now. The good thing is that they [Google] have not locked this to their own platform, you can use the text in other places too, which is good."


Who are the new users coming on to the Internet, whom Navlekha aims to serve? Reverie recently carried out a survey of Internet users, and came away with some interesting findings. According to the company, Hindi is the most widely used Indian language online, but Marathi, Gujarati, and Bengali have the most active users.

And while there are a large number of urban Internet users who shop and browse in their native languages, almost 70% of Indian-language users still come from small towns and villages. What's more, although many of these users do have low-end smartphones, nearly one in four has a phone that costs more than Rs 10,000, which is to say, a phone that's comparable with high-end devices on most counts.

Reverie Language Technologies

Reverie's study also looked at how these customers are using their smartphones, and the company found that seven of the top 25 apps were social and messaging apps. Video and music streaming, however, is growing.

Predictably, WhatsApp was almost everyone's top pick, but Facebook and ShareChat are also popular amongst users of Indian languages. So are YouTube, Gaana, PhonePe and Google Tez (now known as Google Pay). Interestingly, these payments apps were more popular than banking apps amongst Indian language users—which reflects the fact that many Indian banking apps simply aren't available in Indian languages.

These things need to change and will continue to evolve going forward, Pani said.

For Google, the way to accomplish this is Navlekha.

"We will help you get your content online for free," the company says on its website. "We will not charge for our publication tools and the domain name for the first three years. All you need to get started is your content and a commitment to bring your publication online."