IBENGALURU — India's biggest e-commerce company is not Flipkart or Amazon, but IRCTC, the Indian Railways' ticket booking website. Twenty million people travel by train every day on average, and IRCTC sees some 60 million visits every month. That's around 2 million site visitors every day, and IRCTC issues 700,000 tickets on average each day.
This adds up to a treasure trove of over 100 terabytes of passenger data each year — name, age, phone numbers, gender, meal preferences, their income bracket, if they have a physical disability, or fall under the defence quota — much of which is very valuable in an economy premised on targeting and profiling potential customers with increasing accuracy.
Now, the government wants to sell this data to the highest bidder.
Last month, Rail Minister Piyush Goyal said the government was re-assessing the disinvestment of IRCTC, as it wants to tap the data that the platform has collected. "There is huge data with the company and that is not getting captured in the valuation. We are trying to see how we can utilise that," Goyal said in a press conference.
The minister's statement, which went largely unnoticed, is the first instance of the possible privatisation of citizen data, by a government department, to earn a profit. Data scientists are very concerned, particularly in the absence of a data privacy law. Earlier this week, the Telecom Regulatory Authority of India offered a set of recommendations on data use by private companies, but was silent on government departments which are sitting on far more granular citizen databases.
"Something about this just rings strange to me," said Vasant Dhar, a data scientist and professor at the Stern School of Business and the Center for Data Science at New York University, noting that when a passenger gives the railways her data, she doesn't expect the data to be sold further for profit.
Sharing IRCTC data as part of a disinvestment deal, he said, would mean that data given to the railways as a custodian would be passed on to unknown third parties. "Companies should monetise data," Dhar said. "You should expect something very different from the government."
Rail officials said the monetisation of railway data has been in the works for a while. Last year, then-Railways Minister Suresh Prabhu said his ministry wanted to monetise the data while somehow ensuring passenger privacy was not compromised.
"The Indian Railways is one of the largest data creators in the world. It has to handle a large volume of data which needs to be used wisely. Data analytics is a way forward," Prabhu had said. "Data itself is of no use unless it is tabulated into something."
In a speech, presenting the Rail Budget of 2016, Prabhu said that the railways is exploring possibilities of monetising user data. "Though IR [Indian Railways], as an organisation, collects over 100 Terabytes of data every year, yet it is hardly analysed to gain business insights," he said.
Rail officials, in the meantime, have spoken to ways to collaborate with private companies like Ola and Uber.
"Based on a passenger's booking history, she can get a message offering an Ola or Uber cab on reaching New Delhi railway station. We can also offer food options or a booking for National Museum or Rail Museum through the site," an officer in the ministry toldTimes of India.
Another railways official who spoke to HuffPost India also expressed a similar opinion, but flagged ethical concerns.
"A MakeMyTrip using customer data is different from the Railways using citizen data, so it becomes very important that we don't cross any lines in doing so," a senior railways officer told HuffPost India, admitting that the railways had no clear policy on how to utilise its data to improve its own services. "The problem is, there's no clear picture on how to do it."
In 2016, the IRCTC database leaked, and the information of around 1 crore people was feared stolen. IRCTC officials feared that personal details including phone numbers, date of birth and other such details of its customers have been sold in a CD for Rs 15,000 for whosoever was interested.
Although this data does not appear to be easily available anymore, Soham Gupta, the founder of a stealth-stage analytics startup in Bengaluru, said that in 2016, it was possible to find the data through a single Google search.
"There were a couple of forum posts where people were offering the whole data dump for Rs. 10,000," Gupta said. "We didn't buy the data because we weren't sure whether we would use it, but we did see some samples of the data and it looked pretty genuine. Names, phone numbers, date of birth, those kinds of details were all listed."
IRCTC isn't the only company gathering such information. Travel sites such as MakeMyTrip, Ixigo, or even Paytm, which offer rail bookings gather similar data on their customers. There are also services around rail trips.
Railyatri, for instance, offers food delivery on trains, cab and hotel bookings once you reach your destination, real-time journey data, and estimates on the chances a wait-listed ticket will be confirmed.
"If you enter your PNR number, we look at what the wait-list is, and then look at historical data of the last 100 journeys of the train, trains on that day, other trains on the same route, and we can predict with a great degree of accuracy about whether your ticket will get confirmed," a spokesperson told HuffPost India.
Another Railyatri service tells you how much time you have to book your tickets. "If you go to the site and see, 300 tickets are available, you might think that 'I can book tomorrow', but we can look at the historical data and tell you, this train will probably sell out in six hours, that train will probably have tickets available after two days, and so on."
It's also able to predict when a train will be arriving, and it does so with more accuracy than the competition. Railyatri has around 13 million registered users who use the app to check their PNR, to track their train, and order snacks from an upcoming station while a train journey is under way.
"The amount of logistics work that goes into making a food delivery within a one-minute window requires incredibly reliable data," the spokesperson said, explaining that the app used user location data to track train schedules.
"We only use the data while you're using the app so there's no privacy concern - once you're off the train we're not tracking our data," the spokesperson said. "But most people only use the Railways data of when a train reaches a station, and use that to estimate when it will reach the next station. Because we are using the location data, we're able to get a much more accurate picture."
Railyatri also publishes its insights in a report where it has traced delays to the train and station level, and measures changes to see which routes and trains are improving and which are getting worse.
"This kind of data can be used to improve the railways experience for customers," the spokesperson explained. It doesn't officially submit this information to the government but makes it publicly available, the company said.
HuffPost India reached out to the Railways PROs to find out more about third party data and what information is being tracked. We'll update the story once they respond.