01/03/2019 11:49 AM IST | Updated 01/03/2019 11:49 AM IST

India Has Big Data; But Poor Data Quality Hurts Real Development

Having a policy strategy specifically drawn out for big data would define the regulatory and contractual aspects of each type of data which would enlarge the scope of data use.

Mint Images via Getty Images

Google Maps, Alexa, Siri, personalised Netflix options, YouTube suggestions, Pinterest feeds, calling an Uber, chatbots on websites and even  targeted brand advertisements on social media platforms have all become a part of our daily lives.

In order to work, these platforms make use of user data. This data is a small part of the big data used by companies to innovate and structure more products and services for the end users. While we cannot and often do not want to function without these technologies — as customers, our primary fear is for our privacy.

With the launch of the Digital India campaign, India has been moving swiftly in the direction of smart cities, increased public and private technological investments, smarter policing technologies, increased medical breakthroughs, and greater operational efficiency based on big data.

The most current use of real-time data and AI, for efficient crowd management, is at the ongoing Kumbh Mela in Prayagraj. The Government has also invested approximately $480 million for scaling the heights of AI, machine learning and IoT but without efficient leverage of data as a business asset along with a regulated risk assessment, corporate data strategy and a policy structure, India risks being left behind in this technological race.

Recently, several companies including GlaxoSmithKline, General Motors and Johnson and Johnson took part in the 2019 Big Data and AI Executive Survey by NewVantage Partners. As per the survey, several companies have increased investments in big data and AI, but are not seeing commensurate results. One of the primary reasons for this is that while data is the fuel for AI, several bottlenecks prevent its effective use. Restriction in big data flow, which is often defined by volume, velocity and variety, may result in an incomplete and biased data set. Therefore, while the quantity of data should not be regulated, what needs to be controlled is the quality of data and the method of analysis.

Big data including personally identifiable data is reams of data made available from your digital presence, which when merged and analysed together could have significant social impact.

In accordance with the strategy of ‘cleaning’ big data, that is, maintaining the quality of big data, it is essential for any company to plan and create a risk assessment and policy strategy, for use and deployment of any analysis.

This will have a three fold impact – first, the organisation will be able to assess the growth of the company based on absorption and analysis of various types of data. Second, it would be able to modify the security and regulatory practices for each type of data. Third, it would be able to permit cross-sectoral data use. Data silos is one of the biggest challenge for several organisations.

This would also allow the company to be clear about the data it’s collecting, and why — which would make it more trustworthy to consumers. Often, legal verbiage clouds users’ understanding of their rights, and this has resulted in the need for a category of mobile applications that exist to block surveillance by data brokers.

This makes data mining look automatically harmful, and also hinders cross-sector data integration, limiting the spread of “open data”. 

Big data including personally identifiable data is reams of data made available from your digital presence, which when merged and analysed together could have significant social impact. For instance, suicidal behaviour prediction was conducted by Alina Joseph and Ramamurthy B from Christ College, Bangalore, with various data mining techniques by analysing data gathered from suspect tweets.

In another case in 2017, the Income Tax department launched ‘Project Insight’ through data mining and analysis to curb the circulation of black money in India. In these kinds of cases, allowing reasonable access to your data is not a bad thing.

While the expenditure on becoming AI centric is on the rise, there is much less focus on data management which fuels the growth of AI. The main aim for most data-intensive companies would be to achieve a data-driven culture which lays down the big data policy structure, risk assessment with various types of data collected and understanding the data challenges whether organisational or technological in nature.

It is also important for organisations to give back to society—either through cross-sectoral data access, or data analysis for social good. Additionally, companies would also need to ensure that the dataset collected is not biased and efficient software tools for data analytics are maintained.

Establishing the big data architecture, as specified above, including storage and access would enable efficient use of big data without any security loopholes. Policies pertaining to big data architecture would primarily encompass guidelines for each stage of data processing from collecting the data, to structuring using analytical tools like hadoop and then presenting it in a manner for analysis to assist in corporate growth.

The conflict between several corporate policies is often due to a departure from and/or a varied interpretation of the tenets of big data primarily because of the lack of domestic laws or principles. The way forward would be to align national and corporate policies pertaining to the various stages of data architecture including data quality, usage guidelines and management of big data. The main aim of organisations should be to re-define the impact and value of data generated from IoT into knowledge for growth as well as social good.