Deciphering India’s Bid to Monetize Government Data

By: Dhruv Somayajula


The government has recently published the draft ‘India Data Accessibility and Use’ policy (IDAUP), which promises a framework for easier access to government data. This policy is the latest attempt by the government to derive economic value from its data, following the National Data Sharing and Accessibility Policy notified in 2012. India is currently experiencing a digital boom, with widespread access to cheap mobile internet and deeper penetration of internet access. Simultaneously, the Indian government has embraced the use of information technology, as seen in the launch of numerous e-projects in the last two decades. These projects include both projects where government-owned non-personal data is created based on anonymising interactions with Indian citizens, and the purely non-personal data processing that does not involve any individuals at any stage. The latter may include weather, traffic, military, scientific, commercial and economic research data processed by the government in a digital medium. Data processed by the government thus presents enormous economic potential, and the IDAUP seeks to leverage just that.


The IDAUP provides for licensing high-value datasets (HVD) within government data that have undergone value addition through defined pricing guidelines. A similar approach to monetizing government data has been attempted in the past. The NDSAP enabled ministries to frame pricing policies for their respective data sets. Keeping this in mind, the introduction of a fresh data access and use policy, as opposed to tweaking the existing NDSAP, is puzzling. Neither the IDAUP or its background note provide clarity on this replacement.


Creating value from government data


At the outset, the concept of generating monetary value from government data needs to be examined. Under Section 17(d) of the Copyright Act, 1957, the government is the first owner of the copyright over any literary works, such as computer databases, tables, compilations and programs, that is made or published under the direction or control of the government. The term of the copyright protection for government works under Section 28 of the Copyright Act, 1957 is sixty years after the year in which the work is first made or published. Therefore, the government does hold intellectual property rights over the databases that are prepared by it, and can choose to license its property out in a manner of its choice, as affirmed under Section 30 of the Copyright Act, 1957.


However, there is growing emphasis that ownership over government data, and its subsequent use, must be balanced and developed in public interest. Parts of government data may also be derived from public non-personal data, which brings in the community as an interested stakeholder in the use of this data. The Expert Committee Report on Non-Personal Data Governance Framework, 2020 chaired by Kris Gopalakrishnan (the Kris Gopalakrishnan Report) has clarified that for the purposes of public non-personal data, data custodians can include the government, who shall have a duty of care towards the community from which the data is collected, and a responsibility towards data stewardship.


To summarize, the government retains a legal right over government data and thus is entitled to generate value from such datasets through licensing. However, this must be balanced by the government’s role as a custodian towards the non-personal data generated from the public and its responsibility to the community at large.

Concerns with monetizing ‘public’ data


In this context, a practice of monetizing government data raises glaring concerns. Licensing HVDs to government departments or private entities may create a perverse incentive for greater data collection to ensure revenue maximisation. This particular concern of government data being licensed to private entities for revenue generation is not without precedent.


In 2019, the Ministry of Road Transport and Highways in India scrapped a policy allowing private entities to access the vehicle registration databases Vahan and Sarathi, but not before it earned over Rs 111 crores for permitting this access. Vahan’s database comprised of 25 crore vehicle registrations and Sarathi’s database comprised of 15 crore driving licenses, which were shared via a bulk data sharing policy with over 170 entities from 2014 to 2019.


Ultimately, this policy was discarded with the ministry flagging concerns of privacy and potential misuse of personal data. Viewing data collected from the public as an opportunity to raise revenue may raise similar ethical risks, unless stringent limitations are placed on what kind of data sets can be put up for licensing.


Protecting personal information disclosed under the IDAUP


In addition to incentivising sharing of public data for revenue maximisation, there continue to remain obvious privacy concerns with the current IDAUP framework. The government data to be shared with the public either through a licensing framework or as open data shall include various kinds of personally identifiable data fields. Clause 13 of the IDAUP provides for a future set of anonymisation framework to govern this policy. However, beyond this anonymisation there is little to highlight the concerns of sharing personally identifiable data.


In contrast, the open license for data sets published in 2017 under the NDSAP allowed entities to use government data under an open access license. The open access license specifically excluded personal information, identity documents, and data that the data provider was not authorised to license or share forward. Further, the implementation guidelines under the NDSAP categorised personally identifiable data sets under the negative list, i.e., not to be shared. This protection to datasets containing personal information is warranted. In recent years, studies have proven the possibility of re-identifying anonymised data, made particularly convenient when further access to datasets of similar fields is shared. As a result, requiring anonymisation of personal data is a necessary first step, but cannot be seen as a panacea for all the potential issues of data aggregation and profiling. Additionally, the costs associated with anonymisation must be contemplated for adequate budgetary allocation and smooth rollout of anonymisation within the policy.


To effectively privacy and data protection concerns from licensing or sharing data under the IDAUP, anonymisation must be one part of a multi-pronged approach. This can be complemented with preparing negative lists or carving out specific data fields for which open access or license is not permitted. Lastly, re-identification of anonymised data-fields must be punishable under the IDAUP. Under the current iteration of the Personal Data Protection Bill, 2019 (PDP Bill) tabled before the Parliament, re-identification of anonymised data is a criminal offence. A regulatory framework such as the PDP Bill must therefore be put in place parallelly with the IDAUP in order to effectively mitigate privacy and data protection concerns emanating from this policy. Analogous EU directives on the re-use of public sector information expressly reference EU data protection regulations in addition to directing anonymisation of personal data.


Recommendations for societal welfare through IDAUP


The concerns of data protection and misuse for revenue maximisation can resolve the risks emanating from operationalising the IDAUP. However, framing the end objectives of the IDAUP towards community-based societal welfare may positively aid the framework developed for its implementation. As discussed above, checks and balances are needed to avoid creating a perverse mechanism that prioritises revenue maximisation over social welfare or data minimisation. These checks and balances can be framed through a list of objectives broadly based on public interest. Licensing HVDs to private entities under the IDAUP must incorporate partiality towards public interest or areas which directly benefit the public. This framework can interplay with data trusts and data stewards conceptualised under the Kris Gopalakrishnan Report.


Secondly, the government is the best agent of data collection and aggregation, and can collect data from citizens with little resistance. This sovereign mechanism that enables statistical offices and district representatives to collect vast amounts of data cannot be tainted with the idea of data being collected for profit motives. The implementation guidelines to be issued under the IDAUP by the India Data Office can specify that the license issued under the IDAUP would only permit re-use of government data. This can offset concerns of the government using its unique data collection powers to collect data on behalf of high-paying clients.


Thirdly, keeping in mind the role of the government being integral in being able to collect the data, the IDAUP must ideally provide for means through which the government may retain control over the end-use of licensed government data. Through this control, the government may seek to encourage parity in the digital ecosystem by offering incentives to micro, small, and medium enterprises licensing government data.


Lastly, a proactive framework can unlock the developmental potential in HVDs by emphasising equity of access in the digital ecosystem. For example, government datasets on traffic data, road coverage, accidents, and public transport fares may benefit indigenous start-ups working on local ride-hailing alternatives. Charging a uniform rate for licensing these HVDs may allow established players with deep pockets to retain or gain a dominant position in the market and lock out the smaller players from growing in the market. As this policy is currently awaiting revisions based on public comments, it is hoped that the twin concerns of public good and privacy protections are emphasised in future iterations.




Dhruv is a Research Fellow with the Centre for Applied Law and Technology Research at Vidhi. He is interested in the interplay between law, technology and civil liberties.

Recent