About the author: The authors are Fourth Year students of Gujarat National Law University (GNLU). On 12 July 2020, an expert committee established by the Ministry of Electronics and Information Technology released its report on the “Non-Personal Data Governance Framework” for the country. The report identifies the potential of data as a commodity, and aims to leverage non-personal data for domestic development.
Non-personal data and the need to regulate it
Non personal data is a set of anonymised or aggregated data which excludes personally identifiable information. This means that individuals cannot be identified using such data. The criteria of identifiability depends on a multitude of factors such as any characteristic, trait, feature of the identity of a natural person, or a combination of such features.
The idea behind a law for non-personal data (‘NPD’) emanates from the recognition that data has an economic value, which should not be restricted to big companies that have access to a large amount of data by virtue of their hegemony in the data market. NITI Aayog has also noted, in its discussion paper, that concentration of data in the hands of few dominant players creates high entry barriers for small businesses. In a conversation about the report, Kris Gopalakrishnan, the head of the Committee, stated that since larger businesses have the wherewithal to comply, the intent is to make it easy for smaller companies to become a data business and get access to data. Moreover, recognising collective data as a resource, the expert committee on NPD believes that it is an invaluable resource that can be used for good governance.
Types of Data
The Report divides NPD into three categories: (1) Public NPD, which is data collected or generated by the government/any agency of the government; (2) Community NPD, which includes ‘raw/factual’ data belonging to any community (group bound by common interests); and (3) Private NPD, which is derived from assets or processes privately-owned. Private NPD, unlike Community data, includes inferred or derived data and insights involving application of algorithms and proprietary knowledge, along with data about non-Indians.
The concept of ‘sensitive data’, which, inter alia, relates to national security or data bearing risk of collective harm to the group, has been inherited from the Personal Data Protection Bill, 2019 (‘PDP’). Further, since the consent clause under PDP for collection and processing of personal data is not wide enough to cover personal data which is anonymised such that all personal identifiers are removed, thereby converting it into NPD, the Committee suggests that the data principal be asked for her consent for anonymisation of data and its subsequent usage. However, in a competitive and innovative market where uses of data are multitudinous and ever-expanding, it is impractical to expect that a data principal will be informed of all potential uses of her data. Moreover, the lack of periodic monitoring in such regard would incentivise companies to use data for purposes that have not been consented to. The only safeguard in this regard is that the data custodian, who undertakes all processing of such data, has to do so in a manner that is in the best interests of the data principal, and with a duty of care towards the community.
Ownership of data & Data businesses
The Report notes that data is a non-rivalrous yet excludable good. It can be consumed by everyone without fear of depletion, but others can be excluded from using it through laws like copyright, trade secrets and patents. In view of this, to maximise social and economic benefits accruing to the society from data, the Committee looked into the question of ownership of data to establish legal rights over data. It recognised that there are situations where many actors may have simultaneous rights over the data, and thus adopted the notion of “beneficial ownership/interest” for community data. Hence, unlike an individual who enjoys direct control over his data, the rights over community data would be exercised by the ‘data trustee’ on behalf of the community. This trustee can collaborate with a new data regulator called the Non-Personal Data Authority (‘NPDA’) to seek and enforce data sharing. The Report indicates multiple times that this trustee may be the government or a government agency, although it refrains from making it the sole option. Public NPD, on the other hand, would be a national resource; and non-government actors collecting or producing private NPD may have rights over their data.
Important community data, which is to be recognised across industries, may be directly asked for by the data trustees or the governments from private custodians, and may be placed in ‘data trusts’ accessible to all citizens. A data trust is an “institutional structure, comprising specific rules and protocols for containing and sharing a given set of data”, and it may contain data from multiple sources that are relevant to a particular sector, and required for providing a set of digital or data services.
A new business category called ‘data businesses’ has also been proposed. These are entities collecting and processing data in the course of their business, regardless of the sector they are working in. Once they cross a certain data collection/traffic threshold, they will be required to mandatorily register themselves and submit meta-data as provided, which will then be published by the NPDA in the form of directories.
The Report requires specification of the purpose for which the data sharing request is made. It could range from data for sovereign purposes, i.e., national security, law enforcement or regulatory purposes; to ‘core public interest purpose’ which includes data for research and innovation, for policy development, etc.; and finally, data for economic purposes. With respect to private data, only the raw, factual data pertaining to the community needs to be shared, subject to well-defined grounds at no remuneration, owing to the community’s rights over its data. At points or levels where value-add is considerable to the original community data, data sharing may still be mandated but for a fair, reasonable and non-discriminatory remuneration payable to the data custodian. With increasing value-add it may just be required that the concerned data is brought to a well-regulated data market and price be allowed to be determined by market forces.
A data request may be made to the data custodian specifying the purpose of such request. If a dispute arises from a request, the NPDA would step in and evaluate the genuineness of the request, on the basis of which it will take a decision on whether to mandate data sharing or not. Notably, the role of NPDA will be different from that of the Personal Data Protection Authority (‘PDPA’) set up under the PDP. While the PDPA is tasked with regulation of personal data and thus has a restricting role, NPDA has an enabling and enforcing role. It ensures that data is shared wherever possible, to maximize social and economic welfare and innovation, along with monitoring compliance with the rules and regulations under the NPD report.
Concerns about the Draft
Although the intention of the Report is laudable, it suffers from lack of evidence, and grounds its arguments and suggestions in a rather simplistic interpretation of data. A concrete first step would have been to conduct a stakeholder dialogue to understand the concerns of those involved and the nature of rights and processes involved in data sharing.
Without defining ‘community data’ clearly, the guidelines give unrestricted power to the trustee to ask for such data from private companies. This can have devastating consequences for India’s intellectual property regime, considering the fact that private organizations invest time and resources in the collection of data because of which it is considered as their data. Further, since a ‘community’ has not been defined, there could be various overlaps with one person falling under multiple communities, and ambiguity with regards to the basis for formation of such communities. It also seeks to equate community interest with public interest, which is a flawed argument. There are various dynamics and power plays within communities which might cause the marginalised to be left out. Finally, there is also evidence showing how community data can be used to profile communities, thereby facilitating state surveillance and violating group privacy.
The draft Report also assumes that raw data is not valuable for a company, and therefore there shouldn’t be any concerns while sharing it with the government. This assumption is flawed because raw data is valued due to its potential and companies calculate the value of their data based on various factors such as their market power, market size, etc.
Another cause for concern is the Committee’s arguments for establishing rights over data. There is currently no policy framework related to data ownership at a national or international level. Although the legal knowledge on the topic is unclear and fragmented, the de facto rights lie with the company. By giving unfettered power to the government (for “sovereign purposes”) to collect and use data of private players, the report essentially establishes a digital control raj under the garb of data sovereignty. Justice B.N Srikrishna, the architect of the Personal Data Protection Bill, has also criticised the draft report. He commented that by allowing the government to ask for community data from a private company, the recommended framework gives a ‘blank cheque’ to the government.
Moreover, while the report acknowledges that anonymized data bears risk of re-identification, it doesn’t specify what measures will be taken to address this risk. Various research studies show how the susceptibility of anonymised data to the risks of re-identification is extremely high. An experimental study conducted in the US found that data anonymised through the k-anonymity process (one of the techniques mentioned in the report) can be re-identified with a success rate of over 80%. Researchers from the Imperial College, London have also demonstrated that 90% of shoppers were re-identified as unique individuals by using just four random pieces of information. Therefore, a vague standard of anonymization can pose a serious threat to privacy of citizens.
Additionally, the Committee defines various stakeholders of data infrastructure, such as data custodian and data trustee, without giving a proper framework to explain the relationship between different stakeholders. It assumes that the data trustee and custodian will always act in the “best interests” of the data principal, without giving adequate safeguards as to how the authorities would ensure this along with ensuring that the community gets a say in the decisions. The idea of data trusteeship is being deliberated upon across the globe, but the report fails to provide the principles governing such a trust and the legal responsibilities that come with it. It is imperative for the government to realize that data trusts do not automatically solve the problems associated with data businesses unless there are efficient standards and policies to make it work.
The Competition Law Review Committee has made it clear that the existing provisions of the Competition Act, 2002, are broad enough to deal with data businesses. Without taking this into account, the draft report argues that companies with access to big data have a leverage in the market because of which there is a need to regulate data. It does not answer why the present competition law regime is insufficient in addressing data related concerns. Moreover, they seem to imply that dominance in itself is anti-competitive, whereas the Competition Act clearly mentions that abuse of dominance is what breaches competition law. With two different regulators: the Data Protection Authority, and the Competition Commission of India already in place, the proposed NPDA is likely to result in overregulation and parallel proceedings in different forums.
The Report is already seeing opposition from tech giants from the United States, who are preparing to push back against the Regulations, calling it an “anathema” to promoting competition and undermining investments. India is one of the most over-regulated countries in the world and therefore, the option for voluntary data sharing should be explored to ensure minimum regulatory burden on data businesses, where access to data should be made compulsory only under specific conditions. The discussion around sharing of data held by private companies is taking place all over the world. The EU has proposed the Data Act 2021 which will “foster business-to-government data sharing for the public interest”, and will also support “business-to-business data sharing”. The proposition is being carefully navigated by holding stakeholder dialogues and evaluating the current IPR framework, with the Commission pulling back from mandatory data sharing. Establishing a regime for sharing of proprietary data has the ability to propel India to the forefront of the data economy movement, but for that a successful mechanism needs to be carefully worked out with the aim of harmonising social welfare and economic rights.