AI Voice Replication: Bridging Familiarity and Ethics in a Digitally Enhanced World (2024)

By Ryoko Arakawa, Lauren Challis, Yasodara Cordova and Alexander Galanos - 27 June 2024

AI voice replication technology raises complex ethical and legal issues. It affects copyright law, privacy rules and other human rights, and even trust between people and society. In this blogpost, we examine some of the main concerns of AI voice replication technology. We also discuss how it is currently regulated and how its risks can be mitigated.

The voice of David Attenborough is part of his identity and an important aspect of his career. Recognised worldwide for his voice, which represents the ubiquitous narrator in the BBC Natural History documentary series "Life", the British broadcaster occasionally unwittingly lends his voice to memes all over the internet.

Suppose a 15-year-old boy in Brazil uses HuBert (a self-supervised speech learning representation model) and Bark (a voice input-to-audio output model), both available on Hugging Face (a platform for the machine learning community), to clone David’s voice and then narrate a famous football match, such as Arsenal vs. Hull City in 2014. Also, suppose that this match, now narrated by a deep fake of David’s voice, has become a source of income for the teenager through Google-served ads on his YouTube channel.

The above hypothetical situation is quite possible. Today, there are dozens of open models for machine learning involving voice recognition, with the potential to accurately replicate voices - famous or not - for various purposes, called synthetic voice. If we consider that David’s voice is not only his professional signature, source of income, and media presence, but also a unique fingerprint of his identity, the questions emerging from this case extend far beyond copyright, but also encompass privacy and other human rights, as well as various ethical considerations.

When Grief Meets AI

AI voice replication is also used for other purposes. While the notorious ChatGPT was released in November 2022, some companies were already working on the development of ‘grief tech’ apps. For instance, HereAfter AI allows users to interact with their deceased loved ones through an AI-generated audio simulation of the deceased person, enabling real-time conversation.

Arguably this could affect the relationships we hold with people who passed away. While ‘grief tech’ might aid during the period of grief, it could at the same time interfere with grief in less desirable ways, such as delaying acceptance and moving forward, causing difficulty coping with emotional trauma, or cultivating an unhealthy relationship with AI. It could also affect the self-determination of the deceased persons if they did not consent to the use of their data beforehand. Finally, it could also influence the way we perceive death more generally, since the AI system enables a situation where a deceased person is kept ‘alive’ synthetically. This in turn leads to questions of digital personhood: to which extent is a natural person linked to their digital person? Do they hold rights over their digital persona? And, at the moment a natural person ceases to exist, does this mean their digital persona should cease to exist, too?

Kant’s deontological approach may shed light on how we ought to treat the dead. According to his theory, and the importance he attaches to human dignity, people should always act in a way so that humanity is (also) used as an end and never merely as a means. Thus, a person should be treated as an identifiable entity, and not solely as an object. This respect for personhood does not necessarily cease with death, and therefore could be interpreted as extending to what can represent a person’s memory or knowledge. In this case, does the digital persona become the ‘objectified’ version of the deceased, rendering the practice of ‘reviving’ the dead through a synthetic voice undutiful? We put our bets on it that, if Kant would hear about this innovation, he would roll over in his grave (pun intended).

Further exploration of the concept of objectification is exemplified by the recent case involving OpenAI and Scarlett Johansson, which has opened discussions regarding the unauthorised use of one’s voice and its implications for individual rights. This case also underscores ongoing debates within gender studies research, particularly concerning the potential for AI to perpetuate prejudice, racism, and other forms of discrimination in its outputs. It highlights persistent challenges within socio-technical systems that remain unresolved.

Another ethical challenge of AI voice replication – relevant to both examples mentioned – is its potential misuse for the purpose of social, emotional or economic manipulation. Voice replication could, for instance, be employed tomanipulate social relations, making it difficult to know what is objectively true. It could also be used to impersonate people leading to fraudulent transactions andfinancial losses. Moreover, misinformation spread through synthetic voices is already causing economic and political disruptions all over the world. In short, AI voice replication can undermine social trust and cohesion at a large scale, affecting not only individuals but the entire society.

Legal challenges of AI voice replication

AI voice replication can offer several benefits, for instance when developing creative and artistic content on the internet, facilitating social interactions for people with speech disabilities, or even promoting freedom of speech, including parody, and criticism. It can also assist with translation services, fraud detection and prevention (such as identity verification) and productivity gains (e.g. virtual assistants). However, given the ethical concerns mentioned above, regulators are also increasingly concerned about the development and application of this technology, contemplating the need for (new) legal action.

The legal landscape related to deep fake technology is complex, as many legal domains are affected. If people’s voices are used to train AI systems, the protection of their private lives and of their personal data can be at stake. AI systems need data to train and improve, which typically involves large amounts of sensitive data and – in the case of AI voice generation – this will often also entail biometric data. Moreover, voice replication systems can perpetuate a person’s presence on the Internet even after their death, going against the right to be forgotten (as protected by Article 17 of the General Data Protection Regulation, though the GDPR only applies to the personal data of living persons).

Another legal issue pertains to copyright. Thetrend toward voice replication of established artists and celebrities has been growing for some time. As the systems enabling these practices are trained on massive amounts of publicly available data, there is a risk that the data could infringe the intellectual property rights of third parties (includingcopyrighted material). Copyright experts disagree about concepts of permitted use, output, and value, as well as to which extent exceptions to copyright apply. Currently, there are claims being litigated in a host of jurisdictions around first order questions of what is derivative work under intellectual property law, what is fair use, and whether intellectual property protection applies without human authorship. As this is still a nascent area of jurisprudence, there are few definitive judicial rulings on such questions.

In addition to the above, it is crucial to continue guaranteeing the protection of human rights more generally when using AI voice replication systems. In addition to the right to privacy, AI developers must also consider the prohibition of discrimination as enshrined in Articles 2 and 7 of the Universal Declaration of Human Rights (UDHR), Articles 2 and 26 of the International Convent of Civil and Political Rights (ICCPR), Article 2 of the International Covenant on Economic, Social and Cultural Rights (ICESCR), and Article 21 of the EU Charter of Fundamental Rights (CFR). The generation of hate speech could also result in a violation of Article 20(2) of the ICCPR which prohibits such speech. Likewise, spreading disinformation with AI-created voices may violate the right to vote and be elected (Article 25 of the ICCPR) as disinformation has already been challenging democracy and elections all over the world. The right to health (Article 10 of the ICCPR and Article 12 of the ICESCR) should also be considered, as voice replication technology could contribute to the spread of wrong healthcare information, which could undermine the right to seek, receive and impart health-related information.

It should be noted that these treaties and the rights they protect have been implemented and interpreted in different ways in national jurisdictions. This leads to uncertainty and creates a fragmented landscape in which globally active voice technology companies need to comply with different and sometimes conflicting legal frameworks. Moreover, some frameworks (such as UDHR) are not legally binding, which is why regional human rights treaties often play a more important role along with domestic laws.Overall, this legal uncertainty presents challenges for users and developers of deep fake technology. Infringements of legal obligations - whether (in)direct or (un)intentional - could lead to costly claims that can simultaneously result in significant damage to the users’ reputation.

Different governments, different approaches

In light of this fragmented legal landscape, how should governments best address the trade-offs that arise within the wide array of AI voice generation use cases? Are the existing laws, ethical standards, and norms sufficient to mitigate the mentioned risks and to promote the benefits of the technology? And if not, which policies should be adopted to tackle these risks without unduly restricting fundamental rights and freedoms such as free expression?

In the U.S., the White House orchestrated a number of voluntary commitments around safety, security and trust by leading AI companies (such as Amazon, Google, Meta, Microsoft, Anthropic, and OpenAI). These commitments include three specific areas that are relevant for the concerns we mentioned above. Commitments 5, 6 and 7 respectively require providers of AI systems to report on ‘inappropriate use’, to deploy them in ways that ‘enable users to understand if audio content is AI-generated’, and to ‘prioritise research on societal risks posed by AI systems, including on avoiding harmful bias and discrimination, and protecting privacy’. The White House’s Executive Order on AI also recommends agencies to introduce watermarks or to otherwise label output from generative AI.

In the EU, the Artificial Intelligence Act (AI Act) aims to protect fundamental rights while promoting trade and fostering a single market for AI through a risk-based approach. Whilst the AI Act does not restrict deep fake content or the technology itself, Article 50 requires providers and deployers to disclose that the contents have been artificially generated and manipulated.

Moreover, Article 5 of the AI Act prohibits “the placing on the market, putting into service or use of an AI system that deploys subliminal techniques beyond a person’s consciousness in order to materially distort a person’s behaviour in a manner that causes or is likely to cause that person or another person physical or psychological harm”. We would argue that this prohibition may apply to AI voice replication technology causing financial, political or health harm as mentioned above.​​​​

While it is difficult to compare the US’ voluntary commitments and the EU’s AI Act, one relevant observation is that both focus on transparency and information as a way of mitigating risks stemming from the misuse of synthetic voice. The voluntary commitments encourage public reporting on inappropriate use and societal risk of model capabilities, enabling the user to understand (as opposed to being informed about) audio content, including provenance, and for providers to prioritise research on societal risks, including privacy. Likewise, Article 53 of the AI Act obliges providers of General Purpose AI (GPAI) models to draw up, keep up-to-date and make available information and documentation to AI system providers to the extent this does not breach intellectual property rights or any confidential information protected under the law. Providers of GPAI models with systemic risks have additional obligations. Codes of practices will be established to facilitate compliance.

Although both jurisdictions value transparency and information disclosure, addressing responsibilities along the value chain is complex and challenging as various actors participate in the creation of voice replication technology. We believe that, currently, both the US’ voluntary commitments and the AI Act inadequately allocate responsibilities, remain ambiguous, and insufficiently protect human rights.

Conclusion

AI voice replication is no longer science fiction but reality. While the technology itself has been developing drastically, current regulations are rather uncertain, unspecific and complex. Although some governments emphasise the importance of labelling content as AI-generated, this poses certain technological challenges, in particular to ensure that systems effectively apply labels across many modalities, such as audio, text and images.

The current regulations and mitigation measures do not resolve the numerous challenges we discussed. Minimising the risks while also maintaining the potential of AI voice replication will require coordinated efforts that bring together diverse groups from various backgrounds, including researchers, industry professionals, policymakers, and civil society. Addressing AI voice generator systems demands a balanced, careful and pragmatic approach, and the acknowledgment that the solutions will neither be purely technical, nor purely regulatory. In order to identify the best way forward, we argue that more research should be undertaken about the impact of voice replication technology.

Note on the authors

Ryoko ARAKAWA is a researcher at Keio Global Research Institute (LinkedIn/Twitter).

Lauren CHALLIS is a Responsible AI consultant at Considerati (LinkedIn).

Yasodara CORDOVA is the Head of Privacy & Identity Research at Unico IDtech and a member of the investments committee of the Co-develop Fund (LinkedIn/Twitter).

Alexander GALANOS is the Group General Counsel of Affinda Group (LinkedIn).

This article solely reflectsthe views of the authors, and does not represent the position of the Faculty or the University.

Back to Blog

AI Voice Replication: Bridging Familiarity and Ethics in a Digitally Enhanced World (2024)
Top Articles
Latest Posts
Article information

Author: Pres. Carey Rath

Last Updated:

Views: 6543

Rating: 4 / 5 (41 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Pres. Carey Rath

Birthday: 1997-03-06

Address: 14955 Ledner Trail, East Rodrickfort, NE 85127-8369

Phone: +18682428114917

Job: National Technology Representative

Hobby: Sand art, Drama, Web surfing, Cycling, Brazilian jiu-jitsu, Leather crafting, Creative writing

Introduction: My name is Pres. Carey Rath, I am a faithful, funny, vast, joyous, lively, brave, glamorous person who loves writing and wants to share my knowledge and understanding with you.