AI Art, Copyright & Questions of Theft
AI-art has landslided into public view, with questions around whether it is truly "art" and whether there are issues with IP law. These next few years will reveal crucial case rulings for the future.
With the recent and rapid rise of AI-art generation, there has been backlash about how these technologies are used, how they conflict with the interest of artists and how they can possibly be regulated given the ambiguity over things like copyright, ownership and usage. This newsletter explores some of the legal landscape around AI art, with reference to the actual design of the technology from the perspective of a data scientist, in order to explore and contextualize claims around “art theft” and other common themes and topics emerging in public discourse. With AI companies earning profits amongst concerns of transparency and consent in data, it is also important to examine the role they play in this technology’s adoption.
As it stands, a standalone piece of AI-generated art is not considered protected under copyright law. There may be copyright for cases where there is proof of sufficient human contribution. The range of human contribution to the piece is a spectrum. For example, when using text-based AI, you can generate scripts, dialogues and code, which requires specific skills on the behalf of a person, called prompting abilities, and may require several hours of carefully-considered work in order to land at these results. In which case, documenting this process becomes important, but also raises the question of whether this responsibility should really fall onto the AI-operator individual in the first place. There’s also the risk that two individuals may use AI technology to generate images that are virtually identical. On this basis, perhaps it’s in the best interests of AI-operators to keep an audit of their processes to avoid any legal conflicts that could occur, until laws have sufficiently developed to address the important questions for both AI-users and artists.
A major concern with AI art models is memorization. This occurs when a Generative AI model is able to create a synthetic piece that is extremely close to the original in terms of its structure and content. This similarity can be an ambigous. Below is a noticeable example of memorization appears, but with plenty of noticeable changes.
There are clear cut examples of memorization:
This generated image is not copyrighted and could be used and distributed by anyone freely, raising obvious questions around copyright infringement. It will, for now, be up to a court to decide a resolution. The court will look at the circumstances to determine what has been copied (source). In this case, perhaps the general outline and design of the character could be sufficient cause for copyright infringement. This will also have to take into consideration whether the piece was totally AI-generated or just generated with AI assistance. This again raises the importance of AI-users keeping a log of their processes in order to demonstrate this, for possible legally grey areas. This points out the need for services and tools to help them AI-users do this, something challenging within the current system many AI-models are provided in. It is clear though that AI-users who are motivated enough to use the technology end up using it as a tool, carefully developing their ability to use it, shaping their creations, and so there should be no assumption of guilt of the AI-user.
Van Lindberg, an adviser has written on why AI art generations should be copyrighted. He points out that a photographer’s most basic selection process has been found sufficient to make an image copyrightable. The supreme court said "only a modicum of creativity" is required to make the work copyrightable. As a description of the creative process, he referenced the several dozens of images the user had to choosing specific prompts and modifications for, going through many iterations just for a frame in part of a broader collection.
According to the Copyright act, a piece is covered as long as creative output is fixed into any tangible medium of expression, which is the case for AI-art creators. This could lead to some legal turmoil between say, an AI-art user who makes something that resembles a specific image belonging to an artist. It does not seem likely that, given the low requirements for creative input on the part of an AI-art user, an artist would succeed in any kind of litigation or claims against an AI-art, unless there is clear example of memorization.
In any case, memorization is unlikely if the dataset training the model was large and diverse, which is usually the case with many state-of-the art models (such as Stable Diffuison and GPT). The model itself can be tweaked to make memorization virtually impossible. Surrounding the controversy of AI-generated pieces that resemble a real person’s art piece, there has been a stream of misinformation around the process of AI-art generation.
This model learns the ability to recreate something, but only an approximation of the thing. This ability, or knowledge gained, is always changed and updated when training it on different things. Each artist could have their art removed from the training data without affecting the model's generation abilities to any noticable degree. The exception being immortalized artists such as Van Gogh (who fall outside of copyrighted restrictions anyway), or perhaps a tiny subset of artists who have been over-represented in the model's training.
In some sense, you could view the model as being "inspired" the same way a human might be, learning by trying to recreate certain examples. There are different ways in which a human may learn, which may explain a recent surge in questionable explanations of how AI-art generations works. One person may simply look at a few paintings and then use those styles in their piece, while another may consciously try to recreate the structure, even content of a piece they are inspired by. Both are valid forms of learning, so it is hard to see why there is a material difference when it comes to AI vs human art creation.
Some would argue the way an AI generates art is a form of “collaging”, however, this is conceptually different, as a collage is simply the rearrangement and selective omission of parts of an art work, whereas an AI’s generation process focuses on a series of complex mathematical operations involving noise and denoising. While there is at least some memorization occurring in most of the recent models, it’s clear that moving forward, AI-art generation technologies, such as Stable Diffuson, will be ensuring that over-representation and memorization do not occur.
Many synthetic pieces have an arbitrary resemblance to a piece that was used in the training data. This is the the nature of art, which seldom ever exists as in the form of independent items, but rather, as items that are thematically or stylistically similar to combinations or groups of pre-existing things. Subjectivity will play a role in examining a pair of images and determining whether the one is a copy of the other.
Such a determination should made as if both pieces were created by a person and not an AI. Another factor is the sheer number of users and generations occuring, and the social circles where said users spread around popular prompts, like memes. A user of the AI art models could use specific prompting and biased inputs to generate a picture that resembles a pre-existing one. Furthermore, an artist who does this to try prove that “art theft” is possible, is assuming that a real person would use an AI tool to do this, which is not a rational assumption to make. Judge Learned Hand stated that a “defendant’s work infringes on the plaintiff’s if the ordinary observer, unless he set out to detect the disparities, would be disposed to overlook them, and regard their aesthetic appeal as the same.” There are, however, here are plenty of ways in which actual memorization could harm not just artists but designers and organisations of all kind.
There may be some conflict to be resolved in court, if an AI-user can prove that their input into the model’s generations was enough to be considered as “AI-assisted” or “AI-generated”. These users may generate something that they can prove as a “valid creative expression” that, due to a number of different factors, resembles another piece that is also copyrighted. Part of the problem here is that we don’t even have the language necessary to flesh out these concepts and distinctions so there will be a ton of important case law manifesting over the next few years of history.
Another important question is around the role of data usage and consent in many of thes arguments. Some vocal artists are campaigning and lobby against AI-art usage, under groups such as the Concept Art Association. The broader artist community have various concerns about the usage and training of AI generators. Regardless of the clear utility of AI-art generations, it is important to look into the background of profitable AI companies such as Stability AI, Midjourney and OpenAI. The way they have used people’s data has involved creating complex business structures so they could make the legal claim that their usage of people’s IP from the Internet was covered under Copyright for Non-Commercial Research and Private Study. The ambiguity lies due to the incompleteness of “Fair Use”. The U.S. Copyright Office writes:
“Courts look at how the party claiming fair use is using the copyrighted work, and are more likely to find that nonprofit educational and noncommercial uses are fair.”
This seems then, that the likes of Stability AI are covered under Fair Use, but it is not clear whether this would be the ruling in many scenarios. There has been an increasing number of data and technological regulations since the General Data Protection Regulation (GDPR), so it is likely AI-image generation will be regulated regardless of current case law. Users in general do have a right to see what data of theirs is being used and to be able to get it deleted from the company’s databases. However, the ambiguity with AI, in the form of Machine Learning models, means that many outcomes will depend on the interpretation of the technology and the law.
Some groups are citing misinformation around the AI-generation models in their public communication, and taking aim not only at large-scale, for-profit companies who use artist data, such as Stability AI, but also at smaller, more independent project. These same projects have often worked towards models that using artist’s data without explicit consent. It’s worth noting that some leading critics of the use of AI-art generation, including Karla Ortiz, have ties to the Disney corporation who are launching their own AI generative tools, without any clarification of whether these tools are also free of the same problems addressed by the Concept Art Association.
Furthermore, many groups with Anti-AI art views referencing the work of artists who publish their work onto websites such as Artstation and Deviantart, where the data is then scrapped and used to train a generative model. It is not clear how this is “theft” in any meaningful sense. But it is clear that the ambiguity of current laws around IP usage has made many people concerned about the business practices of these profitable AI companies. It raises an important issue of data rights and protections, an ongoing conversation since the rise of Big Tech.
There is a sense that if one’s data is used to generate revenue or be used for a particular outcome, the user should have some knowledge and say in how the data is used. From the perspective of a company, they could and often do hire an artist or designer who browses these websites for inspiration, maybe even using it as templates, or having deliberate study of its structure. This designer then uses this acquired knowledge or reference data to create images that later become products. However, there is a question not of quality but of scale. There is something unnerving about having several large AI algorithms being routinely updated and checked on images that could be of their faces, their art, images of their friends and family. While the contribution of each person is low, there is a persisting sense that there has been a breach of data privacy, especially by those in the arts or graphics world. It is important to note that many of the leading AI image generation companies have addressed, to some degree, issues around using artist’s data without their consent.
With the new iteration of the Stable Diffusion generative AI tools, artists are given the ability to opt-out, as to omit their art being used in the training data. This could be framed as a protection of their “data rights”. Art, however, especially listed on a public or open forum such as Artstation, cannot be considered as personal data and so the question of the rights around this is an open one. Even if the art could be argued as or given the same protections as personal data, AI, in the form of machine learning algorithms, is considered to have obtained aggregate information about the data it’s trained on, and so, the final data generated is not personal data, although the aforementioned problem with memorization may also further muddle this distinction.
The next year, as mentioned will be very important, will be an important time in legal history, as the courts will be creating case law that will shape the next decades of AI art usage and regulation.
One of the earliest examples of data protection can be traced back to the 1970s, when the U.S. government introduced the Privacy Act of 1974. This legislation established a set of guidelines for the collection, use, and dissemination of personal information by federal agencies. In the 1980s, the European Union (EU) began to address data protection issues with the adoption of the Data Protection Directive. This directive established a set of principles that governed the processing of personal data within the EU, including the right to be informed about the collection and use of personal data, the right to access and correct personal data, and the right to object to the processing of personal data. The United Nations (UN) has recognized the right to privacy as a fundamental human right, and in 2016, the UN adopted the International Covenant on Civil and Political Rights (ICCPR), which includes provisions related to the protection of personal data. In the US, only a few states have any significant data protection laws, such as the California Consumer Privacy Act (CCPA) of 2018. Later, the GDPR has provided individuals greater control over their personal data and imposes significant fines on organizations that fail to comply with its requirements, combined with various ligitations and congress visits from Big Tech companies.
On a final note, veering away from the topics of theft and copyright addressed, it’s fun to consider that any new laws or regulations may one day infringe on the liberties of so called “AI-citizens”, a concept where AI agents are given some form of rights the same way a person would be. The awful adapation of Asimov’s “I, Robot” scifi book, has the protagonist asking a whether a robot could write a symphony, as an evaluation of its humanity. Since AI can write a symphony, perhaps there’s also a case for giving it ownership and copyright protection, especially if the AI resembles an AGI and has language abilities to explain, motivate and defend its claims. Judges in Australia and South Africa have ruled AI as able to be listed as inventors in patent applications, but as time goes on, more salient ruling will occur to truly explore the idea of an AI being worthy of rights in any meaningful sense. It may be the battles in court end up going from between people to between people and robots.