近日,礼丰律师事务所受国际知名法律媒体《中国法律商务》(China Law & Practice)的邀请,与德国Redeker Sellner Dahs律师事务所合作撰写《中欧生成式人工智能监管:企业须知的重点问题》(Generative AI in China and the EU: What Businesses Need to Know)一文。该文以中欧对比为视角,探讨了生成式人工智能在中国和欧盟可能面临的个人数据保护、生成内容的透明度要求、相关登记与备案要求、输入输出内容的知识产权侵权风险等法律和监管风险。该文由合规部合伙人宇文沛律师、合规部胡运思律师以及Redeker Sellner Dahs律师事务所的律师Dr. Anja Geller撰写。
Generative AI in China and the EU: What Businesses Need to Know
Summary
Personal data protection is a key legal issue arising from the use of generative AI tools, especially regarding the lawfulness of processing
Both China and the EU demand transparency concerning AI-generated content and require some form of notification, although details diverge
In addition, there are many unresolved issues concerning potential copyright infringements
The global race to develop better AI models and systems is picking up speed. While the US is still the frontrunner, China has started catching up: the large language model (“LLM”) DeepSeek has attracted particular attention recently due to its notable efficiency, which makes a more widespread use of “generative AI” (GenAI) feasible. GenAI generates text, image, audio or video content, and LLMs are a sub-category specializing in text processing and generation.
For both developing and deploying GenAI, companies must strictly adhere to increasing regulatory obligations. These mainly include personal data protection, transparency, notification, and copyright.
1. Personal data protection
GenAI models are typically trained on vast amounts of data, a large proportion of which is personal data. It is therefore not surprising that GenAI tools such as ChatGPT and DeepSeek have already been closely scrutinized by European data protection authorities. For example, the Italian data protection authority temporarily banned ChatGPT in 2023 and DeepSeek in 2025, investigating violations of the EU’s General Data Protection Regulation (“GDPR”).
• Lawfulness of data processing
In particular, the lawfulness of processing training data is questionable. A legal basis is required in both China and the EU, Article 13 of the Personal Information Protection Law(个人 信 息 保 护 法 ) (the “PIPL”) and Article 6(1) GDPR. A regular legal basis of data processing is consent. In Wu v. Company (2023) Jing 0491 Min Chu No. 3821, the Beijing Internet Court held that the creation of a video by using deep synthesis technology to “change the face” of the plaintiff without her consent violated her right to personal information. When it comes to large amounts of personal data, obtaining consent is inefficient and expensive, and companies are likely to seek an alternative. As an additional legal basis, Article 13 of the PIPL lists the necessity for the performance of a contract with the data subject. However, this legal basis may only be applied in limited scenarios.
In the EU, the legal basis of legitimate interests in Art. 6(1)(f) GDPR appears to be the most viable. “Legitimate interests” include economic interests in the use of GenAI models, and the processing performed is necessary for the purposes of these interests when alternative options are not sufficient. Under the required balancing test, socially beneficial applications and privacy-enhancing strategies generally weigh in favor of the legality. However, in the case of GenAI models, their broad scope, diverse applications, and risks, such as model inversion, disinformation, and fraud, weigh heavily against it. When data subjects cannot reasonably expect such processing, their interests and fundamental rights can override the legitimate interests of the developer (Recital 47 GDPR), which is likely to be the case in most instances of GenAI model training. Thus, it is questionable whether the legal basis of legitimate interests applies in this context. Regarding special categories of personal data, Article 9(2) GDPR does not contain this legal basis at all. Instead, data may be processed when it has been “manifestly made public” by the data subject or when the data subject has given explicit consent.
“Furthermore, GenAI models and systems violate the principle of data accuracy when they produce incorrect information”
• Purpose limitation, data minimization and data accuracy
In addition to a potential lack of lawfulness, the principles of purpose limitation and data minimization of the GDPR and PIPL are challenged by GenAI models, which can be repurposed for different tasks and are trained on large datasets. Furthermore, GenAI models and systems violate the principle of data accuracy when they produce incorrect information.
Here, enforcement will likely focus on significant mistakes, although even that issue already poses a challenge for developers and deployers.
• Informing the data subjects
The GDPR additionally requires companies to provide certain information to data subjects. When data is scraped from the internet for training purposes and not directly collected from the data subject, Article 14 of the GDPR applies. Due to the scope of the data which GenAI processes, informing each individual requires extensive efforts and might even be impossible. In that case, companies can make the information publicly available instead of providing it individually, Article 14(5)(b) GDPR.
“There is currently no common standard, and best practices seem to be a moving target”
• Risk mitigation measures
To address these manifold data protection concerns, GenAI developers should try to train models on smaller datasets. At the very least, unrestricted web scraping should not be an absolute necessity. Additionally, strengthening privacy-preserving measures remains important. While traditional methods such as pseudo-anonymization and encryption are not sufficient in the context of large datasets, they are still valuable. Newer techniques should also be explored. Examples are differential privacy and machine unlearning, which remove the influence of specific data points. There is currently no common standard, and best practices seem to be a moving target.
2. Transparency of AI-generated content
In the area of AI-specific regulation, both China and the EU demand transparency concerning AI-generated content (“AIGC”). In China, AIGC should be specifically labelled to inform both users and other market players. Pursuant to Article 3 of the Measures for Labeling Synthetic Content Generated by Artificial Intelligence (人工智能生成合成内容标识办法), there are two types of labels. Explicit labels are perceptible to users, displayed as text, sound, images, or other forms, while implicit labels are machine-readable labels embedded in the metadata of AIGC. The Cyberspace Administration of China (“CAC”) has indicated that it will strengthen its supervision in this area by enforcing special rectification actions in 2025.
When it comes to AI-specific obligations, the EU’s Artificial Intelligence Act (the “AI Act”) distinguishes between GenAI models and systems. In the language of the AI Act, large GenAI models are a typical example of “general-purpose AI models” (“GPAI”), becoming “AI systems” when further components, such as a user interface, are added, Article 3(63) and Recitals 97, 99, 105 AI Act.
Like most provisions of the AI Act, the transparency obligations concerning AIGC only apply to AI systems. Providers of GenAI systems must ensure that AIGC is marked in a machine-readable format and is detectable as artificially generated or manipulated, Article 50(2) AI Act. Recital 133 of the AI Act lists watermarking, metadata identification, cryptography, and logging as possible techniques that can already be implemented in the GenAI model to facilitate compliance by downstream GenAI systems. In cases of deep fakes, the deployers of the GenAI system must additionally disclose that the AIGC has been artificially generated or manipulated by labelling it accordingly, Article 50(4) AI Act.
“GenAI application service providers must register the relevant algorithms with the CAC, if their service has “public opinion attributes or social mobilization capabilities”
3. Notification and registration
In China, to ensure responsible AI innovation and supervise the development of GenAI, providers are generally required to complete two administrative formalities before launching their GenAI application service online.
The first one is the “algorithm registration” of the internet information service (互联网信息服务算法备案): GenAI application service providers must register the relevant algorithms with the CAC, if their service has “public opinion attributes or social mobilization capabilities”.
Secondly, when the GenAI application service is provided by a GenAI model which integrates a pre-existing and filed model, the service provider should register its service at the CAC prior to launching (大模型登记). If the GenAI model itself is a newly developed model, the developer is required to undergo a security assessment and file a security assessment report to the CAC (大模型备案). The security assessment focuses on several aspects, such as training data security, AI model security, and security measures, subject to the CAC’s specific instructions on each case.
In the EU, providers of high-risk GenAI systems must register such systems and conduct a conformity assessment, Article 16(f)(i) AI Act. Providers of GPAI models with “systemic risks” must perform model evaluation, mitigate systemic risks, document and report serious incidents, and ensure cybersecurity protection, Article 55(1) AI Act. Furthermore, providers must notify the Commission when their GPAI model meets the classification requirements, and the Commission ensures that a list of GPAI models with systemic risks is published and kept up to date, Article 52 AI Act.
“The outputs may be at risk of alleged copyright infringement if they are found to be substantially similar to copyrighted works”
4. Copyright
Using copyrighted content as training data for commercial GenAI requires authorization from the rights holders unless an exception or limitation applies. Obtaining authorization for all protected content is difficult and unlikely, especially when web scraping is used. Cases such as iQIYI v. MiniMax have been filed in China alleging that copyrighted works have been used to train AI models, which constitutes copyright infringement. By deciding these cases, the Chinese courts are expected to bring some clarity to AI-related copyright issues.
The outputs may be at risk of alleged copyright infringement if they are found to be substantially similar to copyrighted works. In a case decided by the Guangzhou Internet Court in 2024, an AI-generated Ultraman was found to infringe copyright due to its substantial similarity to the copyrighted Ultraman. The GenAI service provider was thus held liable to the owner of the copyrighted work.
In the EU, in addition to users, developers and providers of GenAI models may also be held liable for infringing output. The development of filtering tools can help GenAI providers avoid generating such outputs. However, at least in the AI Act, the regulatory focus remains on the training phase.
Article 53(1) of the AI Act contains copyright protection measures, which only apply to providers of GPAI models. They must make publicly available a “sufficiently detailed summary about the content used for training” of the GPAI model in accordance with a template provided by the AI office, Article 53(1)(d) AI Act. Since the training of GenAI models often involves large amounts of data scraped from the internet, listing every work individually may be difficult or even impossible. Recital 107 of the AI Act hints that such an individual list is not necessary. Instead, the summary should be “generally comprehensive in its scope instead of technically detailed […], for example by listing the main data collections or sets that went into training the model, such as large private or public databases or data archives, and by providing a narrative explanation about other data sources used”.
This transparency requirement is necessary to track the use of copyrighted content. As a result, copyright holders can exercise their opt-out right from the text and data mining exception set forth in Article 4(3) of the Directive on Copyright in the Digital Single Market (the “DSM Directive”). “Text and data mining” encompasses automated techniques aimed at analyzing text and data to generate information, Article 2(2) DSM Directive. This broad definition covers many of the training activities related to GenAI models and systems.
Article 4 of the DSM Directive provides an exception for commercial text and data mining for “reproductions and extractions of lawfully accessible works”. “Lawful access” covers access to “content based on an open access policy or through contractual arrangements” and to “content that is freely available online”, Recital 14 DSM Directive. Such reproductions and extractions can be retained for “as long as is necessary for the purposes of text and data mining”, Article 4(2) DSM Directive. It is not clear whether such content must be deleted once the training phase is completed, or whether it may be retained longer.
“As soon as the EU AI office publishes these codes, they will be an important guideline”
Copyright holders may expressly opt-out of this exemption “in an appropriate manner, such as machine-readable means in the case of content made publicly available online”, Article 4(3) DSM Directive. The format, scope, and timing of the opt-out are still unclear and there is no standardized method. Article 53(1)(c) of the AI Act explicitly refers to this exception and compels GenAI model providers to put in place a compliance system to identify and comply with such opt-outs through state-of-the-art technologies. Here, GenAI model providers may rely on codes of practice as defined in Article 56 to demonstrate compliance, Article 53(4) AI Act. As soon as the EU AI office publishes these codes, they will be an important guideline.
Conclusion
Globally, there are still many unresolved questions and issues concerning GenAI. Both China and the EU have developed newer regulations that more specifically address GenAI and interact with existing regulations. Companies seeking global expansion in China and the EU must continuously pay close attention to legal developments in both regions.
此文章系《中国法律商务》(China Law & Practice, CLP)首发。
(https://www.chinalawandpractice.com/)
若您有任何相关疑问,可联系礼丰律师事务所进一步垂询:
inquiry@lifenglaw.com