Media Watch ｜｜Sora was born, how should we ＂check＂？ – 上海后花园论坛-上海419论坛-夜上海论坛

Editor’s note:The appearance of Sora "World Simulator" marks another revolutionary reconstruction of information environment. Sora, based on Transformer and Diffusion model, has established a wide range of interaction between human/society and intelligent generation model in a real sense. The "Wensheng Video" generated by the whole people far exceeds deep forgery in terms of generation magnitude and detection difficulty, and shows its potential destructive effect on the real media as an "iterative universal version" of deep forgery. Zhang Menghan, an associate professor, and Chen Ze, a doctoral student, wrote in the 4th issue of Media Watch in 2024 that Sora, which takes "100 million parameters" as the data learning unit, has brought the alienation of the gatekeeper relationship. The gatekeeper activities carried out by the media, platforms and the public in accordance with the existing gatekeeper relationship will not only help the public to reduce external uncertainty and ensure the order of information, but will aggravate the disorder of information systems. The change of technical environment calls for the timely debugging and promotion of the theory of gatekeeper. In this context, the check should first be understood as the process of helping the public to establish meaningful contact with the real society with news truth. Whether the gatekeeper theory can get rid of the metaphor of "action" and return to the ontological meaning of social function is directly related to its theoretical and practical significance in the era of generative artificial intelligence.

Under the background of the emergence of artificial intelligence video with Sora as the "starting point", which leads to the loss of our perception of the reliability of the outside world, how to promote the matching between the gate-keeping theory and the peak of technological development? Perhaps we need to start from the technology itself and take "reverse engineering" as the cutting-in way, so as to launch an anatomical comprehensive observation. According to Hettne’s point of view, the more complicated a problem is, the more we need to return to its underlying mechanism and the space where "events" occur, so that we can effectively explain and respond to reality. Specifically, the gatekeeper theory-in the newspaper era, it is to observe the daily work of Mr. Gates in the newsroom, and in the digital era, it is to analyze the consequences of the decentralization of gatekeeper power and dismantle the black box of algorithm with theoretical tools and technology. The complex technical schema of artificial intelligence requires us to return to the "device" that most intuitively records the characteristics, modes and operating mechanisms of related artificial intelligence technologies, that is, to take the specific technical structure as the perspective.

Because the code segment of OpenAI is not open source, this paper focuses on the technical architecture features of Sora, and takes the open source generative AI code "DF Software For All" which is based on the same model (Transformer and Differentiation) as an exploratory supplement. This paper mainly explores why Sora’s technical framework subverts the existing gatekeeper mechanism and the structural impact of the change of gatekeeper relationship on the social system, and tries to explore the possibility of advancing the gatekeeper theory in Sora era on this basis.

Beyond Deep Forgery: Risk of Failure of Gatekeeper under the Reorganization of Human/Machine Model Relationship

When we make a technical comparison between deep forgery and Sora, we can more clearly and intuitively understand the overall impact of Sora on the existing customs control system.

First of all, deep forgery technology does not have the actual "intelligence", that is, it does not form an effective connection between the public and the machine model. According to Burkell, "deep forgery is a tool of experts in a few fields, but it has no use for the public", thus "it is far from the original intention pursued by artificial intelligence". Under the technical threshold, the number of deeply forged videos in circulation is relatively limited. According to the statistics of ContentDetector.AI, between 2017 and 2023, the total number of deeply forged videos on the global Internet is about 500,000. Among them, 96% are "pornographic forged" content, and the rest are related to political, economic, entertainment and other topics. The proportion of deeply forged videos with the possibility of "causing social unrest" is only 4%. It is these more than 20,000 deeply forged videos that have made human society step into the "false information age" and opened the prelude to the "end of the gatekeeper role" of journalism.

Different from deep forgery, OpenAI "bridges" through a series of technologies, making Sora the first social technology in the history of artificial intelligence that allows everyone to connect with the video generation model. In a real sense, the artificial intelligence video generation model completes the transformation from "domain technology" to "public product"-users don’t need to know how to transmit the coded information matrix output by Encoder to Decoder, and they don’t need to use any complicated CGI(Computer Generated Images) tools. Anyone can get a video that is enough to confuse the fake with the real by uploading a text narrative. If deep forgery challenges the traditional way of checking the customs based on the naked eye, then the emergence of Sora leads us into an era of "mass production" of "the catastrophe of false information", as Steven Levy said. We may imagine the following situation: When there are countless AI news videos such as "Biden dies, Harris is acting president" and "National motor vehicles are restricted" in one day, the chaos in the information field will also lead to the disorder of the social system. Under the interaction between numerous user nodes and Sora, the total amount of "Wensheng Video" generated in one day may exceed the sum of deeply forged videos in the past five years. This is bound to have a subversive impact on the existing customs control system, customs control methods and customs control strategies, and lead to disorder of the information environment.

Secondly, in view of the technical characteristics of deep forgery, there are many features in the generated video that can be identified and detected. For example, edge artifacts of image frames, fidelity changes during facial deformation, frame sequence residuals, etc. Sora follows the learning process of "accumulation-understanding-output", rather than the "synthesis" process of deeply forging "A+B", and there is no technical feature brought by "image fusion". Therefore, although Sora still has some problems such as the inability to accurately simulate the physical state of interaction, the existing technologies such as Reality Defender and The Fact are also difficult to accurately identify Sora videos. In this context, the "human +AI" collaborative mode proposed by Felix Simon has become an idealized idea. When neither manual check nor "machine check" can complete the identification of Sora video, the significance of check itself will dissipate.

Although Sora is still a "beta version" only used by artists, film producers and other groups, the last version history of AI-generated video, namely deep forgery, has shown us its comprehensive impact on the social information exchange order. According to Meckel’s view, the deeply forged carriage is carrying human beings away from the stable real world. However, Sora based on Transformer and Diffusion model is far superior to deep forgery in terms of generation order, detection difficulty and potential destructiveness. If deep forgery is the beginning of the separation of human beings from the stable real world, then Sora’s subversion of the existing control mechanism will undoubtedly lead us into a Dystopia dominated by gamism.

The more checks, the more chaos? Reconstruction of "Truth Appearing Mechanism" under the Condition of Alienation

The standard of action in the "field" of news control consists of two basic logics. The first is the control of "possibility set" from the perspective of cybernetics, which determines which information can be selected and flowed out of the "pipeline". The second is the core requirement of journalistic professional ethics for news gatekeepers, that is, the issue of authenticity. It can be seen that there are "two barriers" in news control, namely, content barrier and fact barrier. News gatekeepers need to select "news that can be released" from a series of "information materials" under the joint influence of importance, publicity, profit objectives, etc., so as to complete the content level; It is also necessary to ensure the authenticity of the selected information through "fact check".

Through the analysis of the second part of "DF Software For All" code, it can be found that in the training process of neural network, the number of images on each side has reached 500-5000. Such an intensive amount of training "is enough to make it impossible for human beings to judge the authenticity through the naked eye." Compared with "DF Software For All", Sora has achieved a qualitative leap in both the training amount and the learning database size. At present, Sora has begun to use videos from Shutterstock database and public network as training materials, and the training amount of its generated videos has to be calculated in units of "100 million parameters", far exceeding the training amount of any known model. This means that Sora will have a stronger ability to "confuse the real with the fake". A large number of AI-generated videos can easily bypass the "fact barrier" of traditional gatekeepers based on naked eye observation. In the past, the fact-checking mode based on "manual" is no longer able to cope with such a complicated technical situation.

The failure of the "fact gateway" will lead to a huge impact on the "content gateway". The original operation mechanism of "content gateway" is: on the basis of ensuring the authenticity of news, the gatekeeper shakes a "news screen" woven by "vines" such as market, political system and audience, so that some information with large granularity can flow into the information system, thus "countless event information is converted into controllable media information". Generally speaking, the size of information particles is determined by the degree of fit with the news value judgment system. In other words, if a piece of information is more important, novel and interesting, it is more likely to be processed into news by the gatekeeper and spread quickly in the information system.

However, for the video news generated by Sora, because it can be produced in an imaginative way out of reality, everyone can adjust the inherent properties of the generated video at will by editing the text content, and create a large-grained news that is difficult to distinguish between true and false. If you want the importance of "news" to be more prominent, you can set the video subject to the White House or the Russian-Ukrainian battlefield. If you want "news" to be more exciting, you can take conflict and violence as the keynote of video narration. The granularity of information has changed from the objective attribute of information to the control panel that can be dragged at will. The "content check" by the media, the public and the platform based on the granularity of information will not only accelerate the entropy increase of the information system, but also bring about the alienation of the existing check relationship.

The platform and the public have respectively formed a profit-oriented algorithm and a self-determination logic dominated by personal interests. Driven by the pressure of survival and traffic competition, the media’s gatekeeper activities are gradually dominated by the gatekeeper logic of the public and the platform. It can be said that the appearance of the internet has completely changed the traditional relationship between customs and gates. However, when the gatekeeper relationship reconstructed by the Internet collides head-on with the "Wensheng Video" represented by Sora, a new crisis of the information system arises. The gatekeeper activities carried out according to the existing gatekeeper relationship will not only fail to exert its "living functional order", that is, help the public reduce external uncertainty and play the role of ordering information, but will aggravate the disorder of the information system.

According to Masip and others, the public always prefers those "novel, exciting and interesting" contents to "events in daily life". The simulation and creativity of Sora make the information content generated by SORA have almost unlimited space in satisfying the public’s excitement and interest. The sharing rate, dissemination rate, diffusion range and interactive attraction of this kind of information "usually far exceed an ordinary and important news". Therefore, the public’s personal interest-oriented control is likely to become the diffuser of AI-generated video. Public praise, forwarding or discussion will promote the rapid spread of AI-generated videos in information systems and expand the coverage of potential negative impacts.

At the same time, the platform’s gatekeeper mechanism is also hijacked by the information characteristics of AI-generated videos and the public’s information preferences.

On the whole, the check-up mechanism of the platform is mainly composed of three parts. First, the platform itself checks after negative information or false information enters the public view. Second, it is the public voting mechanism. Thirdly, it is an intelligent distribution mechanism based on algorithms. In addition, the platform algorithm and the public’s gatekeeper power are actually "deprived" from traditional media. If the public and the platform get the power to check but can’t play the social function of checking, the pressure on the media will also increase. For news media, platforms and the public have become important channels for their news sources. Although the professional journalist team and editorial department can still be responsible for the first-hand news articles, it is difficult for the news media to complete the technical verification of "true and false discrimination" in a short time when the platform and public channels are flooded with a large number of AI-generated videos. At the same time, the news industry, which is "normally connected" with the platform, still needs to obey the timeliness needs of the public. Under the pressure of polluted information sources, difficult and time-consuming technical review of AI-generated videos, and timeliness competition among media, journalism can easily become the endorser of the "authenticity" of AI-generated videos. Once the credibility of journalism as the "final fact benchmark" collapses, it will lead to the risk of disorder of the whole information system and make us enter an era of pan-falsificationism. The so-called pan-falsificationism means that when individuals can’t distinguish the boundary between external reality and fictional things at all, they turn to face the media environment with a game mentality that can be grasped by individuals, and treat all information with the information concept of "not sure what is true is false".In this context, the social significance of news as a social tool has also dissipated, and it has become a "public toy".

Matching the peak of technology: the possibility of structural reconstruction of gatekeeper theory in the era of generative AI

The "intelligent emergence" of generative AI, represented by Sora, is promoting the structural change of the gate-keeping theory. In this context, the possible direction of the gatekeeper theory in the era of generative AI includes the following aspects.

First of all, it is about the adjustment of the focus of the theoretical connotation of gatekeeper. It can be said that the emergence of Sora subverts the self-evident premise assumption of the gatekeeper theory, that is, the question of "authenticity". In the pre-Internet era, the "authenticity" in the process of news media’s checking is an inevitable requirement of news, and it is also a basic premise that needs no elaboration. Thus, we saw the classic statement of shoemaker and Vos on the theory of gatekeeper: "Gatekeeper is the process of selecting and making limited information that people accept every day from countless pieces of information." Even after entering the Internet era, the truth of news has not been fundamentally challenged, and the focus of theoretical promotion is mostly placed on the new gatekeeper by researchers. For example, Wallace defines the theory of "the media, the platform and the public intercept the information they are interested in from the huge information flow according to their different orientations".

It can be seen that both shoemaker and Wallace have a default assumption when summarizing the theoretical boundaries, that is, "countless pieces of information" or "huge information flow" are all informational presentations of real-world movements, and the main tasks of the check are "interception" and "production". However, the "world simulator" represented by Sora is destroying the premise of this hypothesis. According to the prediction of Gartner Group, an American data analysis company, in the next three years, the information output by generative AI will account for 30% of the total information space. In the foreseeable future, "countless pieces of information" will be mixed with a large number of generated videos that are difficult to distinguish between authenticity and falsehood. In this context, the self-evident "authenticity" needs special attention in theory. Gatekeeper should first be understood as a process of identifying false information and helping the public to establish meaningful contact with the real society with the truth of news.

Secondly, it is to re-explore the relationship between "people". Nowadays, the gatekeeper relationship as a whole appears as two sides of a boundary-one side is the real world where the platform, the public and the media are located, and the other side is the iterative generative AI. Whether the border can be effectively checked determines the wandering situation of this border. The more the border moves in the direction of platform, media and the public, the more difficult it is for us to perceive the real world through the media. Whether it is the media, the public or the platform, it will face a crisis when the ontological security cannot be guaranteed. In this context, the gatekeeper relationship has the possibility that the media caters to the public interest and platform traffic logic and moves towards a cooperative relationship.

Finally, it is a return to the ontological meaning and social meaning of gatekeeper. The impact of Sora’s national generation and simulation characteristics on the information environment requires us to rethink the role and function of gatekeeper in the social system and focus on its ontological significance as a social function. From words (ChatGPT) to pictures (DALLE) to videos (Sora), as the "head sheep" in the field of generative AI, OpenAI has pieced together a media puzzle that spans the boundary between virtual and reality in just two years. The AI-generated content that simulates the physical world mediates the original solid connection between us and the real world. From a historical perspective, we have never had such a need to control the social function. In the era of generative AI, gatekeeper is no longer just a metaphor about gatekeeper, information receiver and channel, but the essential relationship between human and social truth, information system and social ecological order. In the face of the upcoming AI information tide, the "success" and "failure" of the gatekeeper is directly related to whether the public can still know and understand the real world through the media and act accordingly. How to reconstruct the "truth appearance mechanism" in the era of generative AI will become an important theoretical and practical problem in future research.

(Media Watch, No.4, 2024, with an original text of about 11,000 words, is entitled: How to reconstruct the "truth-appearing mechanism" under the impact of "the generation of the whole people"-the relational practice and structural change of the gatekeeper theory in the era of generative artificial intelligence. This is an excerpt, and the notes are omitted. Please refer to the original text for academic citations. The full text of "Media Watch Magazine" is linked to https://mp.weixin.qq.com/s/HR1nKt5K3dxMCLxCnGS2nA. )

[Author] Zhang Menghan, Associate Professor, School of Communication, Soochow University

Chen Ze, Ph.D. candidate, School of Communication, Soochow University.