AI Self-Portrait

Professor Simsimon
2024
Simon Penny

A picnic in the uncanny valley

How I learned to stop worrying and love AI.
A report of the creation of a dynamic AI self-portrait.



Introduction

(Excerpt from Paper)
As the new AI tools made deepfakes and automatically generated texts increasingly easy, and the press was full of utopian and distopian narratives, and venture capitalists were throwing billions at AI startups in the ‘AI goldrush’, and as vast amount of trite ‘AI-art’ was filling the internet, full of garish and ghastly fantasy scenes strongly reminiscent of the entire history of adolescent fantasy imagery since the mid C20th, I wondered what kind of critical art project might plumb the dimensions of the new AI and the rhetoric swirling around it.

I thought an appropriately double edged project would be to try to make a simulation of the academic me. To see just how close I could get to making myself redundant. This is, in a sense, ‘doing the devil’s work’ – as one colleague observed, if successful, the university might be very interested, but I might make myself redundant. So I have a vested interest in this project failing. I gathered a small group of collaborators and we began work in May 2024.

"The problem of representation - truthiness."

(Excerpt from Paper)
The project AI self-portrait, more colloquially Professor Simsimon is a specialised self portrait of ‘professorial' me. We began with this goal - to create (video) output that looks like me, sounds like me and says things I might say, but haven’t. The goal was - using LLMs, voice cloning and deepfake video - to build an AI replica of the ‘academic’ me, delivering papers I might deliver. The LLM (GPT4.) was trained on all my academic writing of the last 20 years, and prompted to write academic papers, and texts in the mode of spoken presentations of such papers. Voice-cloning (Elevenlabs) was used to generate a facsimile of my speaking voice, and deepfake video (Hedra) was used to generate video imagery of me (talking head) voicing the voice clone. The LLM was used to generate slides for the talk and the whole thing was composited in simple video post-production.

"Technical challenges"

(Excerpt from Paper)
While the technology is complex, it is now relatively easy to produce a moving video image of my face speaking, with of-the-shelf tools. If you know me well, after a few second you think - ok, something’s oY. Likewise, the voice-clone. But it you did not know me, you might interpret that oddness as media glitches - as we so often have learned to do, with bad cell- phone connections and so on. But expecting the LLM to ‘have ideas like mine’, and express them in the way I would, transpired to be far more diYicult, if not impossible. This is in part due to a major technical limitation of ML LLM. GPT can discuss concepts I might talk discuss, using language I might use, yet there is an emptiness of content. What does it mean to have purchase on an idea? What is incisive reasoning? Whatever it is, we know what it is and we know that LLM can’t do it.

Simon Penny © 2024 | contact: simonpenny@uci.edu