Psychic Telephone · 62
Proximities
Even in their earliest forms, photographs arrived with language attached: captions, annotations, titles, and inscriptions. Words have always hovered near images, quietly shaping how they’re understood. The idea that a photograph can stand entirely on its own, or speak for itself, has always been more fantasy than fact.
Conceptual artists have been making the dependency of photographs on language painfully obvious for decades. John Baldessari’s photo-text works immediately come to mind: A photograph and a declarative sentence, paired together, shape your understanding of what’s within the image. In another way, Sophie Calle textually frames her photographic works through diary entries, lists, confessions, and surveillance-like narration. The pictures don’t stand alone; they’re embedded within the story.
As much as photographic images are tethered to language, so too are generative AI images. Not language as explanation or grounding, but language as baked into the creation of the image itself.
Craig Owens describes something adjacent to this when he writes about allegory: the idea that images accrue meaning not through purity or self-containment, but by way of citation, framing, and the layering of codes. When you add text to a photograph, you don’t “clarify” it; you force it into relation with another discourse, which creates a surplus of meaning the image can’t contain on its own.
As I blend photographic and generative image elements into the work I make for Psychic Telephone, speaking to the subtle shifts in how language relates to each seems fitting. In this series, psychic experiences are shared and those accounts become transcribed interviews. As I receive those documents, I scan them for sentences or sentence fragments that I think would be good prompts for the text-to-image generator I use. I break narrative structure in favor of the shortest string of words that I believe will play well with the generator.

There are syntactical differences between, say, an image caption, which sits alongside a photograph, and the cherry-picking I do with the transcripts. When a person writes a caption, they’re making an interpretive gesture. The text and image exist in relation to each other but remain separate. You can look at the photograph without reading the caption, even if the caption changes how you see it.
But when I feed a short string of words from Marin’s transcripts into a text-to-image generator, something fundamentally different happens. The text doesn’t sit alongside the resulting image; it disappears into it. The model converts both words and images into the same mathematical form: vectors in what’s called latent space, an abstract environment where concepts cluster based on learned relationships. “Two dogs running” and an actual image of two dogs running occupy nearby coordinates in this space. But there are other proximities as well. For example: King is to queen as husband is to wife. Those word sets have a relationality to them and would theoretically cluster together.
When the AI “reads” my prompt, it’s not interpreting meaning the way a human would. It’s converting words into mathematical coordinates and matching them against visual patterns from its training data. There’s no consideration of narrative importance, emotional resonance, or what matters in context. Not to mention, if there are gaps in the dataset about particular image-text relationships, the model will fail to accurately visualize those prompts.
The sentences I chose from the transcripts were never meant to function as image prompts—they’re fragments of conversation. In most discussions of generative AI, prompts are treated as technical tools: carefully optimized strings of language designed to control output. But the prompts I use aren’t optimized for precision. They’re intentionally imperfect, specific to the person speaking, whether that be Marin herself or the person she is speaking with. Each person, bringing their own way of seeing and understanding the world, has their own way with words.
Even though my role in this collaboration has been primarily visual in nature, it needs to be underlined how many different ways language and image relate and how in my use of AI, photography influences how the storytelling unfolds, and ultimately how entangled the psychic experiences, Marin’s writing, and my interpretation of those elements are.


