Behind the design: Adobe Photoshop Reference Image
Reshaping a generative AI beta feature to match creative intent
With Generative Fill in Photoshop, users could already transform images simply by describing what they wanted to see. Reference Image took that capability one step further: Now, creators can upload an image, which Photoshop can use as inspiration to shape results that better match creative intent.
Staff Experience Designers Avalon Hu and Dana Jefferson, Staff Experience Researcher Roxanne Rashedi, PhD, Senior Experience Researchers, TJ Jones, PhD and Esme Xu, and Senior Experience Developer Evan Shimizu, PhD trace the evolution of this feature from early beta release to its current form, sharing how user insights shaped every design decision along the way.
What was the primary goal when you set out to design Reference Image?
Dana Jefferson: Our goal was to enhance and accelerate creative workflows, deliver new levels of control, and help creatives achieve the outputs they imagined by allowing them to choose the images that supply generative inspiration.
In the early beta we aimed for a simple and intuitive upload interface, but user research revealed a fundamental misunderstanding about how the reference image influenced the output. Our redesign was driven by the need to close the intent gap and introduce technology that better aligned the feature with user expectations.
What user insights did you leverage to help inform the design solution?
Roxanne Rashedi: There were four rounds of research for Reference Image with both professional (C-Pro) and non-professional (Non-Pro) creatives. The first was led by Senior Experience Researcher TJ Jones, the second by Senior Experience Researcher Esme Xu, and I led the final two.
Round 1: Early lessons about understanding and usage
TJ Jones: In our first round of research, we set out to understand how users perceived and interacted with the feature. People were quick to recognize its potential, and many could clearly see how a reference image might help improve the final output. However, while most users understood the basic UI interactions, they wanted more control over the output and felt that clearer, more descriptive interface copy could guide them through the process.
One key area of confusion was the relationship between reference image and prompt. Some people used the prompt field to describe aspects of the reference image, hoping to influence how the reference image shaped the output. It showed that the connection between the image and the prompt field wasn’t intuitive. The insights informed early explorations around interface copy and interaction patterns that would help people better understand that relationship.
Round 2: Better understanding the level of control people were looking for
Esme Xu: Building on what we learned in the first round, we ran a second study to dig deeper into expectations around influence and control, so we could shape the overall experience based on what we learned.
We found that users still needed more clarity on how the prompt and reference image worked together. Many people still weren’t sure what role the prompt was supposed to play or, given the presence of a reference image, how much it should affect the output. People also wanted the ability to control the image’s influence more precisely—to be able to dial it up or down depending on what they were trying to do—and for Photoshop to remove or ignore the background automatically. We began exploring clearer sliders and toggles to control influence.
Round 3: Drag-and-drop compositing
Roxanne: Next, we shifted focus to understanding how users think about combining and influencing visual elements. We tested a drag-and-drop workflow to help place objects within a scene. Still, neither group felt it delivered easy, cohesive results: C-Pros hoped for a one-step solution to harmonize compositions, and Non-Pros struggled with perspective and alignment.
In addition, the term we used for the feature, “generative compositing,” confused Non-Pros and felt overly complex to C-Pros, who viewed generative AI as a support tool rather than one for creating new content. Both groups seemed to prefer simpler, action-oriented terms like “remove background” or “combine images.”
Workflow complexity was another hurdle. Compositing requires users to select a layer, and introducing layers early in the workflow slowed progress, especially for Non-Pros. It was time to rethink both terminology and when to introduce layers.
Round 4: Generative Fill as the starting point
Roxanne: Finally, we focused on streamlining the experience. By using Generative Fill as a starting point, users could select a portion of an image and generate new content based on a prompt. A “preserve character” slider helped clarify that the process involved "stitching" an object into a new image rather than altering the original.
There was progress. The interaction felt more intuitive, but users continued to encounter challenges with precise selections, and the generated outputs often fell short of expectations. For example, when a user carefully outlined a plant holder, the model produced a smaller or overly simplified version, leaving users frustrated by the gap between effort and outcome.
The results prompted collaboration with our research science partners to enhance the model’s sensitivity to user selections so the technology would better reflect user intent and creative vision.
What was the most unique aspect of the design process?
Evan Shimizu: The most unique aspect of the design process was our ability to run four different UI designs and interaction flows very quickly in a Photoshop-like prototype. We built an entirely new web app called “Protoshop” (Prototype + Photoshop) to test interactions, rather than trying to build them in Photoshop on the desktop.
Having a web app meant that we could quickly test and iterate on designs, without requiring users (and our team) to download and install a new version of Photoshop every time it was updated.
It might seem like building an entirely new web app that emulates Photoshop would be more work than implementing a feature in Photoshop desktop, but in the time it would take to implement a single version of the reference-image UI in Photoshop, we built the entire Protoshop web app. After that, we could make changes in days instead of weeks. Sometimes we’d make changes by the hour, as working in a web environment allowed Dana to request changes that I could implement easily. The iteration speed allowed us to experiment with the designs and identify usability issues before moving forward with user research.
What was the biggest design hurdle?
Avalon Hu: One of the biggest challenges of this project has been balancing our technology research team’s assumptions with genuine user needs. Often, when a new technology is developed, researchers have a specific vision for how users will interact with it. Additionally, because they’re so familiar with the system, they often know clever workarounds to carry out tasks. However, in real-world scenarios, we can't dictate how users will engage with our technology—especially when it's a feature within a creative tool.
We must remain open to the diverse and imaginative ways people use it. As a result, our approach has been to adapt the technology to user behavior rather than expecting users to conform to it.
How did the solution improve the in-product experience?
Dana: The original designs played a crucial role in shaping the final experience. For starters, research revealed several gaps where the feature didn’t meet user expectations. The biggest of those was that users typically expected an uploaded reference image to do more than guide the results (they expected the generated result to be an exact duplicate of the reference image). The original UI was adaptable enough that we were able to carry over many of its elements while also addressing gaps.
Avalon: Unlike earlier explorations, Reference Image not only enables users to fill a selection by referencing an image; it also allows them to insert an object or replace part of an image. Previously, users had to accept whatever result the system generated—often with little control. Users can now express their intent in advance and have greater influence over the outcome.
What did you learn from this design process?
Dana: High-quality results are crucial for user satisfaction. I was glad that our team didn’t rush to ship the feature; instead, they continued to iterate and provide feedback on the technology until the output and behavior met users’ expectations.
Avalon: Sometimes, less really is more. In our initial design phase, we included several additional controls to give users more influence over the outcome. However, adding extra steps and options can unintentionally make experiences feel more complex and intimidating. By streamlining the flow, we were able to leverage existing features more effectively and let the new functionality shine through simplicity.
What’s next for Reference Image?
Evan and Roxanne: Reference Image was introduced in public beta to guide image generation using a visual reference. While it offered early insights into how users engage with reference-based workflows, the team evolved the experience based on what we’ve learned. The current version delivers a flexibility that supports most workflows enabled by Reference Image in a familiar interaction model.
Give Reference Image a try on desktop: Open the Creative Cloud app, select “Apps,” select "Beta apps,” then install the Adobe Photoshop beta.