Creating Nouna: A Consistent AI Character Case Study
In artificial intelligence (AI), the emergence of AI characters has intrigued researchers and developers. These AI characters serve various purposes, from enhancing user interactions to acting as brand ambassadors.
A significant problem with AI diffusion models (image generators) today is building consistency. The same prompt each time gives variations in output, making it difficult to control specific characteristics.
As part of Project Hyperrealism, we aim to learn the nuts and bolts of AI diffusion models to create any character with precise characteristics we define in a hyperrealistic form. So we decided to create our AI character named “Nouna” in memory of our cat, who passed away last year, and to advance our skills to create any AI character with precision.
We started by creating a detailed persona model covering various attributes of Nouna’s personality (see the table below). Using the persona, we perfected the visual characteristics and added a dose of imagination to finalize the character model.
Then built her in various scenarios using cutting-edge technologies – Stable Diffusion 1.5, XL, and XL Turbo base models while integrating IPAdapter, ControlNet, Roop, CLIP Interrogator, After Detailer, OpenPose, and more, deployed on the ComfyUI platform.
Nouna stands out as a remarkable, meticulously crafted creation. This case study delves into the intricate process of creating Nouna, discussing the technical aspects, challenges faced, solutions to common issues with AI diffusion models, applications and use cases for AI characters, the significance of achieving consistency in AI character development, and future directions for Nouna.
Project Overview
Objective
To create “Nouna,” a realistic AI human character living in 2072, to achieve hyperrealistic character development capabilities using AI diffusion models to aim for precision.
- Push visual fidelity using state-of-the-art Generative Adversarial Networks (GAN) AI to generate detailed and lifelike hyperrealistic images/video of the character.
- Develop a clear personality and backstory. Form the character’s distinct interests, values, speaking styles, etc., that shape her content and interactions.
- Craft a precise character model and ensure consistency across mediums.
- Establish boundaries for the character’s areas of expertise/knowledge. They should convey credibility and depth in their topic areas.
- Enable complex emotion conveyance. High visual quality allows nuanced facial expressions, micro gestures, and eye contact.
- Develop a cohesive visual style. Define details like her overall look, fashion sense, makeup, hairstyles, etc. Ensure visuals remain aligned across platforms and scenarios.
- Allow for gradual evolution. The character should grow and develop in alignment with her established personality and backstory.
Tools and Technologies Used
- Stable Diffusion 1.5, XL, and XL Turbo
- ControlNet
- ComfyUI and AUTOMATIC1111
- Roop Unleashed
- IPAdapter
- ADetailer
- ReActor
- CLIP Interrogator
- OpenPose
Conceptualization
The story behind Nouna’s creation
The idea behind creating Nouna, the AI character, is to honor the memory of a cat in our family named Nouna, who passed away last year after twelve years of life. Nouna was a special cat cherished by its human parent and family, and its unique personality left a lasting impression.
In creating Nouna as an AI character, the aim was to capture the essence and spirit of the original cat while adding new attributes we envisioned.
But there is a twist: Nouna has been reincarnated in the form of a human, specifically as a 21-year-old girl living in the year 2072. At Future Disruptor, we research and experiment with futures. It was the right thing to do, but it simultaneously added a challenge to envision the happenings in 2072.
Despite now having a human form, Nouna retains certain traits reminiscent of her feline counterpart. She is playful and curious, with a mischievous streak. These traits serve as a nod to the original Nouna and help to keep her memory alive in this new form.
As Nouna navigates her new life as a human in the futuristic world of 2072. Her unique perspective and blend of human and feline traits make her a one-of-a-kind character, cherished by those who knew and loved the original Nouna.
Crafting the persona
Attribute | Details |
---|---|
Name | Nouna |
Age | 21 years old |
Gender | Female |
Occupation | Mood Architect at Brandwick Emotion Studio. |
Education | Bachelor’s degree in Temporal Dynamics and Chronoengineering. |
Residence | Urban city in the futuristic metropolis of Noorlantis. |
Family | Lives with her parents and has no siblings. |
Personality | Innovative, curious, empathetic, tech-savvy, adventurous, destructive, spontaneous, tenacious, optimistic, charismatic, and creative. |
Physical Traits | Genetically modified peach pink hair, skinny, graceful gait, grey eyes that sparkle with curiosity, expressive arched eyebrows, and smooth olive-toned skin. |
Interests | Loves hairstyling, DIY biology, immersive storytelling, futuristic fashion, and sustainable living. |
Hobbies | Designing virtual reality environments, coding, exploring virtual worlds, and space tourism. |
Technology Usage | Highly proficient in using advanced AI assistants, augmented reality glasses, neural implants for enhanced cognition, and holographic displays. |
Virtual Identity | Known as “NeonNouna” in online metaverse communities, where she is respected for her disruptive ideas and creative contributions to virtual worlds. |
Music Choice | Music is important to her, and has eclectic taste. Loves Shuba, Dharia, Elley Duhรฉ, BANKS, M.I.A., Lynn Gunn (PVRIS), Grimes, Zara Larsson, Jaira Burns, Indila, Bishop Briggs, Lorde, and Kate Linn. |
Social Circle | An active member of immersive storytelling forums and maintains friendships in real life and virtual spaces. |
Health & Wellness | Practices mindfulness meditation to balance her digital lifestyle, participates in virtual fitness classes, and ensures regular check-ups for both physical and mental well-being. |
Fashion Style | Futuristic and avant-garde, with a mix of sleek cyberpunk elements, sustainable fashion choices, and biomorphic. |
Goals | Pioneer research into the mind-body connection and develop holistic approaches to mental wellness that combine ancestral healing practices with cutting-edge neuroscience and psychology techniques to promote emotional resilience and psychological well-being. |
Challenges | Balancing her passions with the need for real-world human connection, navigating the rapidly evolving digital landscape, and addressing ethical concerns surrounding AI and gene editing. |
Values | Creativity, innovation, empathy, environmental sustainability, equality, and inclusivity. |
Development Process
Perfecting the intricate features
Perfecting the visual characteristics was the toughest challenge. Each time we generated an image, even with the same prompt and sampling settings, it gave a distinct output.
Nouna has multiple unique facial features, which we had to get right each time. We utilized a lot of learnings from Healthline and Mayo Clinic articles to understand human anatomy and ensure that whatever we create is natural. Here are the features we crafted:
Eyes: We tried to make her eyes resemble a cat yet remain human. We zeroed down on grey eyes despite trying various colors. Getting the eyes right is a major challenge with AI diffusion models as, most times, it will output dark black eyes with negligible details. It can be solved with inpainting and lowering exposure. An interesting observation was that even the dark black eyes retain details while they do not show in the initial form with further editing.
Hair: Nouna lives in a time when gene editing is the norm. We created a unique hair color that shares peach, pink, and blonde hues. This became very difficult to produce through AI as it is not a usual color that AI diffusion models understand today. Another challenge we created for ourselves was Nouna’s passion for trying many different hairstyles, but it is a form of French braid in most cases. If you want to create a consistent hairstyle using prompts only, then go for common hairstyles and hair colors.
Nose: Unlike most AI characters built towards beauty, we wanted to add mischievous characteristics where we realized an arched nose ala gave the most visual attribute. We did not go overboard, just enough.
Eyebrows: Her eyebrows had to be an interesting and distinct feature of her face. We ended up creating a steep arch towards the 35% ends.
Face Contours: Nouna has three prominent skin counters on her face. These were created to form a natural dew skin texture that could adapt to various expressions while maintaining high visual fidelity.
Jawline: We initially wanted to create a diamond jawline, but it did not suit the persona. We eventually ended up crafting a mix of heart and oval jawline. If you want to create a highly visually appealing jawline, use a sharp, structured shape.
Freckles and Blemishes: Prompts lead to intense freckles and blemishes, even at low weights. Polyhyderon_AI perfected it with a LoRA model, but we did not want to chain any LoRA, so we created the mild freckles and blemishes on Adobe Photoshop and ran the result through Roop Unleashed and ReActor.
Body Type: Skinny with narrow shoulders, thin arms and legs, little muscle mass, and protruding collarbones and ribcages.
Ultimately, we perfected our ability to create consistent characteristics with a 94% accuracy across various settings, diffusion models, and scenarios.
Technical Development
We started the development on AUTOMATIC1111 but soon realized ComfyUI was a better fit for the precision we aimed.
As we progressed, our workflow kept growing and changing. Today, it spans over two and a half times on a 4K resolution monitor with all the nodes and dependencies.
It seems overwhelming, and for the most part, it is, but as soon as we form logic, it all makes sense. Below, we have highlighted the various settings and dependencies we used. You can use these or modify them according to your needs to create consistent AI characters.
Results Gallery
Challenges and Solutions
Realism
AI diffusion models lack an inherent understanding of the world and abstract concepts. This results in generated images that often do not make logical sense or fail to depict abstractions accurately.
Additionally, they struggle with getting details like textures, lighting, and shadows to look authentic during close inspection. Other challenges include the amplification of biases in training data, visual artifacts from some generation techniques, and the inability to reason about physics and space constraints.
Solution:
1. Lower the CFG scale and reduce the sampling steps to reduce overprocessing, especially on the XL Turbo model.
2. Use ADetailer to create finer details in the latent noise sent to the KSampler.
3. Add negative prompts:
semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, worst quality, low quality, normal quality, low res, low details, oversaturated, undersaturated, overexposed, underexposed, grayscale, bw, bad photo, bad photography, bad art, no supermodel, perfected eyebrows, perfect teeth, perfect hair, beautiful, smooth skin, airbrushing, edited
Prompt Execution
Executing text prompts accurately is difficult for AI image generators because they lack the full language comprehension and common sense reasoning of humans. Their limited vocabulary and literal interpretation of prompts make it tough to handle nuanced, imaginative, or context-dependent concepts. Ambiguities and vagueness in prompts can lead to completely unintended images. Bias in training data can also result in problematic generations from specific prompts.
Additionally, the Stable Diffusion models have a token limit of 75 and focus on the first part of the prompt. This often tends not to cover all keywords in the prompt. This happens frequently with both positive and negative prompts.
Solution:
Add weights to keywords in both positive and negative prompts. ()
for adding weights and []
reducing weight. For example: (grey eyes:1.3) or [long hair:.9].
Body, Hands, and Fingers Disfigurement
The human anatomy is highly complex and poses unique challenges. The intricate muscles, joints, and skeletal structures are difficult to simulate correctly. In particular, hands can contort into diverse poses and gestures, demanding flexible generations across perspectives. Additionally, fine details like skin textures, fingernails, and knuckles are challenging to render realistically. The lighting and shadows cast across the fingers also need careful handling as the hand moves.
Solution:
1. Add these negative prompts to prevent various forms of body horror and disfigurement:
(deformed iris, deformed pupils, mutated hands and fingers:1.4), (deformed, distorted, disfigured:1.3), poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, disconnected limbs, mutation, mutated, ugly, disgusting, amputation, extra fingers, bad hands, mutant
2. Utilize inpainting techniques to regenerate the specific part of the image.
Fidelity
High-fidelity image generation requires capturing the intricacies of lighting, geometry, physics, and surface materials at scales beyond current AI capabilities. It also demands incorporating real-world imperfections and asymmetries missing from synthetic data.
Solution:
1. Camera and lens specification prompts have a major impact on visual fidelity. For example: shot on iPhone 14, Fujifilm XT3, or Hasselblad helios 44-2 58mm f2 I79/150.
2. Add additional prompts like these can also do the tricks:
shallow depth of field, DSLR, soft lighting, high quality, film grain, 8K clarity, best quality
Human Behavior and Expressions
The intricacy, subjectivity, and ephemeral nature of human feelings make visual representation difficult. Human behaviors are also highly dependent on cultural, contextual, and individual factors that AI lacks understanding. This makes realistically depicting interactions between humans or asymmetric, idiosyncratic expressions problematic. Additionally, movement coordination, conveying personality through posture and gesture, modeling differences across ages, and representing cultural norms are challenges.
Solution:
1. Utilize OpenPose for specific poses.
2. Add prompts that depict the culture, country, ethnicity, etc., and emphasize them with prompt weights.
Text and Writing
The multi-modal nature of real-world text, containing immense visual and linguistic diversity, makes photorealistic text generation a difficult problem. Advances in few-shot learning across fonts and languages, programmatic layout engines, and better text-image integration can help overcome these challenges.
Models are not ready to do precise text generation. At the time of writing, even one or two words can give unclear output.
Base Model Resolution
Restricted base model resolution creates limitations in image quality and realism. However, pushing resolutions too high leads to impractical data, computation, and model size requirements.
Stable Diffusion 1.5 works on 512 x 512, and XL and XL Turbo works on 1024 x 1024. Having a larger resolution results in problematic outputs. All models often tried to create multiple instances of the object/person in focus.
Advances in upscaling, model efficiency, and leveraging intermediate deep-layer activations can help increase resolution while maintaining feasibility and realism. However, computational barriers remain a central challenge in scaling the base image resolution for high-fidelity image synthesis.
Solution:
Generate the image in the base model’s recommended resolution variations and upscale using an upscaler like UltimateSDUpscale.
Color Bleed
Color bleed stems from fundamental difficulties in separating and precisely rendering distinct objects and regions. Indistinct edges lead to fuzzy halos and diffusion across boundaries. Shared textures also create gradients that bleed across edges. Properly handling transparency, lighting interactions between materials, object overlaps, and color contrasts are also challenging. This results in colors unintentionally mixing between intended regions.
The core technical issues underlying color bleed relate to imprecise region segmentation, boundary rendering, texture modeling, transparency effects, lighting physics, and color harmony. Advances in matting, disentanglement, geometry-aware synthesis, and modeling of lighting and material interactions will help address these challenges.
Solution:
1. The “BREAK” prompt divides the two color objects. Example: Nouna with (peach pink hair:1.3) BREAK (silver biomorphic neckband).
2. Experiment with different clip steps that help in several scenarios.
Applications and Use Cases
“Nouna” was created to develop our capabilities to build any character with any attributes at will and overcome the most common challenges with the AI diffusion models. With this process now, we can create any character we like and help others develop AI-based characters with high precision.
Here are some of the ways that such characters can help in your business.
Strengthening Brand Identity and Recognition
AI-based visual characters can serve as brand ambassadors, consistently embodying the brandโs ethos, personality, and visual identity across all customer touchpoints. This strengthens brand recognition and builds a deeper emotional connection with customers.
According to a report by Gartner, brands that present themselves consistently are 3.5 times more likely to enjoy excellent brand visibility than those that donโt.
Boosting Customer Engagement and Experience
With the ability to process and interpret human emotions through natural language understanding and emotional intelligence algorithms, AI characters can deliver highly engaging and empathetic customer interactions.
This personal touch can significantly enhance customer satisfaction; a study by PwC found that 59% of customers feel companies have lost touch with the human element of customer experience. AI characters can bridge this gap by providing a personalized, conversational experience.
Personalization at Scale
AI characters can analyze customer data in real-time to offer tailored recommendations, advice, and support, making each customer interaction unique. This level of personalization at scale can lead to increased conversion rates and customer loyalty.
Salesforce research indicates that 52% of consumers expect offers to be personalized, highlighting the importance of personalization in todayโs market.
Enhancing Training and Development
In corporate training, AI-based characters can simulate real-world scenarios for more effective learning and skill acquisition. The immersive nature of these interactions can lead to better retention of information and skills.
A report by IBM suggests that learners retain 75% more material when engaged in interactive learning experiences than traditional lecture-based learning.
Promoting Accessibility and Global Reach
These AI entities can communicate in multiple languages and dialects, breaking down language barriers and making products or services accessible to a global audience. Furthermore, they can be designed to accommodate users with disabilities, ensuring inclusivity.
The World Bank highlights the importance of digital accessibility, noting that inclusive digital tools can help integrate the 1 billion people with disabilities into the economy.
Future Directions for Nouna
We do not have plans to launch Nouna as an AI influencer. Rather, we will focus on advancing our research and capabilities in the aspects below, which will be our core. Nouna will continue to be the brand representative of Future Disruptor.
Videos: Nouna exists in the static form, but we will work to infuse animations and actions through video. Especially utilizing Deform and OpenAI Sora.
Voice: Crafting an apt voice and tone.
Stories, knowledge, and world representation: We have already published our first story under the Future Stories series, where Nouna navigates her life in the year 2072:
Nouna’s Day in 2072: A Glimpse of the Future
We also have a plan to build our next character, “Sanxa,” with even more complexity, utilizing the capabilities of more advanced AI diffusion models in the future.
Conclusion
With the advancements in AI diffusion models, forming consistency is still a challenge, which can be mitigated through the solutions highlighted in this case study. As the models advance, we can expect to fix these issues natively, as shown in the Stable Diffusion 3 release.
Do not hesitate to contact us at [email protected] to discuss the character model you want to create. Feel free to share your suggestions or questions in the comments. We will be happy to share our experience.
Learner-in-Chief at Future Disruptor. A futurist, entrepreneur, and management consultant, who is passionate about learning, researching, experimenting, and building solutions through ideas and technologies that will shape our future.
Subscribe to the Future Disruptor newsletter.
Leave a Reply