Artwork

Konten disediakan oleh Rob. Semua konten podcast termasuk episode, grafik, dan deskripsi podcast diunggah dan disediakan langsung oleh Rob atau mitra platform podcast mereka. Jika Anda yakin seseorang menggunakan karya berhak cipta Anda tanpa izin, Anda dapat mengikuti proses yang diuraikan di sini https://id.player.fm/legal.
Player FM - Aplikasi Podcast
Offline dengan aplikasi Player FM !

StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation

28:41
 
Bagikan
 

Seri yang sudah diarsipkan ("Feed tidak aktif" status)

When? This feed was archived on August 11, 2025 06:07 (4M ago). Last successful fetch was on November 01, 2024 13:33 (1y ago)

Why? Feed tidak aktif status. Server kami tidak mendapatkan feed podcast yang valid secara terus-menerus.

What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.

Manage episode 442787166 series 2954468
Konten disediakan oleh Rob. Semua konten podcast termasuk episode, grafik, dan deskripsi podcast diunggah dan disediakan langsung oleh Rob atau mitra platform podcast mereka. Jika Anda yakin seseorang menggunakan karya berhak cipta Anda tanpa izin, Anda dapat mengikuti proses yang diuraikan di sini https://id.player.fm/legal.
Tuning-free personalized image generation methods have achieved significant success in maintaining facial consistency, i.e., identities, even with multiple characters. However, the lack of holistic consistency in scenes with multiple characters hampers these methods' ability to create a cohesive narrative. In this paper, we introduce StoryMaker, a personalization solution that preserves not only facial consistency but also clothing, hairstyles, and body consistency, thus facilitating the creation of a story through a series of images. StoryMaker incorporates conditions based on face identities and cropped character images, which include clothing, hairstyles, and bodies. Specifically, we integrate the facial identity information with the cropped character images using the Positional-aware Perceiver Resampler (PPR) to obtain distinct character features. To prevent intermingling of multiple characters and the background, we separately constrain the cross-attention impact regions of different characters and the background using MSE loss with segmentation masks. Additionally, we train the generation network conditioned on poses to promote decoupling from poses. A LoRA is also employed to enhance fidelity and quality. Experiments underscore the effectiveness of our approach. StoryMaker supports numerous applications and is compatible with other societal plug-ins. Our source codes and model weights are available at https://github.com/RedAIGC/StoryMaker.
2024: Zhengguang Zhou, Jing Li, Huaxia Li, Nemo Chen, Xu Tang
https://arxiv.org/pdf/2409.12576
  continue reading

298 episode

Artwork
iconBagikan
 

Seri yang sudah diarsipkan ("Feed tidak aktif" status)

When? This feed was archived on August 11, 2025 06:07 (4M ago). Last successful fetch was on November 01, 2024 13:33 (1y ago)

Why? Feed tidak aktif status. Server kami tidak mendapatkan feed podcast yang valid secara terus-menerus.

What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.

Manage episode 442787166 series 2954468
Konten disediakan oleh Rob. Semua konten podcast termasuk episode, grafik, dan deskripsi podcast diunggah dan disediakan langsung oleh Rob atau mitra platform podcast mereka. Jika Anda yakin seseorang menggunakan karya berhak cipta Anda tanpa izin, Anda dapat mengikuti proses yang diuraikan di sini https://id.player.fm/legal.
Tuning-free personalized image generation methods have achieved significant success in maintaining facial consistency, i.e., identities, even with multiple characters. However, the lack of holistic consistency in scenes with multiple characters hampers these methods' ability to create a cohesive narrative. In this paper, we introduce StoryMaker, a personalization solution that preserves not only facial consistency but also clothing, hairstyles, and body consistency, thus facilitating the creation of a story through a series of images. StoryMaker incorporates conditions based on face identities and cropped character images, which include clothing, hairstyles, and bodies. Specifically, we integrate the facial identity information with the cropped character images using the Positional-aware Perceiver Resampler (PPR) to obtain distinct character features. To prevent intermingling of multiple characters and the background, we separately constrain the cross-attention impact regions of different characters and the background using MSE loss with segmentation masks. Additionally, we train the generation network conditioned on poses to promote decoupling from poses. A LoRA is also employed to enhance fidelity and quality. Experiments underscore the effectiveness of our approach. StoryMaker supports numerous applications and is compatible with other societal plug-ins. Our source codes and model weights are available at https://github.com/RedAIGC/StoryMaker.
2024: Zhengguang Zhou, Jing Li, Huaxia Li, Nemo Chen, Xu Tang
https://arxiv.org/pdf/2409.12576
  continue reading

298 episode

Semua episode

×
 
Loading …

Selamat datang di Player FM!

Player FM memindai web untuk mencari podcast berkualitas tinggi untuk Anda nikmati saat ini. Ini adalah aplikasi podcast terbaik dan bekerja untuk Android, iPhone, dan web. Daftar untuk menyinkronkan langganan di seluruh perangkat.

 

Panduan Referensi Cepat

Dengarkan acara ini sambil menjelajah
Putar