LIVIVO - Search results -

Search results

Result 1 - 5 of total 5

Book ; Online: FPANet

Oh, Gyeongrok / Gu, Heon / Kim, Jinkyu / Kim, Sangpil

Frequency-based Video Demoireing using Frame-level Post Alignment

2023

Abstract: Interference between overlapping gird patterns creates moire patterns, degrading the visual quality of an image that captures a screen of a digital display device by an ordinary digital camera. Removing such moire patterns is challenging due to their ... ...

Abstract	Interference between overlapping gird patterns creates moire patterns, degrading the visual quality of an image that captures a screen of a digital display device by an ordinary digital camera. Removing such moire patterns is challenging due to their complex patterns of diverse sizes and color distortions. Existing approaches mainly focus on filtering out in the spatial domain, failing to remove a large-scale moire pattern. In this paper, we propose a novel model called FPANet that learns filters in both frequency and spatial domains, improving the restoration quality by removing various sizes of moire patterns. To further enhance, our model takes multiple consecutive frames, learning to extract frame-invariant content features and outputting better quality temporally consistent images. We demonstrate the effectiveness of our proposed method with a publicly available large-scale dataset, observing that ours outperforms the state-of-the-art approaches, including ESDNet, VDmoire, MBCNN, WDNet, UNet, and DMCNN, in terms of the image and video quality metrics, such as PSNR, SSIM, LPIPS, FVD, and FSIM.
Keywords	Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Artificial Intelligence
Subject code	006
Publishing date	2023-01-18
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Article ; Online: Robust sound-guided image manipulation.

Lee, Seung Hyun / Chi, Hyung-Gun / Oh, Gyeongrok / Byeon, Wonmin / Yoon, Sang Ho / Park, Hyunje / Cho, Wonjun / Kim, Jinkyu / Kim, Sangpil

Neural networks : the official journal of the International Neural Network Society

2024 Volume 175, Page(s) 106271

Abstract: Recent successes suggest that an image can be manipulated by a text prompt, e.g., a landscape scene on a sunny day is manipulated into the same scene on a rainy day driven by a text input "raining". These approaches often utilize a StyleCLIP-based image ... ...

Abstract	Recent successes suggest that an image can be manipulated by a text prompt, e.g., a landscape scene on a sunny day is manipulated into the same scene on a rainy day driven by a text input "raining". These approaches often utilize a StyleCLIP-based image generator, which leverages multi-modal (text and image) embedding space. However, we observe that such text inputs are often bottlenecked in providing and synthesizing rich semantic cues, e.g., differentiating heavy rain from rain with thunderstorms. To address this issue, we advocate leveraging an additional modality, sound, which has notable advantages in image manipulation as it can convey more diverse semantic cues (vivid emotions or dynamic expressions of the natural world) than texts. In this paper, we propose a novel approach that first extends the image-text joint embedding space with sound and applies a direct latent optimization method to manipulate a given image based on audio input, e.g., the sound of rain. Our extensive experiments show that our sound-guided image manipulation approach produces semantically and visually more plausible manipulation results than the state-of-the-art text and sound-guided image manipulation methods, which are further confirmed by our human evaluations. Our downstream task evaluations also show that our learned image-text-sound joint embedding space effectively encodes sound inputs. Examples are provided in our project page: https://kuai-lab.github.io/robust-demo/.
MeSH term(s)	Humans ; Sound ; Semantics ; Cues ; Neural Networks, Computer
Language	English
Publishing date	2024-03-27
Publishing country	United States
Document type	Journal Article
ZDB-ID	740542-x
ISSN	1879-2782 ; 0893-6080
ISSN (online)	1879-2782
ISSN	0893-6080
DOI	10.1016/j.neunet.2024.106271
Database	MEDical Literature Analysis and Retrieval System OnLINE

In stock of ZB MED Cologne/Königswinter

Zs.A 2365: Show issues

Location:
Je nach Verfügbarkeit (siehe Angabe bei Bestand)
bis Jg. 1994: Bestellungen von Artikeln über das Online-Bestellformular
Jg. 1995 - 2021: Lesesall (2.OG)
ab Jg. 2022: Lesesaal (EG)

Order via subito

This service is chargeable due to the Delivery terms set by subito. Orders including an article and supplementary material will be classified as separate orders. In these cases, fees will be demanded for each order.

Details ▾
- See ZB MED holdings
- Order with fees

Book ; Online: Robust Sound-Guided Image Manipulation

Lee, Seung Hyun / Oh, Gyeongrok / Byeon, Wonmin / Yoon, Sang Ho / Kim, Jinkyu / Kim, Sangpil

2022

Abstract	Recent successes suggest that an image can be manipulated by a text prompt, e.g., a landscape scene on a sunny day is manipulated into the same scene on a rainy day driven by a text input "raining". These approaches often utilize a StyleCLIP-based image generator, which leverages multi-modal (text and image) embedding space. However, we observe that such text inputs are often bottlenecked in providing and synthesizing rich semantic cues, e.g., differentiating heavy rain from rain with thunderstorms. To address this issue, we advocate leveraging an additional modality, sound, which has notable advantages in image manipulation as it can convey more diverse semantic cues (vivid emotions or dynamic expressions of the natural world) than texts. In this paper, we propose a novel approach that first extends the image-text joint embedding space with sound and applies a direct latent optimization method to manipulate a given image based on audio input, e.g., the sound of rain. Our extensive experiments show that our sound-guided image manipulation approach produces semantically and visually more plausible manipulation results than the state-of-the-art text and sound-guided image manipulation methods, which are further confirmed by our human evaluations. Our downstream task evaluations also show that our learned image-text-sound joint embedding space effectively encodes sound inputs. Comment: arXiv admin note: text overlap with arXiv:2112.00007
Keywords	Computer Science - Computer Vision and Pattern Recognition
Subject code	004 ; 410
Publishing date	2022-08-30
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: MTVG

Oh, Gyeongrok / Jeong, Jaehwan / Kim, Sieun / Byeon, Wonmin / Kim, Jinkyu / Kim, Sungwoong / Kwon, Hyeokmin / Kim, Sangpil

Multi-text Video Generation with Text-to-Video Models

2023

Abstract: Recently, video generation has attracted massive attention and yielded noticeable outcomes. Concerning the characteristics of video, multi-text conditioning incorporating sequential events is necessary for next-step video generation. In this work, we ... ...

Abstract	Recently, video generation has attracted massive attention and yielded noticeable outcomes. Concerning the characteristics of video, multi-text conditioning incorporating sequential events is necessary for next-step video generation. In this work, we propose a novel multi-text video generation~(MTVG) by directly utilizing a pre-trained diffusion-based text-to-video~(T2V) generation model without additional fine-tuning. To generate consecutive video segments, visual consistency generated by distinct prompts is necessary with diverse variations, such as motion and content-related transitions. Our proposed MTVG includes Dynamic Noise and Last Frame Aware Inversion which reinitialize the noise latent to preserve visual coherence between videos of different prompts and prevent repetitive motion or contents. Furthermore, we present Structure Guiding Sampling to maintain the global appearance across the frames in a single video clip, where we leverage iterative latent updates across the preceding frame. Additionally, our Prompt Generator allows for arbitrary format of text conditions consisting of diverse events. As a result, our extensive experiments, including diverse transitions of descriptions, demonstrate that our proposed methods show superior generated outputs in terms of semantically coherent and temporally seamless video.Video examples are available in our project page: https://kuai-lab.github.io/mtvg-page.
Keywords	Computer Science - Computer Vision and Pattern Recognition
Subject code	004
Publishing date	2023-12-07
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Sound-Guided Semantic Video Generation

Lee, Seung Hyun / Oh, Gyeongrok / Byeon, Wonmin / Kim, Chanyoung / Ryoo, Won Jeong / Yoon, Sang Ho / Cho, Hyunjun / Bae, Jihyun / Kim, Jinkyu / Kim, Sangpil

2022

Abstract: The recent success in StyleGAN demonstrates that pre-trained StyleGAN latent space is useful for realistic video generation. However, the generated motion in the video is usually not semantically meaningful due to the difficulty of determining the ... ...

Abstract	The recent success in StyleGAN demonstrates that pre-trained StyleGAN latent space is useful for realistic video generation. However, the generated motion in the video is usually not semantically meaningful due to the difficulty of determining the direction and magnitude in the StyleGAN latent space. In this paper, we propose a framework to generate realistic videos by leveraging multimodal (sound-image-text) embedding space. As sound provides the temporal contexts of the scene, our framework learns to generate a video that is semantically consistent with sound. First, our sound inversion module maps the audio directly into the StyleGAN latent space. We then incorporate the CLIP-based multimodal embedding space to further provide the audio-visual relationships. Finally, the proposed frame generator learns to find the trajectory in the latent space which is coherent with the corresponding sound and generates a video in a hierarchical manner. We provide the new high-resolution landscape video dataset (audio-visual pair) for the sound-guided video generation task. The experiments show that our model outperforms the state-of-the-art methods in terms of video quality. We further show several applications including image and video editing to verify the effectiveness of our method.
Keywords	Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Artificial Intelligence
Subject code	004
Publishing date	2022-04-20
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

To top

Search results

Search options

Book ; Online: FPANet

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Article ; Online: Robust sound-guided image manipulation.

More links

Kategorien

In stock of ZB MED Cologne/Königswinter

Order via subito

Book ; Online: Robust Sound-Guided Image Manipulation

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Book ; Online: MTVG

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Book ; Online: Sound-Guided Semantic Video Generation

Full text online

More links

Kategorien

Inter-library loan at ZB MED