Thursday, September 28, 2023

AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

[Submitted on 27 Sep 2023]

Title:AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

Download a PDF of the paper titled AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model, by Seungwhan Moon and 12 other authors

Download PDF
Abstract:We present Any-Modality Augmented Language Model (AnyMAL), a unified model that reasons over diverse input modality signals (i.e. text, image, video, audio, IMU motion sensor), and generates textual responses. AnyMAL inherits the powerful text-based reasoning abilities of the state-of-the-art LLMs including LLaMA-2 (70B), and converts modality-specific signals to the joint textual space through a pre-trained aligner module. To further strengthen the multimodal LLM's capabilities, we fine-tune the model with a multimodal instruction set manually collected to cover diverse topics and tasks beyond simple QAs. We conduct comprehensive empirical analysis comprising both human and automatic evaluations, and demonstrate state-of-the-art performance on various multimodal tasks.

Submission history

From: Seungwhan Moon [view email]

[v1] Wed, 27 Sep 2023 22:50:51 UTC (31,162 KB)



from Hacker News https://ift.tt/BgfDqrh

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.