Skip to content

Latest commit

 

History

History
156 lines (90 loc) · 9.88 KB

README.md

File metadata and controls

156 lines (90 loc) · 9.88 KB

Check our projects at https://github.com/FaceOnLive


Trending AI Researches with Source Codes

The field of Artificial Intelligence (AI) is rapidly evolving, with new breakthroughs and technologies emerging at a swift pace. This document aims to highlight some of the trending research areas within AI and list relevant source codes where enthusiasts and professionals alike can find resources, code, and projects related to these cutting-edge topics.

20. Long-form factuality in large language models

Empirically, we demonstrate that LLM agents can outperform crowdsourced human annotators - on a set of ~16k individual facts, SAFE agrees with crowdsourced human annotators 72% of the time, and on a random subset of 100 disagreement cases, SAFE wins 76% of the time.

19. Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i. e., high-resolution visual tokens, high-quality data, and VLM-guided generation.

18. BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion

Image inpainting, the process of restoring corrupted images, has seen significant advancements with the advent of diffusion models (DMs).

17. AIOS: LLM Agent Operating System

Inspired by these challenges, this paper presents AIOS, an LLM agent operating system, which embeds large language model into operating systems (OS) as the brain of the OS, enabling an operating system "with soul" -- an important step towards AGI.

16. AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

In this study, we propose AniPortrait, a novel framework for generating high-quality animation driven by audio and a reference portrait image.

15. VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

We introduce VoiceCraft, a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on audiobooks, internet videos, and podcasts.

14. Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

In this paper we propose to study generalization of neural networks on small algorithmically generated datasets.

13. LLM4Decompile: Decompiling Binary Code with Large Language Models

Therefore, we release the first open-access decompilation LLMs ranging from 1B to 33B pre-trained on 4 billion tokens of C source code and the corresponding assembly code.

12. FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation

In this paper, we introduce FRESCO, intra-frame correspondence alongside inter-frame correspondence to establish a more robust spatial-temporal constraint.

11. Evolutionary Optimization of Model Merging Recipes

Surprisingly, our Japanese Math LLM achieved state-of-the-art performance on a variety of established Japanese LLM benchmarks, even surpassing models with significantly more parameters, despite not being explicitly trained for such tasks.

10. Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small

Research in mechanistic interpretability seeks to explain behaviors of machine learning models in terms of their internal components.

9. DeepSeek-VL: Towards Real-World Vision-Language Understanding

The DeepSeek-VL family (both 1. 3B and 7B models) showcases superior user experiences as a vision-language chatbot in real-world applications, achieving state-of-the-art or competitive performance across a wide range of visual-language benchmarks at the same model size while maintaining robust performance on language-centric benchmarks.

8. Chronos: Learning the Language of Time Series

We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models.

7. VideoMamba: State Space Model for Efficient Video Understanding

Addressing the dual challenges of local redundancy and global dependencies in video understanding, this work innovatively adapts the Mamba to the video domain.

6. V3D: Video Diffusion Models are Effective 3D Generators

To fully unleash the potential of video diffusion to perceive the 3D world, we further introduce geometrical consistency prior and extend the video diffusion model to a multi-view consistent 3D generator.

5. Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts

Despite recent advances in image-to-video generation, better controllability and local animation are less explored.

4. StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control

The enormous success of diffusion models in text-to-image synthesis has made them promising candidates for the next generation of end-user applications for image generation and editing.

3. GiT: Towards Generalist Vision Transformer through Universal Language Interface

The enormous success of diffusion models in text-to-image synthesis has made them promising candidates for the next generation of end-user applications for image generation and editing.

2. Towards General Computer Control: A Multimodal Agent for Red Dead Redemption II as a Case Study

Despite the success in specific tasks and scenarios, existing foundation agents, empowered by large models (LMs) and advanced tools, still cannot generalize to different scenarios, mainly due to dramatic differences in the observations and actions across scenarios.

1. DragAnything: Motion Control for Anything using Entity Representation

We introduce DragAnything, which utilizes a entity representation to achieve motion control for any object in controllable video generation.




Visitors