Multimodal LLMs Without Compromise: Researchers from UCLA, UW–Madison, and Adobe Introduce X-Fusion to Add Vision to Frozen Language Models Without Losing Language Capabilities
Source: MarkTechPost LLMs have made significant strides in language-related tasks such as conversational AI, reasoning, and code generation....

NVIDIA Open-Sources Open Code Reasoning Models (32B, 14B, 7B)
Source: MarkTechPost NVIDIA continues to push the boundaries of open AI development by open-sourcing its Open Code Reasoning...
Hugging Face Releases nanoVLM: A Pure PyTorch Library to Train a Vision-Language Model from Scratch in 750 Lines of Code
Source: MarkTechPost In a notable step toward democratizing vision-language model development, Hugging Face has released nanoVLM, a compact...
Google Launches Gemini 2.5 Pro I/O: Outperforms GPT-4 Turbo in Coding, Supports Native Video Understanding and Leads WebDev Arena
Source: MarkTechPost Just ahead of its annual I/O developer conference, Google has released an early preview of Gemini...
Google Launches Gemini 2.5 Pro I/O: Outperforms GPT-4 in Coding, Supports Native Video Understanding and Leads WebDev Arena
Source: MarkTechPost Just ahead of its annual I/O developer conference, Google has released an early preview of Gemini...
Researchers from Fudan University Introduce Lorsa: A Sparse Attention Mechanism That Recovers Atomic Attention Units Hidden in Transformer Superposition
Source: MarkTechPost Large Language Models (LLMs) have gained significant attention in recent years, yet understanding their internal mechanisms...
This AI Paper Introduce WebThinker: A Deep Research Agent that Empowers Large Reasoning Models (LRMs) for Autonomous Search and Report Generation
Source: MarkTechPost Large reasoning models (LRMs) have shown impressive capabilities in mathematics, coding, and scientific reasoning. However, they...

Is Automated Hallucination Detection in LLMs Feasible? A Theoretical and Empirical Investigation
Source: MarkTechPost Recent advancements in LLMs have significantly improved natural language understanding, reasoning, and generation. These models now...
LLMs Can Now Talk in Real-Time with Minimal Latency: Chinese Researchers Release LLaMA-Omni2, a Scalable Modular Speech Language Model
Source: MarkTechPost Researchers at the Institute of Computing Technology, Chinese Academy of Sciences, have introduced LLaMA-Omni2, a family...
NVIDIA Open Sources Parakeet TDT 0.6B: Achieving a New Standard for Automatic Speech Recognition ASR and Transcribes an Hour of Audio in One Second
Source: MarkTechPost NVIDIA has unveiled Parakeet TDT 0.6B, a state-of-the-art automatic speech recognition (ASR) model that is now...