Recent advances in structured 3D Gaussians for view-adaptive rendering, particularly through methods like Scaffold-GS, have demonstrated promising results in neural scene representation. However, existing approaches still face challenges in perceptual consistency and precise view-dependent effects. We present PEP-GS, a novel framework that enhances structured 3D Gaussians through three key innovations: (1) a Local-Enhanced Multi-head Self-Attention (LEMSA) mechanism that replaces spherical harmonics for more accurate view-dependent color decoding, and (2) Kolmogorov-Arnold Networks (KAN) that optimize Gaussian opacity and covariance functions for enhanced interpretability and splatting precision. (3) a Neural Laplacian Pyramid Decomposition (NLPD) that improves perceptual similarity across views. Our comprehensive evaluation across multiple datasets indicates that, compared to the current state-of-the-art methods, these improvements are particularly evident in challenging scenarios such as view-dependent effects, specular reflections, fine-scale details and false geometry generation.
在视图自适应渲染中,基于结构化 3D 高斯的方法(尤其是 Scaffold-GS)近年来在神经场景表示方面取得了令人瞩目的成果。然而,现有方法仍在感知一致性和精确的视图依赖效果方面面临挑战。我们提出了 PEP-GS,这是一种通过以下三个关键创新增强结构化 3D 高斯的框架:(1) 引入局部增强多头自注意力机制(Local-Enhanced Multi-head Self-Attention, LEMSA),取代球谐函数,实现更精确的视图依赖颜色解码;(2) 采用 Kolmogorov-Arnold 网络(Kolmogorov-Arnold Networks, KAN),优化高斯的不透明度和协方差函数,提升可解释性和投影精度;(3) 提出神经拉普拉斯金字塔分解(Neural Laplacian Pyramid Decomposition, NLPD),以提高跨视图的感知相似性。我们在多个数据集上的综合评估表明,与当前最先进的方法相比,这些改进在视图依赖效果、镜面反射、细节刻画以及虚假几何生成等具有挑战性的场景中表现尤为显著。