6 points by yokee 16 hours ago|1 comments
Unlike LoRA and its variants, which inject trainable parameters directly into the weights of the Transformer, requiring tight coupling with the backbone.

ShadowPEFT instead enhances the frozen large base model by adding a lightweight, centralized, pretrainable, and detachable Shadow network. This shadow network operates in parallel with the base model, delivering learned corrections to each decoder layer. Because the shadow module is architecturally decoupled from the backbone, it can be independently trained, stored, and deployed, benefiting edge computing scenarios and edge-cloud collaboration computing.

heyjude87 15 hours ago
I came across this paper a few days ago, and the idea is actually pretty interesting.

Do you think it supports VLMs?