audio encoder...

#3
by Imran1 - opened

Can I add audio encoder into this architecture?
Supposed, it will understand the audio and also speck.

DeepSeek org

We think this is feasible. Janus has demonstrated that the biggest bottleneck in multitasking lies in the encoding phase, rather than conflicts arising from using the same transformer. We have validated the feasibility of separating the encoding process and then using the same transformer to handle text-only understanding, multimodal understanding, and visual generation. The same applies to audio.

Sign up or log in to comment