rather than "latents", try this with the unet encoder activations, or maybe just the bottleneck
- use something like KLMC2 with a weak conditioning weight to draw samples in the neighborhood of a prompt
- collect the latents from this sampling process and learn a reduced rank representation over the latents (e.g. PCA)
- the rank-reduced representation should provide a semantic basis whose directions are specifically relevant to the conditioning prompt and feasible outputs