Higher-Order Ambisonics (HOA) encoding from sparse, irregular microphone arrays remains a critical challenge for consumer spatial audio capture in immersive communication; XR. We propose Flow-HOA, a generative framework that jointly optimizes a multi-dimensional perceptual objective while producing a deployable, time-invariant bank of Finite Impulse Response (FIR) encoding filters. Using conditional flow matching, the model learns to map a simple prior distribution to the target distribution of FIR filter coefficients. Training is guided by a composite loss that balances time-domain waveform fidelity, multi-resolution spectral consistency, sub-band energy preservation,; spatial directivity constraints. Objective evaluations demonstrate improved performance over strong model-based baselines in both signal fidelity; spatial accuracy metrics. Subjective listening tests further confirm that Flow-HOA yields higher overall sound quality with reduced artifacts.