The advent of foundation models has introduced unprecedented opportunities in all areas of natural language processing. Automatic translation (be it speech or text translation) is no exception, with a wide variety of language directions, domains and application scenarios whose coverage is no longer a mere utopia. Although conditions today are more favorable than in the past, open challenges still exist in terms of fully exploiting the power of the available models, increasing their flexibility to integrate diverse input types, or constraining the output to meet specific application requirements. Open questions include: how to feed non-symbolic models with symbolic information describing the context of a translation request? How to supply meta-information about target users? How to integrate model capabilities with external information from structured knowledge bases? How to condition the output to specific target applications? This PhD aims to explore state-of-the-art solutions to tackle these challenges, with a special focus on the integration of multimodal information (e.g. contextual information supplied as visual cues), user-specific constraints (e.g. for gender/formality control), and application-specific constraints (e.g. structural requirements as in the case of video subtitling).