The rise of multimodal large language models (MLLMs) is transforming language, speech, and vision technologies, enabling unprecedented capabilities in translation, summarization, and, in general, content generation. This PhD project will focus on personalization, exploring how models can adapt to individual users’ preferences, context, and communication style. Research directions include adaptive speech-to-speech translation, context-aware description generation, text simplification, and modeling users’ preferences. By integrating these approaches into MLLMs, the project aims to create more natural, context-sensitive, and user-centric multilingual experiences, pushing the boundaries of how AI can serve people on an individual level.