Data Scientist · NLP · Speech Models · Generative AI
Cambridge, UK · mariabalos16@gmail.com · +44 7899 866 210 · LinkedIn
Data Scientist with production experience in NLP and speech models, currently training and deploying multilingual TTS systems at Vocality.ai for enterprise clients. Background in UX research adds a user-centred perspective to building data products. Passionate about deep learning, computer vision, and solving hard problems — always learning.
Improved Spanish TTS mispronunciation rate from ~11% to <0.5% through data quality improvements and fine-tuning at Vocality.ai
Datamecum Datathon — binary classification problem solved with a Random Forest & XGBoost ensemble (AUC 0.9851)
Automated manual interview transcriptions end-to-end using Whisper & Pyannote, cutting time spent by ~80% at Singer Instruments
LeetCode problems solved across algorithms and data structures
Jan. 2025 – Present | Remote
Vocality.ai
Training and deploying multilingual TTS models in production for enterprise clients including Mercedes-Benz, Heineken, and ONCE.
Aug. 2023 – Jan. 2024 | Cambridge, UK
Museum of Archaeology and Anthropology (MAA)
Sep. 2022 – Mar. 2023 | Remote, Biotech
Singer Instruments
· See project →
Awarded Intern of the Year 2022. Internship extended by 3 months.
Oct. 2024 – Jul. 2025 | Master's Degree
Datamecum
Dissertation: RAG-Driven Educational Assistant · See project →
Oct. 2023 – May 2024 | Intensive Program
Datamecum
1st place Datathon · See project →
2021 – 2023 | Master's Degree
Open University of Catalonia
2018 – 2021 | Bachelor's Degree
Open University of Catalonia