Maria Magdalena Balos

Data Scientist · NLP · Speech Models · Generative AI

Cambridge, UK  ·  mariabalos16@gmail.com  ·  +44 7899 866 210  ·  LinkedIn

Data Scientist with production experience in NLP and speech models, currently training and deploying multilingual TTS systems at Vocality.ai for enterprise clients. Background in UX research adds a user-centred perspective to building data products. Passionate about deep learning, computer vision, and solving hard problems — always learning.

Key Achievements


<0.5%

Improved Spanish TTS mispronunciation rate from ~11% to <0.5% through data quality improvements and fine-tuning at Vocality.ai

1st place

Datamecum Datathon — binary classification problem solved with a Random Forest & XGBoost ensemble (AUC 0.9851)

~80%

Automated manual interview transcriptions end-to-end using Whisper & Pyannote, cutting time spent by ~80% at Singer Instruments

300+

LeetCode problems solved across algorithms and data structures

Work Experience


  • DATA SCIENTIST

    Jan. 2025 – Present | Remote
    Vocality.ai

    Training and deploying multilingual TTS models in production for enterprise clients including Mercedes-Benz, Heineken, and ONCE.

    • Reduced critical mispronunciation rate in es-ES from ~11% to <0.5% through data quality improvements, fine-tuning, and architecture changes
    • Automated manual workflows, reducing operational overhead by ~50%
    • Built audio post-processing pipeline for assembling multi-segment TTS outputs, removing silences, and formatting for ads and e-learning
    • Built business analytics dashboards in Streamlit
    • Maintains and extends MLOps pipelines on GCP
    • Models used: WhisperX, BigVGAN, Matcha-TTS
    Python TTS/ASR MLOps GCP Streamlit Flask Docker Bash
  • RESEARCH ASSISTANT

    Aug. 2023 – Jan. 2024 | Cambridge, UK
    Museum of Archaeology and Anthropology (MAA)

    • Conducted contextual observation studies — visitor movement, timing, and interaction patterns
    • Collaborated in the design of a 32-question structured survey
    Contextual observation Survey design User research
  • UX RESEARCHER

    Sep. 2022 – Mar. 2023 | Remote, Biotech
    Singer Instruments  ·  See project →

    Awarded Intern of the Year 2022. Internship extended by 3 months.

    • Automated interview transcription using Whisper and Pyannote (speaker diarisation), reducing transcription time by ~80%
    • Led usability studies and user interviews in lab environments with scientists
    • Analysed qualitative and quantitative research data and presented insights to stakeholders
    Whisper Pyannote Python NLP User interviews Usability studies

Education


  • DEEP LEARNING & GENERATIVE AI

    Oct. 2024 – Jul. 2025 | Master's Degree
    Datamecum

    Dissertation: RAG-Driven Educational Assistant · See project →

    PyTorch LangChain RAG LLMs NLP Computer Vision Generative AI
  • DATA SCIENCE INTENSIVE PROGRAM

    Oct. 2023 – May 2024 | Intensive Program
    Datamecum

    1st place Datathon · See project →

    Scikit-learn XGBoost Pandas NumPy EDA
  • INTERACTION DESIGN & UX

    2021 – 2023 | Master's Degree
    Open University of Catalonia

    UX Research Usability studies Interaction Design
  • GRAPHIC DESIGN & DIGITAL CREATION

    2018 – 2021 | Bachelor's Degree
    Open University of Catalonia

    Visual Design Graphic Design