Maria Magdalena Balos

Data Scientist · NLP · Speech Models · Generative AI

Cambridge, UK · mariabalos16@gmail.com · +44 7899 866 210 · LinkedIn

Data Scientist with production experience in NLP and speech models, currently training and deploying multilingual TTS systems at Vocality.ai for enterprise clients. Background in UX research adds a user-centred perspective to building data products. Passionate about deep learning, computer vision, and solving hard problems — always learning.

Key Achievements

<0.5%

Improved Spanish TTS mispronunciation rate from ~11% to <0.5% through data quality improvements and fine-tuning at Vocality.ai

1st place

Datamecum Datathon — binary classification problem solved with a Random Forest & XGBoost ensemble (AUC 0.9851)

~80%

Automated manual interview transcriptions end-to-end using Whisper & Pyannote, cutting time spent by ~80% at Singer Instruments

300+

LeetCode problems solved across algorithms and data structures

Work Experience

DATA SCIENTIST

Jan. 2025 – Present | Remote
Vocality.ai

Training and deploying multilingual TTS models in production for enterprise clients including Mercedes-Benz, Heineken, and ONCE.
- Reduced critical mispronunciation rate in es-ES from ~11% to <0.5% through data quality improvements, fine-tuning, and architecture changes
- Automated manual workflows, reducing operational overhead by ~50%
- Built audio post-processing pipeline for assembling multi-segment TTS outputs, removing silences, and formatting for ads and e-learning
- Built business analytics dashboards in Streamlit
- Maintains and extends MLOps pipelines on GCP
- Models used: WhisperX, BigVGAN, Matcha-TTS
Python TTS/ASR MLOps GCP Streamlit Flask Docker Bash
RESEARCH ASSISTANT

Aug. 2023 – Jan. 2024 | Cambridge, UK
Museum of Archaeology and Anthropology (MAA)
- Conducted contextual observation studies — visitor movement, timing, and interaction patterns
- Collaborated in the design of a 32-question structured survey
Contextual observation Survey design User research
UX RESEARCHER

Sep. 2022 – Mar. 2023 | Remote, Biotech
Singer Instruments · See project →

Awarded Intern of the Year 2022. Internship extended by 3 months.
- Automated interview transcription using Whisper and Pyannote (speaker diarisation), reducing transcription time by ~80%
- Led usability studies and user interviews in lab environments with scientists
- Analysed qualitative and quantitative research data and presented insights to stakeholders
Whisper Pyannote Python NLP User interviews Usability studies

Education

DEEP LEARNING & GENERATIVE AI

Oct. 2024 – Jul. 2025 | Master's Degree
Datamecum

Dissertation: RAG-Driven Educational Assistant · See project →

PyTorch LangChain RAG LLMs NLP Computer Vision Generative AI
DATA SCIENCE INTENSIVE PROGRAM

Oct. 2023 – May 2024 | Intensive Program
Datamecum

1st place Datathon · See project →

Scikit-learn XGBoost Pandas NumPy EDA
INTERACTION DESIGN & UX

2021 – 2023 | Master's Degree
Open University of Catalonia

UX Research Usability studies Interaction Design
GRAPHIC DESIGN & DIGITAL CREATION

2018 – 2021 | Bachelor's Degree
Open University of Catalonia

Visual Design Graphic Design

Maria Magdalena Balos

Key Achievements

<0.5%

1st place

~80%

300+

Work Experience

DATA SCIENTIST

RESEARCH ASSISTANT

UX RESEARCHER

Education

DEEP LEARNING & GENERATIVE AI

DATA SCIENCE INTENSIVE PROGRAM

INTERACTION DESIGN & UX

GRAPHIC DESIGN & DIGITAL CREATION