Fabio Augusto Suizu PRO

fabiosuizu

AI & ML interests

None yet

Recent Activity

posted an update 1 day ago

Hi everyone! I've been working on a pronunciation assessment engine optimized for edge deployment and real-time feedback. Wanted to share it with the community and get feedback. **What it does**: Scores English pronunciation at 4 levels of granularity — phoneme, word, sentence, and overall (0-100 each). Returns IPA and ARPAbet notation for every phoneme. **Key specs**: - 17MB total model size (NeMo Citrinet-256, INT4 quantized) - 257ms median inference on CPU - Exceeds human inter-annotator agreement at phone-level (+4.5%) and sentence-level (+5.2%) - Benchmarked on speechocean762 (2,500 test utterances) - Tested across 7 L1 backgrounds (Chinese, Japanese, Korean, Arabic, Spanish, Vietnamese, Russian) **Architecture**: CTC forced alignment + Viterbi decoding + GOP (Goodness of Pronunciation) scoring + MLP/XGBoost ensemble heads. No wav2vec2 dependency — the entire pipeline runs in 17MB. **Try it**: https://huggingface.co/spaces/fabiosuizu/pronunciation-assessment The demo lets you record audio or upload a file, enter the expected text, and get instant scoring down to individual phonemes. **API access**: Available via REST API, MCP servers (for AI agents), and Azure Marketplace. Details in the Space description. Would love feedback on: 1. Use cases you'd find this useful for 2. Languages you'd want supported next 3. Whether the scoring feels calibrated for your experience level Thanks!

updated a Space 1 day ago

fabiosuizu/pronunciation-assessment

published a Space 4 days ago

fabiosuizu/pronunciation-assessment

View all activity

Organizations

None yet

Posts 1

Post

966

Hi everyone!

I've been working on a pronunciation assessment engine optimized for edge deployment and real-time feedback. Wanted to share it with the community and get feedback.

**What it does**: Scores English pronunciation at 4 levels of granularity — phoneme, word, sentence, and overall (0-100 each). Returns IPA and ARPAbet notation for every phoneme.

**Key specs**:
- 17MB total model size (NeMo Citrinet-256, INT4 quantized)
- 257ms median inference on CPU
- Exceeds human inter-annotator agreement at phone-level (+4.5%) and sentence-level (+5.2%)
- Benchmarked on speechocean762 (2,500 test utterances)
- Tested across 7 L1 backgrounds (Chinese, Japanese, Korean, Arabic, Spanish, Vietnamese, Russian)

**Architecture**: CTC forced alignment + Viterbi decoding + GOP (Goodness of Pronunciation) scoring + MLP/XGBoost ensemble heads. No wav2vec2 dependency — the entire pipeline runs in 17MB.

**Try it**: fabiosuizu/pronunciation-assessment

The demo lets you record audio or upload a file, enter the expected text, and get instant scoring down to individual phonemes.

**API access**: Available via REST API, MCP servers (for AI agents), and Azure Marketplace. Details in the Space description.

Would love feedback on:
1. Use cases you'd find this useful for
2. Languages you'd want supported next
3. Whether the scoring feels calibrated for your experience level

Thanks!

spaces 1

Speech AI

🎙

Pronunciation scoring + Speech-to-Text + Text-to-Speech

models 0

None public yet

datasets 0

None public yet