Wow this is super cool! Thanks for sharing!
Tyler Williams PRO
unmodeled-tyler
AI & ML interests
Founder of Quanta Intellect/VANTA Research - Looking to get in touch? Head to my website!
Recent Activity
replied to
fabiosuizu's
post
about 16 hours ago
Hi everyone!
I've been working on a pronunciation assessment engine optimized for edge deployment and real-time feedback. Wanted to share it with the community and get feedback.
**What it does**: Scores English pronunciation at 4 levels of granularity — phoneme, word, sentence, and overall (0-100 each). Returns IPA and ARPAbet notation for every phoneme.
**Key specs**:
- 17MB total model size (NeMo Citrinet-256, INT4 quantized)
- 257ms median inference on CPU
- Exceeds human inter-annotator agreement at phone-level (+4.5%) and sentence-level (+5.2%)
- Benchmarked on speechocean762 (2,500 test utterances)
- Tested across 7 L1 backgrounds (Chinese, Japanese, Korean, Arabic, Spanish, Vietnamese, Russian)
**Architecture**: CTC forced alignment + Viterbi decoding + GOP (Goodness of Pronunciation) scoring + MLP/XGBoost ensemble heads. No wav2vec2 dependency — the entire pipeline runs in 17MB.
**Try it**: https://huggingface.co/spaces/fabiosuizu/pronunciation-assessment
The demo lets you record audio or upload a file, enter the expected text, and get instant scoring down to individual phonemes.
**API access**: Available via REST API, MCP servers (for AI agents), and Azure Marketplace. Details in the Space description.
Would love feedback on:
1. Use cases you'd find this useful for
2. Languages you'd want supported next
3. Whether the scoring feels calibrated for your experience level
Thanks!
reacted
to
fabiosuizu's
post
with 🔥
about 16 hours ago
Hi everyone!
I've been working on a pronunciation assessment engine optimized for edge deployment and real-time feedback. Wanted to share it with the community and get feedback.
**What it does**: Scores English pronunciation at 4 levels of granularity — phoneme, word, sentence, and overall (0-100 each). Returns IPA and ARPAbet notation for every phoneme.
**Key specs**:
- 17MB total model size (NeMo Citrinet-256, INT4 quantized)
- 257ms median inference on CPU
- Exceeds human inter-annotator agreement at phone-level (+4.5%) and sentence-level (+5.2%)
- Benchmarked on speechocean762 (2,500 test utterances)
- Tested across 7 L1 backgrounds (Chinese, Japanese, Korean, Arabic, Spanish, Vietnamese, Russian)
**Architecture**: CTC forced alignment + Viterbi decoding + GOP (Goodness of Pronunciation) scoring + MLP/XGBoost ensemble heads. No wav2vec2 dependency — the entire pipeline runs in 17MB.
**Try it**: https://huggingface.co/spaces/fabiosuizu/pronunciation-assessment
The demo lets you record audio or upload a file, enter the expected text, and get instant scoring down to individual phonemes.
**API access**: Available via REST API, MCP servers (for AI agents), and Azure Marketplace. Details in the Space description.
Would love feedback on:
1. Use cases you'd find this useful for
2. Languages you'd want supported next
3. Whether the scoring feels calibrated for your experience level
Thanks!
reacted
to
MonsterMMORPG's
post
with 👀
about 16 hours ago
SECourses Ultimate Video and Image Upscaler Pro is now V2.1 and massive improvements has arrived
Check all below screenshots to see all amazing features
20 Feburary 2026 Update V2.1
This is a pretty big update
We have 100% changed the FlashVSR+ backend to a new repo and I have significantly upgraded this repo
The new FlashVSR+ works amazing and I think it is better than SeedVR2 for high res videos upscale like upscaling 720p into higher resolution
Top menu navigation bar updated into a better version and view
FlashVSR+ tab remade and all the features are now working
For lower VRAM a button is added which you can use if you get OOM
Read the updated UI to understand how to use
FlashVSR+ now can upscale images very well as well
Image Based GAN upscalers tab also improved and some bugs fixed
Output & Comparison tab Video Output was not working properly and this issue fix fixed
In Output & Comparison tab, new multi video and multi image comparison sliders added which is super useful to quickly compare multiple videos and images
Lots of various bug fixes made
App is getting closer to be perfect please heavily test it and let me know errors and what features you request
This update was mostly about improving the FlashVSR+ since it is a very fast and amazing video upscaler model
Image Based - Gan upscale now can upscale videos perfectly fine and Batch Size (Frames per Iteration) is now working to speed up upscaling videos
For updating, get the latest zip file, extract and overwrite all files and run Windows_Run_SECourses_Upscaler_Pro.bat file