Takuya Umeki's picture

Open to Collab

Takuya Umeki

consome2

otoearth

·

https://www.full-duplex.ai/

AI & ML interests

Full-duplex

Recent Activity

liked a dataset 3 days ago

sarulab-speech/J-CHAT

reacted to their post with ❤️ 29 days ago

We’ve released two conversational speech datasets from oto on Hugging Face 🤗 Both are based on real, casual, full-duplex conversations, but with slightly different focuses. Dataset 1: Processed / curated subset https://huggingface.co/datasets/otoearth/otoSpeech-full-duplex-processed-141h * Full-duplex, spontaneous multi-speaker conversations * Participants filtered for high audio quality * PII removal and audio enhancement applied * Designed for training and benchmarking S2S or dialogue models Dataset 2: Larger raw(er) release https://huggingface.co/datasets/otoearth/otoSpeech-full-duplex-280h * Same collection pipeline, with broader coverage * More diversity in speakers, accents, and conversation styles * Useful for analysis, filtering, or custom preprocessing experiments We intentionally split the release to support different research workflows: clean and ready-to-use vs. more exploratory and research-oriented use. The datasets are currently private, but we’re happy to approve access requests — feel free to request access if you’re interested. If you’re working on speech-to-speech (S2S) models or are curious about full-duplex conversational data, we’d love to discuss and exchange ideas together. Feedback and ideas are very welcome!

replied to their post about 1 month ago

We’ve released two conversational speech datasets from oto on Hugging Face 🤗 Both are based on real, casual, full-duplex conversations, but with slightly different focuses. Dataset 1: Processed / curated subset https://huggingface.co/datasets/otoearth/otoSpeech-full-duplex-processed-141h * Full-duplex, spontaneous multi-speaker conversations * Participants filtered for high audio quality * PII removal and audio enhancement applied * Designed for training and benchmarking S2S or dialogue models Dataset 2: Larger raw(er) release https://huggingface.co/datasets/otoearth/otoSpeech-full-duplex-280h * Same collection pipeline, with broader coverage * More diversity in speakers, accents, and conversation styles * Useful for analysis, filtering, or custom preprocessing experiments We intentionally split the release to support different research workflows: clean and ready-to-use vs. more exploratory and research-oriented use. The datasets are currently private, but we’re happy to approve access requests — feel free to request access if you’re interested. If you’re working on speech-to-speech (S2S) models or are curious about full-duplex conversational data, we’d love to discuss and exchange ideas together. Feedback and ideas are very welcome!

View all activity

Organizations

liked a dataset 3 days ago

sarulab-speech/J-CHAT

Viewer • Updated Feb 9, 2025 • 2.02k • 44 • 33

liked 2 datasets about 1 month ago

otoearth/otoSpeech-full-duplex-processed-141h

Preview • Updated 19 days ago • 103 • 19

otoearth/otoSpeech-full-duplex-280h

Preview • Updated 19 days ago • 572 • 8

liked 8 models 9 months ago

pyannote/speaker-diarization-3.1

Automatic Speech Recognition • Updated May 10, 2024 • 13.1M • 1.58k

pyannote/voice-activity-detection

Automatic Speech Recognition • Updated May 10, 2024 • 687k • 225

Qwen/Qwen2-Audio-7B-Instruct

Audio-Text-to-Text • Updated Jan 12, 2025 • 489k • 519

fixie-ai/ultravox-v0_5-llama-3_2-1b

Audio-Text-to-Text • 0.7B • Updated 1 day ago • 484k • 70

SWivid/F5-TTS

Text-to-Speech • Updated Mar 21, 2025 • 644k • 1.15k

hexgrad/Kokoro-82M

Text-to-Speech • Updated Apr 10, 2025 • 9.21M • • 5.76k

coqui/XTTS-v2

Text-to-Speech • Updated Dec 11, 2023 • 7.45M • 3.41k

nari-labs/Dia-1.6B

Text-to-Speech • Updated Jun 1, 2025 • 76.3k • • 2.83k