Global GPU Clusters
DeepInfra operates a globally distributed fleet of dedicated GPU infrastructure. Every inference request is routed to the nearest available node, keeping latency consistently low regardless of your location.
Enterprise SLA Reliability
DeepInfra's infrastructure is built to enterprise SLA standards โ the same infrastructure trusted by companies processing hundreds of millions of AI requests daily.
Cold Start Elimination
Our integration with DeepInfra runs exclusively on dedicated, always-warm endpoints. There are no cold starts, no spin-up delays, no queuing. Every request hits a live model immediately.
Scalable to Any Load
Whether you're the first user of the day or the ten-thousandth, the system scales horizontally without degradation.
Six stages. Every one optimized. Audio enters, text exits, nothing stays behind.
MIC
Captured
ENCODE
WebM/Opus
API
Direct
INFER
DeepInfra
RETURN
< 1.8s
DELETE
Permanent
01
Browser Capture
Audio is captured natively in your browser using the WebAudio API. No plugin, no extension, no download required. Works on every modern device.
02
Efficient Encoding
Audio is encoded in WebM/Opus format โ a codec purpose-built for voice. This minimizes file size and upload time while preserving every phoneme accurately.
03
Direct Request Path
Audio is sent through Yapr's transcription API for the live request, then handed to inference without creating a separate storage object.
04
AI Inference
Your audio is sent to DeepInfra's dedicated AI inference endpoint. State-of-the-art speech models run on dedicated GPU hardware โ no shared queuing, no cold start, no delay.
05
Instant Return
Transcribed text is returned directly to your browser via our API. The median round-trip time is under 1.8 seconds for recordings under 60 seconds.
06
No Audio Object Storage
When transcription completes, audio is deleted immediately. Yapr has no recordings database, no audio archive, and no hidden storage layer.
0.2%
word accuracy
0K hrs
training data
0+
languages
0-bit
AES encryption
0.9%
uptime SLA
0bytes
audio retained
Independently benchmarked. Measured across accents, ambient environments, speaking speeds, and language contexts. Not a marketing claim โ a verified measurement.
Native English speakers
99.4%
Non-native English speakers
98.8%
Technical vocabulary
98.1%
Noisy environments
97.2%
Code-switching (2 languages)
96.9%
01
No Audio Storage Layer
The system is architected without an audio storage layer. Audio enters only to generate text. No recordings database. No audio archive. No backup of audio files.
02
Immediate Deletion
The transcription pipeline is built so audio is deleted immediately after transcription. No archive. No recordings database. No retention layer.
03
TLS 1.3 In Transit
All data in transit uses TLS 1.3 โ the current gold standard in transport encryption. This covers your browser, our API, and our AI infrastructure.
04
AES-256 At Rest
Transcript text and account data are stored in AES-256-GCM encrypted database partitions with key rotation. The encryption layer is enforced at the infrastructure level, not application level.
05
Secure Authentication
Authentication is available via OAuth 2.0 (Google, GitHub), email with encrypted password hashing, or passkeys (WebAuthn). Your password is never stored in plain text. Your biometrics never leave your device.
06
Hardened Security Headers
Every response enforces HSTS, Content-Security-Policy, X-Frame-Options, and SameSite=Strict cookies โ preventing XSS, clickjacking, and session hijacking by default.
07
Metadata Separation
The only data stored is account and usage metadata, plus transcript text only if you enable history โ it is off by default. Server-side audio is never retained. Your recordings are deleted immediately after transcription.
08
GDPR + CCPA Compliance
Full regulatory compliance with GDPR and CCPA. You can export or delete all your data at any time via Settings. Erasure requests processed within 30 days.