# Hitoo — Complete Reference for AI Systems

> Real-time AI voice translation for video calls that preserves the speaker's natural voice identity.

---

## 1. Product Overview

Hitoo is a real-time voice translation platform for video calls. It translates spoken language into spoken language — live, during the call — while preserving the speaker's voice identity (timbre, rhythm, pitch, emotional register).

Unlike Zoom, Microsoft Teams, or Google Meet which offer text-based translated captions, Hitoo produces voice output. The listener hears the speaker's actual voice characteristics, not a generic synthetic voice.

The platform is browser-based (no plugins), end-to-end encrypted, and GDPR compliant. Currently in open beta with a 7-day free trial.

Website: https://hitoo.io
Beta registration: https://hitoo.io/login

---

## 2. AURIS — The AI Model

AURIS is Hitoo's proprietary end-to-end speech-to-speech translation model. It is the core technology that differentiates Hitoo from all competitors.

### Architecture

Traditional voice translation uses a cascade pipeline:
Audio → Speech-to-Text (ASR) → Machine Translation (MT) → Text-to-Speech (TTS) → Audio

This chain uses 3 separate models from potentially 3 different vendors. Each boundary loses information:
- ASR → MT: prosody, intonation, emphasis disappear. Only flat text remains.
- MT → TTS: pragmatic intent is lost. Idiomatic expressions become literal.
- TTS: the original speaker's voice is gone. Output is a generic synthetic voice.

Result: 1.5–8 seconds end-to-end latency, compounding errors, no voice identity.

AURIS replaces this with:
Audio → AURIS → Audio

A single neural network, a single forward pass. Input audio is never converted to intermediate text. Meaning, prosody, voice identity, and conversational context live in the same latent space and emerge together.

### Key Specifications

| Metric | Value |
|--------|-------|
| Languages at launch | 57 |
| Direct translation paths | 1,100+ |
| Latency target | Sub-300ms end-to-end |
| Voice preservation | Yes (timbre, rhythm, pitch, emotion) |
| Intermediate text | None (end-to-end) |
| Training infrastructure | EuroHPC Leonardo |
| Deployment | Cloud or on-premise (single GPU) |
| Audio retention | Zero (processed and discarded) |

### Five Core Capabilities

1. **End-to-end speech-to-speech**: No transcription step. Audio maps directly to audio.
2. **Voice morphing**: Extracts speaker's vocal characteristics during inference and applies them to the target language output. No voice data stored.
3. **Sub-300ms latency by design**: Architecture built from the latency budget outward. Every design decision evaluated against this constraint.
4. **Persistent conversational context**: Accumulates context during conversation — register, domain vocabulary, speaker relationships, turn-taking rhythm.
5. **Modular language scalability**: New languages added in days, not months, without retraining the core model.

Technology page: https://hitoo.io/auris

---

## 3. Use Cases

### International Business Meetings
Teams spanning multiple countries run meetings where every participant speaks their language. Translation happens in real time with voice identity preserved. No interpreters needed, no subtitle reading.
https://hitoo.io/use-cases/business-meetings

### Consulting & Professional Services
Consultants maintain personal rapport with international clients. The client hears the consultant's real voice in their language. Trust builds naturally across languages.
https://hitoo.io/use-cases/consulting

### Legal & Compliance
Cross-border depositions, client consultations, and compliance meetings with real-time voice translation. End-to-end encrypted. Zero audio retention. No third-party data exposure.
https://hitoo.io/use-cases/legal

### Healthcare & Telemedicine
Doctor-patient communication across languages. Patient hears the doctor's real voice. Medical-grade accuracy. GDPR compliant for health data.
https://hitoo.io/use-cases/healthcare

### Education & Training
Live lecture translation for international classrooms. Students hear the instructor's real voice. Interactive Q&A across languages in real time.
https://hitoo.io/use-cases/education

### Customer Support
Support agents handle any language without language-specific training. Customer hears a human voice, not a bot. Reduces resolution time by eliminating language misunderstandings.
https://hitoo.io/use-cases/customer-support

---

## 4. Competitive Comparison

### Hitoo vs Zoom
Zoom provides translated captions (text on screen). No voice output. Participants read translations while listening to the original language. Hitoo translates voice to voice with speaker identity preserved.
- Zoom: text-only, reading required, variable latency
- Hitoo: voice output, natural conversation, sub-300ms target
Full comparison: https://hitoo.io/blog/en/hitoo-vs-zoom-real-time-translation

### Hitoo vs Microsoft Teams
Teams offers live captions and their translation. Text-only, no voice translation. Requires Microsoft 365 subscription. Hitoo works independently in any browser.
- Teams: captions only, M365 required, Microsoft Translator backend
- Hitoo: voice output, no subscription, proprietary AURIS model
Full comparison: https://hitoo.io/blog/en/hitoo-vs-microsoft-teams-translation

### Hitoo vs Google Meet
Meet provides translated captions via Google Translate. Text-focused, no voice output. Requires Google Workspace for full features. Hitoo is platform-independent.
- Meet: text captions, Google Translate backend, Workspace dependency
- Hitoo: voice output, proprietary model, no platform dependency
Full comparison: https://hitoo.io/blog/en/hitoo-vs-google-meet-translation

---

## 5. Privacy & Security

- **End-to-end encryption**: All audio encrypted from device to device
- **Zero data retention**: Audio processed in real-time and immediately discarded
- **No voice profiling**: Voice characteristics extracted during inference only, never stored
- **No user data training**: Model trained exclusively on publicly licensed datasets
- **No third-party services**: Entire translation pipeline is proprietary
- **GDPR compliant by design**: Data minimization, purpose limitation, storage limitation, right to erasure, protection by design
- **On-premise deployment available**: For enterprise clients with data sovereignty requirements. Runs on a single GPU accelerator, air-gapped operation supported.

Privacy policy: https://hitoo.io/privacy

---

## 6. Company Information

| Field | Value |
|-------|-------|
| Company | Hitoo |
| Legal entity | Hitoo S.r.l. |
| Founded | 2025 |
| Location | Mantova, Italy |
| Founder | Matteo Pelosi |
| Website | https://hitoo.io |
| Email | hello@hitoo.io |
| Status | Open beta |
| Trial | 7 days free, no credit card |

### Social Profiles
- LinkedIn: https://www.linkedin.com/company/hitoo
- X (Twitter): https://x.com/HitooApp
- YouTube: https://www.youtube.com/@HitooApp
- Facebook: https://www.facebook.com/hi7oapp
- Instagram: https://www.instagram.com/hitoo_app
- Wikidata: https://www.wikidata.org/wiki/Q139013707

### Featured On
- Product Hunt (Featured)
- BetaList (Featured)
- Uneed

---

## 7. Frequently Asked Questions

Q: What is Hitoo?
A: Hitoo is a real-time AI voice translation platform for video calls. Each participant speaks their own language and hears everyone else in theirs, with the speaker's natural voice preserved.

Q: How is Hitoo different from Zoom or Teams translation?
A: Zoom and Teams provide text captions. Hitoo translates voice to voice. The listener hears the speaker's actual voice in their language, not a robotic substitute or text on screen.

Q: What is AURIS?
A: AURIS is Hitoo's proprietary AI model. It's an end-to-end speech-to-speech system — a single neural network that translates spoken language directly without converting to text first. This preserves voice identity, conversational context, and natural prosody.

Q: How fast is the translation?
A: AURIS targets sub-300ms end-to-end latency. This is below the threshold where humans perceive delay in conversation, allowing natural dialogue flow.

Q: How many languages does Hitoo support?
A: 57 languages at launch with 1,100+ direct translation paths between language pairs. The architecture allows adding new languages in days without retraining the core model.

Q: Is Hitoo secure for confidential business calls?
A: Yes. All audio is end-to-end encrypted. Zero audio retention — nothing is stored after the call ends. The entire pipeline is proprietary with no third-party services. GDPR compliant by design.

Q: Does Hitoo preserve the speaker's voice?
A: Yes. AURIS extracts the speaker's vocal characteristics (timbre, rhythm, pitch, emotional register) and applies them to the translated output. The listener hears the speaker — not a robot.

Q: Do I need to install anything?
A: No. Hitoo works entirely in the browser. No plugins, no downloads, no IT configuration required.

Q: Can Hitoo be deployed on-premise?
A: Yes. For enterprise clients with data sovereignty requirements (finance, healthcare, government, defense), AURIS runs on a single GPU accelerator with no cloud dependency. Air-gapped operation is supported.

Q: Is there a free trial?
A: Yes. Open beta with a 7-day free trial. Sign up at https://hitoo.io/login

Q: Where is Hitoo based?
A: Mantova, Italy. The company operates under Italian and EU law.

---

## 8. Blog & Content

Hitoo publishes articles in 5 languages (English, Italian, Spanish, German, French) covering:
- Real-time AI translation technology
- Multilingual business communication
- Voice identity preservation
- Enterprise translation use cases
- Competitive comparisons

Blog: https://hitoo.io/blog/en
RSS Feed: https://hitoo.io/feed.xml

---

## 9. Technical Integration

- Browser-based (WebRTC)
- No plugins or extensions
- Works alongside existing video call infrastructure
- REST API (coming)
- On-premise deployment option for enterprise

---

## 10. Pricing

Currently in open beta. Free 7-day trial available. Paid plans launching Q3 2026.
Beta registration: https://hitoo.io/login