Building voice-based systems that understand and speak Hebrew is technically challenging due to limited Hebrew voice models, accent diversity, and the need for proper IVR design adapted to Israeli business culture.
Author: @skills-il
Build Hebrew voice bots and IVR systems with speech-to-text, text-to-speech, and telephony integration for Israeli businesses. Covers OpenAI Whisper Hebrew, Google Cloud STT/TTS, Azure Speech, Amazon Polly, IVR menu design for Sunday-Thursday hours, voicemail transcription, accent handling, and +972 phone integration. Do NOT use for text-based chatbots (use hebrew-chatbot-builder) or Hebrew NLP without voice (use hebrew-nlp-toolkit).
npx skills-il add skills-il/developer-tools --skill hebrew-voice-bot-builderBuild production-ready Hebrew voice bots and IVR systems for Israeli businesses. This skill covers the full voice pipeline: speech-to-text (STT), text-to-speech (TTS), IVR flow design, telephony integration, and Hebrew-specific challenges like accent handling and mixed Hebrew-English speech.
Before building, decide on the voice bot architecture based on the use case:
| Architecture | Best For | Components |
|---|---|---|
| IVR (keypad) | Simple menu navigation, payment lines, appointment scheduling | TTS + DTMF + telephony |
| Voice bot (conversational) | Customer service, order status, FAQ handling | STT + LLM + TTS + telephony |
| Voicemail transcription | Missed call handling, message routing | STT + notification pipeline |
| Hybrid | Complex flows with both speech and keypad input | STT + TTS + DTMF + telephony |
Key decisions:
Whisper provides the best Hebrew transcription accuracy, especially for mixed Hebrew-English speech common in Israeli tech environments.
import openai
client = openai.OpenAI()
def transcribe_hebrew(audio_file_path: str) -> str:
"""Transcribe Hebrew audio using OpenAI Whisper."""
with open(audio_file_path, "rb") as audio_file:
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
language="he", # Force Hebrew language detection
response_format="text",
)
return transcript
def transcribe_hebrew_with_timestamps(audio_file_path: str) -> dict:
"""Transcribe with word-level timestamps for subtitle generation."""
with open(audio_file_path, "rb") as audio_file:
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
language="he",
response_format="verbose_json",
timestamp_granularities=["word"],
)
return transcriptWhisper Hebrew tips:
language="he" explicitly to avoid misdetecting Hebrew as ArabicLower latency than Whisper, suitable for real-time voice bots.
from google.cloud import speech_v1
def transcribe_hebrew_google(audio_content: bytes) -> str:
"""Transcribe Hebrew audio using Google Cloud STT."""
client = speech_v1.SpeechClient()
audio = speech_v1.RecognitionAudio(content=audio_content)
config = speech_v1.RecognitionConfig(
encoding=speech_v1.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code="he-IL",
# Enable automatic punctuation for Hebrew
enable_automatic_punctuation=True,
# Model optimized for phone calls
model="phone_call",
# Enable word-level confidence scores
enable_word_confidence=True,
)
response = client.recognize(config=config, audio=audio)
results = []
for result in response.results:
results.append(result.alternatives[0].transcript)
return " ".join(results)
def stream_transcribe_hebrew(audio_generator):
"""Real-time streaming transcription for live phone calls."""
client = speech_v1.SpeechClient()
config = speech_v1.StreamingRecognitionConfig(
config=speech_v1.RecognitionConfig(
encoding=speech_v1.RecognitionConfig.AudioEncoding.MULAW,
sample_rate_hertz=8000, # Standard phone audio
language_code="he-IL",
model="phone_call",
enable_automatic_punctuation=True,
),
interim_results=True, # Get partial results for faster response
)
streaming_config = speech_v1.StreamingRecognizeRequest(
streaming_config=config
)
def request_generator():
yield streaming_config
for chunk in audio_generator:
yield speech_v1.StreamingRecognizeRequest(audio_content=chunk)
responses = client.streaming_recognize(requests=request_generator())
for response in responses:
for result in response.results:
if result.is_final:
yield result.alternatives[0].transcriptEnterprise-grade with custom model training for domain-specific Hebrew vocabulary.
import azure.cognitiveservices.speech as speechsdk
def transcribe_hebrew_azure(audio_file_path: str) -> str:
"""Transcribe Hebrew audio using Azure Speech Services."""
speech_config = speechsdk.SpeechConfig(
subscription="YOUR_AZURE_KEY",
region="westeurope", # Closest region to Israel
)
speech_config.speech_recognition_language = "he-IL"
audio_config = speechsdk.AudioConfig(filename=audio_file_path)
recognizer = speechsdk.SpeechRecognizer(
speech_config=speech_config,
audio_config=audio_config,
)
result = recognizer.recognize_once()
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
return result.text
elif result.reason == speechsdk.ResultReason.NoMatch:
return ""
else:
raise RuntimeError(f"Speech recognition failed: {result.reason}")Consult references/hebrew-stt-models.md for a detailed comparison of STT providers with accuracy benchmarks.
from google.cloud import texttospeech
def synthesize_hebrew(text: str, output_path: str, voice_gender: str = "female") -> None:
"""Convert Hebrew text to speech using Google Cloud TTS."""
client = texttospeech.TextToSpeechClient()
input_text = texttospeech.SynthesisInput(text=text)
# Available Hebrew voices
voice_name_map = {
"female": "he-IL-Wavenet-A", # Female, high quality
"male": "he-IL-Wavenet-B", # Male, high quality
"female_standard": "he-IL-Standard-A", # Female, lower cost
"male_standard": "he-IL-Standard-B", # Male, lower cost
}
voice = texttospeech.VoiceSelectionParams(
language_code="he-IL",
name=voice_name_map.get(voice_gender, "he-IL-Wavenet-A"),
)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3,
speaking_rate=1.0, # 0.5 to 2.0, adjust for clarity
pitch=0.0, # -20.0 to 20.0 semitones
)
response = client.synthesize_speech(
input=input_text, voice=voice, audio_config=audio_config
)
with open(output_path, "wb") as out:
out.write(response.audio_content)Cost-effective for high-volume TTS needs.
import boto3
def synthesize_hebrew_polly(text: str, output_path: str) -> None:
"""Convert Hebrew text to speech using Amazon Polly."""
polly = boto3.client("polly", region_name="eu-west-1")
response = polly.synthesize_speech(
Text=text,
OutputFormat="mp3",
VoiceId="Abigail", # Hebrew female neural voice
Engine="neural", # Neural engine for higher quality
LanguageCode="he-IL",
)
with open(output_path, "wb") as out:
out.write(response["AudioStream"].read())Highest quality Hebrew voices with SSML support for fine-grained control.
import azure.cognitiveservices.speech as speechsdk
def synthesize_hebrew_azure(text: str, output_path: str) -> None:
"""Convert Hebrew text to speech using Azure Neural TTS."""
speech_config = speechsdk.SpeechConfig(
subscription="YOUR_AZURE_KEY",
region="westeurope",
)
# Hebrew neural voices
speech_config.speech_synthesis_voice_name = "he-IL-HilaNeural" # Female
# Alternative: "he-IL-AvriNeural" for male voice
audio_config = speechsdk.AudioConfig(filename=output_path)
synthesizer = speechsdk.SpeechSynthesizer(
speech_config=speech_config,
audio_config=audio_config,
)
result = synthesizer.speak_text(text)
if result.reason != speechsdk.ResultReason.SynthesizingAudioCompleted:
raise RuntimeError(f"Speech synthesis failed: {result.reason}")
def synthesize_hebrew_ssml(ssml: str, output_path: str) -> None:
"""
Synthesize Hebrew speech with SSML for fine control.
Example SSML for IVR prompt:
<speak version="1.0" xml:lang="he-IL">
<voice name="he-IL-HilaNeural">
<prosody rate="0.9">
ברוכים הבאים לשירות הלקוחות.
</prosody>
<break time="500ms"/>
לתמיכה טכנית, הקישו 1.
<break time="300ms"/>
למכירות, הקישו 2.
</voice>
</speak>
"""
speech_config = speechsdk.SpeechConfig(
subscription="YOUR_AZURE_KEY",
region="westeurope",
)
audio_config = speechsdk.AudioConfig(filename=output_path)
synthesizer = speechsdk.SpeechSynthesizer(
speech_config=speech_config,
audio_config=audio_config,
)
result = synthesizer.speak_ssml(ssml)
if result.reason != speechsdk.ResultReason.SynthesizingAudioCompleted:
raise RuntimeError(f"SSML synthesis failed: {result.reason}")Israeli IVR systems have specific conventions that differ from US/European patterns.
Israeli business week is Sunday through Thursday. IVR systems must account for this:
from datetime import datetime
import pytz
ISRAEL_TZ = pytz.timezone("Asia/Jerusalem")
def get_business_status() -> dict:
"""Determine current business status for IVR routing."""
now = datetime.now(ISRAEL_TZ)
day = now.weekday() # 0=Monday, 6=Sunday
hour = now.hour
# Israeli business days: Sunday (6) through Thursday (3)
# Friday (4): half day until ~13:00
# Saturday (5): closed (Shabbat)
if day == 5: # Saturday (Shabbat)
return {
"status": "closed",
"reason": "shabbat",
"message_he": "שלום, אנחנו סגורים בשבת. נחזור אליכם ביום ראשון.",
"next_open": "Sunday 9:00",
}
elif day == 4: # Friday
if hour < 9:
return {"status": "before_hours", "message_he": "שעות הפעילות ביום שישי: 9:00 עד 13:00."}
elif hour < 13:
return {"status": "open", "message_he": "שלום, איך אפשר לעזור?"}
else:
return {
"status": "closed",
"reason": "friday_afternoon",
"message_he": "סגורים בשישי אחה\"צ. נחזור ביום ראשון.",
"next_open": "Sunday 9:00",
}
elif day == 6 or day <= 3: # Sunday through Thursday
if 9 <= hour < 17:
return {"status": "open", "message_he": "שלום, איך אפשר לעזור?"}
else:
return {
"status": "after_hours",
"message_he": "שעות הפעילות שלנו: א'-ה' 9:00-17:00, ו' 9:00-13:00.",
}
else: # Should not happen but handle gracefully
return {"status": "closed", "message_he": "כרגע אנחנו סגורים."}IVR_MENU = {
"welcome": {
"prompt_he": "שלום, הגעתם ל{company_name}.",
"prompt_en": "Hello, you've reached {company_name}. For English, press 9.",
},
"main_menu": {
"prompt_he": (
"לשירות לקוחות, הקישו 1. "
"למכירות, הקישו 2. "
"לתמיכה טכנית, הקישו 3. "
"למצב הזמנה, הקישו 4. "
"לשמוע שוב, הקישו כוכבית."
),
"options": {
"1": "customer_service",
"2": "sales",
"3": "tech_support",
"4": "order_status",
"9": "english_menu",
"*": "main_menu", # Repeat
},
"timeout_seconds": 8,
"no_input_prompt_he": "לא קיבלנו בחירה. בבקשה הקישו מספר מ-1 עד 4.",
"invalid_prompt_he": "בחירה לא תקינה. נסו שוב.",
"max_retries": 3,
},
"customer_service": {
"prompt_he": (
"לבירור חשבון, הקישו 1. "
"לתלונה, הקישו 2. "
"לנציג, הקישו 0. "
"לחזרה לתפריט הראשי, הקישו כוכבית."
),
"options": {
"1": "account_inquiry",
"2": "complaint",
"0": "agent_queue",
"*": "main_menu",
},
},
"agent_queue": {
"prompt_he": "ממתינים לנציג הפנוי הבא. זמן המתנה משוער: {wait_time} דקות.",
"hold_music": "hold_music_hebrew.mp3",
"periodic_message_he": "תודה שאתם ממתינים. שיחתכם חשובה לנו.",
"periodic_interval_seconds": 60,
},
}| Rule | Example | Why |
|---|---|---|
| Use formal register (second person plural) | "הקישו 1" not "תקיש 1" | Professional tone, avoids gender |
| Keep prompts under 15 seconds | 3-4 options max per menu level | Callers lose patience quickly |
| Announce hours before after-hours message | "שעות הפעילות: א'-ה' 9-17" | Reduces callback attempts |
| Offer English option | "For English, press 9" | 20% of Israeli calls may prefer English |
| Use "כוכבית" for star key | "לחזרה, הקישו כוכבית" | Standard Hebrew term for * |
| Use "סולמית" for hash/pound key | "לאישור, הקישו סולמית" | Standard Hebrew term for # |
| Repeat the menu on timeout | After 8 seconds of no input | Callers may need time to listen |
| Provide voicemail option after hours | "להשאיר הודעה, הקישו 1" | Captures leads outside business hours |
import os
import json
from datetime import datetime
def process_voicemail(audio_path: str, caller_number: str) -> dict:
"""
Process a voicemail recording: transcribe, classify, and route.
Args:
audio_path: Path to the voicemail audio file
caller_number: Caller's phone number (+972...)
Returns:
Processed voicemail with transcript and routing info
"""
# Step 1: Transcribe using Whisper (best Hebrew accuracy)
transcript = transcribe_hebrew(audio_path)
# Step 2: Detect language (Hebrew, English, or mixed)
language = detect_voicemail_language(transcript)
# Step 3: Classify intent
intent = classify_voicemail_intent(transcript)
# Step 4: Extract key entities
entities = extract_voicemail_entities(transcript)
result = {
"caller": caller_number,
"timestamp": datetime.now().isoformat(),
"transcript": transcript,
"language": language,
"intent": intent,
"entities": entities,
"audio_path": audio_path,
"duration_seconds": get_audio_duration(audio_path),
}
# Step 5: Route based on intent
result["routing"] = route_voicemail(intent, entities)
return result
def detect_voicemail_language(text: str) -> str:
"""Detect whether voicemail is Hebrew, English, or mixed."""
hebrew_chars = sum(1 for c in text if "\u0590" <= c <= "\u05FF")
latin_chars = sum(1 for c in text if c.isascii() and c.isalpha())
total = hebrew_chars + latin_chars
if total == 0:
return "unknown"
hebrew_ratio = hebrew_chars / total
if hebrew_ratio > 0.7:
return "hebrew"
elif hebrew_ratio < 0.3:
return "english"
else:
return "mixed"
VOICEMAIL_INTENTS = {
"callback_request": ["תתקשרו", "תחזרו", "חזרו אליי", "תתקשר"],
"order_inquiry": ["הזמנה", "משלוח", "חבילה", "מעקב"],
"complaint": ["תלונה", "בעיה", "לא מרוצה", "לא עובד"],
"appointment": ["תור", "פגישה", "לקבוע", "לתאם"],
"general": [],
}
def classify_voicemail_intent(transcript: str) -> str:
"""Classify voicemail intent based on Hebrew keywords."""
for intent, keywords in VOICEMAIL_INTENTS.items():
if any(keyword in transcript for keyword in keywords):
return intent
return "general"Israeli tech professionals frequently switch between Hebrew and English mid-sentence (code-switching). Voice bots must handle this gracefully.
def handle_mixed_speech(audio_path: str) -> dict:
"""
Handle mixed Hebrew-English speech common in Israeli tech.
Strategy: Use Whisper without language hint for auto-detection,
then post-process to normalize mixed output.
"""
client = openai.OpenAI()
with open(audio_path, "rb") as f:
# Omit language parameter to let Whisper handle code-switching
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=f,
response_format="verbose_json",
)
segments = []
for segment in transcript.segments:
text = segment["text"]
lang = detect_segment_language(text)
segments.append({
"text": text,
"language": lang,
"start": segment["start"],
"end": segment["end"],
})
return {
"full_transcript": transcript.text,
"segments": segments,
"detected_languages": list(set(s["language"] for s in segments)),
}
# Common Hebrew-English tech phrases that Whisper may mishandle
HEBREW_ENGLISH_CORRECTIONS = {
"דיפלוי": "deploy", # Hebrew-accented English
"פושׁ": "push",
"קומיט": "commit",
"סרבר": "server",
"באג": "bug",
"פיצ'ר": "feature",
"אפליקציה": "application",
"דאטהבייס": "database",
}from twilio.rest import Client
from twilio.twiml.voice_response import VoiceResponse, Gather
TWILIO_ACCOUNT_SID = "YOUR_SID"
TWILIO_AUTH_TOKEN = "YOUR_TOKEN"
client = Client(TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN)
def purchase_israeli_number():
"""Purchase an Israeli phone number from Twilio."""
numbers = client.available_phone_numbers("IL").local.list(limit=5)
if numbers:
purchased = client.incoming_phone_numbers.create(
phone_number=numbers[0].phone_number,
voice_url="https://your-server.com/voice/incoming",
voice_method="POST",
)
return purchased.phone_number
return None
# Flask webhook handler for incoming calls
from flask import Flask, request
app = Flask(__name__)
@app.route("/voice/incoming", methods=["POST"])
def handle_incoming_call():
"""Handle incoming call with Hebrew IVR menu."""
response = VoiceResponse()
# Welcome message in Hebrew
response.say(
"שלום, הגעתם לשירות הלקוחות.",
language="he-IL",
voice="Google.he-IL-Wavenet-A",
)
# Gather DTMF input with Hebrew prompt
gather = Gather(
num_digits=1,
action="/voice/menu-selection",
timeout=8,
language="he-IL",
)
gather.say(
"לשירות לקוחות, הקישו 1. למכירות, הקישו 2. לתמיכה טכנית, הקישו 3.",
language="he-IL",
voice="Google.he-IL-Wavenet-A",
)
response.append(gather)
# If no input, repeat
response.redirect("/voice/incoming")
return str(response)
@app.route("/voice/menu-selection", methods=["POST"])
def handle_menu_selection():
"""Route based on DTMF selection."""
digit = request.form.get("Digits", "")
response = VoiceResponse()
routes = {
"1": "/voice/customer-service",
"2": "/voice/sales",
"3": "/voice/tech-support",
}
if digit in routes:
response.redirect(routes[digit])
else:
response.say(
"בחירה לא תקינה. בבקשה נסו שוב.",
language="he-IL",
voice="Google.he-IL-Wavenet-A",
)
response.redirect("/voice/incoming")
return str(response)
@app.route("/voice/voicemail", methods=["POST"])
def handle_voicemail():
"""Record a voicemail with Hebrew instructions."""
response = VoiceResponse()
response.say(
"אנחנו כרגע לא זמינים. בבקשה השאירו הודעה אחרי הצפצוף ונחזור אליכם בהקדם.",
language="he-IL",
voice="Google.he-IL-Wavenet-A",
)
response.record(
max_length=120, # 2 minutes max
action="/voice/voicemail-complete",
transcribe=False, # We handle transcription ourselves for better Hebrew
play_beep=True,
)
return str(response)Hebrew speakers in Israel have diverse accent backgrounds that affect speech recognition accuracy.
| Accent Type | Characteristics | STT Impact |
|---|---|---|
| Standard Israeli | Modern Israeli pronunciation, merged alef/ayin, no distinction between chet/chaf | Baseline accuracy, all models handle well |
| Russian-accented | Hard "r" (guttural to alveolar), softer sibilants, vowel shifts | May reduce accuracy by 5-10%. Add Russian as alternate language hint |
| Arabic-accented | Preserved pharyngeal sounds (ayin, chet), emphatic consonants | Generally handled well by models trained on Israeli data |
| Ethiopian-accented | Distinct vowel patterns, different stress patterns | May need custom model training for high accuracy |
| English-accented | American/British vowel sounds applied to Hebrew, different "r" | Mixed results. Whisper handles best due to multilingual training |
Improving accuracy for non-standard accents:
Run the demo script to test Hebrew STT with sample audio:
python scripts/hebrew-stt-demo.py --helpUser says: "I need an IVR system for a restaurant in Tel Aviv. Callers should be able to make reservations, check hours, and hear the menu."
Actions:
Result: Complete IVR system with Hebrew prompts, business-hours-aware routing, and voicemail transcription.
User says: "Build a conversational voice bot for our e-commerce site. It should handle order status, returns, and escalate to a human agent."
Actions:
Result: Conversational voice bot that understands Hebrew speech, provides order information, and seamlessly escalates to human agents.
User says: "I want to transcribe voicemails left on our business line and send them as text messages to the relevant department."
Actions:
Result: Automated voicemail-to-text pipeline that transcribes Hebrew voicemails and routes them by intent.
User says: "Our callers frequently mix Hebrew and English, especially tech terms. How do I handle this?"
Actions:
Result: Voice bot that correctly transcribes mixed Hebrew-English speech common in Israeli tech environments.
scripts/hebrew-stt-demo.py -- Demo script for Hebrew speech-to-text using OpenAI Whisper. Generates a sample Hebrew audio file using TTS and transcribes it back to text. Tests basic Hebrew STT accuracy. Run: python scripts/hebrew-stt-demo.py --helpreferences/hebrew-stt-models.md -- Comparison table of Hebrew speech-to-text models (Whisper, Google Cloud STT, Azure Speech) with accuracy benchmarks, latency, pricing, and recommendations by use case. Consult when choosing an STT provider.references/ivr-design-patterns.md -- Common IVR flow patterns for Israeli businesses including restaurant, clinic, customer service, and government office templates. Consult when designing IVR menu structures.Cause: STT model misidentifies Hebrew as Arabic due to shared character ranges or similar phonemes.
Solution: Explicitly set the language to "he-IL" (Google/Azure) or language="he" (Whisper). For Whisper, adding a Hebrew prompt hint also helps: prompt="שלום, ברוכים הבאים".
Cause: Using Standard-tier voices instead of Neural/Wavenet voices. Solution: Switch to neural voices: Google Wavenet (he-IL-Wavenet-A/B), Azure Neural (he-IL-HilaNeural), or Amazon Polly Neural (Abigail). Neural voices are more expensive but significantly more natural.
Cause: Timeout too short, especially for elderly callers or long Hebrew prompts. Solution: Increase gather timeout to 8-10 seconds. Add a "repeat" option ("לשמוע שוב, הקישו כוכבית"). Consider that Hebrew prompts may take longer than English due to longer word counts for the same content.
Cause: Israeli number availability varies. Twilio has limited +972 inventory compared to US numbers. Solution: Search for both local and toll-free numbers. Consider Vonage as an alternative with better Israeli number availability. For high-volume needs, contact Twilio sales for dedicated number blocks. You can also port existing Israeli numbers to Twilio.
Supported Agents
Set up a Hebrew speech-to-text (STT) pipeline. Compare available models (Whisper, Google Cloud Speech, Azure), optimize for Israeli terminology and phone numbers, and add handling for diverse accents.
Design a Hebrew IVR menu for a [business type] with [number] departments. Include a greeting, main menu, and sub-menus. Ensure natural-sounding Hebrew (not robotic), professional business language, and an option to return to the main menu.
Build an automated Hebrew voicemail transcription pipeline. Include speech recognition, punctuation, name and phone number detection, and automatic message summarization. Handle mixed Hebrew-English messages.
Set up Twilio integration with Hebrew speech recognition and text-to-speech. Include purchasing an Israeli phone number, configuring a webhook for incoming calls, connecting to a Hebrew STT model, and setting up TTS with a natural Hebrew voice.
Trust Score
This skill can execute scripts and commands on your system.
1 occurrences found in code
Validate and format Israeli identification numbers including Teudat Zehut (personal ID), company numbers, amuta (non-profit) numbers, and partnership numbers. Use when user asks to validate Israeli ID, "teudat zehut", "mispar zehut", company number validation, or needs to implement Israeli ID validation in code. Includes check digit algorithm and test ID generation. Do NOT use for non-Israeli identification systems.
Build conversational AI chatbots with native Hebrew support, including WhatsApp Business API integration, Telegram bot scaffolding, web chat widgets, Hebrew NLP patterns, and RTL chat UI components.
Compare cloud hosting costs for Israeli startups and developers across AWS (il-central-1), Azure, GCP (me-west1), and Israeli providers like Kamatera and HostIL. Use when the user needs to evaluate cloud pricing with Israel-specific considerations including data residency requirements, latency from Tel Aviv, NIS billing options, startup credit programs, and FinOps cost optimization strategies. Do NOT use for comparing on-premise hosting, colocation services, or non-cloud SaaS pricing.
Want to build your own skill? Try the Skill Creator · Submit a Skill