What is the difference between ivrit.ai and Dicta?

ivrit.ai focuses on Hebrew speech: large audio corpora, Whisper-based ASR models, diarization. Dicta focuses on Hebrew NLP for text: LLMs (DictaLM 3.0 family), BERTs (DictaBERT family), benchmarks. Both are leading Israeli organizations, complementary, not competitors.

Can I use ivrit.ai data for a commercial product?

Yes. ivrit.ai explicitly licenses its resources to allow commercial use. That is their stated mission: to enable commercial support for Hebrew AI. Always confirm on the specific dataset card and plan attribution.

Why is HeQ on HuggingFace under pig4431, not NNLP-IL?

pig4431/HeQ_v1 is a community-maintained HuggingFace mirror. The canonical source is NNLP-IL/Hebrew-Question-Answering-Dataset on GitHub. Use the HuggingFace ID for loading but cite Cohen et al. EMNLP 2023 in publications.

Do DictaLM 24B and DictaLM 12B share the same license?

No. DictaLM-3.0-24B derives from Mistral-Small-3.1 (Mistral license), and DictaLM-3.0-Nemotron-12B derives from NVIDIA Nemotron Nano V2 (NVIDIA license). Plus Dicta has its own license on the derived work. Read both before commercial use.

Can I train a Hebrew model on Yiddish data?

Not without explicit cross-lingual transfer planning. Yiddish and Hebrew share an alphabet but are different languages with different vocabulary, grammar, and morphology. ivrit.ai maintains separate yi-whisper models for exactly this reason.

Hebrew ML Datasets Navigator

Verified94/100

Before deciding whether to install, talk to the skill

Navigate the fragmented landscape of Hebrew and Yiddish ML datasets and models. Covers ivrit.ai (20K+ hours of Hebrew audio, whisper-large-v3 ASR variants, Yiddish models), Dicta (DictaLM 3.0 LLM family, DictaBERT variants, HeQ reading comprehension), the Israeli National NLP Program / NNLP-IL (HebrewSentiment, HebNLI), AlephBERT, and Knesset Plenums. Helps researchers and ML engineers pick the right dataset for a task by use case, license (commercial vs research), Hebrew register coverage, and model-dataset pairing. Use when choosing training data for a Hebrew NLP or ASR project, verifying license compatibility for a commercial product, finding a baseline model for a Hebrew downstream task, or exploring Yiddish ML resources. Do NOT use for Arabic NLP, general HuggingFace dataset discovery, or Hebrew OCR dataset selection (use hebrew-ocr-forms).

The Problem

The Israeli ML community punches above its weight, but the datasets and models are scattered. ivrit.ai publishes world-class Hebrew speech corpora on one HuggingFace org, Dicta publishes Hebrew LLMs and BERT variants on another, the Israeli National NLP Program maintains benchmarks under HebArabNlpProject. Licenses vary from fully commercial-friendly to research-only. A researcher trying to pick the right combination for fine-tuning a Hebrew sentiment classifier on customer support chat for a commercial product has to hunt across five orgs and read every dataset card.

skills-il Developer Tools|28installs1,572views

0Write a Review

1.0.5MITGitHub

28installs1,572views

0Write a Review

Updated: July 12, 2026|Tags:datasets ml hebrew yiddish huggingface ivrit-ai dicta nnlp-il licensing israel

How to use this skill

Not sure how? Read the guide

1. Click "Download ZIP" to download the skill files.
2. Open Claude Desktop and go to Customize > Skills.
3. Click "+" and select "Upload a skill", then upload the ZIP file.
4. Start a new conversation. The skill will activate automatically when relevant.

A new version released? How to update your installed skill

Developers? Install via command line (CLI)

npx skills-il add skills-il/developer-tools@v1.0.5-hebrew-ml-datasets-navigator --skill hebrew-ml-datasets-navigator -a claude-code

When to Apply

When choosing training data for a Hebrew NLP or ASR project
When verifying license compatibility for commercial use of a dataset
When looking for a baseline model for a specific Hebrew task
When building a Hebrew transcription stack and need to know what ivrit.ai offers
When researching or building something in Yiddish and need to find resources

Try These Prompts

Commercial sentiment

I want to train a sentiment classifier on Hebrew customer support chat for a commercial SaaS product. Which dataset should I use, which starting model, and what does the license say about attribution?

Hebrew podcast transcription

I am building a Hebrew podcast transcription product. What does ivrit.ai offer, which ASR model should I use in production with low latency, and how do I handle multiple speakers?

Small Hebrew LLM

I need a Hebrew LLM that runs on consumer hardware (16GB VRAM max) for a Hebrew product. What does Dicta offer, what are the size differences, and what are the upstream licenses?

Yiddish ML

I am researching Yiddish and looking for datasets and models for speech recognition and text processing. What is available in 2026 and what are the licenses?

Frequently Asked Questions

Changelog

v1.0.5

Update: OSCAR-2301 access is temporarily suspended (users routed to CulturaX/FineWeb-2), and the ivrit.ai corpus size was corrected from 22K to 20K hours per the source dataset page.

Jul 6, 2026

v1.0.4

Added Jamba 1.6 and Jamba-Reasoning-3B to the model catalog, acknowledged the DictaLM 3.0 benchmark suite (Translation, Summarization, Winograd, Israeli Trivia, Diacritization), and redirected HebrewSentiment license claims to the live dataset card.

May 19, 2026

v1.0.3

Added HEBREW-MMLU, CulturaX, FineWeb-2, ParaShoot, HeSum, academic resources. Stripped 27 em dashes.

Apr 25, 2026

Related Skills

Israeli Cloud Cost Comparator

Verified·92

Author: skills-il

v1.3.0Popular

Compare cloud hosting costs for Israeli startups and developers across AWS (il-central-1 Tel Aviv), Azure (Israel Central), GCP (me-west1 Tel Aviv), Oracle Cloud (il-jerusalem-1 Jerusalem), and Israeli providers like Kamatera. Use when the user needs to evaluate cloud pricing with Israel-specific considerations including data residency under Privacy Protection Law Amendment 13, latency from Tel Aviv, NIS billing options, startup credit programs (AWS Activate, Google for Startups, Microsoft Founders Hub, Israel Innovation Authority Telem program with subsidized Nvidia B200 GPUs), and FinOps cost optimization strategies. Do NOT use for comparing on-premise hosting, colocation services, or non-cloud SaaS pricing.

Ask the Skill

4.0371,623

Claude CodeCursorGitHub Copilot+4

Israeli Agritech Advisor

Trusted·79

Author: skills-il

v1.2.0Popular

Guide developers in integrating Israeli agritech tools and precision agriculture platforms including CropX (soil monitoring), Netafim GrowSphere (IoT irrigation), Taranis (crop intelligence), and the broader Israeli agritech ecosystem (approximately 600-750 companies per Start-Up Nation Central agrifoodtech). Use when user asks about agritech APIs, precision agriculture, smart irrigation, "hashkaya cham", crop monitoring, pest detection, Israeli agriculture tech, or needs to build farm management software. Covers irrigation optimization, pest detection, climate data integration, and Israeli agricultural context. Do NOT use for general gardening advice or non-agricultural IoT projects.

Ask the Skill

0.0131,525

Claude CodeCursorGitHub Copilot+5

IDF Date Converter

Verified·94

Author: skills-il

v2.0.0Popular

Convert between Hebrew (Jewish) calendar and Gregorian dates, look up Israeli holidays, format dual dates for Israeli documents, and calculate Israeli business days. Use when user asks about Hebrew dates, "luach ivri", Jewish calendar, Israeli holidays, "chagim", Shabbat times, or needs dual-date formatting for Israeli forms. Do NOT use for Islamic Hijri calendar or non-Israeli holiday calendars.

Ask the Skill

0.0891,870

Claude CodeCursorGitHub Copilot+6

Found an issue with this skill?

Use at your own risk. Terms of Use · Security

Want to build your own skill? Try the Skill Creator · Submit a Skill

Reviews (0)

No reviews yet. Be the first to write one!

Hebrew ML Datasets Navigator

How to use this skill

When to Apply

Try These Prompts

Developer & AI Agent Instructions

Security Analysis

Quality Score

Performance Data

Frequently Asked Questions

What is the difference between ivrit.ai and Dicta?

What is the difference between ivrit.ai and Dicta?

Can I use ivrit.ai data for a commercial product?

Can I use ivrit.ai data for a commercial product?

Why is HeQ on HuggingFace under pig4431, not NNLP-IL?

Why is HeQ on HuggingFace under pig4431, not NNLP-IL?

Do DictaLM 24B and DictaLM 12B share the same license?

Do DictaLM 24B and DictaLM 12B share the same license?

Can I train a Hebrew model on Yiddish data?

Can I train a Hebrew model on Yiddish data?

Changelog

Related Skills

Israeli Cloud Cost Comparator

Israeli Agritech Advisor

IDF Date Converter

Reviews (0)