Skip to content

Commit 0f6ffbf

Browse files
feat(locales): add comprehensive ISO 639-2/3 language code validation (#1614)
* feat(locales): add comprehensive ISO 639-2/3 language code validation Replaces hardcoded ISO 639-1 (2-letter) language code list with the comprehensive iso-639-3 package, which validates against all ISO 639 standards including: - ISO 639-1 codes (2-letter, ~184 languages) - ISO 639-2 codes (3-letter bibliographic and terminologic) - ISO 639-3 codes (3-letter, ~8,000 languages) This fixes validation issues with 3-letter language codes that don't have 2-letter equivalents, such as: - fil (Filipino) - bar (Bavarian) - nap (Neapolitan) - zgh (Standard Moroccan Tamazight) Changes: - Add iso-639-3 package dependency to @lingo.dev/_locales - Update validation.ts to use iso-639-3 for comprehensive validation - Remove hardcoded list of 184 language codes (now 8,000+ supported) - Add 5 new tests covering 3-letter language code validation - All 88 tests passing Fixes validation error: "Unsupported locale: fil" * fix(locales): bundle iso-639-3 to resolve ESM/CJS compatibility The iso-639-3 package is ESM-only, which caused build failures when the spec package tried to import the locales CJS bundle. Solution: Configure tsup to bundle iso-639-3 into the output using noExternal option. This allows the CJS build to work properly. Fixes CI build error: ERR_REQUIRE_ESM * fix(compiler): remove duplicate providers from test expectations The test had duplicate entries for 'openai' and 'anthropic' which caused the test to fail. Removed the duplicates to match the actual provider list. This was an existing bug unrelated to the locale validation changes. * Update packages/locales/src/validation.ts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent 01c6455 commit 0f6ffbf

7 files changed

Lines changed: 88 additions & 189 deletions

File tree

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
---
2+
"@lingo.dev/_locales": minor
3+
---
4+
5+
feat: use ISO 639-3 package for comprehensive language code validation
6+
7+
Replaces hardcoded list of ISO 639-1 (2-letter) language codes with the comprehensive iso-639-3 package, which includes:
8+
- All ISO 639-1 codes (2-letter, ~184 languages)
9+
- All ISO 639-2 codes (3-letter bibliographic and terminologic)
10+
- All ISO 639-3 codes (3-letter, ~8,000 languages)
11+
12+
This fixes validation issues with 3-letter language codes like:
13+
- `fil` (Filipino)
14+
- `bar` (Bavarian)
15+
- `nap` (Neapolitan)
16+
- `zgh` (Standard Moroccan Tamazight)
17+
18+
And many other languages that don't have 2-letter ISO 639-1 codes.

packages/compiler/src/lib/lcp/api/provider-details.spec.ts

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,6 @@ describe("provider-details", () => {
1111
"openrouter",
1212
"ollama",
1313
"mistral",
14-
"openai",
15-
"anthropic",
1614
"lingo.dev",
1715
]);
1816
});

packages/locales/package.json

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,5 +30,8 @@
3030
"tsup": "^8.3.5",
3131
"typescript": "^5.8.3",
3232
"vitest": "^3.2.4"
33+
},
34+
"dependencies": {
35+
"iso-639-3": "^3.0.1"
3336
}
3437
}

packages/locales/src/validation.spec.ts

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,23 @@ describe("isValidLocale", () => {
3131
expect(isValidLocale("zh")).toBe(true);
3232
});
3333

34+
it("should validate 3-letter language codes in locales", () => {
35+
// Test ISO 639-2/3 codes that don't have 2-letter equivalents
36+
expect(isValidLocale("fil")).toBe(true); // Filipino
37+
expect(isValidLocale("fil-PH")).toBe(true); // Filipino (Philippines)
38+
expect(isValidLocale("bar")).toBe(true); // Bavarian
39+
expect(isValidLocale("bar-DE")).toBe(true); // Bavarian (Germany)
40+
expect(isValidLocale("nap")).toBe(true); // Neapolitan
41+
expect(isValidLocale("nap-IT")).toBe(true); // Neapolitan (Italy)
42+
expect(isValidLocale("zgh")).toBe(true); // Standard Moroccan Tamazight
43+
expect(isValidLocale("zgh-MA")).toBe(true); // Tamazight (Morocco)
44+
});
45+
46+
it("should validate 3-letter language codes with script and region", () => {
47+
// Test complex locales with 3-letter language codes
48+
expect(isValidLocale("fil-Latn-PH")).toBe(true); // Filipino (Latin, Philippines)
49+
});
50+
3451
it("should validate locales with numeric region codes", () => {
3552
expect(isValidLocale("es-419")).toBe(true); // Latin America
3653
expect(isValidLocale("en-001")).toBe(true); // World
@@ -83,6 +100,22 @@ describe("isValidLanguageCode", () => {
83100
expect(isValidLanguageCode("yi")).toBe(true); // Yiddish
84101
});
85102

103+
it("should validate 3-letter ISO 639-2/3 language codes", () => {
104+
// Test the specific codes that were reported as failing
105+
expect(isValidLanguageCode("fil")).toBe(true); // Filipino
106+
expect(isValidLanguageCode("bar")).toBe(true); // Bavarian
107+
expect(isValidLanguageCode("nap")).toBe(true); // Neapolitan
108+
expect(isValidLanguageCode("zgh")).toBe(true); // Standard Moroccan Tamazight
109+
});
110+
111+
it("should validate other common 3-letter language codes", () => {
112+
expect(isValidLanguageCode("eng")).toBe(true); // English (ISO 639-2)
113+
expect(isValidLanguageCode("spa")).toBe(true); // Spanish (ISO 639-2)
114+
expect(isValidLanguageCode("fra")).toBe(true); // French (ISO 639-2)
115+
expect(isValidLanguageCode("deu")).toBe(true); // German (ISO 639-2)
116+
expect(isValidLanguageCode("jpn")).toBe(true); // Japanese (ISO 639-2)
117+
});
118+
86119
it("should handle case insensitive validation", () => {
87120
expect(isValidLanguageCode("EN")).toBe(true);
88121
expect(isValidLanguageCode("Es")).toBe(true);
@@ -95,6 +128,15 @@ describe("isValidLanguageCode", () => {
95128
expect(isValidLanguageCode("invalid")).toBe(false);
96129
});
97130

131+
it("should reject invalid 3-letter language codes", () => {
132+
// Ensure validation is not just accepting any 3-letter code
133+
// Note: "aaa" is valid (Ghotuo language), so using truly invalid codes
134+
expect(isValidLanguageCode("zzz")).toBe(false);
135+
expect(isValidLanguageCode("xxx")).toBe(false);
136+
expect(isValidLanguageCode("fake")).toBe(false);
137+
expect(isValidLanguageCode("test")).toBe(false);
138+
});
139+
98140
it("should handle edge cases", () => {
99141
expect(isValidLanguageCode("")).toBe(false);
100142
expect(isValidLanguageCode(" ")).toBe(false);

packages/locales/src/validation.ts

Lines changed: 14 additions & 187 deletions
Original file line numberDiff line numberDiff line change
@@ -1,196 +1,23 @@
11
import { LOCALE_REGEX } from "./constants";
2+
import { iso6393, type Language } from "iso-639-3";
23

34
/**
45
* Validation functions for locale codes and components
56
*/
67

7-
// ISO 639-1 language codes (most common)
8-
const VALID_LANGUAGE_CODES = new Set([
9-
"aa",
10-
"ab",
11-
"ae",
12-
"af",
13-
"ak",
14-
"am",
15-
"an",
16-
"ar",
17-
"as",
18-
"av",
19-
"ay",
20-
"az",
21-
"ba",
22-
"be",
23-
"bg",
24-
"bh",
25-
"bi",
26-
"bm",
27-
"bn",
28-
"bo",
29-
"br",
30-
"bs",
31-
"ca",
32-
"ce",
33-
"ch",
34-
"co",
35-
"cr",
36-
"cs",
37-
"cu",
38-
"cv",
39-
"cy",
40-
"da",
41-
"de",
42-
"dv",
43-
"dz",
44-
"ee",
45-
"el",
46-
"en",
47-
"eo",
48-
"es",
49-
"et",
50-
"eu",
51-
"fa",
52-
"ff",
53-
"fi",
54-
"fj",
55-
"fo",
56-
"fr",
57-
"fy",
58-
"ga",
59-
"gd",
60-
"gl",
61-
"gn",
62-
"gu",
63-
"gv",
64-
"ha",
65-
"he",
66-
"hi",
67-
"ho",
68-
"hr",
69-
"ht",
70-
"hu",
71-
"hy",
72-
"hz",
73-
"ia",
74-
"id",
75-
"ie",
76-
"ig",
77-
"ii",
78-
"ik",
79-
"io",
80-
"is",
81-
"it",
82-
"iu",
83-
"ja",
84-
"jv",
85-
"ka",
86-
"kg",
87-
"ki",
88-
"kj",
89-
"kk",
90-
"kl",
91-
"km",
92-
"kn",
93-
"ko",
94-
"kr",
95-
"ks",
96-
"ku",
97-
"kv",
98-
"kw",
99-
"ky",
100-
"la",
101-
"lb",
102-
"lg",
103-
"li",
104-
"ln",
105-
"lo",
106-
"lt",
107-
"lu",
108-
"lv",
109-
"mg",
110-
"mh",
111-
"mi",
112-
"mk",
113-
"ml",
114-
"mn",
115-
"mr",
116-
"ms",
117-
"mt",
118-
"my",
119-
"na",
120-
"nb",
121-
"nd",
122-
"ne",
123-
"ng",
124-
"nl",
125-
"nn",
126-
"no",
127-
"nr",
128-
"nv",
129-
"ny",
130-
"oc",
131-
"oj",
132-
"om",
133-
"or",
134-
"os",
135-
"pa",
136-
"pi",
137-
"pl",
138-
"ps",
139-
"pt",
140-
"qu",
141-
"rm",
142-
"rn",
143-
"ro",
144-
"ru",
145-
"rw",
146-
"sa",
147-
"sc",
148-
"sd",
149-
"se",
150-
"sg",
151-
"si",
152-
"sk",
153-
"sl",
154-
"sm",
155-
"sn",
156-
"so",
157-
"sq",
158-
"sr",
159-
"ss",
160-
"st",
161-
"su",
162-
"sv",
163-
"sw",
164-
"ta",
165-
"te",
166-
"tg",
167-
"th",
168-
"ti",
169-
"tk",
170-
"tl",
171-
"tn",
172-
"to",
173-
"tr",
174-
"ts",
175-
"tt",
176-
"tw",
177-
"ty",
178-
"ug",
179-
"uk",
180-
"ur",
181-
"uz",
182-
"ve",
183-
"vi",
184-
"vo",
185-
"wa",
186-
"wo",
187-
"xh",
188-
"yi",
189-
"yo",
190-
"za",
191-
"zh",
192-
"zu",
193-
]);
8+
// Create a set of all valid ISO 639-1, 639-2, and 639-3 language codes
9+
// This includes 2-letter codes (ISO 639-1) and 3-letter codes (ISO 639-2/3)
10+
const VALID_LANGUAGE_CODES = new Set(
11+
iso6393.flatMap((lang: Language) =>
12+
[
13+
lang.iso6391, // 2-letter code (ISO 639-1)
14+
lang.iso6392B, // 3-letter bibliographic code (ISO 639-2)
15+
lang.iso6392T, // 3-letter terminologic code (ISO 639-2)
16+
lang.iso6393, // 3-letter code (ISO 639-3)
17+
].filter(Boolean).map((code) => code.toLowerCase()),
18+
),
19+
);
20+
19421

19522
// ISO 15924 script codes (most common)
19623
const VALID_SCRIPT_CODES = new Set([

packages/locales/tsup.config.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ export default defineConfig({
99
dts: true,
1010
cjsInterop: true,
1111
splitting: false,
12+
// Bundle iso-639-3 since it's ESM-only and can't be required in CJS
13+
noExternal: ["iso-639-3"],
1214
outExtension: (ctx) => ({
1315
js: ctx.format === "cjs" ? ".cjs" : ".mjs",
1416
}),

pnpm-lock.yaml

Lines changed: 9 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)