Bug: .m files are ignored by the indexer (Unreachable code in discover.c)
Description
When running index_repository on a directory containing .m files (MATLAB, Objective-C, or Magma), the crawler reports files=0 and skips them entirely. This happens because the .m extension is missing from the EXT_TABLE, which causes the initial language check to fail before the content-based .m disambiguation logic can ever run.
Steps to Reproduce
- Create a directory containing a single valid
.m file (e.g., test.m).
- Run the indexer via CLI:
codebase-memory-mcp cli index_repository "{\"repo_path\": \"/path/to/dir\"}"
- Observe the output log:
level=info msg=pipeline.discover files=0
Root Cause Analysis
In src/discover/discover.c, the detect_file_language function is designed to handle .m files with special content-based disambiguation (cbm_disambiguate_m).
However, before reaching that logic, it performs a strict extension lookup:
// src/discover/discover.c
CBMLanguage lang = cbm_language_for_filename(entry_name);
if (lang == CBM_LANG_COUNT) {
return CBM_LANG_COUNT; // <-- Aborts here
}
/* Special: .m files need content-based disambiguation */
const char *dot = strrchr(entry_name, '.');
if (dot && strcmp(dot, ".m") == 0) {
lang = cbm_disambiguate_m(abs_path);
}
The function cbm_language_for_filename relies on the EXT_TABLE defined in src/discover/language.c. Currently, EXT_TABLE includes .matlab and .mlx, but omits .m.
Because .m is missing, the lookup returns CBM_LANG_COUNT (Unknown), causing detect_file_language to instantly abort. The custom cbm_disambiguate_m fallback is rendered completely unreachable.
Proposed Solution
Add the .m extension to the EXT_TABLE in src/discover/language.c (around line 176). It can be safely mapped to CBM_LANG_MATLAB (or Objective-C), as the subsequent cbm_disambiguate_m call will correctly overwrite it by reading the file's contents anyway.
--- src/discover/language.c
+++ src/discover/language.c
/* Markdown */
{".md", CBM_LANG_MARKDOWN},
{".mdx", CBM_LANG_MARKDOWN},
/* MATLAB */
+ {".m", CBM_LANG_MATLAB},
{".matlab", CBM_LANG_MATLAB},
{".mlx", CBM_LANG_MATLAB},
Proof of Fix / Expected Behavior
After patching language.c and recompiling, the indexer successfully discovers and parses .m files.
After Patch Terminal Output:
>codebase-memory-mcp cli index_repository "{\"repo_path\": \"/path/to/dir\"}"
level=info msg=mem.init budget_mb=7803 total_ram_mb=15606
level=info msg=pipeline.discover files=1 elapsed_ms=64
Environment
- OS: Windows 11 (MSYS2/UCRT64 build environment)
- Version/Commit:
main (Latest)
- (Disclosure - Gemini Helped find/fix this bug)
Bug:
.mfiles are ignored by the indexer (Unreachable code indiscover.c)Description
When running
index_repositoryon a directory containing.mfiles (MATLAB, Objective-C, or Magma), the crawler reportsfiles=0and skips them entirely. This happens because the.mextension is missing from theEXT_TABLE, which causes the initial language check to fail before the content-based.mdisambiguation logic can ever run.Steps to Reproduce
.mfile (e.g.,test.m).codebase-memory-mcp cli index_repository "{\"repo_path\": \"/path/to/dir\"}"level=info msg=pipeline.discover files=0Root Cause Analysis
In
src/discover/discover.c, thedetect_file_languagefunction is designed to handle.mfiles with special content-based disambiguation (cbm_disambiguate_m).However, before reaching that logic, it performs a strict extension lookup:
The function
cbm_language_for_filenamerelies on theEXT_TABLEdefined insrc/discover/language.c. Currently,EXT_TABLEincludes.matlaband.mlx, but omits.m.Because
.mis missing, the lookup returnsCBM_LANG_COUNT(Unknown), causingdetect_file_languageto instantly abort. The customcbm_disambiguate_mfallback is rendered completely unreachable.Proposed Solution
Add the
.mextension to theEXT_TABLEinsrc/discover/language.c(around line 176). It can be safely mapped toCBM_LANG_MATLAB(or Objective-C), as the subsequentcbm_disambiguate_mcall will correctly overwrite it by reading the file's contents anyway.Proof of Fix / Expected Behavior
After patching
language.cand recompiling, the indexer successfully discovers and parses.mfiles.After Patch Terminal Output:
Environment
main(Latest)