Skip to content

# Bug: .m files are ignored by the indexer (Unreachable code in discover.c) #296

@mmck27

Description

@mmck27

Bug: .m files are ignored by the indexer (Unreachable code in discover.c)

Description

When running index_repository on a directory containing .m files (MATLAB, Objective-C, or Magma), the crawler reports files=0 and skips them entirely. This happens because the .m extension is missing from the EXT_TABLE, which causes the initial language check to fail before the content-based .m disambiguation logic can ever run.

Steps to Reproduce

  1. Create a directory containing a single valid .m file (e.g., test.m).
  2. Run the indexer via CLI:
    codebase-memory-mcp cli index_repository "{\"repo_path\": \"/path/to/dir\"}"
  3. Observe the output log: level=info msg=pipeline.discover files=0

Root Cause Analysis

In src/discover/discover.c, the detect_file_language function is designed to handle .m files with special content-based disambiguation (cbm_disambiguate_m).

However, before reaching that logic, it performs a strict extension lookup:

// src/discover/discover.c
CBMLanguage lang = cbm_language_for_filename(entry_name);
if (lang == CBM_LANG_COUNT) {
    return CBM_LANG_COUNT; // <-- Aborts here
}
/* Special: .m files need content-based disambiguation */
    const char *dot = strrchr(entry_name, '.');
    if (dot && strcmp(dot, ".m") == 0) {
        lang = cbm_disambiguate_m(abs_path);
    }

The function cbm_language_for_filename relies on the EXT_TABLE defined in src/discover/language.c. Currently, EXT_TABLE includes .matlab and .mlx, but omits .m.

Because .m is missing, the lookup returns CBM_LANG_COUNT (Unknown), causing detect_file_language to instantly abort. The custom cbm_disambiguate_m fallback is rendered completely unreachable.

Proposed Solution

Add the .m extension to the EXT_TABLE in src/discover/language.c (around line 176). It can be safely mapped to CBM_LANG_MATLAB (or Objective-C), as the subsequent cbm_disambiguate_m call will correctly overwrite it by reading the file's contents anyway.

--- src/discover/language.c
+++ src/discover/language.c
    
    /* Markdown */
    {".md", CBM_LANG_MARKDOWN},
    {".mdx", CBM_LANG_MARKDOWN},
 
     /* MATLAB */
+    {".m", CBM_LANG_MATLAB},
     {".matlab", CBM_LANG_MATLAB},
     {".mlx", CBM_LANG_MATLAB},

Proof of Fix / Expected Behavior

After patching language.c and recompiling, the indexer successfully discovers and parses .m files.

After Patch Terminal Output:

>codebase-memory-mcp cli index_repository "{\"repo_path\": \"/path/to/dir\"}"
level=info msg=mem.init budget_mb=7803 total_ram_mb=15606
level=info msg=pipeline.discover files=1 elapsed_ms=64

Environment

  • OS: Windows 11 (MSYS2/UCRT64 build environment)
  • Version/Commit: main (Latest)
  • (Disclosure - Gemini Helped find/fix this bug)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingparsing/qualityGraph extraction bugs, false positives, missing edges

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions