This repository contains the code for ASPNet: Affective Semantic Prompting Network with MLLMs for Incomplete Multimodal Learning. It includes affective semantic prompt generation, prompt feature extraction, and two-stage model training.
- CMU-MOSI
- CMU-MOSEI
- IEMOCAP
The default dataset layout is:
dataset/
CMU-MOSI/
CMU-MOSEI/
IEMOCAP/
Dataset paths can be adjusted in config.py.
train.py: train and evaluate the model.prompts/generate_prompts.py: generate semantic prompts for modality conditions.prompts/extract_features.py: encode generated prompts into.npyfeature files.scripts/: example shell scripts for common training runs.
python -u train.py \
--dataset CMU-MOSI \
--audio-feature wav2vec-large-c-UTT \
--text-feature deberta-large-4-UTT \
--video-feature manet_UTT \
--prompt-feature auto \
--test_condition atv \
--batch-size 32 \
--epochs 300 \
--stage_epoch 150 \
--gpu 0The training pipeline uses:
- First stage: train audio, text, and visual experts on complete modalities.
- Expert selection: select the best first-stage expert states on the validation split.
- Second stage: train the fused predictor under the requested modality condition.
- Model selection: select the best epoch using the validation split.
- Final report: evaluate the selected model on the test split.
Prompt features are aligned with multimodal representations only in the second stage.
Prompt generation uses four MSA-specific stages:
STATE -> EVIDENCE -> INFERENCE -> PROMPT
STATE records the modality availability, EVIDENCE extracts affective cues from available modalities, INFERENCE estimates sentiment tendency and cross-modal relation, and PROMPT produces the final text used for prompt feature extraction. Multiple candidates can be generated for a selected stage and judged automatically.
python -u prompts/generate_prompts.py \
--dataset CMU-MOSI \
--split train \
--condition atv \
--output-path prompt_outputs/train_audio_text_visual_fixed.jsonl \
--raw-data-dir /path/to/raw/videos \
--model-path /path/to/omni-model \
--num-candidates 3 \
--candidate-stage PROMPTFor text-only prompt generation, --raw-data-dir is not required. For audio or visual conditions, provide the dataset raw video directory.
python -u prompts/generate_prompts.py \
--dataset IEMOCAP \
--iemocap-classes 4 \
--split train \
--condition t \
--output-path prompt_outputs/iemocap4_train_text_fixed.jsonl \
--model-path /path/to/omni-modelpython -u prompts/extract_features.py \
--dataset CMU-MOSI \
--prompt-dir prompt_outputs \
--conditions a t v at av tv atv \
--model-path /path/to/deberta-large \
--batch-size 32 \
--max-length 48 \
--pooling meanaspnet/
datasets.py
losses.py
model.py
modalities.py
utils.py
modules/
attention.py
config.py
train.py
prompts/
generate_prompts.py
extract_features.py
scripts/