ASPNet

This repository contains the code for ASPNet: Affective Semantic Prompting Network with MLLMs for Incomplete Multimodal Learning. It includes affective semantic prompt generation, prompt feature extraction, and two-stage model training.

Supported Datasets

CMU-MOSI
CMU-MOSEI
IEMOCAP

The default dataset layout is:

dataset/
  CMU-MOSI/
  CMU-MOSEI/
  IEMOCAP/

Dataset paths can be adjusted in config.py.

Main Entry Points

train.py: train and evaluate the model.
prompts/generate_prompts.py: generate semantic prompts for modality conditions.
prompts/extract_features.py: encode generated prompts into .npy feature files.
scripts/: example shell scripts for common training runs.

Training

python -u train.py \
  --dataset CMU-MOSI \
  --audio-feature wav2vec-large-c-UTT \
  --text-feature deberta-large-4-UTT \
  --video-feature manet_UTT \
  --prompt-feature auto \
  --test_condition atv \
  --batch-size 32 \
  --epochs 300 \
  --stage_epoch 150 \
  --gpu 0

The training pipeline uses:

First stage: train audio, text, and visual experts on complete modalities.
Expert selection: select the best first-stage expert states on the validation split.
Second stage: train the fused predictor under the requested modality condition.
Model selection: select the best epoch using the validation split.
Final report: evaluate the selected model on the test split.

Prompt features are aligned with multimodal representations only in the second stage.

Prompt Generation

Prompt generation uses four MSA-specific stages:

STATE -> EVIDENCE -> INFERENCE -> PROMPT

STATE records the modality availability, EVIDENCE extracts affective cues from available modalities, INFERENCE estimates sentiment tendency and cross-modal relation, and PROMPT produces the final text used for prompt feature extraction. Multiple candidates can be generated for a selected stage and judged automatically.

python -u prompts/generate_prompts.py \
  --dataset CMU-MOSI \
  --split train \
  --condition atv \
  --output-path prompt_outputs/train_audio_text_visual_fixed.jsonl \
  --raw-data-dir /path/to/raw/videos \
  --model-path /path/to/omni-model \
  --num-candidates 3 \
  --candidate-stage PROMPT

For text-only prompt generation, --raw-data-dir is not required. For audio or visual conditions, provide the dataset raw video directory.

python -u prompts/generate_prompts.py \
  --dataset IEMOCAP \
  --iemocap-classes 4 \
  --split train \
  --condition t \
  --output-path prompt_outputs/iemocap4_train_text_fixed.jsonl \
  --model-path /path/to/omni-model

Prompt Feature Extraction

python -u prompts/extract_features.py \
  --dataset CMU-MOSI \
  --prompt-dir prompt_outputs \
  --conditions a t v at av tv atv \
  --model-path /path/to/deberta-large \
  --batch-size 32 \
  --max-length 48 \
  --pooling mean

Repository Structure

aspnet/
  datasets.py
  losses.py
  model.py
  modalities.py
  utils.py
  modules/
    attention.py
config.py
train.py
prompts/
  generate_prompts.py
  extract_features.py
scripts/

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
aspnet		aspnet
prompts		prompts
scripts		scripts
.gitignore		.gitignore
README.md		README.md
config.py		config.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ASPNet

Supported Datasets

Main Entry Points

Training

Prompt Generation

Prompt Feature Extraction

Repository Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ASPNet

Supported Datasets

Main Entry Points

Training

Prompt Generation

Prompt Feature Extraction

Repository Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages