[INLG2023] The High-Level (HL) dataset is a Vision and Language (V&L) resource aligning object-centric descriptions from COCO with high-level descriptions crowdsourced along 3 axes: scene, action, rationale.
Targeted semantic multimodal input ablation. Official implementation of the ablation method introduced in the paper: "What Vision-Language Models 'See' when they See Scenes"
[Frontiers in AI Journal] Implementation of the paper "Interpreting Vision and Language Generative Models with Semantic Visual Priors"
Open source tool to generate explanations of PGMPy Bayesian Networks