... It appears from both our initial data analysis, as well as our qualitative examina-tion of the data, that the pairs make tradeoffs be-tween relying on the linguistic context and the visual ... behavior in the pres-ence of visual information could enable agents to emulate many elements of more natural and real-istic human conversational behavior. A computational model may also make ... of the visual entities has an associated number of do-main-dependent features. For example, they may have appearance features that contribute to over-all salience, become activated multiple...