Why is learning language from video harder than learning from live interaction until age 2?

By Aparna Nadig

In my last post I discussed evidence that children under age 2 tend to learn new words better when an adult labels a new object during interaction, than when watching the adult label the object via video. This phenomenon has been called the video deficit (Anderson & Pempek, 2005).

Why would this be the case? One straightforward explanation is familiarity; infants and toddlers interact with adults and learn from them day in and day out, whereas they traditionally have had much less exposure and interaction with screen representations in the first years of life. They may need to acquire experience with screens before they can learn from them as easily as they learn from a live person.

This simple difference in familiarity seems to be linked with at least two deeper kinds of differences that set live interaction apart from information presented on video: how socially relevant or meaningful the content is, and the ability to understand two-dimensional representation of the real world (see Barr, 2010 and Troseth, 2010 for reviews).

Social meaningfulness

Infants and toddlers are better able to learn from video when content is meaningful and relevant to their ongoing experiences. For instance, they learn labels presented via video better when their own mother does the labelling than when a stranger does it (Krmcar, 2010). They also learn non-language tasks more easily from familiar characters (e.g., Elmo) on video than from unfamiliar characters (a Taiwanese children’s character named DoDo) demonstrating the same actions (Lauricella, Gola, & Calvert, 2011).

Another factor that is key to live interaction and missing from video-presented material is social contingency: children learn best from responses that follow their speech and actions in timing and content. Contingency may be another way to make content socially meaningful; it is also consistent with children’s experiences and expectations of how people convey information. Recent research shows that 24- to 30-month-old children can learn new verbs via socially contingent Skype interactions; in fact the toddlers learned new verbs as well by Skype as they did via live interaction (Roseberry, Hirsh-Pasek, & Golinkoff, 2014). In contrast, no learning was observed in a non-contingent video condition that presented the same content obtained through interaction with a different child.

Transfer of knowledge between 2D representations and the 3D world

Another difference between live interaction and video concerns the nature of 2D representations (a potential problem common to video, TV, tablets, and books). First off, the task of perceiving is different in these two situations — 2D images are perceptually “impoverished” relative to 3D objects and are processed differently in the brain in infancy (Barr, 2010). A second hurdle is a conceptual one: infants do not yet understand the abstract and symbolic nature of screens, they initially see TV for example as a concrete object (DeLoache, 1987; Troseth & DeLoache, 1998). I remember watching the world cup on big screen TV on a low table – my daughter, who was around 12 months at the time and had never seen such a large TV up close, tried to touch and climb into the screen!

Another explanation for the video deficit – the representational flexibility account, is based on how memory processes work (Hayne, 2004). To retrieve information from memory we try to match it up with cues that were present when we first encountered the information. This matching task gets harder the more cues differ at presentation and retrieval. Therefore when a word is learned on a 2D screen and needs to be applied to the 3D world, it takes work to transfer it given a mismatch in cues across these two situations.

Importantly, the video deficit is not about screens being “bad” in and of themselves, as perpetuated by the American Academy of Pediatrics recommendation that children under 2 have no screen time. Research findings in line with the representational flexibility account illustrate this very nicely. Infants aged 15- to 16-month-olds observed a new action on an object either through live interaction (3D) or on a digital tablet (2D). When they were asked to imitate the action in the same dimension (i.e., on a 3D object or 2D touchscreen) they performed very well irrespective of the dimension they learned on – no video deficit was observed. However, when they had to transfer learning across dimensions (learn on 3D object or 2D touchscreen and act out on the other), they performed poorly. Notably, transfer was just as difficult if they learned with a real object or the touchscreen (see chart at Zack, Barr, Gerhardstein, Dickerson, & Meltzoff, 2009).

So it seems there are many explanations for why learning from live interaction is initially easier in the first two years of life than learning from 2D representations (e.g., TV, tablets, phones, books). The first type of explanation is linked to what makes live interaction so effective: it is socially meaningful, relevant to the child’s ongoing experiences, and contingent on the child’s own behavior. The second type of explanation focuses on why learning from 2D screens can be cognitively taxing for infants living in a 3D world: they don’t yet understand the symbolic nature of screens and need to transfer information between the screen and the real world. Importantly, adding elements found in live interaction to video representations, as in Skype video chat, can help toddlers to overcome the video deficit. In my next blog post I’ll extend this discussion to media content targeted at babies and young children, and discuss characteristics of media content that have been shown facilitate learning language from the screen.

