Reliability and validation of a short scale to measure situational emotions in science education
Published: Oct. 10, 2011
Research has shown that emotions play a significant role in the learning process and academic achievement. However, the fact that measurement of emotions during or after instruction usually requires written responses on lengthy research instruments has been given as a reason why researchers have tended to avoid research on this topic in classrooms. Consequently, we developed a short Likert-scale instrument which used only three items within the three factors of interest, well-being and boredom to measure adolescent emotions during instruction in science education. We present four different studies in four populations to assess the validity of the scale. In order to determine the reliability and validity of the instrument, it was administered to pupils across a range of grades (grades 6-12) after being taught standardised lessons by 14 teachers in south-western Germany. The data generated were analysed statistically in terms of their reliability and validity. As the three independent factors (interest, well-being and boredom) had been derived from theoretical constructs, confirmatory factor analysis was applied. In a second study based on pupils from different age groups, grades and school subjects, we found different scores according to age and subject, suggesting that the scale is sensitive to these parameters. A third study used two standardised educational programs in zoology and botany for 5th and 6th graders to assess the scale‘s sensitivity towards changes in emotions. Pupils rated the zoological topic as more interesting than the botanical topic, as less boring and they felt better. External validity was determined by correlating the data that was generated using our scale in a fourth study on University students with the data generated by an established measurement of motivation (the shortened German version of the KIM - ―Kurzskala zur Intrinsischen Motivation‖). The data generated suggest that the three factors cluster satisfactorily and that the instrument, which can be administered with minimum disruption of classroom time, is both reliable and valid.
Well-being, emotions, Science Education, interest, boredom, short assessment instrument
According to Pintrich, Marx and Boyle (1993), learning processes involve more than “cold cognition” and include additional factors such as affective and social variables which need to be taken into account when investigating learning processes. More recent psychological studies emphasise the significance of emotions in learning and achievement (Mayring & Rhöneck, 2003; Pekrun, 2000). Students experience learning and achievement situations individually in a different manner depending on their previous experiences, their social context, their own personal goals, their individual interests, and a number of other personality factors (Götz, Zimgibl, Pekrun, & Hall, 2003). For example, Allen (2010) found that pupils who experienced more intense emotions during an educational intervention demonstrated the most gains after the process.
However, until recently there has been a general paucity of data on the role of emotions in classroom instruction (Gläser-Zikuda, Fuß, Raukenmann, Metz & Randler, 2005; Pekrun, Götz, Titz, & Perry, 2002) and where research has been done it has tended to focus on the cognitive- emotional construct „interest". One reason why emotions have not been studied in instructional situations was due to the fact that they were seen as interfering with learning and achievement (Gläser-Zikuda et al., 2005). At best, they are mentioned as motivational aptitude or as affective learning outcome (Fraser, Walberg, Welch, & Hattie, 1987). However, Värlander (2008) argues that emotions should not be considered as hindering learning, but rather as being a natural part of it and as having a focal role in the process, and the positive impact of interest on learning processes has been confirmed with regard to both knowledge domains and subject areas (Hidi, Bem- dorff, & Ainley, 2002). Teachers" didactic competencies, students" academic achievement and interest, and social interactions have been found to correlate with successful learning processes at school (Hascher, 2003), and boredom has been shown to be related to attention problems and negative intrinsic motivation (Pekrun, Götz, Daniels, Stupnisky, & Perry, 2010).
Much of the previous work on emotions (and especially on interest) is based on trait-survey studies (assuming that a given trait is a personality variable that is resistant to short and sudden changes) by using questionnaires in cross-sectional designs assessing, e.g. interest, across grades and school topics, often in retrospect. In our study, we used a concept that distinguishes between current situational emotions and biographically developed and enduring "trait"- emotions (Ulich & Mayring, 1992) as interest may fluctuate during a lesson (Palmer, 2009). The idea behind this distinction can be clarified by the following example: Pupils may experience a particular lesson, e.g. a hands-on lesson or a specific topic, as interesting even though they do not have a general interest in the subject itself (see Raukenmann et al., 2003). Here, we define "situational emotions" as emotions that are sensitive to changes and that are not developed as a stable trait factor (as, e.g. general interest in a specific topic). Rike many psychological variables, state and trait components exist simultaneously (Spielberger, Gorsuch, & Rushene, 1970).
A central aspect of implementation and of treatments in learning studies is the need to assess situational emotions as some kind of moderating variables, because they are related to learning success. There were just very few approaches that measured situational emotions (see as examples, Gläser-Zikuda, 2010; Gläser-Zikuda et al., 2005). The main focus of our present work is to further modify a short scale to measure differences in emotions during learning processes; this specific scale is supposed to be applied in different educational settings, such as in out-of-school as well as in typical school settings, and from 5th grade up to university level, as well as in formal and informal learning environments. The short scale has its benefit because it is less time consuming and can be applied many times during an educational unit at the end of lessons.
As there is a need to assess situational emotions as a moderating variable when implementing educational interventions and, because there are very few approaches that measure situational emotions (Gläser-Zikuda, 2010; Gläser-Zikuda et al., 2005), the main focus of our study was to investigate the reliability and validity of a short scale to measure situational emotions in different educational settings from 5 th grade up to university level (the benefit of the short scale being that it is less time consuming and can be applied many times during an educational unit at the end of individual lessons). For the purposes of the study we differentiated between a more cognitive-evaluative (satisfaction) and a more affective (joy) dimension in terms of the concept ‘well-being' (c.f., Strack, Argyle, & Schwarz, 1990; Mayring, 2009). Interest is defined as a specific subject-topic-relationship which specifically includes importance and utility (c.f., Hidi, Renninger & Krapp, 1992), and boredom is defined by the components lack of action and interest, as well as subject-related boredom (c.f. Bellebaum, 1990; Csikszentmihalyi & LeFevre, 1987). Well-being is more related to a subjective positive feeling during the lessons, while interest has a more cognitive orientation, and boredom finally is related to a lack of action and interest.
Previous work assessed situational emotions immediately after school lessons by using different versions of a situational emotion scale (Gläser-Zikuda et al., 2005; Gläser-Zikuda & Fuß, 2008; Laukenmann et al., 2003; Randler, 2004, 2009). In this study we investigate the reliability and validity of a much shorter scale based on just three items for each of the dimensions; interest, well-being and boredom. The resultant nine-item scale is easy to apply in most school situation, for example at the end of a lesson or a field trip. We present four different studies based on four different pupil/student populations. The first study was done to provide evidence for the factor structure of the three dimensions (interest, well-being, boredom) of the scale, the second study sought evidence for sensitivity across different subjects and across different grades (we assumed that pupils would assess different lessons and subjects differently - as is known from trait surveys). The third study was done to assess sensitivity on an individual basis (individual reactions to different lessons in biology, namely zoology versus botany, were presented). Finally, we used another scale based on a different motivational theory to seek external validation of the shortened scale by correlating these scores with our scores when using a university student sample.
As noted above, we based our validation on four different studies based on four different samples (three at school level, one at University level). In all studies, the response to the nine-item questionnaire was at the end of the lesson (usually within the last five minutes).
The first study was based on 393 pupils (188 boys, 205 girls) from South-Western Germany. The pupils (all 5th and 6th graders) participated in an educational unit about bird flight (Hummel & Randler, 2010). All teaching was based on the same kind of instruction. The teaching materials were standardised by using a booklet where a hands-on educational unit was offered. Fourteen different teachers participated in the study. Of course, there may still be differences between the individual teachers, but we aimed for at least some kind of standardisation by providing standardised teaching materials and a booklet guiding pupils through their learning process. At the end of the lesson, pupils filled in the questionnaire and we used these data for the confirmatory factor analysis.
The second study was based on 141 pupils (74 girls, 67 boys), aged 12-19 years consisting of 7th (12) 8th (24) 9* (50) 10th (55) graders. Different subjects such as biology, chemistry, German language and social sciences were taught by different teachers to assess differences between subjects and grades. At the end of a lesson, the pupils filled in the questionnaire and we compared the situational emotions. We suppose that situational differences should occur between subject and grades.
The third study used one lesson (90 minutes) about the water lily from a standardised educational program developed by Randler & Bogner (2009), and a 90 minute lesson about snail eco
logy designed by Hummel & Randler (2010). Ninety-seven pupils (48 boys, 49 girls; 53 5th and 44 6th graders) participated in this aspect of the study. We focused on assessing intra-individual differences by applying two similar instructions (hands-on experiments) based on different topics, one instruction dealing with a zoological topic (snail), and the other was a botanical topic (water lily). As we know from previous research in interest in biology (Randler & Bogner, 2007), pupils generally rate zoological topics higher and as more interesting than botanical ones. As such, we expected that pupils should give higher scores in interest and well-being and lower scores in boredom after the zoological topic, than they would do for the botanical topic. However, these previous ratings have only been measured using trait scales (as compared to situational scales).
The fourth study sought additional external validity and was done with University students at the University of Education Heidelberg. We compared measures of our situational emotions scale with another short scale as provided by Wilde, Bätz, Kovaleva, & Urhahne (2009; labelled KIM - Kurzskala zur Intrinsischen Motivation - short scale for intrinsic motivation). This construct is based on measuring motivation based on the self-determination theory of Deci & Ryan (1985, 2003), which contains the domains interest/enjoyment, competence, pressure/tension (negative), and perceived choice. At the end of each course, students filled in both questionnaire; the KIM and our scale for situational emotions. We calculated the overall means over 11 course days (which were assessed each time) and used the individual data to calculate correlation coefficients as an indicator of external validity.
Confirmatory factor analysis (study I): As the three independent factors have been derived from theoretical constructs (Gläser-Zikuda et al., 2005), we assessed the model by using a confirmatory factor analysis (Brown, 2006) using EISREE 8.80 (Jöreskog & Sörbom, 1993). In study II, we used t-tests and one-way-ANOVA to compare the answers according to grade, subject and gender. To assess different variables in parallel, we used a general linear model (GEM). In a first step, all independent variables entered the model simultaneously, and in the second step, all nonsignificant variables were removed to produce the final model. In study III, we compared the answers of the pupils to the zoological and botanical topic by using paired t-Tests. Finally, for study IV, the validation with the KIM, we used Pearson's correlations to assess the relationship between the three emotional constructs of our scale and the four constructs of the KIM.
Reliability was calculated using Cronbach's a of .79 for well-being, of .73 for interest, and of .72 for boredom. As this study was based on different age groups and grades of the pupils and on different types of lessons and subjects, we expected differences between the variables (e.g., gender, grade, age groups). As a first step, we used a general linear model (GLM) with age as covariate, gender, subject and grade as fixed factors. In this model, we revealed no significant main effect for gender and for age (covariate), thus, these variables were removed from the model. The final model (Table 1) showed a significant influence of subject and of grade. The assessment of interest, well-being and boredom is different in different grades and in different subjects. This provides evidence for the fact that different instructions are assessed differently by the pupils which, in turn, confirms the situational characteristic of the scale. Also, different instructions, teachers and lessons may have evoked different situational emotions.
Study III - Intra-individual Differences
We obtained reliabilities of .82 (snail) and .83 (water lily) for well-being, of .73 (snail) and .74 (water lily) for interest, and of .80 (snail) and .82 (water lily) for boredom. In this study, the same pupils received two different lessons that were identical in structure. Both were biological hands- on lessons (see methods) but they differed in the subject matter (zoological versus botanical topic): We found significant differences between the two lessons (snail ecology versus water lily). Pupils rated the zoological topic snail as more interesting and less boring than the botanical topic water lily, and felt better (Table 2). This indicates that the scale is sensitive to changes on the individual level because the same individuals rated the different lessons in a different manner (a typical trait variable should be constant over time and not sensitive towards changes).
The fourth study sought additional external validity with a well-known instrument derived from Deci & Ryan (2003; in the shortened German version, labeled KIM: Kurzskala zur Intrinsischen Motivation - short scale for assessing intrinsic motivation). As noted earlier, this scale can be used at the end of a lesson or at the end of an educational unit and assesses four dimension of motivation. There were significant correlations between the different dimensions, e.g., the KIM interest scale was positively related to the interest and well-being scales of our scale and negatively related to the boredom scale. Similarly, KIM competence was positively related to interest and well-being and negatively to boredom of our scales. KIM pressure/tension was negatively related to well-being and positively to boredom, suggesting that students feeling pressed experienced higher boredom and lower well-being. KIM perceived choice showed the lowest correlations because this measures a slightly different construct, but perceived choice was positively related to well-being.
Our study provides evidence that a shortened scale is sufficient to measure short-term learning situational emotions, and therefore is a suitable means for educational research of this nature. The shortness of the scale allows an application more than once during an educational unit (as noted earlier, longer scales may stir up aversion or reactance). Overall the emotional variables were rated differently with regard to grades and/or subject matter, which suggest that our scale is suitable to measure differences between lessons, and within pupil cohorts. In study-II, no differences in age and gender were revealed which suggests that the scale can be applied effectively from age 10 onwards until (and including) university age students. We do, however, recognize that such differences may emerge from a larger sample size - something which may provide an interesting issue for investigation.
The within subject differences from study III point towards a real situational characteristic of our scale because the same pupils rated the different lessons differently (as expected), and the teacher was identical, thus removing any teacher effects from the results (Randler & Bogner, 2009). This suggests that the scale is sensitive towards changes. Furthermore, study III adds external validity, because trait survey studies on interest in biology revealed that the zoological topics were more interesting to the pupils than the botanical ones (see, e.g., Löwe, 1992; Hong, Shim, & Chang, 1998; Lindemann-Matthies, 2005). This difference between zoological and botanical topics found in trait interest studies is reflected in our measurements of situational interest.
In study IV, we sought external validity by correlating our situational emotion scale with an established measurement on motivation, the KIM based on the motivational theory of Deci & Ryan (1985), as provided by Wilde et al. (2009). Both interest scales correlated with each other and in both cases boredom was negatively related to intrinsic motivation (see also Pekrun et al., 2010) suggesting that the scale, although shortened, remains valid. In turn, the Cronbach's a scores attributed to the data attest to the internal reliability of the instrument.