With quizzes being presented online through GauchoSpace, instructors have access to a wealth of new statistics on quiz and question performance. You can use these statistics to:
- find questions that have the wrong answer selected as correct
- find questions that most students guessed on
- find and replace questions that did not do a good job of discriminating between knowledge levels
- ensure any random questions you used were equally difficult
- determine which questions best sorted students by overall quiz performance
For example, if a question has a very low Facility Index, it means most students guessed on the question and you should either review that topic with the class or reword the question.
Similarly, your overall quiz has a high Error Ratio it means most of the variation between students was likely due to chance and you don't want to make that quiz a large portion of the overall class grade.
The table includes the "Quiz name," "Course name," "Number of complete graded first attempts," and "Total number of complete graded attempts" at the top.
Following this information, there are a series of averages ("Average grade of first attempts," "Average grade of all attempts," "Average grade of last attempts," and "Average grade of highest graded attempts") as well as the Median and Standard Deviation for the quiz attempt you selected earlier (in this example the statistics were calculated for the highest graded attempt for each student).
At the bottom of the table are five more involved statistics that can be useful when comparing multiple quizzes:
Score distribution skewness (Target Values: -1 to 1): Skewness is a measure of how symmetrical the score distribution is. Zero means it is symmetrical, negative means it is skewed to the lower values (there is more variation among students who scored lower than average than there is among students who scored higher than average), and positive values mean it is skewed higher. A score significantly different than 0 can signify that your quiz isn't doing a good job of distinguishing among either high performing students (for negative skew) or low performing students (for positive skew).
Score distribution kurtosis (Target Values: 0 to 1): Kurtosis provides a statistic to determine how different the tails of the grade distribution are from a normal distribution (basically how flat the distribution is). As this statistic is only really meaningful for very large class sizes, it will not be useful to the majority of classes. However, as a rough guide, this number should be between 0 and 1. If the value is outside of this range, it might mean that the quiz is not doing a good job of discriminating between very high/very low knowledge levels and average knowledge levels.
Coefficient of internal consistency (Target Values: >75%): This is a measure of how consistent students perform over the entire quiz. A high value means that students who performed well performed well throughout the quiz and students who performed poorly performed poorly on the entire quiz, whereas a low value means that the quiz did not do a good job of discriminating between students of different knowledge levels or that some of the questions were substantially more difficult than others.
Higher is better, but if you are testing on multiple subjects within a quiz that are various degrees of difficulty don't be surprised if you get a low value. In general, any values below 64% signify that the quiz was unsatisfactory and should be changed. An important caveat here is that if you are intentionally creating very easy quizzes this statistic will be low but won't signify that you should change the quiz.
Error ratio: The error ratio is a measure of how much variation between subjects was due to chance instead of being due to knowledge level. Lower is better here, and a high value tells you that you might have some questions that were too hard and the majority of students guessed on. You can look at individual question statistics to try and pinpoint the problem questions (more details on this process can be found in the next section).
The error ratio should in general be below 50%.
Standard error: Standard error is a measure of how much an individual student's score would change due to random chance if they took the quiz again. Similar to the error ratio, a smaller value is better here. It is often suggested that standard error would ideally be below 8%, but for quizzes with very few students (such as the example in this article) note that the value will likely be higher than that.
Overall, you should aim for skewness to be between -1 and 1, kurtosis to be between 0 and 1, the Coefficient of internal consistency to be high, and the error ratio and standard error to be low.
Statistics for the questions in the quiz are found under the "Quiz Structure Analysis" section.
The first five columns on the left include general information about the question.
1. Q# is the position of the question in the quiz
2. The question type
3. The preview question and edit question icons
4. The question name
5. The number of attempts for the question
The remaining columns show the statistics that are calculated for the questions.
6. The Facility Index is the percentage of the students who got the question correct. The table below provides a quick guide to interpreting Facility Index values.
7. The Standard Deviation in the scores, or how much variation there was in the scores. Very easy or very hard questions will have lower Standard Deviation. It is recommended that a question have above a 33% Standard Deviation, but note that a high Standard Deviation does not automatically mean the question did a good job of discriminating knowledge levels.
8. Random Guess Score is the score students would get if they guessed on the question.
9. Intended weight is the percentage of the total quiz points that this question is worth
10. Effective weight is a measure of how much of the variation in total scores was due to this question. Ideally, it should be as close to the intended weight as possible.
11/12. Discrimination index and Discriminative efficiency are measures of how consistent performance on this question was with performance on the entire quiz. High values mean that students who scored high on the quiz overall were also likely to get this question right, while students who performed poorly were likely to get it wrong. In other words, they are measures of how diagnostic the specific question was for overall performance. High scores for both of these values are better.
When looking at the discrimination index, it can be helpful to use the following chart to determine how well the question is performing.
Additionally, at the bottom of the page, there is a bar chart displaying the facility index and the discriminative efficiency for each question. This chart provides a quick visual way to look for potentially "broken" or not-useful questions. Look for questions that had either a very high or very low Facility index (meaning almost everybody or almost nobody answered it correctly) and questions with low Discriminative efficiency (meaning the question did not do a good job of sorting between high and low performing students).
For more details on the quiz and question statistics, see this article.
This will bring you to a page where you can see the stats for all the random variants. Note that the quiz will lump all random questions that are in a row and from the same category into one group for this analysis. The two most useful things to look at are the Facility index (the percentage of students that got the question correct) and the Discrimination Index (how well the question discriminated between high and low performing students).