Question-writing is an integral part of the instructional design process. Writing effective questions is both art and science. You may develop this art over a period of time through practice. However, a lack of understanding of the science can render your most sincere “artwork” ineffective. As instructional designers, we are familiar with the guidelines, such as “test what you teach”, “parallel options” and “give-aways” vis-à-vis question design. But how many of us understand the science behind these guidelines? Those who do understand, will be familiar with the following terms: Reliability, Validity, Difficulty Index and Discrimination Index. But those who don’t would do well to acquaint themselves with these terms.
What is Reliability?
Reliability is the term used to describe the fact that a test measures what it claims to measure consistently. The keyword here is “consistent”. This implies that all other factors remaining the same if a candidate took the test again he/she would get a score similar to what was scored by him/her in the first attempt. The purpose of the reliability measure is to ensure that the test produces consistent information. Reliability is represented as “r”, which is the correlation between scores at Time 1 and Time 2. “r” is expressed as a number between 0 and 1.00, where:
r = 0, indicates no reliability
r = 1.00, indicates perfect reliability
A test is considered reliable if it has a greater reliability coefficient. However, no test has the perfect reliability (1.00).
What is Validity?
Validity is defined as the extent to which scores obtained on a test represents what it claims to measure. In other words, a test full of recall-level questions aimed at measuring how well a candidate “understands Instructional Design principles” is NOT a valid test. Or, if the objective of a learning program is to skill people to interpret and extrapolate Bloom’s Taxonomy/Gagne’s Events, then testing them on recalling the levels in Bloom’s Taxonomy and events in Gagne’s Events is NOT a valid test. Basically, a valid test is one that is relevant to and measures directly what it claims to measure (the objective).
Reliability versus Validity
Validity will tell you how effective a test is for a stated context; reliability will tell you how trustworthy a score on that test will be.
What is Difficulty Index?
Difficulty Index is used to describe how effectively a test differentiates between candidates who do well on the test and those who don’t. You arrive at this index by dividing the number of students who choose the correct answer for a test item by the number of total students. A test item is considered easy if it has a value of 0.75 or more; and difficult if it has a value of 0.25 or less. A good test is one with moderate difficulty, which is a value of 0.5.
What is Discrimination Index?
Discrimination Index is the term used to describe how well an assessment differentiates between high and low scorers. Discrimination Index is calculated by subtracting the number of candidates with low overall scores who got the item correct from the number of candidates with high overall scores who got the item correct.
Total # of students with low overall scores minus Total # of students with high overall scores
A high Discrimination Index is what a good test must strive to achieve. This means that the high-performing candidates must select the correct answer for each question more often than the low-performing candidates. Test items with negative discrimination index must be analysed thoroughly, and corrected or replaced.