Blog

What makes effective test questions and answers for assessments?

What instructors and administrators need to know

There are many forms of test questions, each with their own strengths when it comes to upholding learning objectives. Some types of questions are efficient and measure breadth of student knowledge whereas other types of questions offer more opportunities to gain insights into higher order thinking. Let's take a deeper look at what makes test questions effective.

Christine Lee

Content Manager

Thoughtful test questions and answers can help create an effective assessment, one that accurately measures student knowledge. When test questions are crafted with learning objectives in mind, they help foster study habits, influence knowledge retention, and prepare students for eventual summative assessments. Furthermore, when students feel an assessment is fair and relevant, they are less likely to engage in academic misconduct.

Assessment is the intersection at which instructors can provide feedback to guide students but also where instructors gain insights into student learning. In many cases, this feedback exchange can solidify student-teacher relationships and influence learning outcomes. With effective assessments, students can feel seen and supported. And instructors have the information they need to further learning. Thoughtful decisions about test questions and formats can make a difference in this data exchange.

Some of the most common question types and the roles of each in the realm of assessment are:

Multiple-choice
True/False
Extended matching sets
Fill-in-the-blank
Short answer
Long answer / essay

To that end, this blog post will cover the above question types and then dive into methodology to bolster exam design.

Learn more about assessment formats

Multiple-choice test questions and answers

What sets this question type apart?

Multiple-choice questions (MCQs) have the ability to test a wide swath of knowledge in a short amount of time; this characteristic, plus the fact that MCQs enable faster grading and uphold objective scoring, make them a very popular standardized exam format.

That said, there are many critics of MCQs, some going so far as to say “multiple-choice tests are not catalysts for learning” and that “they incite the bad habit of teaching to tests” (Ramirez, 2013). Multiple research articles, too, indicate multiple-choice questions may result in surface-level study habits. However, they can still be leveraged for effective assessment when utilized appropriately. Multiple-choice questions can be paired with other question types to provide a complementary assessment or they can themselves be designed to test deeper conceptual understanding.

There are examples of how this question type can be useful in testing reading comprehension and practical knowledge of learned principles. In response to criticism surrounding the inclusion of multiple-choice questions on the Uniform Bar Exam (UBE), The Jacob D. Fuchsberg Law Center at Touro College cites the “case file” format of a 1983 performance test in California, a multiple-choice exam paired with documents typical of a legal case file. Successful completion of this exam did not rely on rote memorization of rules. Rather, this exam used a series of multiple-choice questions to assess the application of relevant theories and practices to true-to-life scenarios presented in the mock case file.

Those considering the value of multiple-choice questions should also keep in mind any summative assessments that lie ahead for students, beyond the scope of a single course. In a recent webinar on the subject of multiple response type questions in nursing programs, Assistant Professor Cheryl Frutchey noted that many of her students at Oklahoma City University’s School of Nursing have been reporting that 70-75% of NCLEX questions are now the “select all that apply” format. In weighing the benefits of a particular question type in determining student success, field-related insights like these may help tip the scale.

True/false test questions and answers

A true/false question asks the exam-taker to judge a statement’s validity. Rather than calling upon powers of memorization, the exam-taker ideally demonstrates their command of verbal knowledge and a working knowledge of a given subject by converting abstract principles to a specific application.

That said, the nature of true/false questions makes it so that even when guessing, the test-taker has a fifty-percent chance of getting the correct answer.

The multiple-true-false question is an adaptation of the true-false question that incorporates (and improves upon) elements of the multiple-choice question type, requiring the test-taker to consider all answer options in relation to a given question stem. This hybrid question type differs from “select all that apply” in asking the test-taker to identify both correct and incorrect statements rather than just the “true” ones, shedding light on incorrect or incomplete understandings.

For both true/false and multiple-choice question types, opportunity for feedback is severely limited.

Extended matching sets

Particularly helpful for the usual format of clinical assessments in nursing exams, this item type provides a series of individual questions and a longer list of possible answers for the test-taker to choose from. By design, extended matching set questions prioritize an understanding of the question stems before a correct selection can be made, making it difficult to quickly eliminate incorrect answers from the list.

With an extended list of answers to accompany perhaps only a handful of question stems, this question type encourages the test-taker to process information within each question before parsing relevant answers from the provided list, emphasizing a deeper subject mastery than simple memorization can provide.

Fill-in-the-blank test questions and answers

A known benefit of free response question types like fill-in-the-blank is the decreased possibility of guessing the correct answer. Since the exam-taker must provide an answer that fits contextually within the provided question stem, fill-in-the-blank questions are more likely to exercise language skills.

In a recent study composed of 134 final-year undergraduate dental students at the University of Peradeniya, 90% found fill-in-the-blank questions more challenging than the same question in multiple-choice format, and only 19% reported encountering fill-in-the-blank questions during their time in the program. By withholding answer choices that lead to quick answer recall, fill-in-the-blank questions can effectively gauge an exam-taker’s understanding. Though, as revealed above, the prevalence and/or feasibility of this item type may vary from program to program. And again, feedback is minimal with this type of question.

Short answer

Short-answer questions are valuable for measuring a test-taker’s understanding of a subject beyond simple recall. Preparing for an assessment with this question type promotes study habits that reinforce comprehension over memorization, thus increasing the likelihood that the test-taker will retain this knowledge.

For example: After using ExamSoft to convert their assessment format from multiple-choice to short-answer questions, the Donald & Barbara Zucker School of Medicine at Hofstra/Northwell conducted a survey to measure student attitudes about the switch. Sixty-four percent of the 274 students surveyed thought that short-answer questions better equipped them for a clinical setting. By exercising abilities in critical thinking, reasoning, and communication, the free-response format of this question type allows the cultivation of skills necessary for the workplace.

Long answer or essay test questions

Long answer or essay questions allow individual students to formulate their unique ideas and responses to demonstrate their understanding of a concept. This question is one that can most easily measure higher-order thinking and depth of knowledge, though at the same time, it may not cover a wide range of said knowledge.

Marking essay questions can be a time burden on instructors; additionally, long answers involve some measure of subjective scoring. They may also measure writing skills as well as subject-specific knowledge.

Learn about summative exams

Drafting test questions and answers with ExamSoft or Gradescope

Beyond building assessments using all of these common question types, ExamSoft users can:

Supplement individual questions with audio, video, or image attachments
Create “hotspot” questions for exam-takers to select an area of an image as an answer
Tag questions with categories, including learning objectives and accreditation criteria. Additionally, ExamSoft offers robust item analysis.
Explore various question types offered by ExamSoft, such as bowtie, matrix, and drag-and-drop.

With Gradescope, instructors can:

Accommodate a variety of question types with audio, video, or image attachments
Utilize item analysis to measure exam design effectiveness, particularly for multiple-choice questions
Grade question by question with answer groups and AI-assisted grading instead of student-by-student to promote more objective scoring
Use Dynamic Rubrics to ensure students receive detailed insight into how points were awarded or deducted. Dynamic Rubrics also allow for flexibility to adjust grading criteria midstream to account for later accommodations for all students.

How Gradescope can help teaching and learning

Examplify, ExamSoft’s test-taking application, offers several built-in exam tools for test-takers to use, including:

Highlighter and notepad
Programmable spreadsheet
Scientific and graphing calculators

Gradescope accommodates a variety of assignment types and enables:

Grading of paper-based exams, bubble sheets, and homework
Programming assignments (graded automatically or manually)
Creation of online assignments that students answer right on Gradescope

Assessment is a crucial part of education, no matter the subject or level. Assessments are tools to measure how much a student has learned, though with the right post-exam data, they can be so much more, including assessments themselves being a learning opportunity. But not all assessments are created equal; a poorly written exam or exam item may skew results, giving instructors a false sense of student learning.

Effective exam items provide an accurate demonstration of what students know, and they also support fair and equitable testing. To get the most out of your assessments, it’s important to write well-constructed exam items with every student in mind and then test item efficacy.

How ExamSoft supports teaching and learning

What item types fit your objectives?

There are two general categories of exam items: objective items and subjective items. Objective test items have a clear correct answer; item types can include multiple choice, true/false, short answer, and fill-in-the-blank items. Subjective items, on the other hand, may have a range of correct answers. Answers to subjective questions often involve persuasive/defensible arguments or present various options for in-depth discernment. Test items like these usually come in the form of long answers, essays, or performance-based evaluations.

According to the Eberly Center for Teaching Excellence and Educational Innovation at Carnegie Mellon University, “There is no single best type of exam question: the important thing is that the questions reflect your learning objectives.” It is the educator’s place to determine whether a subjective or objective test item will better align with their learning objectives.

If you want students to explain the symbolism in a literary text, subjective-based questions like short answers and essays are usually best. Objective test items are great if you want to make sure your students can recall facts or choose the best argument to support a thesis. If you want your students to match medical terms to their definitions? A matching task, which is an objective item, may be your best bet. No matter the subject, it is imperative to ensure the question types serve the intended learning objectives.

Learn more about the difference between subjective and objective assessments

Create assessment items with Bloom’s Taxonomy in mind

As you consider exam items, and whether you’re going to use objective or subjective items, it’s important to keep cognitive complexity in mind. Bloom’ s Taxonomy can help with planning not only curriculum but assessment. Bloom’s consists of six levels of cognitive understanding. From the lowest to highest order, these are:

Remember
Understand
Apply
Analyze
Evaluate
Create

As you move up the ladder from recall to creation, there is a gradual shift from objective to subjective exam items. If students are new to the concepts you’re teaching, it’s often best to focus on the initial three levels with objective items and set an appropriate knowledge foundation. As students progress through a course or program, you can start to assess the top three levels of cognition with subjective exam items to determine higher-order thinking or capability. While some courses may span testing student factual recall to synthesizing and creating their own ideas, many introductory classes may only pertain to parts of Bloom’s Taxonomy. More advanced courses, like graduate seminars, may target the higher order categories like analyze, evaluate, and create.

You might assess students’ grasp of the “remember” level with a multiple-choice question about the date of a significant period in history. Whereas testing students’ skills in “evaluation” may look like a persuasive essay prompting students to argue and support their stance on a topic with no one correct position such as interpretation of metaphors in written works.

Consider word choice and cultural bias

As exam creators, we may sometimes write an item that is difficult for students to understand. After writing an item, ask yourself if the question or statement could be written more clearly. Are there double negatives? Have you used passive voice construction? Are you attempting to teach the concept in the question stem itself? Often, the more concise the item is, the better. If possible, do not use absolutes such as “never” and “always.” We’re writing questions, not riddles; it is best practice to test the students’ knowledge, not how well they read. The point is to focus on student knowledge acquisition and effectively convey the point of the question.

Avoid idioms and colloquialisms that may not be clear to international students. Questions containing regional references demonstrate bias. Also consider references that may exclude historically marginalized groups. For instance, an item that refers to a regional sport may not be as clear to these groups as a sport with international reach. Another example is the infamous critique of the SAT question referring to “regattas.” This term, which might be familiar to one certain socioeconomic group and completely unfamiliar to others, is simultaneously not a measure of aptitude.

Make sure your exam items reliably assess concept mastery

Using psychometrics, specific and widely accepted statistical measures of exam data, you can test the reliability of your exam and items. One way to measure exam reliability through psychometrics is the item Difficulty Index, or p-value. Simply put, what percentage of exam-takers answered a specific question correctly?

If the p-value is low, the item may be too difficult. If the p-value is high, the item may be too easy. However, this data point alone is not a strong measure of reliability and should be used in context with other psychometric measures. If your difficult question has a high Discrimination Index and Point Biserial values, you can more confidently say that only the higher-order thinkers answered correctly, while the lower-performers did not. A high corresponding Point Biserial value also tells you that generally, students performing well on this item, albeit difficult, performed well on the overall exam. When psychometrics are used together, you are able to gain a solid holistic picture of item performance and whether your question was well written.

Psychometric analysis measures include:

Difficulty (p-value)
Discrimination Index
Upper and Lower Difficulty Indexes
Point Biserial Correlation Coefficient
Kuder-Richardson Formula 20

Create effective test questions and answers with digital assessment

The above strategies for writing and optimizing exam items is by no means exhaustive, but considering these as you create your exams will improve your questions immensely. By delivering assessments with a data-driven digital exam platform, instructors, exam creators, and programs can use the results of carefully created exams to improve learning outcomes, teaching strategies, retention rates, and more.

Return to the blog homepage

Subscribe to the blog