The More the Merrier?

How many raters should you include in talent management calibration sessions?

The use of group calibration sessions is becoming increasingly popular across organizations (Hastings, 2012). This method of employee assessment involves groups of organizational stakeholders coming together to collectively discuss and evaluate employee contributions. Unlike individual manager assessments where personal bias and limited perspective can affect ratings, having multiple raters can increase the odds that an employee’s contributions will be fully and fairly considered, and that individual managers’ biases will not overly influence the assessment results.

Why More Can Be Merrier

Quite a bit of research evidence supports the move toward greater use of group-based calibration assessments. In general, groups tend to outperform individuals on decision-making tasks, particularly when those tasks involve problem solving, brainstorming, or complex decisions (Hill, 1982). Groups also possess a significant advantage in terms of memory, as multiple people means multiple memories to pull from and compare between. For example, one study found that after a significant time delay, rater groups had a more accurate memory of an employee’s performance (i.e., previous behaviors) than did individual raters (Martell and Borg, 1993). Groups, therefore, may be especially useful at reducing memory-prone decision errors (i.e., “How often did this employee exhibit behavior X?”).

When More Is Too Much

Group-based decision-making has many benefits over individual based decisions. But there’s a catch: Include too many group members and groups begin to experience serious coordination problems (Branson et al., 2010), increased conflict (Steiner, 1972), decreased individual participation (Hare, 1952; Chidambaram & Tung, 2005), and come to less effective decisions overall (Blenko et al., 2010). In fact, some research has shown that once more than seven people are included in a group, each additional member reduces decision effectiveness by 10 percent (Blenko et al., 2010).

What’s the Right Amount?

So what’s the magic number? Unfortunately, no research has examined the effect of group size on calibration session effectiveness specifically. But evidence from other group decision-making experiments suggests keeping calibration groups between five (Hackman & Vidmar, 1970; Piezon & Donaldson, 2005)  and seven (Blenko et al., 2010)  raters is likely to lead to the best decisions in the least amount of time. And it is probably safe to assume that calibration sessions with more than 10 raters are likely to underperform when it comes to making accurate and efficient employee assessments.

What’s the Right Mix?

The advantage of calibration groups is they enable numerous raters, each who holds a different position, perspective, and opinions to be included in the conversation. But having too many raters can decrease the value of calibration sessions. To create the optimal calibration group, here are three questions to ask yourself when determining which raters to include in a calibration session:

  1. Would this person enhance the group’s diversity? Research suggests that diverse groups have more productive discussions and produce better quality ideas than non-diverse groups (Ruhe, 1978; Hoffman & Maier, 1961). Our personal biases also tend to be in favor of those individuals who are similar to us (i.e., individuals of our same race, gender, or age). For example, research has shown that both black and white raters will give significantly higher performance ratings to members of their own race (Kraiger & Ford, 1985; Landy & Farr, 1980). Similar biases also have been shown to extend to employees who share the same organizational role or tenure (Milliken & Martins, 1996). Having diversity amongst the evaluators in a calibration group, therefore, can be a critical component to achieving fair and unbiased evaluations.
  2. Has this person ever rated the included employee(s) before? Just as we favor people who are “like us.” we also show bias toward individuals we have made a decision about in the past. For example, research has shown that when a rater has evaluated an employee in the past, they are more likely to discount new information about the employee if that information does not match their original evaluation (Bazerman et al., 1982). Raters also tend to evaluate employees more favorably if they were responsible for the previous hire or promotion of that employee (Bazerman et al., 1982). Referred to as the sunk cost effect, managers will be more likely to make a decision if it justifies a previous commitment. Because groups increase both the severity and frequency of sunk costs (Whyte, 1993), it may be best to avoid including managers who were responsible for the recent hire or promotion of to-be-evaluated employees, or to ensure that managers who had not made a previous decision about this employee are included in the session, as well.
  3. Would this person play the role of “group facilitator”? Having a neutral party present to facilitate, intervene, and help groups to identify and solve problems (Schwarz, 1994) can be critical to the success of calibration sessions. The role of the group facilitator is critical, and one that can improve the functioning and performance of the calibration group (Schwarz, 1994). If group size becomes unmanageable, keep in mind that the individual playing the role of group facilitator should never be in question for removal. Instead, try to use criteria #1 and #2 to narrow down your list of included leaders.

Using these three questions to guide your selection of calibration group members will help ensure greater accuracy in the decisions made during the calibration session. However, there is another question that also can influence the impact of calibration sessions: “Will this person’s participation in the session significantly influence the organization’s acceptance of the calibration results?” Sometimes you need to invite senior leaders or other influential company members to a calibration session just because they need to be there. But if they don’t need to be there and they won’t add significant value, then keep them out and strive to keep the calibration group size in the 5 to 7 person range.

Lauren Pytel is an HCM Researcher at SAP SuccessFactors. In her role, Pytel is focused on understanding group decision-making as it pertains to talent management, particularly calibration sessions and talent reviews. Pytel earned her Master’s degree in experimental psychology from DePaul University and is currently a doctoral candidate for the university’s Experimental Psychology Ph.D. program.

Training magazine is the industry standard for professional development and news for training, human resources and business management professionals in all industries.