Shopping BasketView Cart

Coffee Cupping Competitions – Real or Random Chance?

There are coffee cupping competitions all over the world ranging from the rigorous to the ridiculous.  Placing well in the competitions has a major effect on coffee farm profitability. However, there are serious questions which should be addressed about all of these competitions. How accurate are the scores?  What criteria are being used? How effective is a judge’s palate when she has to taste more than three or four coffees? Can competitions correct for judge bias?

 There are two well written articles in the August 2010 Roast Magazine on the 100 point coffee scoring system, one by Shawn Steiman and the other by Ken David.

Shawn Steiman’s article, For Coffee Evaluation, Scores Don’t Add Up, states that the 100 point coffee scoring system is flawed because the system is slanted towards high acid, high body coffees and the scores are subjective based on judge preference. In talking to Shawn, he says that “people are lousy instruments” for accurate measurements and, in addition, the scoring systems permit judge bias to influence the final result. He says “In summary, you have bad instruments using bad measuring sticks.”

In Ken David’s article, The 100-Point Rating Paradox, he states that ,” There is no such thing as an objective sensory reading… What the more scientific among us are looking for are reliable, repeatable sensory readings, readings that are necessarily the same time after time given the same set of sensory stimuli. But the associative structure that generates these readings can never be considered objective.” 

There are no objective studies of judge effectiveness and/or bias in cupping coffee. However, there are studies regarding judge effectiveness and bias in wine tasting competitions. Considering that coffee is more complex and has more identifiable flavors than wine, these studies are also valid for coffee competitions. The most prominent is by Robert Hodgson, a retired professor and a vintner. He conducted a controlled study of wine tastings by the California State Fair wine judges over a four year period. The 70 judges were given the same wine three different times from the same bottle. The judges’ ratings on the average varied by ±4 points on a rating scale of 80 to 100. In other words, a wine rated a 90 on the first round could on the average rate anyplace from an 86 to a 94 on the second round. Some of the judges variations were much worse. Only one in ten judges rated the same wine within ±2 points. An Examination of Judge Reliability at a major U.S. Wine Competition Robert Hodgson Journal of Wine Economics Vol. 3 page 105, Fall 2008.

Hodgson also conducted another study where he looked at the winners in various wine competitions. He took wines which had entered a number of competitions and which had won a first, second or third medal in at least one competition and compared their performance at all of the competitions. He concluded that the distribution of medals in competitions was identical to the distribution of flipping a coin. In other words, the distribution “mirrors what might be expected should a gold medal be awarded by chance alone.”An Analysis of the Concordance Among 13 U.S. Wine Competitions Robert Hodgson Journal of Wine Economics  Spring 2009 Vol. 1

Joshua Green, the editor of Wine and Spirits magazine, says “It is absurd for people to expect consistency in a taster’s ratings. We’re not robots.” Robert Parker, the founder of 100 point rating system for wine, says that ” I generally stay within a three point deviation” between ratings of the same wine. Mlodinow, Leonard, A Hint of Hype, A Taste of Illusion Wall Street Journal  11/20/2009

In Hawaii, we have two major cupping competitions, the Kona Coffee Cultural Festival Competition and the Hawaii Coffee Association Competition.

In the Kona Coffee Cultural Festival Competition, judges are required to cup 60 different coffees on the first day of competition and then cup the top 20 coffees again on the second day. For years there have been serious questions about the validity of the KCCF judging.   One of the most critical observations is the lack of transparency. The judges are not allowed to reveal their individual scores. Since the judges are cupping 20 of the same coffees twice, there is a rare opportunity to determine how consistent each of the judges are from day one to day two. If anyone is making this comparison, it is not public information.

In addition, the only information given to a farmer is a numerical score which is an average of the attribute scores. A farmer could  have a great score for acidity and a poor score for body and only know the overall average score. If one purpose of the Festival is to improve Kona coffee quality then a farmer needs to know how well or how badly her  coffee scored on each attribute.  Without attribute scores, farmers have no guidance on how to improve the quality of their coffee.

The Hawaii Coffee Association Competition is more transparent.  However, the HCA competition has the same problem as the KCCF competition – only a final numerical score is published. No one is told the attribute  scores.   Farmers and consumers are not able to see the coffee scored on each criteria i.e. acidity, body, balance and etc. Seeing a score of 88.34 on a score tally doesn’t tell the farmer anything about the positive and negative attributes of her coffee. Farmers need to know if there is a particular characteristic that is holding back the quality of their coffee. 

In addition, farmers are not told each judge’s individual score.This lack of transparency is serious. There is all the difference in the world between three judge’s average score of an 85 calculated from an 80, an 85 and a 90 and an 85 score calculated from an 84, an 85 and an 86. 

Additionally, the HCA contest promises that farmers will be given private written comments about their coffee, however, most comments are cryptic and virtually useless. We saw one private score sheet with the one word comment “Wheaty.”  Is “Wheaty” a positive or a negative attribute. If it is a defect, is it a bean defect or a processing defect? (Actually, it is probably a roast defect which is the judge’s responsibility.)

In summary, in the HCA competition farmers are expected to produce an award winning coffee without being able to see any of the judges’ individual scores, without seeing how their coffee scored on each criteria and without effective feedback on the positive and negative aspects of their coffee. The HCA should be congratulated for its efforts to be transparent, but it can do more.

Conclusion

So clearly, there are faults in using a 100 point system for rating coffee and faults in the process for producing that rating. However, there is no best alternative.  As stated by Ken David “…ratings and the blind tastings that generate them are a means through which quality and distinction can be recognized on the basis of merit rather than on the strength of tradition, public relations firms, or the sheer luck of attracting the attention of journalists more interested in the drama of a story than in the distinction of the beverage.” Ken David, The 100-Point Rating Paradox, ibid.

So, until a machine comes along that can duplicate a human palate, we are stuck with the fallibility of human judges.  However, we can improve the process by making it more transparent and by acknowledging that judges cannot replicate their own ratings within ±4 points. We can also encourage quality improvement by publishing each judges ratings on each criteria.. As stated by Ken David, “”.. cupping forms and the ratings they generate should be seen as a way of testing and exploring the consensus of the expert community in a disciplined, ongoing process of communal definition and discovery rather than as a detached application of an eternally fixed set of objective judgments….also, and perhaps more importantly, provide an orderly and deliberative content for ongoing criticism and refinement of that system.” Ken David, The 100-Point Rating Paradox, ibid.

Karen Jue Paterson is the owner of Hula Daddy Kona Coffee, a 33 acre coffee farm in Kona, Hawaii.  She is a member of the Hawaii Coffee Association, the Kona Coffee Council, the Kona Coffee Farmers Association, the Holualoa Village Association  and the Specialty Coffee Association of America. She is also the author of a number of articles on Kona Coffee including: Kona Coffee Farmers at a Crossroad http://www.huladaddy.com/?p=696 How Typica is Your Kona Coffee? http://www.huladaddy.com/?p=710, Crimes Against Kona Coffee http://www.huladaddy.com/?p=1271 and Are Roasters Eroding the Kona Coffee Brand? http://www.huladaddy.com/?p=952

 

 

 

 

Share

Comments

  1. Jim Boulter says:

    Aloha Karen,

    I always enjoy reading your commentary!!

    Coffee tastings and judging are real events, but the scores and scoring systems would ( as you elegantly point out) benefit from standardization and transparency. In short, taste is subjective and assigning a numerical value to a particular flavor profile is fraught with methodological and criterion-based confounds.

    Nevertheless, from a consumer’s point of view, the rating numbers can be *very* persuasive as they help us to sort through the dozens of new offings every month and choose coffees that give us great flavor and, hopefully, good value.

    For me, HulaDaddy has consistently been a source of excellent coffee. In addition, your blog has been informative and provocative.

    Mahalo nui loa for your coffee and advice!

    Jim Boulter
    Hollywood, California

Speak Your Mind

*