Ludek's Blog About People Analytics: GPT-4's performance in the knowledge test of evidence-based HRM practices

Luděk Stehlík

Some time ago, I “replicated” Rynes, Colbert, and Brown’s 2002 study on HR practitioners’ beliefs about effective HR practices on a convenience sample of more than 140 LinkedIn users. The results of this “replication” closely resembled the results of the original study. On average, respondents correctly answered 19.4 out of 35 items, achieving a 55% success rate, which was very close to the 57% average success rate in the original study (and also quite close to the 50% success rate that corresponds to random choice, given the TRUE/FALSE response format).

I was curious to see how GPT-4 would perform in this test, as it had been evaluated on various standardized tests such as the SAT, GRE, Bar Exam, and AP. The prompts used had the following form: Read the following statement and indicate whether it is true or false. Keep in mind that the statement refers to general trends and patterns that apply on average but not necessarily to all cases. When evaluating the statement, ensure that you correctly interpret the words used in the statement and take into account existing scientific evidence. Give me the answer either true or false, without intermediate values, in a boolean way. Finally, briefly explain your reasoning behind your answer. The statement is as follows:…

So, what were the results? GPT-4 answered 29 out of 35 items right, i.e., it achieved an 83% success rate, which corresponds to the 99th and 97th percentiles in the original and “replicated” studies, respectively. GPT-4’s results were thus superior to majority of people who took the test.

However, even when it gave a correct answer, it did not always rely on correct facts and/or valid reasoning, which could be a problem if management decided to act on the answers provided. See the table below to check the details of its responses.

Search:

Item ID	Item	GPT-4 response	Correct answer	GPT-4 reasoning	Research evidence	Possible contingencies

Item ID	Item	GPT-4 response	Correct answer	GPT-4 reasoning	Research evidence	Possible contingencies

item1	Leadership training is ineffective because good leaders are born, not made.	false	false	The statement is false because there is substantial evidence that leadership skills can be developed and improved through training and experience. While some individuals may have innate qualities that make them more predisposed to becoming effective leaders, leadership is a complex skillset that can be cultivated and enhanced over time. Research has shown that leadership training can indeed have a positive impact on individuals' abilities to lead others, communicate effectively, and make better decisions.	Field study evidence that leadership behaviors and effectiveness increase following training (Barling et al., 1996). Evidence that leadership behaviors are only weakly predicted by dispositional characteristics (Judge & Bono, 2000) that are heritable (Loehlin et al., 1998; Reimann et al., 1997)
item2	The most important requirement for an effective leader is to have an outgoing, enthusiastic personality.	false	false	The statement is false because, while an outgoing and enthusiastic personality can be beneficial in some leadership situations, it is not the most important requirement for an effective leader. Effective leadership encompasses a range of skills and attributes, such as empathy, strategic thinking, adaptability, and effective communication. Different leadership styles can be effective in various contexts, and introverted leaders may excel in certain situations where they can listen attentively, motivate others, and make thoughtful decisions.	This kind of personality is, on average, an asset for leadership. A recent meta-analysis estimates a corrected validity coefficient of .31 between extraversion and leader effectiveness (Judge et al in press). However, intelligence has an even higher correlation (.52; Lord et al 1986). Also, some highly effective leaders are distinctly introverted (Bennis & Nanus, 1997; Collins, 2001)	Although it is unlikely that any future variables will produce higher validities than intelligence, work on leader behaviors (such as visionsetting and communication skills) is just beginning, so little is known about their average effect sizes. We do know (e.g., Barling et al., 1996; Baum et al., 1998) that leadership success can be affected by leaders’ behaviors and not just their traits, so future research would be useful.
item3	Once employees have mastered a task, they perform better when they are told to “do their best” than when they are given specific, difficult performance goals.	false	false	Research suggests that specific, difficult performance goals generally lead to higher performance compared to vague or "do your best" goals. When employees have clear, challenging objectives, they are more likely to be motivated to achieve them, resulting in better performance. However, it is important to ensure that the goals are realistic and attainable to prevent frustration or disengagement.	Employees reach higher levels of performance when they are given difficult-yetattainable goals rather than told to do their best. This is one of the most robust findings in all of industrial/organizational psychology (Locke & Latham, 1990)

Showing 1 to 3 of 35 entries

Previous1 2 3 4 5…12Next

The possible takeaway from this finding? Although GPT-4 can be a handy tool for exploring possible solutions to specific HR-related problems, on its own and in its current form it cannot replace the good old systematic search for and retrieval of evidence, critical evaluation of its reliability and relevance, and its weighing and synthesis as conducted and/or supervised by human experts.

P.S. I didn’t test the reliability of GPT-4’s responses, nor did I set its temperature to 0, so it’s possible that you might obtain somewhat different results if you decide to replicate the test. In addition, please keep in mind that the comparison presented here is not entirely an apples-to-apples comparison, mainly due to the fact that new evidence may have emerged that does not match the correct answers in the original study conducted more than 20 years ago.

GPT-4’s performance in the knowledge test of evidence-based HRM practices

Author

Affiliation

Published

Citation

Footnotes

Citation