Project
Which people are suitable for work in the military context?
This project is a PhD project which lasts four years where I aim to execute at least four empirical studies. This is an IOPS project because it is in general psychometric oriented. For instance, I am going to investigate how different kinds of assessments are related to predicting job performance (i.e., comparing methods). I also going to develop new psychological tests to improve selection in the military context, and if possible, I am to develop psychological tests to improve placement strategies (i.e., proposing new psychological tests). For an overview of the whole project, see the four studies below.
Study 1
The objective of study 1 is to complete a systematic review according to the PRISMA (Page et al., 2021) guideline, and if possible, a meta-analysis. Specifically, reviewing which psychological constructs (e.g., personality) relate the most to military performance, and the type of psychological test (i.e., typical or maximum performance tests) that is best suited to assess these constructs (i.e., the degree of reliability and predictive validity). Results derived from this study will serve as input for the following studies. The question is,which psychological constructs are most related to military job performance, which type of psychological test aims to assess these constructs and what is their predictive value?
Study 2
The objective of study 2 is to investigate the validity of unstructured interview outcomes and psychological tests when predicting military performance. Because of safety risks, the Ministry of Defence wants to reject candidates who form a danger to themselves and the Ministry, called, “unsuitable candidates”. Unsuitable candidates can be characterized as those who succumb to pressure or are uncooperative and therefore lead to a lower unit performance. Because of these characteristics, the Ministry of Defence interviews their candidates intensively through a life course interview. On the bases of this interview, and with the aid of a psychological test battery, candidates are assessed.
Research shows that the use of selection interviews alone is an adequate predictor of job performance (e.g., Schmidt & Hunter, 1998; Huffcutt et al., 2014; Sackett et al., 2022). For instance, Sacket et al. (2022) estimated that the predictive validity of structured interviews is .42. However, when taking into account the degree of structure during selection interviews, Huffcutt et al. (2014) estimated the following predictive validities .20 (level 1), .46 (level 2), .71 (level 3) and .70 (level 4). Where level 1 reflects an unstructured interview (i.e., no standardization) and level 4 a structured interview (i.e., completely standardized). In addition, older research from Schmidt & Hunter (1998) which also took into account general ability scores, shows that the incremental validity of structured interview outcomes is 24%, whereas for unstructured interview outcomes this is only eight percent. Last, the estimates from Schmidt & Hunter (1998) were derived from research settings (i.e., a low-stakes context), rather than applicant settings (i.e., a high-stakes context) which probably overestimates the predictive validity due to unfaithful or social desirable answering.
To be able to know whether psychological tests or an unstructured life course interview predicts job performance in the military context, I therefore aim to answer the question: how valid are psychological tests and an unstructured interview in predicting job performance in the military context? And how do these instruments relate to each other?
The design of this study is as follows. Every candidate will be assessed via a subset of their current applied psychological battery test and life course interview. The life course interview is the selection variable. Subsequently, only those candidates who are accepted may follow the “Initiële Militaire Opleiding” (IMO), which lasts 10 weeks. Because the Ministry of Defence does not have performance criteria yet, I am therefore going to consider completing the IMO as job performance and dropout as no job performance. Because the Ministry of Defences rejects their candidates when they find them unsuitable, this creates range restriction: underestimating the true predictive value of predictor scores with job performance (e.g., Caretta & Ree, 2022). To correct for this, I am going to apply the Case II method (Pearson, 1903; Thorndike, 1949). This is because direct range restriction occurs on the interview outcomes. And because interview outcomes are unrelated to the psychological test scores, I do not have to correct the latter variable as well.
Study 3
The objective of study 3 is develop job performance criteria in the military context based on job analysis (Sackett & Laczo, 2003). Two sorts of job performance criteria can be developed: 1) Hands-On Performance Tests (HOPTS), or 2) supervisory ratings. The first one is objective, has a high criterion-related operational validity (Cucina et al., 2024), and is expensive. While the second one is cheaper (Wise, 1992), more subjective (e.g., Murphy & Cleveland, 1995), and has a lower validity (i.e., r = .27 to .31; Hayes et al., 2002; Sackett et al., 2022). These performance criteria should measure, presented in order of importance according to the Ministry of Defence, (1) Counterproductive Work Behaviors (CWB; e.g., causing harm to colleagues), (2) Task performance (e.g., shooting accurately with a rifle), and (3) Organizational Citizens Behaviors (OCB; e.g., experienced soldiers providing guidance to less experienced soldiers without formal instructions to do so). The question is, what are the relevant job performance criteria in the Dutch military context?
I then aim to develop and test supervisory ratings. When time allows it and the Ministry of Defence gives me the opportunity to continue with developing job performance criteria, I subsequently proceed with developing and testing HOPTS. An additional research question will be: how valid are these HOPTs for measuring military job performance in the Dutch military context? And what is the concurrent validity of these HOPTS with supervisory ratings?
If more is possible, then I aim to develop and test force-specific HOPTS which will aid the staff selection when they need to assign incumbents to a different but best possible job position. For instance, indicating the probability whether an incumbent will perform sufficiently within the same force but in a different unit (e.g., switching from being a medic to a sniper). This is important because after three years, and in some circumstances longer, military personnel serving the Dutch Ministry of Defence need to take on a new position. The following research question will be added: to what extent discriminate force-specific HOPTS between each other?
Study 4
The objective of study 4 is to create self-report typical performance tests, which could be used for selection at home, while having a low probability of faking but still be a valid predictor of job performance. Research suggests that, when participants fill in self-report typical performance tests (e.g., measuring personality) when applying for a job (i.e., a high-stakes context), they tend to provide a more prosocial image of themselves compared to a research setting for example (e.g., Becker & Colquitt, 1992; McFarland & Ryan, 2000; Stark et al., 2001; Harold et al., 2006; Paunonen & LeBel, 2012). This is problematic considering the fact that faking reduces criterion-related validity (Loy et al., 2025), especially when measuring socially desirable traits (Speer et al., 2025). I therefore aim to develop biodata questionnaires guided by the Model of Empirically Scored Biodata Inventories (MESBI; Speer et al., 2019) with input from the Knowledge, Skills, Abilities, and Other characteristics (KSAOs) which are relevant in the military context. These questionnaires will be to a large extend products on the basis of the results of study 2 and 3. What characterizes biodata questionnaires is that they are historical and specific of nature (Mael, 1991). One can ask a respondent what he or she typically does when working with others at work. The latter deals with events that already took place and it is specific because it is tailored to the work setting.
Biodata questionnaires have shown to be an adequate instrument for assessing job performance. For instance, the meta-analysis of Speer et al. (2022) shows that the multiple correlation of biodata and general mental ability scores (GMAs) with job performance is .63. In addition, when using composite biodata scores alone, the predictive validity ranges between .24 (K = 22; N = 16,279) and .44 (K = 49; N = 20,564) depending on the scoring method (Rational versus Empirical respectively). The Rational method is a theory driven approach for identifying suitable items that might reflect the latent construct in question and weight them when this makes sense (e.g., Mumford & Stokes (1992). The Empirical method is a data driven approach for identifying statistical relationships between biodata items and a criterion (e.g., Mumford & Owens, 1987; Hogan, 1994).
Another advantage is that they are less sensitive to faking compared to traditional self-report questionnaires. This is because of mainly two reasons. First, biodata questionnaires use a forced choice format where respondents have to answer how they typically act or think (i.e., ipsative), rather than answering when he or she thinks or behaves in a certain way (i.e., normative). In other words, responses to biodata questionnaires are less likely to be confounded by social desirability. Second, when using biodata questionnaires, one can use next to unverifiable items also verifiable items. A verifiable biodata item could be: “During your previous work, how often were you late?”. If the true answer to this question can verified, then it is considered to be a verifiable item. Harold et al. (2006) show that, letting applications answer non-verifiable biodata items, leads to a lower predictive validity relating to job performance compared to verifiable biodata items.
Taken together, these instruments are expected to function better than currently applied self-report typical performance tests (i.e., because of their strong unique contribution next to GMAs for predicting job performance). In addition, they are expected to fasten the application process by letting applicants complete biodata questionnaires beforehand at home where there is less or no surveillance present (i.e., because they are less susceptible to socially desirable responses that can subsequently be verified). The question is, how do transformed self-report typical performance tests into biodata questionnaires, or created biodata questionnaires, perform when assessing applicants for suitability to work for the Dutch Ministry of Defence?
Supervisors
Prof. Dr. Ruud den Hartigh
Dr. Susan Niessen
Financed by
The Dutch Ministry of Defence
Period
1 October 2025 – 1 October 2029

