Prognostic models combine several prognostic factors to provide an estimate of the likelihood (or risk) of future events in individual patients, conditional on their prognostic factor values. A fundamental part of evaluating prognostic models is undertaking studies to determine whether their predictive performance, such as calibration and discrimination, is reproduced across settings. Systematic reviews and meta-analyses of studies evaluating prognostic models' performance are a necessary step for selection of models for clinical practice and for testing the underlying assumption that their use will improve outcomes, including patient's reassurance and optimal future planning. In this paper, we highlight key concepts in evaluating the certainty of evidence regarding the calibration of prognostic models. Four concepts are key to evaluating the certainty of evidence on prognostic models' performance regarding calibration. The first concept is that the inference regarding calibration may take one of two forms: deciding whether one is rating certainty that a model's performance is satisfactory or, instead, unsatisfactory, in either case defining the threshold for satisfactory (or unsatisfactory) model performance. Second, inconsistency is the critical GRADE domain to deciding whether we are rating certainty in the model performance being satisfactory or unsatisfactory. Third, depending on whether one is rating certainty in satisfactory or unsatisfactory performance, different patterns of inconsistency of results across studies will inform ratings of certainty of evidence. Fourth, exploring the distribution of point estimates of observed to expected ratio across individual studies, and its determinants, will bear on the need for and direction of future research.