Task II: Judging Internal Validity

 Back to Core Tasks

Competencies

  1. Was the statistical design of the trial appropriate?
  2. Was there any intervention assignment bias?
  3. Were the intervention groups comparable?
  4. Was there any intervention-related bias?
  5. Were there co-interventions that may have confounded the results?
  6. Are the outcome variables meaningful?
  7. Was there any outcome assessment or measurement bias?
  8. Was there any follow-up bias?
  9. Were the results analyzed appropriately?
  10. What biases might the trial personnel have introduced?
  11. Is the trial internally valid?

Competency Decomposition

Competency A: Was the statistical design of the trial appropriate?

Subcompetency

Justification

Data Requirement

1. What were the study hypotheses?

Primary hypothesis is the question the trial was most designed to answer

1.a. primary hypothesis

Secondary hypotheses are the questions that data were collected for, but not necessarily for a definitive answer

1.b. seconday hypothesis

Findings for post-hoc hypotheses are less persuasive than for a priori hypotheses

1.c. post-hoc hypotheses

2. Were the analysis groups and subgroups appropriate?

Specification of a priori subgroups should relate to the study hypotheses

2.a. a priori subgroups

Findings for post-hoc subgroup analyses are less persuasive than for a priori ones

2.b. post-hoc subgroups

3. Was the trial designed with reasonable power to answer the primary hypothesis?

The outcome on which the power calculations are performed should be related to the primary hypothesis

3.a. powered outcome

The difference in effect size that the trial is designed to detect should be clinically significant

3.b. hypothesized difference in effect size 

The higher the alpha, typically 0.05, the greater the chance of false positive result in one or both directions

3.c.i. alpha level, ii. one or two-tailed

The higher the power, typically 0.80 (i.e. beta = 0.2), the lower the chance of false negative result 

3.d. power

The method for calculating a target sample size depends on the type of variable being analyzed

3.e. i. target sample size, and ii. method of calculation

Target enrollment is the recruitment goal

3.f. target enrollment

Target enrollment may have to be inflated above the target sample size to allow for enrollment refusals, dropouts, etc. This data also helps future planning for related trials. 

3.g. explanation of any difference between target sample size and target enrollment

The actual power achieved by the trial depends on the sample size achieved, and may lead to less negative predictive value than anticipated

3.h. actual sample size

4. Was the trial monitored appropriately?

Information on monitoring committees needed, to see who did the interim analyses, had stopping power

4.a.i. name, and ii. makeup of monitoring committee(s)

Details of interim analysis plans needed to assess whether bias may be introduced in subsequent conduct of trial 

4.b. interim analysis plans 

How the results affected execution of the trial is helpful for determining presence of any bias

4.c. procedure for reporting to investigators findings of monitoring

If no committee members were trained in statistics, they may miss errors

4.d. statisticians on monitoring committee?

Area of specialization of committee members may bias oversight

4.e. areas of specialization of monitoring committee members

A data monitoring committee member who was also an author may not be independent.

4.f. monitoring committee members authors?

5. Was the trial stopped prematurely?

Require details of stopping rule used

5.a. description of stopping rule

If stopping rule not defined a priori, may allow for bias in when to stop trial

5.b. when stopping rule defined

How often was the data peeked at? when? what adjustments were made for this?

5.c. i. monitoring schedule, ii. adjustment for multiple looks

How premature was trial stoppage? Premature termination of trial may exaggerate finding, and may leave secondary hypotheses unanswered

5.d. when trial stopped relative to planned

6. Were there important differences between the trial's design and its execution?

Require to know stage of trial to know what to critique

6.a. current stage of trial

If the protocol changed from design to execution, the trial may no longer be a valid test of the trial hypotheses

6.b.i. changes between intended and executed protocols, ii. reasons for the changes

Knowing when protocol changed gives idea of how many subjects were affected by the change

6.c. date of protocol changes

Back to [Top] [Core Tasks]

Competency B: Was there any intervention assignment bias?

Subcompetency

Justification

Data Requirement

1. What was the unit of randomization?

Definition of unit of randomization necessary to judge appropriateness of statistical analysis

1.a. unit of randomization

2. Was the randomization schedule truly random?

Randomized allocation minimizes selection bias by equally distributing unknown confounders between the intervention groups

2.a. random sequence generation method

If fixed randomization scheme: was one group oversampled? Variables that are stratified are not randomly distributed in the intervention groups; smaller blocking sizes can interfere with randomization

2.b.i. allocation ratio, ii. stratification variables, iii. blocking scheme

If adaptive randomization scheme: describe method (number, baseline, outcome adaptive?) 

2.c. adaptive randomization method

3. Was intervention allocated randomly?

Subjects have to be allocated to an intervention based on some application of the randomization schedule

3.a. method of intervention allocation

Unconcealed allocation is associated with exaggerated outcomes

3.b. method of allocation concealment

4. How effective was allocation concealment?

Data on whether the person in charge of allocating interventions could guess which intervention upcoming subjects were to get tells if person could second guess allocation

4.a. allocator's guess of intervention allocation

Back to [Top] [Core Tasks]

Competency C: Were the intervention groups comparable?

Subcompetency

Justification

Data Requirement

1. How effective was the randomization?

If baseline characteristics are equally distributed statistically between the randomized groups, unknown characteristics are also likely to be equally distributed.

1.a. i. baseline characteristics, ii. statistical test for difference, iii. statistical result

2. Were groups comparable after randomization?

Subject characteristics could have changed between eligibility determination and randomization, such that intervention groups become less comparable than at enrollment

2.a. time interval between enrollment and randomization

Subject characteristics could have changed between eligibility determination and randomization, such that intervention groups become less comparable than at randomization

2.b. time interval between randomization and intervention

Back to [Top] [Core Tasks]

Competency D: Was there any intervention-related bias?

Subcompetency

Justification

Data Requirement

1. What was the experimental intervention?

The intended intervention is what the trial was designed to test. Particular details depend on the type of intervention (drug, procedure, behavorial, environmental).

1.a. description of intervention i. type, and ii. type-specific details

Intended intervention may include modifications for specific subject circumstances

1.b. subject-specific adjustments allowed

Intervention effect can only be ascertained if it was clear who got what intervention 

1.c. which intervention groups assigned to intervention

Performance bias may exist if intervention received differed substantially from what was intended

1.d. differences between planned and actual intervention

2. What was the control intervention?

Since the intervention effect is specified as a comparison to the control, we must know what the control intervention was

2.a. description of control i. type, and ii. type-specific details

Rationale for a placebo control should be explicitly discussed

2.b. justification for type of control

Explicit description of similarity of interventions yields information on probability of success in masking intervention

2.c. similarity of control and experimental intervention

Intervention effect can only be ascertained if it was clear who got what intervention

2.d. which intervention groups assigned to control

3. Was there differential compliance across the intervention and control groups?

Exclusion bias can result if certain types of subjects are more likely not to complete assigned intervention.

3.a. what proportion of each intervention group completed their assigned intervention

Subjects who complete their assigned intervention but do so with less than 100% compliance dilute the intervention effect

3.b. compliance in each intervention group

Presence of systematically different reasons between intervention groups to discontinue assigned intervention introduces a hidden bias

3.c. i. reasons for not completing assigned intervention, ii. number of subjects for each reason in each intervention group

Subjects who cross-over dilute the intervention effect

3.d. number who crossed over to other intervention 

4. Was intervention masking achieved?

Unblinding of subjects may lead to performance bias

4.a.i. method, and ii. efficacy of blinding of subjects to intervention

Unblinding of care providers may lead to performance bias

4.b. i. method, and ii. efficacy of blinding of provider(s) to intervention

Unblinding of study nurses may lead to performance bias

4.c. i. method, and ii. efficacy of blinding of study nurse(s) to intervention

Unblinding of investigators may lead to performance bias

4.d. i. method, and ii. efficacy of blinding of investigator(s) to intervention

5. Were trial participants blinded to interim trial results?

Unblinding of subjects to results may lead to performance bias

5.a. i. method, and ii. efficacy of blinding of subjects to results

Unblinding of care providers to results may lead to performance bias

5.b. i. method, and ii. efficacy of blinding of provider(s) to results

Unblinding of study nurses to results may lead to performance bias

5.c. i. method, and ii. efficacy of blinding of study nurse(s) to results

Unblinding of investigators to results may lead to performance bias

5.d. i. method, and ii. efficacy of blinding of investigator(s) to results

Back to [Top] [Core Tasks]

Competency E: Were there co-interventions that may have confounded the results?

Subcompetency

Justification

Data Requirement

1. Could pre-enrollment interventions have confounded the results?

If used, how long was the washout time? A prior intervention may still be a confounder if its effects last longer than washout period

1.a. duration of washout period

2. Were there co-intenventions that may have confounded the results?

Allowed co-interventions helps in generalizability

2.a. description of allowed co-interventions i. type, and ii. type-specific details

Effects that are in fact due to co-interventions may be falsely attributed to the intervention

2.b. i. type, and ii. type-specific details of actual co-interventions, iii. by which intervention groups

If co-interventions were disproportionately taken by one group, then the observed effect cannot so easily be ascribed only to the tested intervention

2.c. proportion of each intervention group taking each co-intervention

3. Could follow-up activities have confounded the results?

Frequent clinic visits during trial follow-up may lead to improved outcomes that are not generalizable to the non-experimental setting

3.a. schedule of follow-up visits

Actions at each follow-up could constitute additional therapy, or may lead to casefinding bias

3.b. actions during follow-up

Follow-up personnel could have contributed a intervention effect, e.g. friendly nurses

3.c. personnel that carried out the follow-up activities

Performance bias may exist if intervention groups received more follow-up activities differentially

3.d. proportion receiving follow-up activities per intervention group

Back to [Top] [Core Tasks]

Competency F: Are the outcome variables meaningful?

Subcompetency

Justification

Data Requirement

1. What were the outcome variables?

Well-defined outcomes (e.g. death) are less subject to error in measurement than poorly defined ones

1.a. outcome definitions

Timing of outcome assessment should make sense pathophysiologically or clinically, and on relevant subgroups if not assessed in all subjects

1.b. i. when outcome assessed, ii. on which intervention groups

Primary outcome is the one used in the a priori power calculation for the trial

1.c. designation of i. primary and ii. secondary outcomes

2. Are the outcomes intermediate or final?

Intermediate outcomes may give only weak support to the study's hypothesis

2.a. outcome definitions

Require the study hypotheses to determine if the outcomes are intermediate or not

2.b. i. primary and ii. secondary hypotheses

Require the objective of the study to determine if the outcomes are intermediate or not

2.c. study objective

3. What side effects, if any, were monitored?

Side effects important for establishing the clinical context of the intervention effect

3.a. side effect definitions

Timing of side effect assessment should make sense pathophysiologically or clinically, and on relevant subgroups if not assessed in all subjects

3.b. i. when side effects assessed, ii. on which intervention groups

4.Were there any changes in the outcome definitions between design and execution?

Trial may not be as valid if trial actually measured something other than originally intended

4.a. i. outcomes changed, ii. why, iii. to what

Back to [Top] [Core Tasks]

Competency G: Was there any outcome assessment or measurement bias?

Subcompetency

Justification

Data Requirement

1. How was each outcome assessed?

Full description of assessment method is needed to assess presence or absence of detection bias

1.a. description of assessment method

Untrained or improperly trained assessors can introduce detection bias

1.b. description of assessors

2. How accurate was the assessment method?

Unreliable or poorly validated measurement may cause detection bias

2.a. i.validity and ii. reproducibility of assessment method

3. Did the outcome assessors have any knowledge that may have led to biased assessment?

Lack of assessor blinding can lead to detection bias

3.a. i. method, and ii. efficacy of blinding of assessor(s) to intervention received

Lack of assessor blinding can lead to detection bias

3.b. i. method, and ii. efficacy of blinding of assessor(s) to interim results

Back to [Top] [Core Tasks]

Competency H: Was there any follow-up bias?

Subcompetency

Justification

Data Requirement

1. Was there differential follow-up between the intervention and control groups?

Lesser follow-up reduces the precision of observed results, and magnifies potential exclusion bias

1.a. proportion of subjects followed up, in each intervention group

Exclusion bias can result if certain subjects are systematically more likely to be lost to follow-up

1.b. clinical characteristics of i. followed and ii. not followed, in each intervention group

Reasons for loss to follow-up may provide information on nature and extent of exclusion bias

1.c. i. reasons for lack of follow-up, and ii. how many for each reason, in each intervention group

2. Was there differential rates of outcomes assessment between the intervention and control groups?

Missing data can lead to exclusion bias, from incomplete measurement

2.a.% of subjects yielding usable data at each timepoint, in each intervention group

Exclusion bias can result if certain subjects are systematically more likely to be lost to follow-up

2.b. clinical characteristics of i. assessed and ii. not assessed, for each outcome in each intervention group

Reasons for lack of outcome assessment may provide information on nature and extent of exclusion bias

2.c.i. reasons outcome not assessed, and ii. how many for each reason, for each outcome in each intervention group

Duration of follow-up gives information on attrition of subjects overtime

2.d.i. mean follow-up, ii. person-years of follow-up for each outcome, in each intervention group

Back to [Top] [Core Tasks]

Competency I: Were the results analyzed appropriately?

Subcompetency

Justification

Data Requirement

1. What were the raw results of the study?

Raw results must be clear, e.g. must have a denominator

1.a.i. numerator and ii. denominator of all raw results

Both the estimate of the effect and its precision (e.g., standard deviation) are needed

1.b. summary descriptors, with precision

Parameterized summary descriptors can be misleading if done inappropriately

1.c. justification for parameterization, or transformation

Require to know when this datum was assessed

1.d. follow-up time per datapoint

2. What perspective(s) was used?

Intention-to-treat analysis is less biased than efficacy analysis, but efficacy analysis provides more information on effectiveness of intervention

2.a. intention to treat and/or efficacy analysis?

Many different definitions of intention-to-treat and of efficacy analysis are used

2.b.i definition of intention to treat analysis, ii. definition of efficacy analysis

3. Were appropriate statistical analyses performed?

Require to know which statistical method used for each test, to be able to duplicate it. Software errors may invalidate results

3.a. for each test, i. name of statistical method(s),ii. software used

Inappropriate methods can yield misleading results

3.b. justification for use of these statistical methods

Actual value of test statistic more useful than a declaration of significance

3.c. actual result of test statistic, i. estimate, ii. upper 95% and iii. lower 95% confidence interval

Statistical methods have strong assumptions about nature of data that may be inappropriate (e.g. normality)

3.d. evidence that assumptions were fulfilled or reasonable

4. Were losses to follow-up handled appropriately?

Inappropriate handling of losses to follow-up can lead to misleading results

4.a. censoring method

5. Are the results robust to alternative analyses and inferential statistics?

Subject-level data needed for reanalysis by other investigators using other methods

5.a. raw results, follow-up time, and completeness, as II.H.2.d, II.I.1.a and d

Back to [Top] [Core Tasks]

Competency J: What biases might the trial personnel have introduced?

Subcompetency

Justification

Data Requirement

1. Could the source of funding have introduced bias?

Commercial or other interests may influence a study's outcome

1.a. funding source i.who, ii. what type

The reporting may be biased if biased sponsors had right to modify or withdraw the manuscript

1.b. funder's role in preparation of manuscript

2. How likely is it that the investigators introduced bias?

Particular investigators may have known subject biases

2.a. investigators

Area of specialization may bias design and/or results

2.b. area of specialization of each investigator

If investigators have financial interest in outcome of study, they could introduce bias

2.c.i amount of money involved, ii. nature of financial conflict

Open access to investigators for questions and clarifications provides accountability for integrity of results

2.d. i. name and ii. contact information for contact person

3. What assurances are there that the trial was conducted with integrity?

Any retractions or corrections, due to intentional fraud or unintentional error, may limit internal or external validity 

3.a. description of any i. fraud, ii. retraction, iii. correction

Previous history of fraud by an investigator would increase our prior suspicion of fraud in the study

3.b. integrity record of investigators and funders

Back to [Top] [Core Tasks]

Competency K: Is the trial internally valid?

Subcompetency

Justification

Data Requirement

1. Were the trial's conclusions supported by the data?

Requires the authors' interpretation of the trial

1.a. authors' conclusion of the trial

Conclusions are supported by the results

1.b. all the data requirements for II.A.1.a-b, II.H.2.d, II.I.1.a and d

2. What study limitations were acknowledged?

Authors identification and discussion of study limitations helps to judge proper strength of conclusion 

2.a. authors' statement of study limitations

3. What were the recommendations for clinical action supported by the trial results?

Requires the authors' recommendation for clinical action, if any

3.a. authors' statement of clinical application

Back to [Top] [Core Tasks]


© 2002-2010 The Regents of the University of California. Last modified 30-Jul-08