Flawed Sleep-Training Study Makes Invalid Claims In the News
A flawed Australian study recently made national news. The researchers made extreme and exaggerated claims regarding infants. The researchers claimed two interventions during three months on a total of 7 babies in each group appeared effective with no detrimental effects on the infants or the families. Their claims were based on poor data, a flawed design and misleading information. Disregarding basic research standards, the researchers used inconsistent, invalid, and subjective data.
The sad result is misinformation and confusion to pediatricians and the parents who look to authorities for information and advice. We are all misguided by these easily-published, substandard, and inaccurate studies.
Child psychologists and other professionals are alarmed by these unsupported and inflated claims. Below is a response to the multiple errors in the study, “Behavioral Interventions for Sleep Problems: A Randomized Controlled Trial.” The article was published in Pediatrics: Journal of American Academy of Pediatrics, volume 132, May 24, 2016.
This study is multiply flawed but still makes generalizations for all babies based on two experimental groups with only 7 babies in each group at the end of a three month study. The flaws in the study listed below would render the paper unpublishable in other venues.
Data flaws include:
• 50% drop out rate: unacceptably high. No comparison of dropouts with retained, which would be required in top psychology journals. Are the retained participants representative of the group demographically? Was there a pattern of child age that stayed in or dropped out of the study? What missing data handling methods were used for the analysis? What missingness mechanisms were assumed for the analyses? If the missing data handling methods used in the paper are not appropriate, the results would be biased and misleading.
• Small sample size at beginning and even smaller at the end—the graduated extinction (cry it out) intervention had a sample size of n=7 at one month and n-7 at three months due to drop-outs. The fading intervention had a sample size of n=10 at one month and n= 7 at three months.
• Failure to collect real-time stress measures, i.e., cortisol data during nocturnal treatment. Researchers did not measure acute stress during interventions.
• Treatment success based on self-reports and sleep diaries, possible sources of bias. Parental responses can be influenced by educational levels, psychopathology, parenting styles and family dynamics, as well as desire to be successful in the study.
* Lack of diagnostic criteria or definition of “sleep problems.” Parents answered Yes/No to the question: “Does your infant have a sleep problem?”
* No intervention fidelity check. Did the parents do what they were told to do? Researchers failed to verify that interventions were delivered as designed. Parents underwent treatment sessions and collected cortisol samples morning and afternoons. Parents received a booklet describing interventions and cell phone support, however, no data was recorded for the calls.
• Inflated Type 1 error (unreplicatable chance findings that are not actually “true”) from running too many tests.
• Unclear data analysis models (linear mixed model regressions) making it hard to judge validity of analysis. Without more information about the models, it is difficult to evaluate whether the models were appropriate for the analyses. For example, are linear trends implied by the linear mixed models? If so, the trends shown in Figure 2, however, do not look linear. Therefore, the models might be mis-specified ones for the data.
• Critical details about the results are missing. For example, they had “significant interactions for sleep latency.” What kind of interactions are they referring to?
* Problems with internal validity. Researchers failed to address confounding variables. There was a range of ages, with average infant age of 10.8 months. The study duration was three months. Infant night time awakenings typically diminish by the first year of life. Cortisol responses also diminish from four months of age.
* No Durability Statement. (maintenance of the interventions). How long were changes maintained?
* Inconsistent and limited follow up. The follow up study was twelve months. This data was reported using all original participants (n=14 and n=15) as opposed to the actual numbers after drop outs (n=7 and n=7) in each intervention by three months. What did the drop outs do? No information.
Presentation flaws include:
* The methods should indicate the final full sample size, not the recruited size (which was already too small to draw conclusions).
* The statistics presented are on different numbers of participants throughout the paper and number of participants tested is misleading or unclear in the tables.
* Measures should be described, including their reliability.
Interpretation flaws include:
* No distinctions were made about effects by child age. The average child age in the education group at the outset was significantly younger which could explain the higher stress scores for mothers at the end of the study. How old were the participants for which all data was collected?
* Most critically, in the summary of the study (abstract), which is what most people will only read, no cautions were presented in the conclusions about any of the flaws we have mentioned.
Instead the paper concludes that “Both graduated extinction and bedtime fading provide significant sleep benefits above control, yet convey no adverse stress responses or long-term effects on parent-child attachment or child emotions and behavior.”
This will mislead doctors, to believe that sleep training is fine for all babies.
In summary, the problems of possible bias due to subjective data, lack of diagnostic criteria, lack of a fidelity check, problems with internal validity, no durability statement, inconsistent follow up data, and limited generalizability due to the small sample size compromise the integrity and validity of this research. These multiple flaws prevent drawing any reliable conclusion regarding adverse effects to infants.
Until researchers adequately define “sleep problems” in infants using diagnostic criteria and clear operational definitions, carefully measure cortisol levels during and after interventions, and use the gold standard for research (empirical studies), claims regarding what is not harmful to infants is speculative. Conclusions claiming no harm to children should be supported by large, random, well-controlled replicated and longitudinal studies with all relevant variables related to wellbeing. All such research conclusions must be interpreted with caution.
Darcia additionally comments: Though the authors provide a rationale for sleep training by citing adverse consequences of parents deprived of sleep, they do not offer the alternative perspective of how babies’ systems are regulated by the caregiver’s physical presence throughout the day and at night. Nor do they bring in the broader baseline of human evolution over millions of years. According to our evolutionary baseline, babies who are left alone to sleep is abnormal, unhealthy and dangerous. Such an understanding would prompt alternatives to advocating sleep training with poorly-wrought studies like this one and instead push for changes in policies and institutions (e.g., extensive paid parental leave) to make it possible for parents to stay with children as they learn to sleep on their own without coercion.