By Court Wing
Taking a deep look at the trial’s Supplementary Appendix, the response from the psychedelic science community, and the choice to measure the results using the QIDS depression rating scale.
On April 15, 2021 the New England Journal of Medicine published a study comparing the efficacy of psilocybin-assisted therapy to a popular SSRI antidepressant, escitalopram (sold under the brand names Lexapro, Cipralex, and others): titled: Trial of Psilocybin versus Escitalopram for Depression. The landmark paper written by the team at Imperial College London’s Centre for Psychedelic Research, concluded that the “trial did not show a significant difference in antidepressant effects between psilocybin and escitalopram in a selected group of patients”, which caused a bit of an uproar in the psychedelic science community.
Reactions and questions came quickly on social media: Was the paper edited too heavily by the New England Journal of Medicine? Were appropriate rating scales used to judge the effectiveness of psilocybin? Are the “real” results hidden in the study’s appendix? As a participant in NYU’s study on psilocybin-assisted therapy for major depressive disorder in 2020 who received incredible benefits (my depression of five years went completely into remission and has remained there), I felt it was necessary to try and explain the latest results in more depth.
The study in question, under lead authors Robin Carhart-Harris, Ph.D, David Nutt, MD, Rosalind Watts, D.Clin.Psy and others, was a double-blind randomized trial with 59 participants for six weeks to compare the efficacy of psilocybin versus a leading antidepressant in treating depression. Each trial started with a psilocybin dose day; one group received a high dose of 25 mg, the other a negligible dose of 1 mg. Then, the high dose group proceeded to receive a daily placebo while the low dose group received 10 mg of escitalopram each day for the first three weeks. At three weeks, the psilocybin group received a second 25 mg dose of the magic mushroom compound and continued with the daily placebo. The SSRI group received a second placebo, 1 mg dose of psilocybin and also had their daily dose of escitalopram increased to 20 mg. Both groups received an equal amount of extensive psychotherapeutic support and counseling, totaling around 35 to 40 hours during the six week-trial using Watts’s ACE therapeutic model: Accept, Connect, Embody.
Prior to the start of the trial, both groups received multiple and extensive depression assessments, using four different depression rating scales; QIDS- SR-16, HAM-D-1A, BDI-17, and MADRS. Of the four depression inventories, QIDS-SR-16 is the newest, designed for convenience of use so patients can “self-rate” (that’s what the SR stands for), and crucially for this trial, it was the primary scale used to compare psilocybin and escitalopram’s efficacy in fighting depression. However, lead author Robin Carhart-Harris has now stated that should have been better considered because QIDS-SR-16 is the least established of the four scales used. There are several issues as to why it was not the best rating scale to use and its results should be viewed as less accurate, and we will explain those issues below, but first let’s review the trial results as published.
In the abstract, the NEJM concluded:
“On the basis of the change in depression scores on the QIDS-SR-16 at week 6, [the mean (±SE) changes in the scores from baseline to week 6 were −8.0±1.0 points in the psilocybin group and −6.0±1.0 in the escitalopram group, for a between-group difference of 2.0 points] this trial did not show a significant difference in antidepressant effects between psilocybin and escitalopram in a selected group of patients.”
This is an extremely conservative and staid summary for all the rating scales and secondary outcomes. Even so, in my opinion, this alone is phenomenal because they are stating that psilocybin, a psychedelic compound, is at least as effective as a leading SSRI for treating patients with major depressive disorder. But the real results are in the data contained within the appendices and tables, many published in the Supplementary Appendix rather than in the abstract or main study itself, so let’s examine them.
Analyzing the Supplementary Appendix
In clinical research, the two main items to track in depression scores are the “response” rates and the “remission (remitter)” rates. A response rate means there is an improvement in depression symptoms in at least 50% of patients. A remission rate means that a patient no longer has enough symptoms to qualify for a medical diagnosis of depression; for all intents and purposes, it’s effectively gone. So even when we look at the solely at QIDS scores for those two rates, the difference is striking:
“A QIDS-SR-16 response occurred in 70% of the patients in the psilocybin group and in 48% of those in the escitalopram group… QIDS-SR-16 remission occurred in 57% [psilocybin] and 28% [escitalopram]… Other secondary outcomes generally favored psilocybin over escitalopram, but the analyses were not corrected for multiple comparisons. The incidence of adverse events was similar in the trial groups.”
In both ratings for the QIDS scale we see psilocybin outperform escitalopram by nearly double with only two doses as opposed to six weeks of daily doses. But also notice the statement at the end about secondary outcomes favoring psilocybin and that adverse events were similar.
Honestly, these are significant understatements when you look at the secondary outcomes directly in the appendices and tables. Certainly, as a leading scientific journal it’s a far better position to conservatively report the outcome rather than promote the results, but consider the following: In the three other well-established depression inventories, HAM-D, BDI, and MADRS, the response rate for psilocybin at the 6-week mark was between 67.9 and 76.7% while for the SSRI it was only 20.7 to 41.4%. Even more striking are the remission rates, lying between 28.6 and 56.7% for psilocybin while the SSRI produced remission at 6 weeks in 6.9 to 20.7% of participants. (Check out the Supplementary Appendix, pg. 13 to see for yourself.)
As this is a two-dose study, there was a similar outperformance after the first psilocybin dose; in two scales (QIDS and BDI) 33.3 to 51.7% of participants no longer qualified as being depressed by the end of the first week. In my opinion, it can’t be overstated how miraculous these remission rates are; these are patients that have often been non-responsive to other treatments for depression, and have likely been through a gamut of approaches, including psychotherapy, exercise, other antidepressants, alternative therapies, and had yet to find relief, let alone remission after a single week.
When we look at secondary outcomes, there are even more revelations. In a score known as “wellbeing”, participants in the psilocybin group increased 15.8 points after six weeks while those in the SSRI group only improved 6.8 points. This not only shows a reduction in depression symptoms, but a marked improvement in patients’ happiness with their sense of self. This is similarly reflected in the “Flourishing Scale” which found the psilocybin group to improve 14.4 points while the SSRI group only improved by 8.9 points after six weeks.
Other similar secondary outcomes also demonstrated remarkable efficacy for psilocybin including reductions in suicidal ideation, trait anxiety, experiential avoidance, anhedonia (which has implications for chronic pain), emotional breakthrough inventory, psychotropic related sexual dysfunction, and others. A key line to take from the caption for Supplementary Table S1 that compares depression inventory rates across all six weeks is: “All contrasts favored psilocybin. None favored escitalopram.” These are well established depression inventories that are used as the standard of comparison in nearly every modern study testing efficacy against nearly any method or medication for relieving depression, but because they were not chosen as the primary scales, they were classified as secondary outcomes. But if all these scores had been corrected against each other, including the QIDS, psilocybin would have shown to be clearly superior.
So why was QIDS chosen as the primary evaluation instead of the much more frequently employed MADRS inventory? As someone who had to take the MADRS inventory repeatedly in order to qualify for NYU’s investigational study of psilocybin for major depressive disorder, I will tell you it is surprisingly precise and accurate, making it nearly impossible to hide the depths of your disease from yourself. As much as we may mask the symptoms of our disorder to others in order to function in our day to day lives, we may in fact find we mask the severity of our symptoms to an even greater degree to ourselves. According to Carhart-Harris, the choice to use QIDS was almost arbitrary and now considered ill-advised in hindsight. And other professionals on Twitter and elsewhere online are largely in agreement, arguing that QIDS was a scale not designed to measure depression so much as one designed for patient convenience and to measure response to classic SSRIs. For example, QIDS has no measure for wellbeing, emotional breakthrough, experiential avoidance or, dare we say, mystical experiences.
SSRIs modulate and downregulate distressing feelings, but do not generally resolve them, much like a daily salve that keeps negative emotions just under conscious awareness. Psilocybin not only goes to the heart of engaging the origin of troubling feelings, but due to its ability to induce neuroplasticity, it’s theorized that the psychedelic compound directly aids in a cortical reorganization of prior maladaptive circuits and strongly held associations that create the framework of a patient’s life experience and the events in it.
Evaluating the Choice to Use the QIDS Scale
Worth noting about the QIDS scale relative to the other inventories in the study is a concept in statistics known as a confidence interval or CI. When a study is performed, it’s obviously not done on the entire population but on a sample of the population. A confidence interval is a measure of how likely the mean average of the results in the study population would match the mean average of results in the general population. It’s also a measure of how likely those same results would occur if scientists were to repeat the test multiple times.
In a study like this one where two medications are being compared against each other for efficacy, their confidence intervals can be laid out on a table or graph known as a forest plot. When the CIs are displayed on a forest plot, they are shown as a range of most likely results (i.e. -2 to -15). This is key because that allows researchers to demonstrate their confidence that a given range of results would occur for 95% of the general population or in repeated studies. 95% is the agreed upon standard for proof of any statistical significance in patient response to medication for this type of study. However, if on a forest plot, your CI crosses zero (which is the midline between the two groups), there is a far greater likelihood that there is no difference in effect between the groups.
So recall now that Carhart-Harris said that choice of QIDS was arbitrary as the main depression scale for the study and that their team of researchers predicted no difference in effect size between the psilocybin and escitalopram when they submitted the pre-req application to run the study. For more than a week before the study was released, Carhart-Harris did a daily thread on Twitter describing effect size, how different measurements may in fact be measuring the same issue and could be condensed, that NEJM analysis of the results are extremely conservative, but most of all he “implored” readers to view the supplementary tables and appendices, and to particularly look at the confidence intervals for the main inventory and then the confidence intervals for the secondary outcomes.
Carhart-Harris made a very careful note that confidence intervals that do not cross zero are considered statistically significant and those that do cross zero are considered insignificant. He directed us to look at Figure S1 and Table S4 where you will see at the top that the only inventory that crosses zero is the QIDS scale, which strongly implies its result is a false negative in showing no difference in outcome between the SSRI and psilocybin, and we can be confident of that because of the redundancy of the other evaluations they also used. Every other inventory and measure shows psilocybin far out pacing escitalopram by nearly a two to one margin. You can take a look yourself by accessing the study’s Supplementary Appendix, and turning to Section S6. Supplemental Figure S4: Mean change for primary and secondary outcomes with confidence intervals (pg. 16).
Conclusion
Between the extraordinary results in the secondary outcomes, the fact that the QIDS scale was the only inventory to cross zero in the forest plot, and the strong likelihood that modern depression scales aren’t designed to capture the full range of positive personality change that underpin psilocybin’s cortical mechanisms, it’s hard to see how this is not an overwhelming win for psilocybin.
It would certainly be remiss for me to not once again state I was a participant in a very similar study myself who experienced full remission and know others who experienced the same. I would be equally remiss to not mention that for many who took the two doses, their depression returned after a few months—but not all of them. However, this is already the case with standard daily antidepressants. And with psilocybin, there are no sexual side effects, you can actually feel a full range of emotions, and the frequency of dosing is far less. But for people that have either found themselves unresponsive to standard SSRIs, or experience untenable daily side effects from antidepressant medication, psilocybin appears to offer an equal, if not superior, opportunity to recover their happiness and effectiveness in their daily lives.
About the Author
Court Wing has been a professional in the performance and rehab space for the last 30 years. Coming from a performing and martial arts background, Court served as a live-in apprentice to the US Chief Instructor for Ki-Aikido for five years, going on to win the gold medal for the International Competitors Division in Japan in 2000 and achieving the rank of 3rd degree black belt. In 2004, Court became the co-founder of New York’s largest and oldest crossfit gym, and has been featured in the New York Times, Sunday Routine, Men’s Fitness, and USA Today. He is also a certified Z-Health Master Trainer, using the latest interventions in applied neuro-physiology for remarkable improvements in pain, performance, and rehabilitation. You can find out more on his website: https://courtwing.com