Skip to main content

Pseudoscience

Who owns BPSM?

That's an excellent question.

Diane Jacobs, talking about dermoneuromodulation (DNM)--a practice that she has developed, and that we'll talk more about here later--answered that intellectual property question first, and better than I could have come up with off the top of my head.

When asked:

What's a good name for working top down and bottom up?

 

she answered:

Dermoneuromodulation. :)smiley

It covers the manual territory from skin cell to self of self and leaves out the mesoderm entirely. It is not a copyright term.

Anyone can use it, to describe what they do, manually, if they want. This made-up word is not copyright. I give it away. Please take it. Use it to get away from words like "fascia" and "muscles" and "joints" and "bones" and "ligaments" and "tendons".

 

In the same way as Diane practices with regard to her development of DNM, I don't claim any restrictions on anyone's access to use of the term through copyright or ownership over the term "biopsychsocial massage (BPSM)".

I give it away to the community to use freely, in the same spirit of open access and Creative Commons licensing that POEM is founded on.

There is only one condition of usage--you cannot apply the term to something it is not, any more than someone can make a dog into a cat, just by calling it one.

Source: Left, http://upload.wikimedia.org/wikipedia/commons/8/8c/Poligraf_Poligrafovich.JPG; Right, http://upload.wikimedia.org/wikipedia/commons/9/97/Feral_cat_Virginia_crop.jpg accessed 18 November 2012

 

In a similar way, you can't make non-BPSM practices into BPSM simply by slapping that label on them.

Diane explains that, although she gives the term away freely, that

It should contain only nervous system considerations though, because really, when push comes to shove, only the nervous system can respond (short term, OR, and ESPECIALLY, long term) to what we "do" to another person, manually. Of that I'm convinced.

 

Similarly, if you're not practicing biopsychosocial massage, the term does not apply to what you actually are doing.

You have every right under principles of freedom of conscience to reject classical Newtonian physics, for example, and to say that it does not apply to the work that you are doing. But that claim is inconsistent with the principles of BPSM, and so that inconsistency means, beyond the shadow of a doubt, that your practice is not a biopsychosocial massage practice. Which is fine in itself; you are entitled to practice any way you want to, subject to professional ethics and to regulations in your jurisdiction. All it really means is that you don't get to label it something that it is not--no more, no less.

There is a Cambodian saying that men are like diamonds and women are like silk--if you drop them in the mud, you can wash the diamond and it's as clean as it ever was, but the silk is stained forever.

«បុរសជាមាសទឹកដប់ ទោះធ្លាក់ចូលភក់ ហើយលើកមកវិញ ក៏នៅតែជាមាសទឹកដប់ដដែល តែនារីវិញ ប្រៀបបាននឹងកំណាត់សំពត់ស បើកាលណាធ្លាក់ចូលភក់ជ្រាំហើយ ទោះខំប្រឹងបោកគក់លាងសម្អាតយ៉ាងណា ក៏មិនដូចដើមដែរ» (courtesy of Frank Smith)

 

Source: Left, http://upload.wikimedia.org/wikipedia/commons/8/8f/Apollo_synthetic_diamond.jpg; Right, "Weathered Memories/2008" by Joan H. Calloway ("wishes, true and kind") http://3.bp.blogspot.com/_Q8uC-dZACLA/TJ7nFt-t2cI/AAAAAAAACaY/eDRBb_GeD38/s400/DSCN0956.JPG accessed 18 November 2012

 

Let's put aside for the moment the blatant sexism in that proverb ("dropping them in the mud" is a metaphor for their being sexually active, and this is the classic embodiment of the double standard against women in so many traditional societies), and see if there is any useful imagery there for us to communicate a distinction in a totally different domain, without being insulting to more than half of the population.

The term "biopsychosocial massage" refers to massage practiced in an evidence-based, science-based, client-centered way, that understands health, wellness, and disease in terms of natural (not supernatural) processes in the material physical universe among biological, psychological, and sociocultural aspects of life, as well as their interactions and the emergent effects that arise from them.

Anyone who practices massage in this way is practicing BPSM.

If that term is consistently applied to only those practices, then it is a clean and brilliant diamond that clients and other massage stakeholders can use as a baseline to understand exactly what BPSM has to offer.

If the term is (figuratively) dropped in the mud by applying it to anything and everything, no matter whether or not it is consistent with the principles of BPSM, then--like the silk--it is stained forever, and it becomes useless for clients and other massage stakeholders to use as a guide to understand what BPSM has to offer.

So I give the terms "biopsychosocial massage" and "BPSM" to the community to use freely, on the one condition that they not be diluted by applying them as mere buzzwords to massage or other practices that are not massage practiced in an evidence-based, science-based, client-centered way, that understands health, wellness, and disease in terms of natural (not supernatural) processes in the material physical universe among biological, psychological, and sociocultural aspects of life, as well as their interactions and the emergent effects that arise from them.

(Not yet clear on what that means in actual practice? That's ok; there's a great deal of rich material there to explore in depth. We're going to spend some quality time connecting the dots, and translating them into what they mean for actual practice. I just want to get that general principle out there; now that it is, we can do some real work on establishing what it means in practice.)

So the answer to the question in the post title, "Who owns BPSM?" is: It is entrusted to the responsible and sustainable stewardship of the massage community.

 

cheers, to Diane Jacobs!

 


UPDATE, 18 November 2012, 10:57 AM PT:

Gayla Coughlin points out that some of my statements above, as written, are unclear in what they mean for actual practice, and might result in outcomes that I don't want.

I thank her for giving me the opportunity to correct my inaccuracies, and to get closer to my intended outcome.

I am thus taking out a Creative Commons license on biopsychosocial massage (BPSM), and here are the conditions attached to that license.

The particular form of the Creative Commons license that most suits my intent for this work is Attribution-ShareAlike CC BY-SA.

Their blurb explains:

This license lets others remix, tweak, and build upon your work even for commercial purposes, as long as they credit you and license their new creations under the identical terms. This license is often compared to “copyleft” free and open source software licenses. All new works based on yours will carry the same license, so any derivatives will also allow commercial use. This is the license used by Wikipedia, and is recommended for materials that would benefit from incorporating content from Wikipedia and similarly licensed projects.

--"About the Licenses", http://creativecommons.org/licenses/ accessed 18 November 2012

 

What this means is that you can build on, develop, and grow biopsychosocial massage, but only on the condition that you share your work with the community in the same way ("license their new creations under the identical terms")--you cannot take the work that I and others have done on biopsychosocial massage, and trademark or copyright it for yourself. This license thus protects biopsychosocial massage for use by the entire community, rather than having someone seize it away from us in a proprietary way.

The Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) page explains it in this way:

You are free:

  • to Remix — to adapt the work
  • to make commercial use of the work

This means it is approved for Free Cultural Works

Under the following conditions:

  • Attribution You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).

  • Share Alike — If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.

With the understanding that:

  • Waiver — Any of the above conditions can be waived if you get permission from the copyright holder.
  • Public Domain — Where the work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.
  • Other Rights — In no way are any of the following rights affected by the license:
    • Your fair dealing or fair use rights, or other applicable copyright exceptions and limitations;
    • The author's moral rights;
    • Rights other persons may have either in the work itself or in how the work is used, such as publicity or privacy rights.
  • Notice — For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to this web page.

--Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) page accessed 18 November 2012

 

If my statements above sounded like I objected to commercial use on anyone's part, then that was due to my inaccuracy--I have no objection to anyone earning a living by teaching classes, writing books, or anything like that, as long as you honor the moral rights that attach to my Creative Commons licensing of biopsychosocial massage. And by "mere buzzwords", I was not objecting to using the term to market your works based on biopsychosocial massage. I specifically meant slapping the label on practices where it does not apply, in order to market something that is incompatible at its core with biopsychosocial massage.

By "moral rights", I specifically mean that I do not want anyone to use the label "biopsychosocial massage" to endorse practices that are anti-scientific or pseudoscientific, or that are not client-centered. Those violate the spirit of biopsychosocial massage, and are an infringement of my moral right to delineate a set of massage practices and theory that are consistent and compatible with modern science and with evidence in the material physical world.

If you respect that moral right, then you are free to build on and develop biopsychosocial massage for non-commercial or commercial uses, but you cannot take it away from the community by trademarking or copyrighting it for yourself.

So I believe that the conditions of this license protect my intent to release it to the responsible and sustainable stewardship of the community, at the same time that it protects the content from being distorted by misuse of the label to apply to something that contradicts the heart of biopsychosocial massage.

 

cheers, to Gayla Coughlin!

 

Creative Commons License
Biopsychosocial massage (BPSM) by Ravensara S. Travillian is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Based on a work at http://poem-massage.org/content/biopsychosocial-massage-bpsm-new-lineage.

Biopsychosocial massage (BPSM): A new lineage

There is grandeur in this view of life, with its several powers, having been originally breathed into a few forms or into one; and that, whilst this planet has gone cycling on according to the fixed law of gravity, from so simple a beginning endless forms most beautiful and most wonderful have been, and are being, evolved.

--Charles Darwin, Origin of Species, close of first edition, 1859

 

Source: Left: http://upload.wikimedia.org/wikipedia/commons/1/18/Charles_Darwin_by_G._Richmond.png; Right: http://upload.wikimedia.org/wikipedia/commons/a/a0/George_Richmond_-_Emma_Darwin_-_1840.jpg accessed 17 November 2012

 

Charles Darwin, whose biological observations led to the development of evolutionary theory, and his wife Emma loved each other very much.

Their many letters to each other over the years (preserved online in the Darwin Correspondence Project) stand as a testament to how much they thought, cared, and worried about each other.

In one letter, written around February 1839, Emma expresses her wish as a faithful believer, but at the same time, also admits to her doubts in her own hope:

The state of mind that I wish to preserve with respect to you, is to feel that while you are acting conscientiously & sincerely wishing, & trying to learn the truth, you cannot be wrong; but there are some reasons that force themselves upon me & prevent my being always able to give myself this comfort.

--Darwin Correspondence Project, Darwin, Emma to Darwin, Charles [c. Feb 1839] accessed 17 November 2012

 

She wants to feel secure that, if she (or he, or anyone) is really trying diligently and sincerely to learn what is true, that that effort guarantees that she cannot possibly be mistaken about what she is learning. The reason she is so concerned about this is that she was devoutly religious, and she knew that Charles had doubts about religion.

To be impossible to be wrong, through sheer effort and sincerity, is a lovely wish--and yet, in the same sentence, she admits to her beloved husband that even she herself cannot always keep up that belief.

She was right to be concerned about that issue--the history of science at that time in England contains many examples of geologists, paleontologists, biologists, and other scientists who set out on a journey to find evidence in the material physical natural world that proved the stories in the Bible to be literally true.

For example, if the story of Noah's Ark and the Flood were literally true, you would find evidence of it in the layers of rock in that part of the world. The scientists who set out to find it discovered that that evidence is not there, but other evidence, showing that other things happened, is indeed there.

The scientists who set out to demonstrate that the earth is literally only a bit more than 6000 years old demonstrated instead that they would have to reject all the other multiple sources of repeatable, verifiable evidence that showed the earth to be much older than that.

Darwin himself demonstrated that--rather than the Genesis creation story that species were created one time in their present and unchanging form--species actually change over time to better adapt to the environments they find themselves in.

When the evidence these scientists found contradicted what they wanted it to say about the literal truth of the Bible, they faced a test of their own moral character in deciding what to do next about that fact:

  1. They could ignore the evidence, pretend the discovery never happened, and never face the meanings of the contradictions between the evidence and what they believed, or

    Source: http://thinkingmomsrevolution.com/wp-content/uploads/2012/06/fingers-in-ears.jpg accessed 17 November 2012
     
  2. They could double-down on their belief, holding on even tighter to it while rejecting the reality of the material physical evidence, or

    Source: http://www.examiner.com/images/blog/wysiwyg/image/bad_poker.jpg accessed 17 November 2012
     
  3. They could accept the reality of the material physical evidence, revising their beliefs as needed to resolve the contradictions between the beliefs and the evidence.

    Source: http://2.bp.blogspot.com/-LSEZYAmp3P0/UEKl9Td19sI/AAAAAAAACKE/m0nhbygv1nU/s1600/alone.jpg accessed 17 November 2012

 

Some of the most solid scientific knowledge that we rely on every day came from people who had the courage to face the implications for their beliefs that the evidence presented them, and the integrity to not turn away from or deny the contradictions, but rather to engage with them.

To take a more contemporary example of that same spirit, this quotation from Julie Onofrio is, for me, the essence of the courageous engagement that we so urgently need to participate in if we really want to become a profession:

Having an open forum and getting some help in analyzing research is really needed in our profession. Yes, I have to say it disturbs me when the researchers say things like traditional modalities don't work--it's like a slap in the face to all who are doing energy work, or reiki, or Rolfing, and having results and success. It's very hard not to take it personally, but also to set emotions aside and remain in communication. But that is why I support it. I want to learn more and to support the profession in understanding research.

 

This willingness to remain engaged, even when it's difficult because it contradicts what we've been taught, is nothing short of admirable. Julie is showing the courage of facing difficult dilemmas that evidence presents us about how massage actually works, and she is actively engaging with that process, and in that, she is going the extra mile.

Like Emma and Charles Darwin, most MTs are good, decent, caring, and loving people, who want to understand the truth.

If just wanting it sincerely, and working hard at it, were enough by themselves to avoid error, most of us would be there already.

Sadly, in this material physical universe, those good intentions are not sufficient to help us to be correct.

 

 


The National Certification Board for Therapeutic Massage and Bodywork (NCBTMB) is an independent non-profit organization that offers national certification in massage and bodywork.

This national certification functions as a path to initial licensure (sometimes the only path) for MTs in some states.

The Board has undertaken a major revamp of policies and procedures, one which is causing a great deal of disruption among nationally certified MTs and continuing education providers.

Its CEO, Mike Williams, states that the purpose and effects of this change are

streamlined online processes, enhanced communications, and improved programs that elevate the profession and better serve the public.

--NCBTMB front page accessed 17 November 2012

 

Some of those changes may well have that effect--I am not personally nationally certified, and I have not yet examined the changes in depth as other MTs and bloggers such as Laura Allen have.

But in the FAQ about the new procedures for approving continuing education providers, there is--for me--an absolute deal-killer.

 

Q: Will NCBTMB continue to accept alternative courses like energy work, aromatherapy, animal massage, etc?

A: Yes. Massage therapy is part of the holistic profession as are several other modalities and techniques. NCB will continue to accept modalities and techniques that can be legally practiced by a massage therapist without another healthcare provider, (i.e., DC, MD, PT) present. As long as the technique or modality can be shown to be embedded in the lineage of massage, it will be accepted. This means that if the core information of the technique or modality can be referenced as a derivative of another technique or modality that is within the massage therapy scope of practice it will be accepted.

--NCBTMB Approved Providers FAQ accessed 17 November 2012

 

 

The argument over the relationship between massage and "energy work" is nothing new.

In the early 1990s, when I was in massage school, the NCBTMB was developing the first national certification exam--the National Certification Examination for Therapeutic Massage and Bodywork (NCETMB). Eventually, as a result of consumer pressure, they were forced to offer an energy-free alternative, the National Certification Examination for Therapeutic Massage and Bodywork (NCETM), for those MTs who did not want to be coerced into an anti-evidential belief system as the price of their professional training and licensure.

Although the argument is nothing new, there was a fresh opportunity to do something innovative here among the other disruptive changes--but NCBTMB did not take that opportunity.

Instead, they opted to permit teaching any information (which includes misinformation and malinformation) as approved continuing education, as long that that can be shown to be "embedded in the lineage of massage". Considering the long history of "massage myths", documented by Laura Allen (here and here), Lee Kalpin, Paul Ingraham, and many others, it is clear that just because an idea has been embedded in massage, even for a very long time, that does not mean the idea is correct.

NCBTMB had an opportunity to stand up for the principle that, in the therapeutic encounter, a professional should provide only validated warranted (justified or justifiable) high-quality information to the client.

They did not take the opportunity to stand up for that principle, and as a result of that decision, I cannot participate in their new process. I will not go on to apply for national certification as a practitioner, nor will I become an approved continuing education provider under those standards.

I regret those facts, as I consider them massive missed opportunities. But I cannot do it, because our first principles on these matters are so far apart as to be irreconcilable.

Don't misunderstand me here--I am positive that the NCBTMB members are well-intentioned, and that they wanted to do the right thing. I genuinely believe that they were attempting to have the best of both worlds for the benefit of all massage stakeholders, and to not hurt anyone's feelings.

I respect them as the kind, caring, motivated, passionate people that they clearly are.

If that, by itself, were enough to be right, as Emma Darwin wished, we would not have to have this very serious and difficult discussion.

But evidence doesn't work like that--you can't pick and choose which evidence you accept, and which you reject. Either you accept all the evidence, and you go courageously wherever those implications take you, or you just don't accept the evidence.

If they are going to accept massage's traditional explanation of "energy work"--no matter how many times that explanation has been shown by the evidence to be mythical--as validated approved continuing education with their official imprimatur, then they are not preparing MTs who are taught that explanation for modern translational science. Holding on to old ideas even after they have been disproven is an active obstacle to understanding these new developments.

The environment of massage is exhibiting selection pressures toward a type of massage that is integrated with validated high-quality information, and that prepares MTs for understanding advances in neuroscience, cognitive science, endocrinology, and pain science, and translating that understanding into clinical practices that are client-centered and effective.

As a direct response of those pressures, biopsychosocial massage is breaking off from the main lineage of massage to provide a new massage lineage that is fully consistent with those principles.

Source: Darwin's first documented sketch of an evolutionary tree, around 1837, from his notebooks http://www.sciencebuzz.org/sites/default/files/images/myers_darwin_tree.png accessed 17 November 2012

 

 


You can consider this the official birth announcement of a new lineage of massage.

Biopsychosocial massage (BPSM) is massage understood and practiced in a biopsychosocial model. It understands massage, health, wellness, and illness, and the knowledge bases underpinning those concepts in an evidence-based, natural (meaning, not supernatural), organic way that draws on what we know about biology and other natural sciences, psychology, sociocultural aspects of being human, and the emergent effects that arise from interactions among these various factors.

Psychosocial and cognitive approaches don't require that you become a clinical psychologist but that you have a broad concept of the influence of those factors and that you account for them in your encounters with your patients. Know the literature and be able to give management advice based on evidence. When people come to see you they want a plan. Have a plan that is defensible and that works toward their goals. Address concerns, fear avoidance, other stress, and unhelpful beliefs with compassion, understanding, empathy, and informed knowledge.

Understanding why people hurt is part of our professional responsibility and should change most everything we do on a daily basis away from traditional methods and towards methods defensible with modern science.--Jason Silvernail accessed 5 August 2011

 

An example of a biological factor in health could be increased cortisol in the bloodstream in response to chronic stress. The interaction of that biological factor with the increased daily stress in modern society would be an example of interactions among biological factors and sociocultural factors.

An example of a psychological factor in health could be a man who is less likely to seek professional treatment for pain than a woman is, because of his perception that stoically enduring pain is what men do in the society he grew up and lives in. The increased structural damage that can occur as a result of ignoring symptoms and delaying treatment is an example of the interactions among psychological factors and biological factors.

An example of a social factor in health could be the relative stigmatization of mental or behavioral illness, as compared to how more clearly structural conditions are regarded. This stigmatization can drive psychological conditions underground--say, for example, if someone did not get needed psychological treatment because they didn't want it to show up in their medical record. That would be an example of interactions among sociocultural factors and psychological factors.

Biopsychosocial massage is client-centered. That means that the psychological and social factors in the client's unique experience, as well as the universal biological factors we are all subject to, is the center of where we focus our attention and caring. It doesn't mean that we accept everything in someone else's experience is literally true. It does mean that we recognize that, for them it feels true, and for that reason alone, it is important in where we meet the client in the therapeutic encounter.

Biopsychosocial massage welcomes self-expression and the art of massage. It is clear, however, that sometimes our need for self-expression can come into conflict with clients' immediate healthcare needs, and--when that happens--we recognize that, in order to act as healthcare professionals, our ethical fiduciary duty is to put the clients' needs first, ahead of ours if necessary.

Biopsychosocial massage is wholistic, integrative, and evidence-based. That means that it does not draw upon supernatural explanations of mechanisms, and it builds upon foundational knowledge in the sciences to evaluate and validate the evidence for or against particular claims of effectiveness or mechanisms.

That means that we understand and practice it in a holistic, complementary, and integrative way, integrated with other domains of human knowledge and with the natural universe we find ourselves in, rather than silo'ed off in an alternative universe that denies material physical reality, and isolates us away from members of the client-centered biomedical healthcare team.

If a proposed explanation for an effect requires us, for example, to reject physics, as the explanation of "energy work" embedded in massage tradition does, then we face that contradiction head on, and we work to resolve it. If that means updating old beliefs in the light of new evidence, then that is the consequence of practicing biopsychosocial massage.

Michael Hamm is another contemporary example of courageous engagement, facing the evidence head-on and seeking to better understand. I'm paraphrasing his quote here, and I trust that he'll correct me if I've gotten it wrong. If I can find the original quote, I'll replace the paraphrase, but it was something to this effect:

I understand and accept that the traditional anatomical explanation behind craniosacral therapy doesn't hold up in light of the evidence. At the same time, I can't deny that I feel something when I am doing that work, something that I can't explain. I want to better understand what is going on when I do that work.

 

In the absence of clear evidence of what is exactly going on, this suspension of previous belief that has been disproven (and not yet replaced) is totally in line with the principles of BPSM. We don't have to always know everything; we just have to know what we do know, what we don't know, and how strong the evidence is behind our knowledge.

Since our encounters with clients will always run ahead of the available high-quality evidence, we don't limit ourselves only to what has been rigorously validated by studies and nothing else. We take our professional experience into account, and we actively seek to understand and incorporate the clients' preferences, whenever possible, in treatment. But in all these cases, in developing our approach to caring for the client, we remain clear on what is evidence, what is speculation, what is science, what is art, what is literal, and what is metaphor.

Understanding the material physical universe around us, and the centuries of cumulative human knowledge about that universe, give us powerful tools to draw upon. That understanding, combined with the caring that characterizes so many people who choose to go into massage as a career, is the heart of biopsychosocial massage.

Neil deGrasse Tyson sums it up almost perfectly:

I am driven by two main philosophies, know more today about the world than I knew yesterday. And lessen the suffering of others. You'd be surprised how far that gets you.

--Neil deGrasse Tyson

 

That quotation demonstrates the core of massage in a biopsychosocial model.


Source: http://healthskills.files.wordpress.com/2008/10/biopsychosocial.jpg accessed 7 August 2012

 

Over time, here at POEM, we will be following that evidence where it leads, and courageously engaging with the meanings that it shows for the practice of massage therapy. I expect intense, passionate, and fruitful discussions here over the next few years.

 


UPDATE, 18 November 2012, 11:01 AM PT:

Creative Commons License
Biopsychosocial massage (BPSM) by Ravensara S. Travillian is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Based on a work at http://poem-massage.org/content/biopsychosocial-massage-bpsm-new-lineage.

Looking into the abyss (#26/31)

It's not easy to face the realization of having been misled.

And the misleading does not have to be intentional; it could have been done with the best intentions in the world.

But those good intentions don't change the facts that, as a result, the student is launched into real-life practice operating with poor information, is bringing misinformation into the relationship with the client, and is being publicly evaluated on the basis of that misinformation by other potential partners in a unified healthcare team.

It could have happened to any of us--the field of massage is notorious for promoting teachers out of the ranks of students who have simply passed the class they're now expected to teach. Biomedical physicians have nothing on massage when it comes to "See one, do one, teach one".

No blame, no shame: one set out to create that situation; it just evolved that way, undirected. And there was an unspoken social contract that allowed it to continue, because the need for teachers was so high.

But the social contract has changed out from under us, and the current situation is no longer sustainable in light of the responsibilities expected of healthcare professionals.

Ralph Stephens names the problem as the very first one in his list of the educational "seven deadly sins":

Standardizing the number of hours or the curriculum content (ELAP) will not improve educational outcomes as long as our massage educational institutions are allowed to:

  1. Employ unqualified instructors.

...

Two things are needed to "heal" the problem, money and moral conviction. ABMP, AMTA and FSMTB must be persuaded to give substantial and ongoing financial support to COMTA and AFMTE to assist them with their respective missions. COMTA because we need a strong accrediting agency dedicated to the field of massage therapy. That is the natural place for educational standards to live. AFMTE because their Teacher Education Standards Project (TESP) is the trail that the entire education sector must follow if we are to truly "elevate" the profession from the sad state in which it currently exists.

These organizations also need to take a public stand - an unequivocal position - that the operational practices listed in the "Seven Deadly Sins" are no longer acceptable in the massage therapy field; that we expect better from our schools and programs. They may not have the force of law, but such moral courage on the part of community leaders, consistently stated, can and will instigate a change in institutional behavior.

 

Stephens is right about going forward--but what about all the students, practitioners, and teachers who are coming to grips with the fact that much of what they were taught is exaggerated, counterfactual, or simply wrong?

It takes a great deal of courage on their part to stare unflinching into that abyss, and to engage with what's needed to collect, assimilate, organize, and share good information.

The upheaval and disruption in the process is causing a great deal of moral distress and pain in people who are re-evaluating where they are, and how far away they are from where they need to be.

One thing that they do not have to worry about here is being blamed for having been taught wrong.

The policy here is, "no blame, no shame": it is not someone's fault that they did not get the education they deserve, and if they are trying to fix that situation, they deserve--and will have--our support in that journey.

The Buddhist concept of samma-vaca--"right speech"--is a useful guide to discourse here at POEM.

It's often summarized as, "Is it true? Is it kind? Is it necessary?".

We'll examine those questions in a slightly different order than they're usually posed.

"Is it true?": The standard at POEM is that we will not pass along misinformation here.

Massage stakeholders can depend on POEM for accurate information about massage.

If someone is making a factual error, it's ok to correct that error civilly and professionally. That means focusing on the facts, not on the person--no personal attacks, just connecting the dots on what the facts are.

Not everything is a matter of fact, of course--there is no scientific answer to the question "Is chocolate or vanilla better?"--and interpretations, creativity, and imagination are welcome topics for discussion, as long as active misinformation doesn't ride along.

"Is it necessary?": There is a wide consensus that something is rotten in the state of massage education, so yes, having a portal to the shared body of biomedical knowledge that members of a unified and client-centered healthcare professional team all draw upon to varying degrees is an absolutely necessary--and as-yet unmet--niche that POEM is being developed to fill.

"Is it kind?": Absolutely: everyone who participates here can expect to be treated kindly. Kindness does not mean letting misinformation go uncorrected; it means that misinformation will be corrected in a civil, professional, and kind manner, without attacking the person.

When someone does not have access to good and high-quality information, because of gaps in their education, the kind and considerate thing to do is to offer them a bridge to obtain that information.

Giving them an opportunity to correct themselves is far kinder than leaving them--and their clients--to the consequences of misinformation.

We're in really deep waters here, as a result of a number of historical, social, political, and cultural factors all coming together and synergizing.

But if POEM has any say at all in the matter, then we will get through these difficulties, because we'll support each other in learning and growing along the way.

 

Source: http://www.education.noaa.gov/images/article_ocean_floor_2.jpg accessed 26 August 2012

Silence is not always consent (#25/31)

Many times, on the Internet, people assume that if someone states something, and no one contradicts that statement, then everyone agrees with what is said.

Sometimes that's true--and sometimes, the lack of contradiction results from a realistic assessment that there is no point in discussing the matter further.

Honest discussion only works when all parties approach the discussion in good faith, and are willing to honestly re-assess their positions to see if there is somewhere that they could be mistaken. If such a mistake is found, people need to be willing to correct that mistake.

If someone is not willing to engage in honest discussion, there is no shame in deciding that it's a waste of your valuable time to engage in less-than-honest discussion, and to simply walk away. After all, that time you'd burn up on "Is so!" "Is not!" "Is so!" "Is not!" is time you could spend:

  • Working with a client on resolving pain, anxiety, or other symptoms;
  • Enjoying time with your loved ones that will later be the stuff of which fond memories are made;
  • Reading a fun or awesome or life-changing book;
  • Watching a movie you've always wanted to get around to;
  • Making music that has never existed before and never will again, but is absolutely transformative in the moment, or
  • Any number of wonderful other activities--or restful non-activity--just waiting for you.

 

How do you know whether someone's interested in engaging in honest discussion?

You don't, always, but there are some red flags to warn you that they aren't.

Someone who wants to engage in honest discussion will connect the dots in their position for the people they're speaking to.

When you ask an honest question and then someone won't take the time and effort to connect the dots in their argument for you--when they say they "don't have the time to debate the research", or they point you to books by their favorite gurus and say "it's all there, just read it for yourself"--that's a big neon sign that their mind is already made up, and no amount of evidence will influence what they've decided to believe.

Not always, of course--some people eventually give up their adamant resistance, and actually examine the evidence for themselves.

You can't always tell who's going to do that, and who's not.

And sometimes, there is value in speaking out, even if there is no hope of honest discussion.

You may just want to go on record as someone who doesn't believe that statement--nothing more, nothing less.

You may recognize that there are many others reading without commenting, and you may want to point to the evidence for their benefit, rather than for the person who refuses to discuss it. You never know, and can never know, the effects of the seeds you're sowing--but you are having an effect, whether you see it or not.

You're the best judge of your situation, and you're the one to decide whether any given situation makes sense for you to engage in it or not.

But there is no shame in looking at the situation, deciding that it's hopeless, and resolving that the absolute best use of your time is to walk away from it, and spend your time and energy elsewhere. There are many other places on the Internet where learning and honest discussion is truly valued; there are lots of people there who want to hear what you think, based on the evidence, and to discuss with you what it all means.

Refusing to waste your time engaging in bad-faith arguments does not mean you agree to incorrect claims someone else is making--silence does not mean consent.

Finding your space: Anatomical reasoning and our relationship to realism

There are at least three ways, maybe even many more, to approach the practice of massage--as healthcare profession, as self-expression, and as business.

Of course, no one approaches it exclusively one way or another--even healthcare professionals, mystics, and artists have to make a living, professional ethics in business are a thriving area of exploration, and the feeling of self-actualization can be the key to a long and fulfilling career no matter what other aspects of massage you pursue.

These aren't self-contained monocultural boxes you find yourself in, so much as they are tendencies, one way or another. The interactions among those tendencies, and the choices you prioritize, will influence where you find yourself in the space of massage practice.

In this illustration, practitioners A, B, and C all find themselves in different areas of massage practice space, because of the different blends of healthcare professional, self-expression, and business orientations they bring to their practice.

 

Meaning, too, has multiple aspects, including:

  • the ideas we have about the universe around us, and the feelings and reactions those ideas draw out of us;
  • the words, or terms, that we use to talk about those ideas; and
  • the material physical things in the universe that those ideas and words refer to.

 

Since all of these aspects interact with and influence each other, we can model them as a triangle, with the three connected corners representing concepts/ideas, words/terms, and material physical referents.

 

Looking at the relationships among components of the Semantic Triangle, it is easy to see how referents can influence concepts: for example, Wilma--a sun bear at Woodland Park Zoo in Seattle, who no one suspected was positively riddled with tumors, but who held on just long enough to wean her twin cubs onto solid food before suddenly dying from the cancer--is a real-life referent whose fortitude while suffering reinforces the concept of "bear as good mother".

Sometimes the referent’s behavior, in addition to influencing concepts associated with a term, can actually influence the chosen or constructed term itself: the Russian for bear, медведь (pronounced "myed-vyed"), comes from the linguistic roots for "honey-eater" (our word "mead" for honey wine, comes from the same root as "мед").

And, like in the English term "bruin" ("the brown one"), it's also an example of intentional misdirection, and an indication of the beliefs behind it--bears can be scary, especially way back in history at the time when we were first deciding on words to describe the world around us.

To the people who came up with these terms, it may well have seemed safer to use taboo avoidance, just to be sure. Taboo avoidance means, in this case, a kind of magical thinking where it seems more prudent to refer to bears by euphemistic terms like "honey-eater (Russian)", "honey-paw (mesikämmen: Finnish)", or "the brown one (English)", rather than to get this scary animal's attention by outright saying "bear" in one of those languages, and running the risk of summoning angry supernatural bears down upon the speaker.

It’s not immediately obvious how influence flows the other way—that is, how concepts and terms can affect real-world referents—but a little thinking about it provides some examples. If someone thinks of bears as dangerous predators, they may lobby for laws allowing bear hunts, with real consequences to the referent bears themselves. However, assigning the term "endangered species" puts bears under particular legal protections, which could prevent their being hunted, saving the lives of actual bears.

So words, concepts, and real material physical referents all influence each other in the meaning we make of this universe around us.

And that meaning that we make, and decisions based on that meaning, influence where in massage practice space we find ourselves.

 

 


Although we often think of anatomy as strictly scientific, that's not always how people use anatomical terms and concepts. Gil Hedley writes:

The superficial fascia is an organ: it is an organ of metabolism, an endocrine organ producing some 30 hormones and counting; a great lymphoid organ; a sensory organ; a sensual organ; an electrical insulator; a thermal insulator; a movement sleeve; and a great antennae... what else? Tell me!!

 

And people did tell him. Responses included:

"All is fascia."

Microtubulars of liquid light .....-:-

Information super-highway......pure communication.

It's a Gigantic "Soft Drive", information collection unit...completely unique to each host...only to exist for One Lifetime.

Non specific immune function, and groovy to to work:-)

And a information webcam

The "copper wire - like" conduction system for sub atomic vibration of photons and electrons in cell communication.

 

Hedley continues to engage in the comments, but he does not correct any of the factual errors that either he or his commenters make.

What, exactly, is this process? It's not anatomical science--most of the discussion is, at best, highly metaphorical and allegorical, and at worst, factually wrong.

Clearly, it's meeting a huge need among his commenters, though:

Yes!

Your fascia discoveries are inspiring :)

Thanks for continuing to inspire the bodywork field. Blessings!

We are amazing!

 

If it is not science that Hedley is carrying out, then what is he doing?

I think that, given the apparent unmotivated functions of self-expression evident in the original post, and the motivated functions, among others, of validation, reflected in the commenters' responses, it would be fair to say that Hedley is carrying out performance art, religious expression, or both--using terms and concepts from anatomy for those purposes.

I don't think he would object to this taxonomic classification, based on what he's said about his philosophy:

Science to me is another religion among many, whose dogmas I am attempting to shed.

 

He isn't particularly concerned about doing science for the sake of knowledge.

That's perfectly fine, as long as we're all clear about what the process is about. If it's validation, or self-expression, or performance art that you're looking for, that's exactly what you're getting, and there's nothing wrong with that. Consenting adults, caveat emptor, and all that.

If it's anatomical science you want, on the other hand, not only is this not what you're looking for, but taking it at face value will get in the way of your actual understanding of the structure and function of the body.

This is where he makes an actually misleading statement:

I can do a much better job ripping into my own stuff than that particular critic [Paul Ingraham], and recently did so in front of 600 colleagues at the fascia congress in Vancouver, and will gladly do so again to move the knowledge base forward!

 

Propagating ideas such as that the superficial fascia is an endocrine organ, or that cells communicate with each other by means of photons and electrons, without correcting those factual errors, doesn't move the knowledge base forward at all. Instead, it sends a loud message to potential colleagues in healthcare professions that we aren't interested in, or are even actively hostile to, knowledge and reality.

This matters very much on an individual level, and on a professional level as well. One of the biggest obstacles to MTs becoming part of an integrated healthcare team is our inability to distinguish pseudoscience from science, and metaphor from literal truth.

If we remedy those problems, we can share in the common knowledge base of healthcare professions, and we can participate in sending a unified message to the client/patient.

If we don't, then we can't.

It's a decision we all need to make at the individual level, and those individual decisions will determine the fate of MT as an integrative healthcare profession, a siloed alternative medicine industry, or something else altogether.

 

 


What would an examination of these questions look like from the viewpoint of anatomical science?

The first, and most inportant, distinction between science and other human activities is that--rather than just operating in the realms of words and concepts--science has to do the work of connecting claims back to actual referents in the material physical universe.

So, for our claims, we will do that work as we go along.

A commenter on a different forum asked:

Why is Gil's comment so far fetched?

 

She's quite correct--I have made the claim that Hedley's work is performance art or some other form of self-expression, rather than anatomical science, and now it's my job to connect the dots and show why my claim is correct.

For the sake of time, let's just examine one part of the statement; it's representative of the same problems in the rest of it.

"The superficial fascia is an organ: it is an organ of metabolism, an endocrine organ producing some 30 hormones and counting"

 

What is he referring to? He clarifies that later on in the comments to his post:

"Adipocytes are generally classed as connective tissue cells with endocrinal function."

 

Fair enough--he gets the details right the second time. But he doesn't go back and correct his first statement to make it right.

By saying "superficial fascia...is an endocrine organ" there, he is confusing:

  1. structure (connective tissue versus glandular epithelial tissue) with function (protein secretion), as well as structure (endocrine) with function (secretion), and
  2. identity with parts/wholes: equating all of the superficial fascia (adipose tissue + loose areolar connective tissue) with only that part of it that actually secretes proteins (adipose tissue).

 

So he's at the wrong level of abstraction when he says superficial fascia--he means adipose tissue. Sounds like a picky little detail, doesn't it? And yet, it's a symptom of a lack of true understanding about anatomy.

This lack of true understanding about anatomy is a mistake that propagates among the MT community like wildfire--the very first thing you learn on the very first day of the very first anatomy class is the four kinds of tissue, right?

Epithelial, connective, nervous, and muscle tissue, right?

And yet, all over the web, you see people selling the concept that "body tissue can carry emotional memory", and MTs buying it, as though they had never heard of the distinction between epithelial and muscle tissue and nervous tissue. Those MTs can recite the names of those tissues to pass a multiple-choice test, yet they can't put the very first thing taught in anatomy class into practice when it comes to evaluating anatomical knowledge claims.

Being careless about the distinctions between different kinds of connective tissue, and what they are structurally, versus how they function, is exactly the same kind of error.

It prevents scientific understanding and real anatomical reasoning. As I mentioned previously, Hedley has been widely quoted as saying in his video that science is just another belief system, whose dogmas he's trying to shed.

The way he talks about anatomy, it is clear that he is not approaching it as science, nor bothering to get the scientific details correct. As a direct result, it comes across more as art or another form of self-expression, which is fine, as long as people know that that's what it is, and not anatomical science.

But I don't get the sense that people actually realize it; I think they think that's anatomical science they're doing, and it's a long way from it.

The "tell"s are comments like this one:

The "copper wire - like" conduction system for sub atomic vibration of photons and electrons in cell communication.

 

The words come from science, but they way they are strung together makes no sense. This is not a scientist nor a scientifically-trained layperson talking, yet salting the sentence with sciency words is, for some reason, important to the writer.

That indicates that the writer thinks they're making scientific sense, and really has no idea what science is and isn't.

Another, shorter way, to look at it is like this: confusing connective tissue with superficial fascia with adipose tissue and saying that connective tissue is an endocrine organ is the same kind of error as saying that mammals fly.

It's true that one kind of mammal--bats--do fly. But despite that one corner case, if you say that mammals fly, you'll be wrong most of the time.

If he is saying that adipose tissue is an endocrine organ, then he's using the term wrong, because adipose tissue is not an organ.

Superficial fascia, on the other hand, is an organ, but only one of its components has an endocrine function, so again, he's using terminology wrong: it's not an endocrine organ, although one of its components has an endocrine function.

It's the part/whole confusion, "bats fly, therefore mammals fly", logical error that he is committing there.

The questioner continues:

I did miss Gil's larger comment section and I am very glad you spent the time to explain the error and confusion of superficial fascia vs endocrine function of adipose tissue, which Gil is confusing with superficial fascia. Thank you.

But I must ask.... The primary function of the heart is circulatory yet it does have an endocrine function. I understand from Anatomy Trains, Fascia is highly innervated. Could it be possible fascia has more of a role than just stabilization? Especially when it is dysfunctional?

I think the role of fascia has not been studied well enough. Just a few years ago, science told us once a brain cell dies, it is gone forever. Now we understand neurogenesis better.

 

The questioner raises excellent questions, and I am glad they did so, as it gives us an opportunity to explore these issues in more depth.

It is true that sometimes scientific knowledge changes--so what does that mean for us here and now?

We'll examine these questions one by one, to try to figure out what is going on here.

 

 


Wikipedia: Neurogenesis, occurence in adults accessed 26 July 2012

 

"Considered": meaning they had the concept of the nervous system as fixed and incapable of regeneration, and they spoke of it in those terms.

 

The first evidence of adult mammalian neurogenesis in the cerebral cortex was presented by Joseph Altman in 1962, Wikipedia: Neurogenesis, occurence in adults accessed 26 July 2012

 

Joseph Altman questioned these concepts and ways of speaking about the nervous system, and as evidence for his claims, he introduced a material physical referent: the tangible new neural cells in the actual cerebral cortex.

 

followed by a demonstration of adult neurogenesis in the dentate gyrus of the hippocampus in 1963. Wikipedia: Neurogenesis, occurence in adults accessed 26 July 2012

 

Another material physical referent presented as evidence to counter the previous concepts and words: tangible new neurons in the dentate gyrus of the hippocampus.

In 1969, Joseph Altman discovered and named the rostral migratory stream as the source of adult generated granule cell neurons in the olfactory bulb. Wikipedia: Neurogenesis, occurence in adults accessed 26 July 2012

 

Yet another material physical referent: tangible adult generated granule cell neurons in the olfactory bulb.

Up until the 1980s, the scientific community ignored these findings despite use of the most direct method of demonstrating cell proliferation in the early studies, i. e. 3H-thymidine autoradiography. Wikipedia: Neurogenesis, occurence in adults accessed 26 July 2012

 

However, Altman and others' actual evidence with its connection to a referent was ignored in favor of the prevailing concepts and words.

The neuroscience community screwed up--that's not how science is supposed to work. Eventually, it did self-correct to more represent reality, but it took too long to do so.

But it didn't totally overturn their theories--if you're quadriplegic, for example, we still don't know how to make those nerves regenerate. And there are parts of the brain where they have observed neurogenesis, and others where they didn't.

So they were partly right, and partly wrong, and they held onto their theories for too long--but like the connective tissue example, and like the flying mammals example, you need to be very clear about the details of what exactly you are talking about--exactly what kind of connective tissue, exactly what part of the superficial fascia, exactly which nerve cells, in exactly what part of the brain.

Otherwise, you fall into unsound--false--conclusions like the "mammals fly" one.

That's the error that Hedley falls into--he gives names to things, and makes up explanations, without making any attempt to validate the connection of those names and explanations to material physical referents in reality.

It's perfectly acceptable in art or other forms of self-expression to not be constrained by any connection to a material physical referent. But science requires that connection, and since Hedley doesn't supply it, it's not science that he's practicing. Nothing more and nothing less than that.

 

 


The primary function of the heart is circulatory yet it does have an endocrine function.

 

That's correct. Does that make it an endocrine organ?

To answer that, we would need to clarify what an endocrine organ (a gland) is.

An endocrine organ is composed of glandular epithelium. Are the cardiac myocytes that produce the hormones atrial natriuretic peptide (ANP) and brain natriuretic peptide (BNP) made up of glandular epithelium?

What does the answer to those questions tell us about whether the heart is an endocrine organ?

When you answer that question, then you have stepped through the process of anatomical reasoning.

And you have generated a piece of new knowledge as well--the answer to the question "are glands (endocrine organs) the only anatomical structures that produce hormones?".

You were able to do that because you maintained the difference between structure and function, and between part and whole that is absolutely necessary if you are going to figure out correct answers about new anatomical questions that you do not already know the answer to beforehand.

Hedley's descriptions don't support anatomical reasoning to correct answers, because of the way he substitutes parts for wholes, and structure for function. He can make any statement he wants, but you cannot put those statements together and use them to reason with, in the way you did here.

And sound and complete logical reasoning is absolutely necessary in anatomy, because there are so many facts that you cannot memorize all of them by rote. You have to learn enough anatomy to form a basis, and then use that basis for drawing correct conclusions as you need them.

So how do I know my definitions are 100% right, and his aren't?

First, I don't ever know anything 100%. But the way I am using anatomy, I am not only drawing on centuries of actual anatomical history, and distinctions that we can empirically detect with microscopes and other instruments (referents), but I am drawing on an integrated whole with other sciences and logic as well.

The fact that it works so well in generating new knowledge through reasoning is an indicator that this way of dividing up hormone producers between endocrine glands and things that are not endocrine glands, is more likely to be 99% right--and thus, not to change abruptly out from under us--than it is to be 60% right, and we'll have to make massive adjustments someday.

 

 


I understand from Anatomy Trains, Fascia is highly innervated. Could it be possible fascia has more of a role than just stabilization? Especially when it is dysfunctional?

 

Yes, it could be possible. You could form a hypothesis with that question, and you could test it, and you could accumulate evidence that backed up that hypothesis. And you would be carrying out science when you did that.

And when you have done that, and you have shown that your hypothesis is backed up by the evidence, then we can consider that it's part of our knowledge--how certain we are about it will depend on the evidence, but at least we trust it to some degree.|

But that only counts after it's been done. Before it's actually been done, and repeated, and other explanations for what we see have been ruled out, then it's really just marketing hype.

That can change, if the work is done to back it up. But fascia research is very preliminary right now.

Have you ever driven really fast at night, so fast that your stopping distance got ahead of where your headlights could see? That's called "overdriving your headlights".

Metaphorically, to speak of things with certainty before the work has been done to back those things up with evidence is like overdriving your headlights. It's great for ginning up enthusiasm, but you can't really use it to base anatomical reasoning on.

But it's a good question, and maybe the evidence will back it up someday. We just can't act as though we're there already, because we've just started learning so much.

 

 

 


To finish my thought I must say, Instant Ice and Kinesio Tape boggle my mind neither works directly to effect the muscle, yet tissue responds positively to them. Why do these techniques work?

 

You're right that something happens that creates a response of some type, yet the muscles are too deep for them to be directly effected.

What kinds of anatomical structure communicate both with superficial layers of skin, and with muscles as well?

Fascia is one kind of structure; can you think of any others?

That would be a very good candidate to begin looking at for answers.

 

'"What kinds of anatomical structure communicate both with superficial layers of skin, and with muscles as well?"

Sensory / Motor nerves come to mind, capillaries... As well as fascia.'

 

Good answers.

We know, from centuries of anatomy, that sensory nerves can carry pain signals, and that nerves can be blocked in various ways from carrying them, while capillaries don't carry pain signals.

Hedley says that superficial fascia is "a sensory organ", but he doesn't offer any explanation of why he says that.

It is a poetic metaphor, but it is not a fact that anatomical reasoning can be based on.

It is a similar error to the part/whole of adipose tissue/superficial fascia--as you observed, fascia is highly innervated.

Why would fascia duplicate that function itself, when it already contains tons of nerves doing that same job?

"Okay, that can explain the instant ice, but kinesio tape? Primarily effects fascia, or others thoughts...?"

 

I would say that, since:

  1. those modalities are directly contacting nerves, while the epidermis stands completely between them and the superficial fascia, blocking it from them,
  2. we know that nerves have that functionality, while there is absolutely no evidence that connective tissue does, and
  3. fascia already contains lots and lots of nerves, and there is no anatomical need for fascia to duplicate that function,

that the evidence up till now, plus our anatomical reasoning about the anatomy we know, indicates that it is much more likely that the mechanism involves the nervous system to a much greater degree than it does the fascia.

Now that we have an idea that is consistent with the anatomical evidence, we could do a literature search to see if others have investigated this question, or we could design a study of our own to test it.

That doesn't mean that nobody will ever show any interesting properties of fascia. But from what we know now, to a very strong degree of certainty, it doesn't make sense to speculate about new properties that fascia might have, until and unless the research actually shows that that is true.

 

 

 

 


It's up to you where you locate yourself in massage practice space. If you find self-expression or business to be your more natural fits, there is absolutely nothing wrong with that fact.

If you find healthcare professional to be your more natural fit, then--for the sake ultimately of your clients--you have a higher obligation than others do to get the knowledge and the facts as close to correct as you possibly can.

Anatomical science is crucial to the core of massage as a healthcare profession. If you are seeking anatomical science, then make sure that that is what you are actually getting.

There is nothing wrong with seeking other things instead of anatomical science--you just want to make sure that you are very clear on what the difference is, and that you know yourself and what you are looking for, and know for sure what you are getting.

 

Metaphysical boundary collapse

One of massage's biggest culture wars at present arises out of the dispute between monistic and dualistic philosophies. It has implications for how we practice with clients, and how we teach our students in our schools.

Although we're experiencing this culture war every day in our own field, this argument is centuries-old and is not limited to massage. Throughout human history, great minds have tried--and failed--to resolve it. I don't expect us to resolve it anytime soon, but we do need to resolve whether those of us on opposite sides of the philosophical divide can work together, or whether it divides us irreconcilably.

The argument goes back much further in history, but in the early 1800s, advances in the relatively new science of chemistry caused a seismic shift in the evolving field of medicine. As Siddhartha Mukherjee describes the experiment that shattered previous thought on dualism in health and medicine:

Early interactions between synthetic chemistry and medicine had largely been disappointing. Gideon Harvey, a seventeenth-century physician, had once called chemists the "most impudent, ignorant, flatulent, fleshy, and vainly boasting sort of mankind." The mutual scorn and animosity between the two disciplines had persisted. In 1849, August Hofmann, William Perkin's teacher at the Royal College, gloomily acknowledged the chasm between medicine and chemistry: "None of these compounds have, as yet, found their way into any of the appliances of life. We have not been able to use them...for curing disease."

But even Hofmann knew that the boundary between the synthetic world and the natural world was inevitably collapsing. In 1828, a Berlin scientist named Friedrich Wöhler had sparked a metaphysical storm in science by building ammonium cyanate, a plain, inorganic salt, and creating urea, a chemical typically produced by the kidneys.

--Siddhartha Mukherjee, The Emperor of All Maladies: A Biography of Cancer, Scribner 2010, p. 83.

 

This drawing shows a molecule of ammonium cyanate, a compound that doesn't come from living things. It's made up of:

  • 2 nitrogen atoms, shown in blue;
  • 4 hydrogen atoms, shown in gray (since this is a 2-D drawing of a 3-D molecule, one of the hydrogens is hidden behind a nitrogen, but it really is there, even though we can't see it in this arrangement);
  • 1 carbon atom, shown in black; and
  • 1 oxygen atom, shown in red.

Source: modified from http://upload.wikimedia.org/wikipedia/commons/8/8c/Wohler_synthesis.gif accessed 27 June 2012

 

Urea, a kind of waste product produced by the kidneys in many different species of living things, forms molecules that are made up of:

  • 2 nitrogen atoms, shown in blue;
  • 4 hydrogen atoms, shown in gray;
  • 1 carbon atom, shown in black; and
  • 1 oxygen atom, shown in red.

Source: modified from http://upload.wikimedia.org/wikipedia/commons/8/8c/Wohler_synthesis.gif accessed 27 June 2012

 

These two very different substances, one found in living organisms and one not found in them at all, have exactly the same atoms in exactly the same amounts. The only difference is the arrangement of those atoms in 3D space.

 

 

Source: http://upload.wikimedia.org/wikipedia/commons/8/8c/Wohler_synthesis.gif accessed 27 June 2012

 

 

The Wöhler experiment--seemingly trivial--had enormous implications. Urea was a "natural" chemical, while its precursor was an inorganic salt. That a chemical produced by natural organisms could be derived so easily in a flask threatened to overturn the entire conception of living organisms: for centuries, the chemistry of living organisms was thought to be imbued with some mystical property, a vital essence that could not be duplicated in a laboratory--a theory called vitalism. Wöhler's experiment demolished vitalism. Organic and inorganic chemicals, he proved, were interchangeable. Biology was chemistry: perhaps even a human body was no different from a bag of busily reacting chemicals--a beaker with arms, legs, eyes, brain, and soul.

With vitalism dead, the extension of this logic to medicine was inevitable. If the chemicals of life could be synthesized in a laboratory, could they work on living systems? If biology and chemistry were so interchangeable, could a molecule concocted in a flask affect the inner workings of a biological organism?

--Siddhartha Mukherjee, The Emperor of All Maladies: A Biography of Cancer, Scribner 2010, p. 83.

 

In one way, Mukherjee is right--this experiment showed that the vitalistic claim that a distinction based on vital essence existed between living organisms and non-living things had no basis in material physical reality. By "dead", he means that its foundation was shown to be false, and that there was therefore no basis to continue to use it as a basis for explanations in medicine or science. His usage refers to the "referent" part of the Semantic Triangle--no referent means no vitalism.

But in another sense, he's prematurely pronouncing it dead. There are still many people who believe in vitalism and dualism, not only in their own personal belief systems, but also by bringing dualistic concepts such as "spirit" and "energy healing" into the therapeutic encounter. The fact that there is no material physical referent in support of the idea does not prevent them from operating in the "concept" and "terms" part of the Semantic Triangle.

Whoever wrote the Wikipedia article on vitalism correctly observed that vitalism didn't disappear just because of that one experiment:

The concept of vitalism in chemistry can be traced back to Jöns Jakob Berzelius who suggested that in the division of organic and inorganic that a mysterious vital force exists in organic compounds.

Vitalism played a pivotal role in the history of chemistry since it gave rise to the basic distinction between organic and inorganic substances, following Aristotle's distinction between the mineral kingdom and the animal and vegetative kingdoms. The basic premise was that organic materials differed from inorganic materials fundamentally; accordingly, vitalist chemists predicted that organic materials could not be synthesized from inorganic components. However, as chemical techniques advanced, Friedrich Wöhler synthesised urea from inorganic components in 1828.

Further discoveries continued to marginalise need for a "vital force" explanation as more and more life processes came to be described in chemical or physical terms. However, contemporary accounts do not support the common belief that vitalism died when Wöhler made urea. This Wöhler Myth, as historian of science Peter J. Ramberg called it, originated from a popular history of chemistry published in 1931, which, "ignoring all pretense of historical accuracy, turned Wöhler into a crusader who made attempt after attempt to synthesize a natural product that would refute vitalism and lift the veil of ignorance, until 'one afternoon the miracle happened'". However, in 1845, Adolph Kolbe succeeded in making acetic acid from inorganic compounds, and in the 1850s, Marcellin Berthelot repeated this feat for numerous organic compounds. In retrospect, Wöhler's work was the beginning of the end of Berzelius's vitalist hypothesis, but only in retrospect, as Ramberg had shown.

In fact, some of the greatest scientific minds of the time continued to investigate the possibility of vital properties. Louis Pasteur, shortly after his famous rebuttal of spontaneous generation, performed several experiments that he felt supported the vital concepts of life. According to Bechtel, Pasteur "fitted fermentation into a more general programme describing special reactions that only occur in living organisms. These are irreducibly vital phenomena." In 1858, Pasteur showed that fermentation only occurs when living cells are present and, that fermentation only occurs in the absence of oxygen; he was thus led to describe fermentation as 'life without air'. Rejecting the claims of Berzelius, Liebig, Traube and others that fermentation resulted from chemical agents or catalysts within cells, he concluded that fermentation was a "vital action".

 

but he/she ends the chemistry section rather abruptly with Pasteur, rather than following through continuously to the present. This, too, is premature--vitalistic thought persists to this day. The developments in chemistry and other sciences that--among people who are familiar with the subject--convinced them that vitalism is no longer a compelling alternative explanation.

I think this overlooks a great number of people who aspire to be healthcare professionals, but who have not had access to an in-depth scientific and biomedical ethics education.

The issue of vitalism/dualism in MT is a huge issue for us. To continue to insist on vitalistic mechanisms as explanations is an obstacle to integration with other members of the healthcare team in fields that have long ago accepted the scientific consensus that--as a source of explanation in the lab and in the clinic--vitalism is dead.

And it directly contradicts established consensus of what belongs in an MT body of knowledge. As we've seen, vitalism contradicts chemistry and pharmacology.

Yet MTs are expected to know basic principles of pharmacology in order to practice.

The Massage Therapy Body of Knowledge (MTBoK) calls for the following required knowledge:

Pharmacology

  • General classification and types of drugs, herbs, supplements, their effects and their side effects.
  • Massage therapy considerations and potential responses to general classes of drugs, herbs and supplements.
  • Use of authoritative, medically accepted drug reference to look up drugs, their effects and their side effects.

--MTBoK, p. 18

 

while the Massage and Bodywork Licensing Examination (MBLEX) states the following expectations:

PATHOLOGY, CONTRAINDICATIONS, AREAS OF CAUTION, SPECIAL POPULATIONS (13%)
...

E. Classes of medications

--Massage and Bodywork Licensing Examination Candidate Handbook, Content Outline, p. 15

 

and the National Certification Exam in Therapeutic Massage and Bodywork/National Certification Exam in Therapeutic Massage lists the following topics:

III. Pathology (13%)

...

L. Drug interactions with massage/bodywork
1. medications (e.g., prescription; over-thecounter)
2. recreational drugs (e.g., tobacco; alcohol)
3. herbs
4. natural supplements

--NCETMB/NCETM Candidate Handbook, pp. 21, 23

 

and yet, at the same time, they require vitalistic concepts on the very same test--concepts that directly contradict the science on which these learning expectations are based.

This puts our students in an impossible position for learning, when one set of expectations directly contradicts another, as well as putting the teachers and schools in the position of being required to teach mutually contradictory information, and to assess students on how well they perform the impossible task of integrating that knowledge.

We need to figure out what this means to us as a community and as a developing profession. As Mukherjee observes, the metaphysical boundary collapsed a century and a half ago, but not all of us have quite gotten word of the collapse yet.

We need to address, at the very least (there may be even more issues that I have overlooked here):

  • how do we balance ethical standards and best practices in the client's interests in the therapeutic encounter with the practitioner's freedom of conscience?
  • how do we--schools, teachers, mentors--provide an education to our MT students that prepares them to build bridges to integration with other members of the biomedical healthcare team?
  • what do we do about the sunk costs in the previous unsustainable path, and the tremendous investment that it will require for us to practice as an integrated healthcare profession?

 

Source: The 7 November 1940 collapse of the Tacoma Narrows suspension bridge, http://upload.wikimedia.org/wikipedia/en/5/5c/TacomaNarrowsBridgeCollapse_in_color.jpg accessed 27 June 2012

The challenge of reconciling our mental models with the material physical universe: Top-down and bottom-up approaches

A recurring theme that you'll find at POEM is how the practice of science is defined, in large measure, by its central value of seeking to avoid bias and by a collection of methods designed to assist scientists in avoiding bias when interpreting research results.

Even more than other methods for avoiding cognitive and logical traps, statistical measures are some of the most rigorous tools scientists have for providing clear frameworks for interpreting what the data from empirical observations and experiments actually mean.

To lay the foundation for discussing statistics in evaluating massage research, let's first talk about different approaches to the challenge of reconciling our mental models with the material physical world.

Data, information, facts, and truth

Data is a collection of factual information used as the basis for reasoning, discussion, or calculation. When a scientist talks about a fact that is rooted in research, they are referring to a piece of information that is being presented as objective reality.

Because that information is a fact, a scientist will often say "It is true that..." and then go on to state whatever that particular fact means.

It is easy for a casual listener to believe the scientist must be referring to absolute “Truth”, because of the way these words are commonly used in everyday conversation.

For example, the media may cover scientific topics in a way that implies that science points directly to “Truth” in the same way the term is used in philosophy or meaning-making and self-expression.

But this is not a faithful representation, because science—which deals only with aspects of the natural material physical universe—takes for granted that the measurement of things observed in the natural world contains a certain amount of error. By "error", we mean the Merriam-Webster dictionary meaning of "a variation in measurement, calculation, or observation due to mistakes or uncontrollable factors".

As we will discuss in Chapter 4 of the research literacy e-Book, it is impossible to observe or measure reality from a completely 100% neutral position, and there are no perfect measurement tools.

For this reason, scientists emphasize working in a way to obtain the best results possible, knowing that no observations of reality can be completely error-free.

There can be no achievement of absolute truth, just--if the process is carried out with integrity--getting closer and closer to what the facts are.

In order to work toward this goal, scientists have developed methods for managing observational errors, because those errors can be understood and controlled by making skillful choices about experimental design and statistical techniques.

The Semantic Triangle, introduced in Chapter 2 of the research literacy e-Book here at POEM and available later this month, shows how the elements of meaning can be divided among concepts (the meanings people attach to ideas), terms (the language used to describe ideas), and referents (the things in the natural world to which terms and concepts refer).

Source: http://sig.biostr.washington.edu/~raven/semantic-triangle.jpg accessed 2 May 2012

 

The big question is how to know—given that perceptions and experience vary so much from one person to another—that those concepts and terms in our minds really connect to the referents they claim to represent.

Sorting out how best to connect those internal aspects of meaning to the external physical world is an ongoing problem that challenges all of us.

Top-down vs. bottom-up approaches to data

One approach that has been taken throughout history is to decide in advance what the “truth” is, and then to look for empirically observed facts that will reinforce that “truth.”

This is known as the top-down approach, in which a researcher starts with a desired answer in mind and then fits the questions and the data into that answer.

Obviously, this approach implies a great deal of bias from the start.

Ptolemy, a Greek astronomer who lived in Egypt during the first and second centuries CE/AD, developed a model showing the sun and the planets in a circular orbit around the Earth. This model depicted the Earth at the center of everything, or geocentrism: a view that seemed at first to fit with what people observed when they looked up at the sky.

Source: http://upload.wikimedia.org/wikipedia/commons/7/7b/Bartolomeu_Velho_1568.jpg accessed 1 May 2012

 

But some careful observers noted that a planet such as Mars would sometimes be seen moving in its normal direction, but then it would come to a stop and begin to move in the opposite direction—backward across the sky—before returning to its expected path. It seemed to move in a retrograde way.

Source: http://upload.wikimedia.org/wikipedia/commons/6/6a/Retrograde_Motion.bjb.svg accessed 1 May 2012.

 

The left side of the drawing shows the Earth's actual motion around the sun in the blue points 1-5. Mars' actual motion around the sun is shown by the red points on the left of the diagram, and the right side of the diagram shows what Mars' motion looks like to an observer on the Earth. So there is no such thing as Mars (or Mercury, for that matter) in retrograde; it's actually an illusion produced by our motion relative to the other planet around the sun.

To reconcile this observation with the idea of the planets and sun making simple circles around the Earth, advocates for Ptolemaic astronomy used the concept of epicycles, or loops, that represented the additional movements of the planets. Epicycles were explained as looping paths that averaged out to simple circles. In the expanded Ptolemaic system, the planets and sun were continually looping around given points, which were themselves moving in simple perfect circles around the earth.

Source: http://upload.wikimedia.org/wikipedia/commons/2/29/Ptolemaic_elements.svg accessed 1 May 2012.

 

As in the previous image, Mars is shown in red, and Earth in blue. This is the model of epicycles introduced to account for what looked to observers on Earth to be retrograde motion.

Because of the observed referent (occasional apparent or seeming reversals in movement of the planets), it was necessary to add this new term and concept (epicycles) in order to hold onto and protect the Ptolemaic idea that something was moving in perfect circles around the earth. The advocates of Ptolemaic astronomy kept adding epicycles as necessary to force the model to fit the observations.

And for a very long time, despite the hacks and cobbled-together epicycle justifications, the Ptolemaic model continued to have a great influence on astronomy’s view of the Earth’s place in the universe, because there was not much change in the data available to observers.

But over time, new observational instruments such as telescopes were invented, and these made it possible to add new information to the accumulated body of knowledge about the sky.

Eventually, a tipping point was reached, and the weight of evidence made it clear that Ptolemy’s model of the universe no longer matched the observed facts.

A newer explanation, called the heliocentric model, was developed by Polish astronomer Nicolaus Copernicus (1473-1543) in which all the planets, including Earth, orbited around the sun.

Source: http://upload.wikimedia.org/wikipedia/commons/5/57/Heliocentric.jpg accessed 1 May 2012

 

Source: http://upload.wikimedia.org/wikipedia/commons/3/33/Geoz_wb_en.svg accessed 1 May 2012

 

A century later, Johannes Kepler introduced his laws of planetary motion, which demonstrated that the planets actually move in elliptical paths around the sun, not in perfect circles--a model which was an even better fit to the empirical data.

Those who insisted on retaining Ptolemy’s view of the universe, despite the growing evidence against it, were holding on to the top-down approach to data. They practiced apologetics, and used cherry-picking, special pleading, and other fallacious techniques, to protect their model from the challenge the material physical world confronted it with.

In contrast, the bottom-up approach of Copernicus and Kepler, who worked from the data to develop their conclusions, won out.

These new thinkers prevailed over the Ptolemaists because they were willing to let go of their previous beliefs (Kepler, in particular, was disappointed by the idea that planets moved in ellipses rather than in the perfect circular shapes he found so beautiful, but he followed his conscience in following the process where it led) and to let the data itself tell the story.

[Of course, by "prevailed", we never mean "100% accepted": there are, after all, modern-day adherents to the Flat Earth model in the incarnation of the Flat Earth society, just to name one example (Motto: "Replace the science religion...with SANITY.").

What we mean is that the majority of professionals, who have actually done the work to understand the domain, vouch for the work as having been carried out with integrity, and to be validated as showing the results it claims to demonstrate.]

Statistics is one methodology that we apply in a bottom-up approach to understand the meaning of the story that the data is telling us.

Exercise

Can you think of some real-life examples of where people try, or have tried, to protect an old model that has been discredited, despite the mounting evidence against it?

Areas where you might find examples nowadays include healthcare and politics, among others.

How far are some people prepared to go to protect old models?

What techniques do they use to do so?

What are the stakes--politically, psychologically, economically, and in other domains?

 

 

The germ theory is too Western

Laura Allen embodies the very ideas of transparency and accountability when she says that anyone is free to quote anything she says anytime and anywhere, and I believe I'll take her up on that.

Over on her Facebook account, which you may or may not be able to see unless you're already friends with her, she writes:

It's a concern to me that three times in the past couple of days, I have seen stories on here about employers who don't want the massage therapists to change the sheets for every client. That is so unethical, not to mention a health hazard. If you are working in such a place I suggest getting out immediately and reporting the owners to the massage board AND the health board. As one person said to the owner who was mad about her changing the sheets, would you want to check into a hotel and sleep on the sheets the last person used? I don't think so. And if the guilty owner happens to be reading this, do us all a favor and get the hell out of this business.

 

Clear, concise, and correct. And if the guilty owner was reading the post, they didn't choose that hill to (metaphorically) die on; Laura's commenters were 100% supportive of the bright shining biomedical and ethical line in the sand that she drew.

It occurred to me that there could be correlation between the type of massage practiced and its underlying conceptual model, with the degree of sanitation and hygienic practices adhered to.

For example, if you truly believe that disease is caused by a bad wind entering the body, or by negative thinking, or by karma, then that's not really much of a motivation for paying attention to getting rid of germs on surfaces.

And an interesting followup question is, if you do believe in one of those conceptual models, and you are scrupulously diligent about observing good hygiene, then why do you go to that trouble?

I mentioned that that would be a fascinating study that I would probably never get around to carrying out, but if someone else did, I would love to read about it.

Well, ask and you shall receive, I guess.

One of Laura's commenters told a story from her own experience, that is a perfect case study of the correlation I was thinking about:

I had an MT friend who worked in a chiro's office and he reused disposable acupuncture needles. He was quite careless with them and they'd often fall on the carpet where you wouldn't notice them until you got off the table, barefoot, and get one in your foot. When the MTs in his office complained, he waved them off for being too "Western." In China, they reuse needles from person to person. At least, he bragged, he only reused them on the same person. Eventually he agreed not to do acupuncture in the massage rooms so massage clients didn't get stuck by stray needles. Sheesh.

 

/facepalm

There are so many issues here, that it's difficult to know where to start.

Disease transmission by infected reused needles, or Hygiene 101, is only the first one.

To get back to our topic from needles, I'm sure the POEM commenters can name several conditions that can be passed from one person to another by dirty bed linen.

Sources: Left: http://www.stanford.edu/class/humbio103/ParaSites2004/Scabies/scabies.jpg accessed 29 April 2012, Right: http://www.stanford.edu/class/humbio103/ParaSites2004/Scabies/scabies1.jpg accessed 29 April 2012

 

And although this may come as news to the chiropractor in the story, in resource-poor areas of the world, they don't share needles because they *want* to; they do it because they have no other options.

Every time something like that reinforces the perception of MTs as elitist, classist, ethnocentric, and generally oblivious, it just makes more work for the rest of us to dismantle that perception.

So here we go, gradually chipping away at it:

First of all, the session is about what the client wants and needs, not about forcing the client--with or without full disclosure and informed consent--to settle for what people in resource-poor environments are compelled to make do with. The chiropractor in the study is not practicing in a client-centered way; his practice is centered on something else, where infection control is not a priority.

Second, in chiding others for being "too 'Western'", he probably sees himself as all diversity-oriented, and transcending elitism and ethnocentrism.

Nothing could be further from the truth.

He is claiming, in effect, that Chinese people don't value their own lives and bodily integrity enough to care about basic biomedical best practices. Where he got the idea that he gets to speak for them is unclear, but his claim positively advocates poorer medical care based on nationality and ethnicity.

This violates Ethics 101 in a big way.

If Chinese people do reuse needles, what could be the explanation?

Unlike the chiropractor in the story above, who implies they are choosing to do so when they have better options, I think that looking at the availability of resources is a useful source for possible explanations.

According to the Wikipedia article "List of countries by GDP (nominal) per capita", the US per capita annual income ranges (depending on the reporting source) from $47,153-48,387.

The per capita annual income in China ranges (depending on the reporting source) from $4,428-5,414.

The per capita annual income in Ethiopia ranges (depending on the reporting source) from $300-360.

I'll leave as an exercise for the readers to evaluate whether Chinese people and Ethiopian people reuse acupuncture and injection needles because:

  • they don't care about their own lives and health, or about each other, and consider infection control "too 'Western'", or whether
  • unused needles are much harder to come by in environments where the average person earns 11% (China) or 0.007% (Ethiopia) of what the average American earns.

 

 

 

And if you consider it a do-or-die cost issue--if your business, in the US context, is so iffy that you need to operate it in the American context with Chinese or Ethiopian standards of practice and margins on clean linens, unused acupuncture needles, or any other compromise on infection-control best practice, then your business is not dying.

It is already dead, and you just haven't acknowledged the fact. If you cannot afford to practice infection control, it's over. Deader than the parrot in the Monty Python sketch.

 

I'll heartily second Laura's recommendation:

And if the guilty owner happens to be reading this, do us all a favor and get the hell out of this business.

 

and I'll add some of my own.

Recommendations for educators:

  • The history of massage is an important thing for students to know about, but infection-control trumps it every time.
  • If you don't have time in the curriculum to teach both about how people used to believe humors or bad winds caused disease, AND what we know now about how to prevent infection in a massage therapy practice, so that the students not only rotely deliver the correct answer on tests, but really show that they understand and can apply it in context, then the curriculum resources have to be devoted to infection control at the expense of pre-modern concepts of illness and disease.

 

Recommendations for students and practicing MTs:

  • Check to see if your school is teaching (or did teach, if you've graduated) proper infection-control practices.
  • Make sure that you know how to protect clients by reporting unethical and unsafe practices to the correct regulatory authorities in your area.
  • If not, make sure that you get all that information somewhere else, and use it in your practice--it's just that important.

 

Recommendations for clients:

  • The time in a session is time that you have paid for, and you should not feel hesitant to ask questions about the care or service you are receiving.
  • A client-centered healthcare professional will be happy to answer any questions you may have. Hospitals in the US, UK, and elsewhere are now actively promoting campaigns (as shown in the buttons below) to ask your provider whether they've washed their hands before examining you. MTs who want to be part of an integrated healthcare team will not balk at following the same infection-control best practices as other members of that healthcare team.
  • Don't hesitate to ask what infection-control procedures your MT uses.
  • When you are getting on the massage table, take a moment to look at the linens you will be lying on--do they look clean and unused, or do they appear to be re-used?
  • How many layers of linens are on the table? If it's more than one, the establishment may be cutting corners by stacking sheets to save time between clients. The problem with stacking sheets is that mere layering will not prevent transmissible conditions from crossing those layers. Don't accept sheet-stacking from your MT; insist on a single layer of clean and unused linens every single time. This is your time and your care; it is reasonable that you expect it to be conducted in a way that looks out for your best interests.

 

Sources: Left: http://www.jcrinc.com/Common/Images/custom/products/HHB-05.jpg accessed 29 April 2012; Center and Right: http://www.healthcareinspirations.com/hci_fe03_single_quantity.html?&prodid=513 accessed 29 April 2012

 

These are steps we can take, and encourage our clients to take, to show that we are serious about developing into a healthcare profession that will accept the responsibility of self-regulation and client protection that comes along with that status.

Chapter 5: Just enough statistics

 


01. Why you might want to know this

This chapter provides a high-level overview of the basic concepts and vocabulary associated with statistics, the branch of mathematics that deals with the collection, analysis, interpretation, and presentation of numerical data. We will also cover some of the most important statistical measures encountered in massage research literature.

While statistics may initially seem somewhat intimidating, a few simple and useful concepts will go far in helping you to develop massage research literacy.


02. Industry-level massage educational and performance objectives addressed by this chapter


03. Learning objectives for this chapter

03a. Upon successful completion of this chapter, you will be able to do, know and understand, and value the following:

03b. Do

  • name and explain the most common and most important statistical measures used in articles on massage research
  • Using a sample massage research journal article, recognize and point out the mean in the statistical results.
  • Using the lines on a boxplot, name the descriptive statistical values they represent.
  • Recognize and point out a usage of standard deviation in a sample massage research article and explain what it means in context.

03c. Know and understand

  • Name and describe the two primary ways in which statistical measures are used.
  • Name and explain the most common and most important statistical measures used in articles on massage research.
  • Explain the differences between a top-down approach to exploring knowledge and a bottom-up approach.
  • Define the concept of a normal distribution, represented by a bell curve.
  • Define the terms data range, average, and percentile.
  • Define the terms variance and standard deviation.
  • Name and explain three ways of measuring the average of a group of data values.
  • Explain how a boxplot is used to represent significant values of data.

03d. Appreciate

  • Discuss how scientists attempt to manage error and uncertainty through the use of specialized tools and techniques.
  • Discuss how making good choices about experimental research design and appropriate statistical measures can compensate for the effects of experimental error.

04. Big ideas in this chapter

  • Statistical measures provide frameworks and guidelines for researchers in their quest to minimize the effects of error in interpreting research results.
  • Observation and measurement in the natural world inevitably contain a certain amount of error because no person can be absolutely neutral, and no measuring tools can measure perfectly.
  • A top-down approach to exploring knowledge attempts to fit new information into previous beliefs or hypotheses.
  • A bottom-up approach lets the explanation emerge from the observed data itself.
  • To understand the meaning of the story being told by the data, researchers need ways to distinguish real patterns and trends from things that only seem to be patterns or trends.
  • The concept of a normal distribution, represented by a bell curve, is one of the most useful and powerful statistical concepts and lies at the foundation of a great number of statistical techniques and measures.
  • The concept of "normal" is closely tied to the concept of "average". "Average" refers to three statistical measures that describe data: the mean, the median, and the mode.
  • The arithmetic mean, often referred to simply as mean, is calculated by adding all the values of the data together and dividing by the number of members in the group. If all the data values are fairly similar, the mean can be a good representation of the group, but if there are extremely high or low values in the data, the mean is not very representative of the group.
  • The median represents the exact middle of a group of data; 50% of the data values fall above the median, and 50% of the data values fall below.
  • The mode is the value that occurs most often in the data. Not all data sets have a mode, and some data sets have two or more modes.
  • A percentile value is a threshold that indicates what percentage of the values fall below it. For example, 20% of values fall below the 20th percentile and 60% of values fall below the 60th percentile.
  • A boxplot is a graphic device used to illustrate significant values of the data at a glance. Lines on and around the box correspond roughly to the values in the bell curve.
  • The range of data includes all the values, from the lowest to the highest.
  • Variance is a measure that describes how spread out, or dispersed, the data values are from one another.
  • The most common representation of data dispersion is the standard deviation (SD). The SD indicates how representative the mean is of a set of data values.

 


Key terms in this chapter

  • arithmetic mean
  • average
  • Average

  • bell curve

  • bottom-up approach
  • box and whiskers diagram
  • boxplot
  • data
  • descriptive statistics
  • error
  • error bars
  • facts
  • inferential statistics
  • mean
  • median
  • mode
  • normal distribution
  • normal values
  • percentile
  • population
  • range
  • range of normal values
  • standard deviation
  • Statistics: mean, median, mode, standard deviation, power
  • statistics
  • top-down approach
  • variance
  • weight of evidence
  • Mean

  • Median

  • Mode

  • Percentile

  • Standard deviation

  • False positive error (type I error)

  • False negative error (type II error)

  • α (alpha)

  • β (beta)

  • p

  • Confidence interval

  • Sampling

  • Power and sample size

  • κ (kappa)


05. Key terms in this chapter


06. Claims made in this chapter


07. Entities and relationships for Ontology of Meaning in Massage in this chapter


08. Exercises


09. References cited in this chapter

  1. Barlow A, Clarke R, Johnson N, Seabourne B, Thomas D, Gal J. Effect of massage of the hamstring muscle group on performance of the sit and reach test. Br J Sports Med. 2004 Jun;38(3):349-51.
  2. Sakurai M, Suleman MI, Morioka N, Akça O, Sessler DI. Minute sphere acupressure does not reduce postoperative pain or morphine consumption. Anesth Analg. 2003 Feb;96(2):493-7.
  3. Kshettry VR, Carole LF, Henly SJ, Sendelbach S, Kummer B. Complementary alternative medical therapies for heart surgery patients: feasibility, safety, and impact. Ann Thorac Surg. 2006 Jan;81(1):201-5.
  4. Manikandan N. Effect of facial neuromuscular re-education on facial symmetry in patients with Bell's palsy: a randomized controlled trial. Clin Rehabil. 2007 Apr;21(4):338-43.
  5. Sankaranarayanan K, Mondkar JA, Chauhan MM, Mascarenhas BM, Mainkar AR, Salvi RY. Oil massage in neonates: an open randomized controlled study of coconut versus mineral oil. Indian Pediatr. 2005 Sep;42(9):877-84.

10. Other learning resources


11. Introduction

The previous chapter, and this one, are probably the hardest chapters in the book. Once you get past these, it's relatively smooth sailing from here on out.

This one, especially, is rather math-y. But remember, there is nothing here that you can’t understand, if it is properly explained for us to do the work of integrating that knowledge. I suspect, that if we go through this together, you will find that a lot of things that look rather hard at first glance will begin to fall into place as we examine them. And I promise that if you “bear” with me, there’s something good waiting for you at the end :).

For our purposes, we are not going to go into a lot of rigorous detail about statistics. We will, however review some of the most common terms, so that you understand them when we encounter them. For the purpose of reading massage research articles, recognizing a few important ones--and understanding what they mean and why they are used--will take you very far.

The statistical understanding needed for reading research should not to be confused with what you need to carry out research, which I do wish to encourage—if you do get to the point where you actually carry out studies, you will need to know much more than I am presenting here. You would even need to consult a specialist to plan beforehand what statistics are appropriate for your study. So the take-home message is that for reading purposes, know the terms and concepts discussed here., But if you want to take it further, be aware that you will also have to further develop your understanding of statistics.

The obvious first question is: Why do we need to learn statistics at all?

Part of the answer is that we use statistics in order to ensure that our results aren’t just chance, but rather are the result of what we hypothesized caused them.

Additionally, we use statistics in research to avoid some of cognitive errors that we talked about in Chapter 4. Without analyzing the research results statistically, a result can often mislead us.

Statistics is also one way of describing, presenting, publishing, and sharing the research issues in the study. And finally, we use statistics to better understand how to apply the research results in our practice of massage.

So those are some of the reasons why statistics are important. But remember: don’t get bogged down in too many details—even the specialists argue over them.

Our strategy is to get familiar with most common and most important stats, and to just skim the others. If you know in general what the most common statistical measures are, and what they mean, you will be in a good position to understand the articles that use them. Some other important ones will be worth learning about, if you continue studying how to carry out research, but we won’t deal with them here, and we'll point them out if we come across them.

 

The Role of Statistics

The bottom-up approach of following the data wherever it may lead lies at the heart of why the scientific method has been so successful.

However, this strength comes with an associated challenge: how to distinguish real patterns in the data (that lead to answers) from false patterns (that indicate some kind of error).

This is a difficult task, and statistics is a key tool that has been developed and refined to provide validated guidelines for making judgments about accumulated knowledge.

Statistics aids the interpretation of meaning in terms of how things can vary from one another, how to lower errors in observation, how to know when two things are associated or in a cause-and-effect relationship, and how to classify things in meaningful ways.

In these ways, statistics lowers—though never totally eliminates—the risk of making errors in interpreting data.

There are two primary ways in which statistics are used, and different statistical measures are used for each of those purposes.

Descriptive statistics, as their name implies, are used to describe characteristics of data about a group. The data can be anything measured systematically (e.g., characteristics of mothers receiving a pregnancy massage treatment or heart-rate measurements taken over a period of several days among athletes receiving sports massage).

Inferential statistics are used to infer, or make predictions about, trends or patterns that are contained in the data and are used to distinguish real patterns from things that only seem to be patterns.

Inferential statistics are beyond the scope of this chapter, and won't be discussed here, since this chapter is intended only to cover the most important concepts needed to begin thinking in statistical terms.

Descriptive statistics: describing the population and its data

Average: Mean, median, and mode

An average is an attempt to describe qualities of a group by combining qualities of individual members of the group.

When we use it in conversation, the term "average" can be somewhat imprecise. In statistics, however, we need the meaning of average to be more precise, and we accomplish this goal by using several statistical measures that describe data about a group, or population. These measures include the mean, the median, and the mode, which tell us some different things about the distribution of those individual qualities or values.

These measures represent three different approaches to averaging, each of which is useful in different situations, and each has its own strengths and weaknesses.

Mean

The arithmetic mean is the method of averaging values that is the most common type of mean encountered in reading massage research literature. You're very likely to be familiar with it already, since it's commonly used to calculate grades. The arithmetic mean is calculated by adding all the values of the data points together and dividing that cumulative number by the total number of data points.

 

 

 

Arithmetic mean (average) value of 5 final exam grades.

Figure 9.1: Mean (average) value of 5 final exam grades.

 

The term "mean" by itself is frequently used in the literature as shorthand for arithmetic mean. If you ever do encounter another, less common measurement of mean used in a study, the specific type, such as geometric mean or harmonic mean, will be explicitly noted (and if you do, I'd love to hear about it, since I've never seen them used in massage research).

 

The mean is often represented in the literature by the letters m̄, m, M, x̄, x, or X. For example, the phrase in parenthesese in this statement in a massage research article that

Forty-eight children (M age = 4.8 years) infected with HIV/AIDS and living in the Dominican Republic were randomly assigned to a massage therapy or a play session control group.

Source: Hernandez-Reif M, Shor-Posner G, Baez J, Soto S, Mendoza R, Castillo R, Quintero N, Perez E, Zhang G. Dominican Children with HIV not Receiving Antiretrovirals: Massage Therapy Influences their Behavior and Development. Evid Based Complement Alternat Med. 2008 Sep;5(3):345-54. doi: 10.1093/ecam/nem032. PMID: 18830444 PMCID: PMC2529379 Free fulltext PDF

 

indicates that the 48 children in the study were, on average (mean), 4.8 years old.

Figure 4-5 shows data excerpted from a study that investigated whether a single massage treatment would alter the flexibility of the hamstring muscles in physically active young men, as measured by the observed value on the sit-and-reach test.1 Barlow 2004 investigated whether a single massage would alter the flexibility of the hamstring in physically-active young men, as measured by the mean (average) value on the sit-and-reach test.

 

Barlow A, Clarke R, Johnson N, Seabourne B, Thomas D, Gal J. Effect of massage of the hamstring muscle group on performance of the sit and reach test. Br J Sports Med. 2004 Jun;38(3):349-51. PMID: 15155444 PMCID: PMC1724798 Free fulltext PDF

Source: School of Applied Sciences, University of Glamorgan, Pontypridd, Wales, UK.

Abstract

OBJECTIVE: To investigate if a single massage of the hamstring muscle group would alter the performance of the sit and reach test.

METHODS: Before treatment, each of 11 male subjects performed the sit and reach test. The treatment consisted of either massage of the hamstring muscle group (both legs, total time about 15 minutes) or supine rest with no massage. Performance of the sit and reach test was repeated after treatment. Each subject returned the subsequent week to perform the tests again, receiving the alternative treatment relative to their initial visit. Mean percentage changes in sit and reach scores after treatment were calculated for the massage and no massage treatments, and analysed using Student's t tests.

RESULTS: Mean (SD) percentage changes in sit and reach scores after massage and no massage were small (6.0 (4.3)% and 4.6 (4.8)% respectively) and not significantly different for subjects with relatively high (15 cm and above) values before treatment. Mean percentage changes in sit and reach scores for subjects with relatively low values before treatment (below 15 cm) were large (18.2 (8.2)% and 15.5 (16.2)% respectively), but no significant differences were found between the massage and no massage groups.

CONCLUSIONS: A single massage of the hamstring muscle group was not associated with any significant increase in sit and reach performance immediately after treatment in physically active young men.

 

He included his data in Table 1, so we can calculate the mean of all the sit and reach scores for the subjects (1) before and (2) after the massage by adding all the values in the appropriate column, and then dividing by 11 (the number of subjects in the study):

Barlow's Table 1

 

Averages for Barlow's data

 

The data shows that the mean (average) sit-and-reach scores for the subjects before massage treatment was 16.64, and after one massage treatment, the mean score increased to 18.55. The mean was calculated for the before and after measurements by adding the values in each column and then dividing the total by 11 (the number of subjects in the study).

The disadvantage of the mean measurement is that if any of the data being averaged is extremely high or extremely low, the mean can be so different from that data that it does not give an accurate description, especially when there are only a few data points. An easy example to illustrate this problem is to imagine a millionaire and a homeless person as the two people standing in line at the post office on a given day. The millionaire’s net worth ($1,000,000) added to the net worth of the homeless person (0) = $1,000,000. That total divided by 2 (two subjects) = $500,000. So it could be said that the mean net worth of everyone in line at the post office at that time would be $500,000. While this is a true statement in mathematical terms, it does not accurately describe the financial situation of the two people in line at the post office that day, dramatically demonstrating how the mean fails when confronted with populations that do not distribute in a normal (bell-curve) manner

The disadvantage of the mean is that it can’t tell you about extreme values in the data, or how any individual compares to the group, except in the most crudely approximate way. In order to examine this limitation further, let’s set up our own table including the mean score (shown in the callouts in table 1 on the previous page). In the last column, observe the difference in score for each subject from the mean score.

 

Median

The median is representative of the data in a different way from the mean—it is the value above which half the data falls, and below which the other half of the data falls.

Exercise:

Let’s find the median score for the before massage group in Barlow’s sit-and-reach study. First, we’ll rearrange our table in descending order, so that the sit-and-reach scores go from highest to lowest.

You'll noticee that sometimes in the real world, things are not really cut-and dried—with this data there are 4 scores above the medium score, and 5 scores above the Medium score. The scores for subject 10 and for subject 6 were as close to a middle score as we could get. Therefore the median value is roughly in the middle.

 

Figure 9.2: Sit-and-reach scores in centimeters (Barlow 2004).

 

Note that here there are exactly 4 scores above and 4 scores below the score of 17, which is the score for subjects 7,6 2. So in this case the median value is exactly in the middle.

 

Here is another example of a researcher reporting medium values. Note: just focus on the black text; we’ll get to percentiles in a little bit.

 

* Data are reported as median (25th percentile, 75th percentile). Fifty-three patients (30 controls and 23 minute spheres) completed the study.

 

** Morphine requirements (47 mg [27, 58] vs ***41 mg [25, 69]) and pain scores (29.5 mm [16, 59] vs 40 mm [22, 58]) were similar in the control and acupressure groups. (Sakurai 2003)

 

9.3 Percentile

 

If a value is in the 99th percentile, that means that 99% of the values are lower. For a value in the 60th percentile, 60% of the values are lower; for a value in the 30th percentile, 30% of the values are lower, and so forth.

 

Now let’s revisit the excerpt from Sakurai 2003 on the previous page . Note that the previously white print is now black bolded because we are going to discuss that now.

 

Figure 9.3: text.

Figure 9.4: text.

 

*Data are reported as median (25th percentile, 75th percentile). Fifty-three patients (30 controls and 23 minute spheres) completed the study. **Morphine requirements (47 mg [27, 58] vs ***41 mg [25, 69]) and pain scores (29.5 mm [16, 59] vs 40 mm [22, 58]) were similar in the control and acupressure groups. (Sakurai 2003)

 

Interpreting what is this says is not as difficult as it first looks. Below illustrates how this shorthand gets translated. As you can see, it not really that difficult.

 

* “Data are reported as median (25th percentile, 75th percentile).”

184

Figure 9.5: text.

Figure 9.6: text.

**“Morphine requirements (47 mg [27, 58] vs *** 41 mg [25, 69])”

 

 

9.4 Mode

 

The mode represents the data in a different way than either the mean or themedian. The mode is the value which occurs most frequently in the data. Inother words, it is the value “most typical” of the population.

 

A data set can have no mode (if all the values are unique—although some authorsconsider this to mean that every value is a mode), one mode (if one particularvalue is the most frequently-occurring value), or more than one mode (if multiplevalues are tied for the most-frequently occurring value).

 

Figure 9.7: text.

 

This data set has two modes, 14 and 13.

 

This data set has one mode, 17.

 

You won’t come across mode too often as its meaningfulness in research applies more narrowly than mean and percentile. Perhaps give an example of a fictitiousresearch experiment where MODE would be meaningful.

 

The statistics we have covered up until now are useful, but in order to get aclearer picture of what all the data looks like, there are more refined tools we canuse to understand the relationships among the values. Standard deviation (SD)is one of those tools

 

9.5 Preparing to discuss standard deviation

 

This (standard deviation) is probably the hardest concept we are going to cover.But it is worth it, because of the value of the concept and its applicability to somany different situations. So let’s break this up into small pieces to tackle onepiece at a time, and see how we can use it, not only in reading massage research,but in many other situations as well.

 

I remember when a sad event in my childhood brought home to me the concept of a population, although I certainly didn’t think about it that way at the time.

 

Figure 9.8: text.

 

When I was in fifth grade, a child at my school died. Although I didn’t know the child personally, I was sad to hear the news, as was everyone else there. ThenI started putting it together with what had happened the year before, whenanother child had died. I figured out that there must be some kind of rule thatevery year one child dies at our school, and that next year it could be me. Thatparticular thought was scary enough to keep me up awake for a couple of nights.

 

Although I was kind of on the right track in certain ways, there were some flaws in my analysis; however, as I was 10 years old at the time, I think I can be forgiven for a certain lack of mathematical rigor. The observation that there was a pattern—the death of one child per year—was a reasonable observationfor that very short time span, although if I had been paying attention longer, itis possible that there would have been many other years where no child at thatschool died.

 

But from that observation of a pattern, I went a little too far in imaging a “rule”that one child died every year—it would be better to think of it as a descriptionof what did happen, rather than as a prescription for what must happen. If youthink of it in that way, you can see one function that statistics serves—descriptivestatistics summarizes the data about a population or a study, and describes inwhat way they are similar (central tendency) or different (variability). It takes avery diverse group, and tries to convey concisely and efficiently to the audiencewhat the important measures of that group are. The statistical measures we havegone over up until now—mean, median, mode, and percentile—are descriptivestatistics.

 

Figure 9.9: text.

 

 

Inferential statistics takes things a step farther—it lets us use reasoning to infer, or make predictions, about the group, based on what we already know. It’s what I was dimly sensing when I realized that another could die at my school the next year1, and so came up with my “rule”. The statistics we are going to talk about now are inferential statistics, and understanding the concepts of normal distribution, standard deviation, types of error, sample size and power, and inter-observer agreement will make a great deal—even most—of the massage

research literature accessible to you.

 

1I was, and still am, quite happy to have been proved wrong on that prediction.

 

 

Finally, one more thing about my example, and then we’ll let it go—remember in Chapter 3 when we talked about how science is about what’s common to every one, while spirituality can be about what is unique and special? I’ve gotten the sense from some of my students, and have felt it myself, that there is something vaguely disturbing about talking about such sad events as a child’s death in terms of a population event, and I suspect that some of the aversion I’ve heard people express to science has something to do with the sense that science somehow sucks out what is special about being human. I would respectfully suggest that the two are not mutually exclusive—it is possible to operate in the two different modes at different times, as appropriate, and in that way to get the best of both—the rigor AND the compassion, as we talked about earlier. AND THAT ....

 

 

9.6 Standard deviation

 

Standard deviation has a lot in common with the averages we discussed earlier, and we will talk about how we can use it as a kind of descriptive statistic. To understand standard deviation, however, we first have to all be on the same page about what normal distribution means, so we’re going to talk about that first, and then come back to standard deviation.

 

9.6.1 Normal distribution

 

We talked earlier in Chapter 2 about how “normal” is one of those words that has a specific, neutral meaning in science, yet has very strong connotations in everyday language. It’s unfortunate that this word is so heavily loaded, as it is one of the most useful and powerful statistical concepts there is, and serves as a gateway to the world of inferential statistics. The word has been used as a weapon to enforce social and medical agendas—after I have taught a session on massage research and fibromyalgia, I’ve had people some up to me afterwards and tell me how painful it is to be told they are not “normal”, where “normal” is a prescriptive word for how they should be. Let’s be very clear that this is not how we’re using the word. Our specific statistical use of the word is defined below.

 

First of all, think about a situation you’ve been in with a lot of other people—a lot of the time, a few people are extreme in some value one way or the other, but most people are pretty close to average. We’ve all been born, so let’s consider the weight at birth in all healthy babies born in the US as our example situation. A few very big babies: 8 (1/2) to 9 pounds, A few very small babies: 6 to 6 (1/2) pounds Most babies somewhere around 7 or 8 pounds or so, more or less: called normal birthweight because it forms a normal distribution.

 

This is what that normal distribution looks like. The curved line is called a bell curve—a pretty descriptive name, because it is indeed shaped like a bell [bell character].

 

Figure 9.10: text.

 

Figure 9-2: Bell curve showing normal distribution of birthweights

 

While all bell curves have the same basic features of a small “tail” at either end (representing a few extreme values) and a large “bump” in the middle (representing a lot of typical values), there can still be some dramatic differences in how the data the bell curve represents is arranged. The following are both bellcurves, but look how different they are from each other:

 

Figure 9-3: Two different bell curves

 

Figure 9.11: text.

 

• The graph on the left is tall and narrow and drops off sharply.

 

• The graph on the right is shorter and drops off much more gently.

 

These differences are useful, because they tell something about the data being studied—namely, about how different the extreme values are from the more typical values for that population. The standard deviation, which is coming up, will explore that distinction in more detail. So now that we are familiar with normaldistributions and bell curves, let’s return to standard deviation, and see how that helps us with reading the massage research literature.

 

9.7 Back to standard deviation

 

We discussed earlier that the mean can sometimes be a useful way to summarize and describe the data. But the mean can be so different from that data that it does not give an accurate description of that data because the data under study is extremely high or extremely low. To put it another way, according to the “Bill Gates Net Worth” web page2 just now, at this moment, Bill Gates’ net worth is $27,600,000,000 (give or take). So, if I told you that on average, Bill Gates and I each have a net worth of $13,800,000,000—did you really learn anything relevant and useful about me3? Or did you just get a graphic demonstration of how badly the mean fails when it has to deal with extreme values?

 

2Yes, there really are some people with that much spare time on their hands. You can find it at:

http://bgnw.marcus5.net/bgnw.html if you like.

 

3If only! :)

 

 

Clearly, we need a better tool for describing populations that—like our big, small, and average-sized babies—exhibit a great deal of variation, and the standarddeviation (SD) is one of those tools we can use. We won’t bother with the mathematics behind the SD here, because for our purposes, I just want you to be able to recognize it when you come across it in the literature, and to understand what it means.

 

Sometimes you’ll see the SD called the mean of the mean [ref]—that refers to the way it is computed mathematically, and also to the way it describes data more accurately than just the mean alone does. Assuming a normal distributionof data (our bell curve), the standard deviation describes where in the bell curve the data lies. And so the normal distribution and standard deviation can deal with extreme data as well as more representative data. Further, a large standard deviation can indicate to the reader that there is something wrong with the data, or with the model, or with both.

 

First of all, let’s put some further meaning on our bell curve. Below we have a bell curve where the different sections are shaded.

 

Figure 9.12: text.

 

Figure 9-4: A bell curve with standard deviations

 

• The solid gray area, referred to as 1 standard deviation from the mean, represents the largest number of data values. Values that fall in this area of the graph are considered the most normal. We expect about 68% of our values to lie somewhere within this range.

 

• The striped area, referred to as 2 standard deviations from the mean, represents a larger number of the values. We expect about 98% of our values to lie within this range (notice that to get from one striped range to the other, we have to go through the gray ranges, so we include that previous 68% in our estimate of 98%).

 

• The data in the black area, referred to as 3 standard deviations from themean, represents a small percentage of the values. We expect about 99.7% of our values to lie within this range (notice that to get from one black range to the other, we have to go through the striped ranges and the gray ranges, so we include that 98% in our estimate of about 99.7%).

 

So now you can begin to see how this addresses the problem with the mean and the extreme values that we’ve encountered—if we know the mean, and we knowhow far away (how spread-out) from the mean a particular value of data is, then we have a much more powerful tool for accurately and clearly representing the data than the mean alone is able to provide4.

 

This is useful because it tells us how “spread-out” the population is. The larger the SD, the more chance you should be somewhat skeptical of the study. Remember our previous two bell curves?

 

Figure 9-5: Two different bell curves with varying standard deviations

 

A false positive error (also called a type I error), for our purposes, exists when it looks like the treatment, such as massage, caused an effect when it really didn’t. In other word, its positive result was false. Here is a hypothetical research experiment to see the effect of massage on blood pressure to illustrate how this can happen.

 

4So you get a much more accurate description of where I really am if I tell you that on average, Bill Gates

and I each have a net worth of $13,800,000,000.00, and that I am more than 3 SD away from that mean.

Let’s just leave it at that for now. :)

 

Figure 9.13: text.

 

Figure 9.14: text.

 

Figure 9-6

 

In this experiment, the researchers concluded that massage does indeed lower blood pressure. But suppose the researchers made a change in the experimental design and instead of having the control subjects sit in a chair for one hour, they lay down on the massage table for an hour which caused a different result for the control group.

 

Figure 9.15: text.

 

Figure 9-7

 

With this experimental design lying down on the table for one hour, without being massaged, also lowered blood pressure. The conclusion from this experimental design would be that lying on the table, and not the massage itself, lowers blood pressure.

 

Note that this was not a real experiment and that this may or may notbe true. Also note that this is an example of an experiment that any massage therapist can carry out.

 

A false negative error (also called a type II error) exists when it looks like the treatment, such as massage, had no effect, but it really did. Here is another hypothetical experiment that demonstrates a false negative error.

 

Suppose the 1 hour massage and also just lying on the table for an hour both resulted in no change in blood pressure.

 

Figure 9.16: text.

 

Figure 9-8

 

However, in this hypothetical experiment, the researcher hypothetically did not pay attention to a couple of important factors. One was that the massage therapist being used to perform the massages was only available on Monday. And on Monday, there were workmen using jackhammers just outside the window. Also, suppose it is summer and the windows are open. However, when the subjects in the control group came participate in the study by just lying on the table without getting a massage, it was later in the week and the workmen were gone.

 

Figure 9-9

 

If the researcher is unaware of the jackhammer annoyance factor, he will conclude that massage does not lower blood pressure. However it is possible that the jackhammer noise was having a blood pressure elevation effect which masked the the blood pressure lowering effect of the massage.

 

Here is a diagram that illustrates this.

 

Figure 9.17: text.

 

Figure 9.18: text.

 

Figure 9-9

 

 

Alpha ()

The statistical measure alpha is the probability of making a false positive error. In most of the research literature which you see, the researcher will tend to set alpha at about 0.05. That 0.05 means a 5% risk making a false positive error. In the example below, that is what Hopper sets his at, and he concludes that, to a 5% or less probability of seeing an effect that is not really present, that his intervention (dynamic soft-tissue mobilization) significantly increased hamstring flexibility in the healthy male subjects he studied.

Example:

OBJECTIVES: The purpose of this study was to investigate the effect of dynamic soft tissue mobilisation (STM) on hamstring flexibility in healthy male subjects...The alpha level was set at 0.05. RESULTS: Increase in hamstring flexibility was significantly greater in the dynamic STM group than either the control or classic STM groups with mean (standard deviation) increase in degrees in the HFA measures of 4.7 (4.8), -0.04 (4.8), and 1.3 (3.8), respectively. CONCLUSIONS: Dynamic soft tissue mobilisation (STM) significantly increased hamstring flexibility in healthy male subjects. (Hopper 2005) (compare this with our other hamstring study, too). (what other hamstring study?)

Figure 9.19: text.

 

alpha (α)

 

The statistical measure α is the probability of making a false positive error. In most of the research literature which you see , the researcher will tend to set α at about 0.05. That 0.05 means a 5% risk making a false positive error. In the example below, that is what Hopper sets his α at, and he concludes that, to a 5% or less probability of seeing an effect that is not really present, that his intervention (dynamic soft-tissue mobilization) significantly increased hamstring flexibility in the healthy male subjects he studied.

 

Figure 9.19: text.

 

Example:

 

OBJECTIVES: The purpose of this study was to investigate the effect of dynamic soft tissue mobilisation (STM) on hamstring flexibility in healthy male subjects...The alpha level was set at 0.05. RESULTS: Increase in hamstring flexibility was significantly greater in the dynamic STM group than either the control or classic STM groups with mean (standard deviation) increase in degrees in the HFA measures of 4.7 (4.8), -0.04 (4.8), and 1.3 (3.8), respectively. CONCLUSIONS: Dynamic soft tissue mobilisation (STM) significantly increased hamstring flexibility in healthy male subjects. (Hopper 2005) (compare this with our other hamstring study, too). (what other hamstring study?)

 

beta (β)

Don’t worry too much about β: most massage research studies don’t address it explicitly in their published reports. But you may come across it, and since it is kind of the “mirror-image” of , I’ll just include it here for your reference more than anything. Just like α is the probability of making a false positive error, β is the probability of making a false negative error. For example if α is the risk you run of seeing a bear that isn’t there, β is the risk you run of denying a bear that really is there when it isn’t. figure out a picture to illustrate

 

 

Beta (β)

Don’t worry too much about β: most massage research studies don’t address it explicitly in their published reports. But you may come across it, and since it is kind of the “mirror-image” of α, I’ll just include it here for your reference more than anything. Just like is the probability of making a false positive error, is the probability of making a false negative error. For example if α is the risk you run of seeing a bear that isn’t there, β is the risk you run of denying a bear that really is there when it isn’t.
 
 

 

9.8 p-value

 

My biostatistics professor would have an aneurysm (sorry, Dr. L.!) if he saw how we are going to treat the concept of p-value. And in the bigger picture, he would be right—it is a misunderstood and misused statistical measure, and deserves a fuller and richer treatment by experimenters and statisticians.

 

On the other hand, the purpose of this book is to give you enough information to read massage research, not to turn you into a specialist in any given area in experimental design. So our strategy will be to understand p-value enough to use it the way most clinicians do to read research articles. and we will note that in itself, that does not fully do justice to the concept.

 

9.9 Sampling

 

9.9.1 Power and sample size

 

Confidence interval and confidence level

 

In the political season (and at other times, too), we often see poll results reportedas a certain set of results, plus or minus a particular margin of error.

 

Although it’s clear from the context that it means a little uncertainty in the exactresults, now that we have discussed the normal distribution, we can understandit in a little more depth.

 

If the poll accurately reflects the population at large, and if we repeated the pollmultiple times, we would expect the results to be about the same, with only alittle bit of variation. The amount that it can vary—positive or negative, sinceit can vary either way—is the margin of error. So if Candidate A is preferred by68% of the population, and Candidate B by 32%, with a margin of error of +/-5%, that means that either candidate’s number could be as much as 5% too highor too low in this poll. So in reality, Candidate A may have anywhere from 63%to 73%, and Candidate B may have anywhere from 27% to 37%. That positiveand negative variation around the reported percentage is the margin of error,which leads us into the concept of a confidence interval. GIGO. Statistical deadheat.

 

You can think of the confidence interval as a band or a range around the reportedvalue—the “true” number lies somewhere within that band. The confidence level,by contrast, reports how confident we are that the true result lies within thatband.

 

examples

 

 

κ (kappa)

Remember how back in Chapter 2, we discussed how, in order to talk scientifically to each other about bears, we first had to both agree that there really is a real-world referent bear? Kappa (or the Greek letter κ) is the measurement we use to see how much different observers agree on the “bear”—the subject or entity under study.

Think of kappa as a percentage, written in decimal-number form. So the highest (best) value kappa can have is 100% agreement, written as 1.00. Examples of less than perfect (less than 100 0.3 (30%)

0.5 (50%) 0.7 (70%) Assigning value judgments (“good”, “moderate”, “poor”) to those kappa values is a matter of interpretation, and different experts differ on what the numbers mean. Let’s look at the next three studies on agreement in reflexology first individually, then as a set, because it opens a bigger question—how do we compare values across studies?

The aim of this study was to investigate whether [reflexology] can be used as a valid method of diagnosis...Inter-rater reliability (kappa) scores were very low, providing no evidence of agreement between the examiners. CONCLUSION: Despite certain limitations to the data provided by this study, the results do not suggest that reflexology techniques are a valid method of diagnosis. (White 2000)

 

White is saying that the people who were using reflexology for diagnosis had very low agreement—in other words, their diagnoses were very inconsistent from one person to the next. Presumably, White would agree with criticisms on reflexology theory on the basis that it is internally inconsistent, if it can lead to so much inter-observer variation. In other words, we can’t agree that there is a bear there.

 

We wanted to test the specific theory behind foot reflexology. Three reflexotherapists examined 76 patients of whom they had no previous knowledge...Interrater agreement, measured by weighted Kappa, ranged from 0.04 to 0.22, and was significantly better than chance (p < 0.05) for six parts of the body. The overall Kappa was 0.11 (95% CI: 0.08- 0.14)...The statistical agreement may be better than pure chance, but is too low to be of any clinical significance. (Baerheim 1998)

 

(See? There’s our p and our confidence interval, as well!)

Baerheim does not dispute that the agreement among the three reflexotherapists is better than pure chance would lead us to expect, but is not convinced that this finding translates into anything that would be useful in real-world practice (clinical significance). So in other words, something seems to be there, but we can’t agree on whether or not it is a bear and what it means.

AIM: The purpose of this study was to test the reliability and validity of the reflexological diagnosis method. METHODS: Eighty patients from various clinics and departments in the Hillel Yaffe Medical Center, Hadera, were examined twice by two different reflexologists. The diagnostics that resulted from these examinations were compared with the conventional medical diagnostics of the same patients. In addition, the level of correlation between the two reflexological examinations was tested. RESULTS: Out of 18 body systems in 6 a statistically significant correlation was found between the conventional medical diagnosis and the two reflexological examinations. In 4 body systems, there was a statistically significant correlation between the conventional medical diagnosis and one out of the two reflexological examinations. The systems in which correlation was found are characterized by having a defined anatomic region. The examination of the significance of the diagnoses regarding the components of the body systems resulted in statistical significance in only 4 out of the 32 components. Between the two reflexological examinations, a statistically significant correlation was found in 14 out of the 18 body systems, and in only 15 out of the 32 system components. CONCLUSION: The reflexology method has the ability to diagnose (reliable and valid) at a systematic level only, and this is applicable only to those body systems that represent organs and regions with an exact anatomic location. (Raz 2003)

 

Raz found more agreement than either Baerheim or White, and drew the conclusion that in certain limited situations (systematic level, and only those systems with exact anatomical locations), reflexology diagnoses are reliably and validly consistent, not only with other reflexology diagnoses, but also with diagnoses from conventional medicine.

So can we put all these studies together into a meta-analysis? No—they are studying issues which are subtly different, but that difference is enough to make it comparing apples and oranges. What we can do is think about it at the very abstract level as “all of these studies address issues in inter-observer agreement among reflexologists, and they find varying degrees of consistency, from ’very low’ to ’reliable and valid at a systematic level only”’—but we can’t combine them into a meta-analysis on that basis.

So what do you think explains the different results? How would you design a study to get at and resolve the underlying issues? (I don’t have an answer for you waiting in the “Answers to Exercises” section; that’s a genuinely open-ended question for you to consider and come to your own conclusions about.)

Although the reflexology already represented a nice range of interpretations for you to look at, there are many more examples in the literature. Because the Cyriax method is another modality which is understudied in comparison with Swedish massage, let’s take a quick look at kappa in studies on Cyriax next. (Don’t worry about the statistic called “rho” mentioned in one study; we are not including it in “Just Enough Statistics”.)

See if you get the following interpretations out of the text below, and how you do so:

• Pellechia finds the Cyriax model highly reliable in evaluating shoulder lesions;

• Chesworth finds Cyriax’s “end-feel” technique highly reliable; in a different study, Hayes disagrees, finding it “questionable”.

James Cyriax’s approach to diagnosis and treatment of soft tissue disorders is frequently used by orthopaedic and sport physical therapists. The reliability of using Cyriax’s system to determine diagnostic categories, however, has not been established. The purpose of this study was to examine the intertherapist reliability of assessments made using Cyriax’s shoulder evaluation. Twenty-one cases of painful shoulder were evaluated independently by two experienced physical therapists. Therapists used a checklist to indicate their assessment of each case by selecting a specific shoulder lesion or by indicating that the case did not fit the Cyriax model. Cohen’s kappa statistic was used to measure intertherapist agreement. Therapists classified 19 of the 21 cases into the same diagnostic category for a percent agreement of 90.5%. The kappa value was .875, indicating “almost perfect” agreement. Both therapists classified the same four cases of painful shoulder as not fitting the Cyriax model of soft tissue examination. The results of this study show that the Cyriax evaluation can be a highly reliable schema for assessing patients with shoulder pain. (Pellechia 1996)

 

 

BACKGROUND AND PURPOSE: Findings related to joint function can be recorded with movement diagrams or by characterizing the “end-feel” according to the procedure described by Cyriax. Because both meth- ods are used to classify pain and resistance in relation to joint range of motion (ROM), the purpose of this study was to simultaneously evaluate the reliability of these categorizations in a patient sample. SUBJECTS: Two physical therapists performed 2 assessments of passive lateral rotation of the shoulder in 34 patients. METHODS: Pain and resistance findings were recorded using movement diagrams and end-feel categories. Intraclass correlation coefficients (ICC[2,1]) were used to analyze the ratio (movement diagram) data, and kappa statistics (kappa) were used to analyze the categorical (end-feel) data. RESULTS: Intrarater ICCs varied from .58 to .89. Interrater ICCs for locating maximum pain and resistance in joint ROM varied from .85 to .91. Other interrater ICCs were lower (ICC = .34-.88). Intrarater kappa values for end-feel were moderate (kappa = .48-.59), and interrater kappa values were substantial (kappa = .62-.76). CONCLUSION AND DISCUSSION: Movement diagram measures conceptually related to the end of joint ROM and end-feel were highly reliable. This finding and the fact that additional end-feel categories were introduced in the study may partially explain the end-feel reliability findings. Consideration of their use in future studies may help to determine their clinical utility. (Chesworth 1998)

 

BACKGROUND AND PURPOSE. We explored the construct validity and test-retest reliability of the passive motion component of the Cyriax soft tissue diagnosis system. We compared the hypothesized and actual patterns of restriction, end-feel, and pain/resistance sequence (P/RS) of 79 subjects with osteoarthritis (OA) of the knee and examined associations among these indicators of dysfunction and related constructs of joint motion, pain intensity, and chronicity. SUBJECTS. Subjects had a mean age of 68.5 years (SD = 13.3, range = 28-95), knee stiffness for an average of 83.6 months (SD = 122.4, range = 1-612), knee pain averaging 5.6 cm (SD = 3.1, range = 0-10) on a 10-cm visual analogue scale, and at least a 10-degree limitation in passive range of motion (ROM) of the knee. METHODS. Passive ROM (goniometry, n = 79), end-feel (n = 79), and P/RS during end-feel testing (n = 62) were assessed for extension and flexion on three occasions by one of four experienced physical therapists. Test-retest reliability was estimated for the 2-month period between the last two occasions. RESULTS. Con- sistent with hypotheses based on Cyriax’s assertions about patients with OA, most subjects had capsular end-feels for extension; subjects with tis- sue approximation end-feels for flexion had more flexion ROM than did subjects with capsular end-feels, and the P/RS was significantly correlated with pain intensity (rho = .35, extension; rho = .30, flexion). Contrary to hypotheses based on Cyriax’s assertions, most subjects had noncapsular patterns, tissue approximation end-feels for flexion, and what Cyriax called pain synchronous with resistance for both motions. Pain intensity did not differ depending on end-feel. The P/RS was not correlated with chronic- ity (rho = .03, extension; rho = .01, flexion). Reliability, as analyzed by intraclass correlation coefficients (ICC[3,1]) and Cohen’s kappa coefficients, was acceptable (¿ or = .80) or nearly acceptable for ROM (ICC = .71-.86, extension; ICC = .95-.99, flexion) but not for end-feel (kappa = .17, ex- tension; kappa = .48, flexion) and P/RS (kappa = .36, extension; kappa = .34, flexion). CONCLUSION AND DISCUSSION. The use of a quantitative definition of the capsular pattern, end-feels, and P/RS as indicators of knee OA should be reexamined. The validity of the P/RS as representing chronicity and the reliability of end-feel and the P/RS are questionable. More study of the soft tissue diagnosis system is indicated. (Hayes 1994)

 

κ (kappa)

Remember how back in Chapter 2 (check that this is in chapter 2), we talked about how, in order to talk scientifically to each other about bears, we first had to both agree that there really is a real-world referent bear? kappa (the Greek letter κ) is the measurement we use to see how much different observers agree on the “bear”—the subject or entity under study.

 

Think of kappa as a percentage, written in decimal-number form. So the highest (best) value kappa can have is 100% agreement, written as 1.00. Examples of less than perfect (κ less than 100

 

0.3 (30%)

 

0.5 (50%)

 

0.7 (70%)

 

Assigning value judgments (“good”, “moderate”, “poor”) to those kappa values is a matter of interpretation, and different experts differ on what the numbers mean.

 

Let’s look at the next three studies on agreement in reflexology first individually, then as a set, because it opens a bigger question—how do we compare values across studies?

 

The aim of this study was to investigate whether [reflexology] can be used as a valid method of diagnosis...Inter-rater reliability (kappa) scores were very low, providing no evidence of agreement between the examiners. CONCLUSION: Despite certain limitations to the data provided by this study, the results do not suggest that reflexology techniques are a valid method of diagnosis. (White 2000)

 

White is saying that the people who were using reflexology for diagnosis had very low agreement—in other words, their diagnoses were very inconsistent from one person to the next. Presumably, White would agree with criticisms on reflexology theory on the basis that it is internally inconsistent, if it can lead to so much inter-observer variation. In other words, we can’t agree that there really is a bear there.

 

We wanted to test the specific theory behind foot reflexology. Three reflexotherapists examined 76 patients of whom they had no previous knowledge...Interrater agreement, measured by weighted Kappa, ranged from 0.04 to 0.22, and was significantly better than chance (p < 0.05) for six parts of the body. The overall Kappa was 0.11 (95% CI: 0.08-0.14)...The statistical agreement may be better than pure chance, but is too low to be of any clinical significance. (Baerheim 1998)

 

(See? There’s our p and our confidence interval, as well!)

 

Baerheim does not dispute that the agreement among the three reflexotherapists is better than pure chance would lead us to expect, but is not convinced that this finding translates into anything that would be useful in real-world practice (clinical significance). So in other words, something seems to be there, but we can’t agree on whether or not it is a bear and what it means.

 

AIM: The purpose of this study was to test the reliability and validity of the reflexological diagnosis method. METHODS: Eighty patients from various clinics and departments in the Hillel Yaffe Medical Center, Hadera, were examined twice by two different reflexologists. The diagnostics that resulted from these examinations were compared with the conventional medical diagnostics of the same patients. In addition, the level of correlation between the two reflexological examinations was tested. RESULTS: Out of 18 body systems in 6 a statistically significant correlation was found between the conventional medical diagnosis and the two reflexological examinations. In 4 body systems, there was a statistically significant correlation between the conventional medical diagnosis and one out of the two reflexological examinations. The systems in which correlation was found are characterized by having a defined anatomic region. The examination of the significance of the diagnoses regarding the components of the body systems resulted in statistical significance in only 4 out of the 32 components. Between the two reflexological examinations, a statistically significant correlation was found in 14 out of the 18 body systems, and in only 15 out of the 32 system components. CONCLUSION: The reflexology method has the ability to diagnose (reliable and valid) at a systematic level only, and this is applicable only to those body systems that represent organs and regions with an exact anatomic location. (Raz 2003)

 

Raz found more agreement than either Baerheim or White, and drew the comclusion that in certain limited situations (systematic level, and only those systems with exact anatomical locations), reflexology diagnoses are reliably and validly consistent, not only with other reflexology diagnoses, but also with diagnoses from conventional medicine.

 

So can we put all these studies together into a single rigorous and credible meta-analysis? No—they are studying issues which are subtly different, but that difference is enough to make it comparing apples and oranges. What we can do is think about it at the very abstract level as “all of these studies address issues in inter-observer agreement among reflexologists, and they find varying degrees of consistency, from ’very low’ to ’reliable and valid at a systematic level only”’—but we can’t combine them into a meta-analysis on that basis.

 

So what do you think explains the different results? How would you design a study to get at and resolve the underlying issues? (I don’t have an answer for you waiting in the “Answers to Exercises” section; that’s a genuinely open-ended question for you to consider and come to your own conclusions about.)

 

Although the reflexology already represented a nice range of interpretations for

you to look at, there are many more examples in the literature. Because the Cyriax method is another modality which is understudied in comparison with Swedish massage, let’s take a quick look at kappa in studies on Cyriax next. (Don’t worry about the statistic called “rho” mentioned in one study; we are not including it in “Just Enough Statistics”.)

 

See if you get the following interpretations out of the text below, and how you do so:

 

• Pellechia finds the Cyriax model highly reliable in evaluating shoulder lesions;

 

• Chesworth finds Cyriax’s “end-feel” technique highly reliable; in a different study, Hayes disagrees, finding it “questionable”.

 

James Cyriax’s approach to diagnosis and treatment of soft tissue disorders is frequently used by orthopaedic and sport physical therapists. The reliability of using Cyriax’s system to determine diagnostic categories, however, has not been established. The purpose of this study was to examine the intertherapist reliability of assessments made using Cyriax’s shoulder evaluation. Twenty-one cases of painful shoulder were evaluated independently by two experienced physical therapists. Therapists used a checklist to indicate their assessment of each case by selecting a specific shoulder lesion or by indicating that the case did not fit the Cyriax model. Cohen’s kappa statistic was used to measure intertherapist agreement. Therapists classified 19 of the 21 cases into the same diagnostic category for a percent agreement of 90.5%. The kappa value was .875, indicating “almost perfect” agreement. Both therapists classified the same four cases of painful shoulder as not fitting the Cyriax model of soft tissue examination. The results of this study show that the Cyriax evaluation can be a highly reliable schema for assessing patients with shoulder pain. (Pellechia 1996)

 

BACKGROUND AND PURPOSE: Findings related to joint function can be recorded with movement diagrams or by characterizing the “end-feel” according to the procedure described by Cyriax. Because both methods are used to classify pain and resistance in relation to joint range of motion (ROM), the purpose of this study was to simultaneously evaluate the reliability of these categorizations in a patient sample. SUBJECTS: Two physical therapists performed 2 assessments of passive lateral rotation of the shoulder in 34 patients. METHODS: Pain and resistance findings were recorded using movement diagrams and end-feel categories. Intraclass correlation coefficients (ICC[2,1]) were used to analyze the ratio (movement diagram) data, and kappa statistics (kappa) were used to analyze the categorical (end-feel) data. RESULTS: Intrarater ICCs varied from .58 to .89. Interrater ICCs for locating maximum pain and resistance in joint ROM varied from .85 to .91. Other interrater ICCs were lower (ICC = .34-.88). Intrarater kappa values for end-feel were moderate (kappa = .48-.59), and interrater kappa values were substantial (kappa = .62-.76). CONCLUSION AND DISCUSSION: Movement diagram measures conceptually related to the end of joint ROM and end-feel were highly reliable. This finding and the fact that additional end-feel categories were introduced in the study may partially explain the end-feel reliability findings. Consideration of their use in future studies may help to determine their clinical utility. (Chesworth 1998)

 

BACKGROUND AND PURPOSE. We explored the construct validity and test-retest reliability of the passive motion component of the Cyriax soft tissue diagnosis system. We compared the hypothesized and actual patterns of restriction, end-feel, and pain/resistance sequence (P/RS) of 79 subjects with osteoarthritis (OA) of the knee and examined associations among these indicators of dysfunction and related constructs of joint motion, pain intensity, and chronicity. SUBJECTS. Subjects had a mean age of 68.5 years (SD = 13.3, range = 28-95), knee stiffness for an average of 83.6 months (SD = 122.4, range = 1-612), knee pain averaging 5.6 cm (SD = 3.1, range = 0-10) on a 10-cm visual analogue scale, and at least a 10-degree limitation in passive range of motion (ROM) of the knee. METHODS. Passive ROM (goniometry, n = 79), end-feel (n = 79), and P/RS during end-feel testing (n = 62) were assessed for extension and flexion on three occasions by one of four experienced physical therapists. Test-retest reliability was estimated for the 2-month period between the last two occasions. RESULTS. Consistent with hypotheses based on Cyriax’s assertions about patients with OA, most subjects had capsular end-feels for extension; subjects with tissue approximation end-feels for flexion had more flexion ROM than did subjects with capsular end-feels, and the P/RS was significantly correlated with pain intensity (rho = .35, extension; rho = .30, flexion). Contrary to hypotheses based on Cyriax’s assertions, most subjects had noncapsular patterns, tissue approximation end-feels for flexion, and what Cyriax called pain synchronous with resistance for both motions. Pain intensity did not differ depending on end-feel. The P/RS was not correlated with chronicity (rho = .03, extension; rho = .01, flexion). Reliability, as analyzed by intraclass correlation coefficients (ICC[3,1]) and Cohen’s kappa coefficients, was acceptable (¿ or = .80) or nearly acceptable for ROM (ICC = .71-.86, extension; ICC = .95-.99, flexion) but not for end-feel (kappa = .17, extension; kappa = .48, flexion) and P/RS (kappa = .36, extension; kappa = .34, flexion). CONCLUSION AND DISCUSSION. The use of a quantitative definition of the capsular pattern, end-feels, and P/RS as indicators of knee OA should be reexamined. The validity of the P/RS as representing chronicity and the reliability of end-feel and the P/RS are questionable. More study of the soft tissue diagnosis system is indicated. (Hayes 1994)

 

9.10 Break

 

This chapter and the previous one were really dense in terms of the material wecovered. But the hardest part of learning about reading research is now over, and ifyou have stuck with it this far, I promise you that you will find the rest of the book tobe smooth sailing in comparison, building readily on what you have already learned.

 

Since you’ve worked so hard on the methods and statistic parts, here’s another nicebear picture for you to look at, while we take a well-earned break.

 

Figure 9.20: text.

 

9.11 Exercise 1:

 

9.12 Exercise 2:

 

9.13 Next steps

 

Now that we know what the methods and the most important (for our purposes) statistics are, let’s move on to look at how study data is reported in the “Results” section.

=====================================================================================================

 

 

 

 

 

 

 

Trish Greenhalgh, in her excellent book How to Read a Paper (2001), puts her own twist on the meaning of evidence-based medicine, defining it as: “…the enhancement of a clinician’s traditional skills in diagnosis, treatment, prevention, and related areas through the systematic framing of relevant and answerable questions and the use of mathematical estimates of probability and risk” (p.1). This adds two new concepts to our definition of evidence-based practice. First, practitioners need to learn how to ask clinical questions that are answerable, something that is not as straightforward as it initially sounds. Secondly, we need to learn to understand the components of research that scare many of us the most – the statistics. Statistics are part of the researcher’s effort to demonstrate the extent to which the data being presented is valid. There are many simple ways of evaluating statistics without becoming a statistician. Dryden & Achilles

 

  • Statistics: mean, median, mode, standard deviation, power
  • Average

  • Mean

  • Median

  • Mode

  • Percentile

  • Standard deviation

  • False positive error (type I error)

  • False negative error (type II error)

  • (alpha)

  • (beta)

  • p

  • Confidence interval

  • Sampling

  • Power and sample size

  • (kappa)

 

Objectives for this chapter:

Do

  • name and explain the most common and most important statistical mea- sures used in articles on massage research

 

Know

  • Statistics: mean, median, mode, standard deviation, power • Average • Mean • Median Mode • Percentile • Standard deviation • False positive error (type I error) • False negative error (type II error) • α (alpha) • β (beta) •p • Confidence interval • Sampling • Power and sample size • κ (kappa)

 

Appreciate

 

Descriptive statistics

Challenge

Our data in studies typically comes from measurements on individuals.

How can we use this individual data to make meaningful statements about populations, so that we can generalize the knowledge we gain from research studies?

 

As discussed in Chapter 2, "normal" is a word that has a specific, neutral meaning in science, yet can have strong connotations in everyday language.

In scientific use, "normal" means "typical, usual, or according to the rule or standard".

Generally, very few people in the total population are extreme in some physical measurement; most are pretty close to a typical value in respect to most measurable physical qualities.

For example, consider the birth weight of all babies born in developed countries. In this group, there will be a few big babies, weighing 8½ to 9 pounds or more. There will also be a few small babies, who weigh 6 to 6½ pounds or less.

Unless some sort of problem occurs, such as gestational diabetes or premature birth, most babies weigh about 6½ to 8 pounds at birth.

This weight is called a "normal" birth weight and takes its name from where it is found on the graph of a normal distribution in a defined population or group.

In this example, the population being considered is girl babies born in Europe.

Source: http://basicmathsuccess.files.wordpress.com/2012/02/birth-weights-bell-curve-1.jpg?w=640&h=474 accessed 1 May 2012

 

This image shows a graph representing birth weight as a normal distribution, also called a "bell curve".

In this graph, the vertical axis describes the number of babies, and the horizontal axis describes the birth weight.

The relatively few very small and very large babies are the small quantities shown at the extreme left and right sides of the graph (forming the small “tail” at either end).

The higher number of 6½ to 8 pound babies make up the big “bump” or curve at the center of the graph.

The group making up the largest part of the distribution represents the normal values. In this sense, normal means “most commonly found.”

Since data values for many natural phenomena tend to form this normal distribution—with most of the numbers in the middle and a few extreme values at either end—when not subjected to some purposeful manipulation (such as a massage treatment), this effect can be used as a baseline for measuring the distribution of data after such a treatment to see whether it differs significantly from the way the data was distributed before the treatment.

Recognition of this possibility lies at the heart of some of the most useful and powerful concepts in mathematics and science.

While all bell curves have the same basic features, there can be some important differences in the details of how the data the normal distribution represents is arranged. Figure 4-4 shows two bell curves that illustrate different normal distributions, in which the data is spread out (dispersed) in different ways. In a steep curve, the data is clustered closely together; in a gently sloped curve, the data is spread more widely.

Median

Since extreme values can cause the mean to give a misleading picture of the data it is intended to describe, it is often useful to apply another approach to the concept of averaging. The median can provide insight into the distribution of the data that the mean often cannot.

Median literally means “in the middle.” The strip that runs directly down the middle of a highway, like in this image from Dublin, Ireland, is called a median.

 

This image from Gray's Anatomy shows the median antibrachial cutaneous nerve running down the middle of the upper arm in cutaway.

 

 

 

In statistics, the median represents an average of the data that has been calculated in a different way from the mean.

Imagine first sorting data values into a list that runs from high to low.

Then imagine drawing a line at the exact midpoint, representing the median – the value above which half the data falls and below which the other half of the data falls.

The median is representative of the data in a different way from the mean—it is the value above which half the data falls, and below which the other half of the data falls.

The median for a set of test scores is shown in Table 4-1. In this example, the median score is 65 (Susan’s score). Four students have scores higher than 65 and four others have scores lower than 65, so Susan’s score represents the median

In Figure 4-3,, the median is shown by the line that cuts the bell curve exactly in half, showing that half the babies had birth weights to the left of the center line (lower than the median) and half to the right (higher than the median).

Let’s find the median score for the before massage group in Barlow’s sit-and-reach study. First, we’ll rearrange our table in descending order, so that the sit-and-reach scores go from highest to lowest. Note that sometimes in the real world things are not cut-and dried—with this data there are 4 scores above the medium score, and 5 scores above the Medium score. The scores for subject 10 and for subject 6 were as close to a middle score as we could get. Therefore the median value is roughly in the middle.

 

Figure 9.2: Sit-and-reach scores in centimeters (Barlow 2004).



Note that here there are exactly 4 scores above and 4 scores below the score of 17, which is the score for subjects 7,6 2. So in this case the median value is exactly in the middle.

Here is another example of a researcher reporting median values. Note: just focus on the black text; we’ll get to percentiles in a little bit.

* Data are reported as median (25th percentile, 75th percentile). Fifty-three patients (30 controls and 23 minute spheres) completed the study. ** Morphine requirements (47 mg [27, 58] vs ***41 mg [25, 69]) and pain scores (29.5 mm [16, 59] vs 40 mm [22, 58]) were similar in the control and acupressure groups. (Sakurai 2003)

Let's find the median score for the before massage group in Barlow's sit-and-reach study. First, we'll rearrange our table in descending order, so that the sit-and-reach scores go from highest to lowest.

Note that sometimes in the real world things are not cut-and dried---with this data there are 4 scores above the medium score, and 5 scores above the Medium score. The scores for subject 10 and for subject 6 were as close to a middle score as we could get. Therefore the median value is roughly in the middle.

Note that here there are exactly 4 scores above and 4 scores below the score of 17, which is the score for subjects 7,6 & 2. So in this case the median value is exactly in the middle.

The median is representative of the data in a different way from the mean—it is the value above which half the data falls, and below which the other half of the data falls. Let’s find the median score for the before massage group in Barlow’s sit-and- reach study. First, we’ll rearrange our table in descending order, so that the sit-and-reach scores go from highest to lowest.

Note that sometimes in the real world things are not cut-and dried—with this data there are 4 scores above the medium score, and 5 scores above the Medium score. The scores for subject 10 and for subject 6 were as close to a middle score as we could get. Therefore the median value is roughly in the middle.

Note that here there are exactly 4 scores above and 4 scores below the score of 17, which is the score for subjects 7,6 2. So in this case the median value is exactly in the middle.

Here is another example of a researcher reporting median values. Note: just focus on the black text; we’ll get to percentiles in a little bit.

* Data are reported as median (25th percentile, 75th percentile). Fifty- three patients (30 controls and 23 minute spheres) completed the study. ** Morphine requirements (47 mg [27, 58] vs ***41 mg [25, 69]) and pain scores (29.5 mm [16, 59] vs 40 mm [22, 58]) were similar in the control and acupres- sure groups. (Sakurai 2003)

 

 

 

 

Mode

In a set of data, the mode is the most frequently occurring value. This is a less commonly used way of representing the average of a group of data values. For example, if a group of scores representing reduction in pain levels among eight people who received massage are 1, 2, 4, 5, 5, 5, 6, and 7, the mode is 5, because it occurs more frequently than any of the other scores.

Not all sets of data have a mode; if the pain scores in that population were 1, 2, 3, 4, 5, 6, 7, and 8, all the values occur exactly once. Since no value occurs more often than the other values, there is no mode in that data set. A data set can also have more than one mode. If the scores had been 1, 2, 2, 2, 4, 7, 7, and 7, the data would have had two modes, 2 and 7, because they occurred with equal frequency in that group.

Mode

The mode represents the average of data in a different way than either the mean or the median does. The mode is the value which occurs most frequently in the data. In other words, it is the value “most typical” of the population.

A data set can have no mode (if all the values are unique—although some authors consider this to mean that every value is a mode), one mode (if one particular value is the most frequently-occurring value), or more than one mode (if multiple values are tied for the most-frequently occurring value).

Figure 9.7: text.

This data set has two modes, 14 and 13. This data set has one mode, 17. You won’t come across mode too often as its meaningfulness in research applies more narrowly than mean and percentile. Perhaps give an example of a fictitious research experiment where MODE would be meaningful.

The statistics we have covered up until now are useful, but in order to get a clearer picture of what all the data looks like, there are more refined tools we can use to understand the relationships among the values. Standard deviation (SD) is one of those tools

In a set of data, the mode is the most frequently occurring value. This is a less commonly used way of representing the average of a group of data values. For example, if a group of scores representing reduction in pain levels among eight people who received massage are 1, 2, 4, 5, 5, 5, 6, and 7, the mode is 5, because it occurs more frequently than any of the other scores.

Not all sets of data have a mode; if the pain scores in that population were 1, 2, 3, 4, 5, 6, 7, and 8, all the values occur exactly once. Since no value occurs more often than the other values, there is no mode in that data set. A data set can also have more than one mode. If the scores had been 1, 2, 2, 2, 4, 7, 7, and 7, the data would have had two modes, 2 and 7, because they occurred with equal frequency in that group.



 

Treatment Approach

The mode responses for the best approach to treatment for all conditions were 3 and 4, indicating an “Equal Mix” or “Mostly CAM” (Figure 4).

 

Footracer KG, Monaghan M, Wisniewski NP, Mandel E. Attitudes and practices of massage therapists as related to conventional medicine. Int J Ther Massage Bodywork. 2012;5(1):18-24. PMID: 22553480

 

 

 

Percentiles

A researcher may report data by citing the percentile of a particular value. The percentile represents a value on a scale of 100 that indicates the percent of a distribution that is equal to or below that value. For example, if the birth weight of a baby is reported to be in the 99th percentile, it means that 99% of the other birth weight values are lower than that value.

An example of how median and percentiles of data are reported is shown in Table 4-2. This table summarizes the results of a study (Sakurai et al.) in which researchers were looking for an alternative to morphine for postoperative pain relief in patients who had abdominal surgery. They were evaluating whether the use of minute sphere acupressure had any effect on postoperative pain levels and morphine requirements.

As shown in this table, 53 patients completed the study (30 in the control group and 23 in the treatment group). Data shown in columns for both groups reflect the median, 25th percentile, and 75th percentile values. (Remember that the median represents the value for which half the values are higher and half are lower, so another way of referring to the median is as the 50th percentile.) As you can see, the postoperative morphine requirements and pain scores were similar in both the control and treatment groups, so the study concluded that the acupressure treatment had no significant effect on either measurement. These findings were stated in the research article as:

Morphine requirements (47 mg [27, 58] vs 41 mg [25, 69]) and pain scores (29.5 mm [16, 59] vs 40 mm [22, 58]) were similar in the control and acupressure groups.2

 

As shown in this excerpt, it is common for researchers to report key measurements in the shorthand format of listing the median value, followed by the 25th percentile and 75th percentile values in brackets, for the control group and any treatment groups.

If a value is in the 99th percentile, that means that 99% of the values are lower.

For a value in the 60th percentile, 60% of the values are lower; for a value in the 30th percentile, 30% of the values are lower, and so forth.

Now let’s revisit the excerpt from Sakurai 2003 on the previous page . Note that

Figure 9.3: text.

Figure 9.4: text.

the previously white print is now black bolded because we are going to discuss that now.

*Data are reported as median (25th percentile, 75th percentile). Fifty-three patients (30 controls and 23 minute spheres) completed the study. **Morphine requirements (47 mg [27, 58] vs ***41 mg [25, 69]) and pain scores (29.5 mm [16, 59] vs 40 mm [22, 58]) were similar in the control and acupressure groups. (Sakurai 2003)

Interpreting what is this says is not as difficult as it first looks. Below illustrates how this shorthand gets translated. As you can see, it not really that difficult.

* “Data are reported as median (25th percentile, 75th percentile).”

Figure 9.5: text.

Figure 9.6: text.

**“Morphine requirements (47 mg [27, 58] vs *** 41 mg [25, 69])”

If a value is in the 99th percentile, that means that 99\% of the values are lower. For a value in the 60th percentile, 60\% of the values are lower; for a value in the 30th percentile, 30\% of the values are lower, and so forth.

Now let's revisit the excerpt from Sakurai 2003 on the previous page . Note that the previously white print is now black bolded because we are going to discuss that now.

*Data are reported as median (25th percentile, 75th percentile). Fifty-three patients (30 controls and 23 minute spheres) completed the study. **Morphine requirements (47 mg [27, 58] vs ***41 mg [25, 69]) and pain scores (29.5 mm [16, 59] vs 40 mm [22, 58]) were similar in the control and acupressure groups. (Sakurai 2003)

Interpreting what is this says is not as difficult as it first looks. Below illustrates how this shorthand gets translated. As you can see, it not really that difficult.

* ``Data are reported as  median (25th percentile, 75th percentile).''

**``Morphine requirements (47 mg [27,    58]  vs  *** 41 mg [25, 69])''

%         Treatment Group      Control Group

%

% And here all of the information can be put into a chart to make the results clearer.

%

%        Morphine Requirement    Pain Score

%

%    No. of Patients    25% falls below    MEDIAN

%

%Approx.50%

%

%falls below    75%

%

%falls below    25% falls below    MEDIAN

%

%Approx.

%

%50%

%

%falls

%

%below    75%

%

%falls below

%

%CONTROL

%

%GROUP    30    27 mg    47 mg    58 mg    16 mm    29.5 mm    59 mm

%

%TREATMENT

%

%GROUP

%

%receiving

%

%MINUTE

%

%SPHERE

%

%ACUPRESSURE    23    25 mg    41 mg    69 mg    22 mm    40 mm    58 mm

%

%TOTAL     53                       

%

%Table 9-7

If a value is in the 99th percentile, that means that 99% of the values are lower. For a value in the 60th percentile, 60% of the values are lower; for a value in the 30th percentile, 30% of the values are lower, and so forth.

Now let’s revisit the excerpt from Sakurai 2003 on the previous page . Note that

Figure 9.3: text.

Figure 9.4: text.

the previously white print is now black bolded because we are going to discuss that now.

*Data are reported as median (25th percentile, 75th percentile). Fifty-three patients (30 controls and 23 minute spheres) completed the study. **Morphine requirements (47 mg [27, 58] vs ***41 mg [25, 69]) and pain scores (29.5 mm [16, 59] vs 40 mm [22, 58]) were similar in the control and acupressure groups. (Sakurai 2003)

Interpreting what is this says is not as difficult as it first looks. Below illustrates how this shorthand gets translated. As you can see, it not really that difficult.

* “Data are reported as median (25th percentile, 75th percentile).”

Figure 9.5: text.

Figure 9.6: text.

**“Morphine requirements (47 mg [27, 58] vs *** 41 mg [25, 69])”

If a value is in the 99th percentile, that means that 99\% of the values are lower. For a value in the 60th percentile, 60\% of the values are lower; for a value in the 30th percentile, 30\% of the values are lower, and so forth.

Now let's revisit the excerpt from Sakurai 2003 on the previous page . Note that the previously white print is now black bolded because we are going to discuss that now.

If a value is in the 99th percentile, that means that 99\% of the values are lower. For a value in the 60th percentile, 60\% of the values are lower; for a value in the 30th percentile, 30\% of the values are lower, and so forth.

Now let's revisit the excerpt from Sakurai 2003 on the previous page . Note that the previously white print is now black bolded because we are going to discuss that now.

*Data are reported as median (25th percentile, 75th percentile). Fifty-three patients (30 controls and 23 minute spheres) completed the study. **Morphine requirements (47 mg [27, 58] vs ***41 mg [25, 69]) and pain scores (29.5 mm [16, 59] vs 40 mm [22, 58]) were similar in the control and acupressure groups. (Sakurai 2003)

Interpreting what is this says is not as difficult as it first looks. Below illustrates how this shorthand gets translated. As you can see, it not really that difficult.


Boxplots

A boxplot, (also known as a box-and-whiskers diagram) is a method of graphically summarizing groups of numerical data. The five statistical measures it depicts are: the median, the upper and lower quadrilles, and the minimum and maximum data values. The boxplot can also show any outliers (values that are significantly outside of the main data grouping).

Figure 4-6 is a boxplot that represents the results of Sakurai’s study in regard to patients’ morphine consumption. Each box represents the 25th percentile through the 75th percentile values, and the “whiskers,” or extended lines on either end of the box, represent the highest and lowest observed values. The line through the middle of the box represents the median (50th percentile). Note that the overall positioning of the treatment group box on the graph is similar to that of the control group box; the treatment group median value is lower, but not low enough for the difference in morphine consumption between the groups to have been statistically significant.

Figure 4-7 shows how a boxplot, if tipped onto its side, represents much the same distribution of data as that shown in a bell curve. The values inside the box are roughly equivalent to those contained in the “normal” section of a bell curve. The values represented by the “whiskers” extending from the end of each box (the highest and lowest observed values of the total data range) are roughly equivalent to the tails shown in the bell curve.

Data Range

The range of data about a population includes all the values, from the lowest to the highest. The range of normal values refers to the values from lowest to highest that are considered to be normal. In the following example, researchers used a range of normal values for heart rate and systolic blood pressure to evaluate complementary pain therapies for safety in a population of heart surgery patients (Kshettry et al.).

Complementary therapies (touch, music) are used as successful adjuncts in treatment of pain in chronic conditions. Little is known about their effectiveness in care of heart surgery patients. Our objective is to evaluate feasibility, safety, and impact of a complementary alternative medical therapies package for heart surgery patients…. Decreases in heart rate and systolic blood pressure in the complementary therapies group were judged within the range of normal values…. Complementary medical therapy was not associated with safety concerns and appeared to reduce pain and tension during early recovery from open heart surgery.3

 

While massage is known to lower heart rate and blood pressure in promoting relaxation, the question Kshettry’s team was interested in was whether such reductions were safe for this group of patients or whether it would cause those values to drop too low. Even though the therapies studied caused the heart rate and the systolic blood pressure to decrease, those measures never fell below the lowest normal value, that is, they remained within a normal range. For that reason, the team concluded that there were no safety concerns regarding the use of complementary therapies among this population of heart surgery patients.

Range of means

Notice that two concepts learned separately (mean and range) can be combined to form a more complex measure, the range of means. For any one set of data values, there is only one arithmetic mean. When a range of means is shown, there must be multiple sets of values corresponding to multiple aspects being tracked or measured. In the research literature, the upper and lower bounds of the range are often included in parentheses next to the mean, as shown in the following example:

Facial Grading Scale change scores showed that experimental group (27.5 (20-43.77)) improved significantly more than the control group (16.5 (12.2-24.7)).4

 

In this study Manikandan examined the effects of a treatment called facial neuromuscular re-education, comparing it to conventional therapeutic measures to determine whether one or the other was more effective in treating Bell’s palsy, a type of paralysis that affects one side of the face, involving the facial nerve on that side. In this case, effectiveness referred to improvements in facial symmetry by relieving the paralysis on one side of the face. The research team used Facial Grading Scores to measure patients’ improvement and found that the experimental group experienced a mean improvement on all items on the scale of 27.5, with that mean representing a data set whose values range from 20 at the lowest to 43.77 at the highest. The control group experienced a mean improvement of 16.5, with that mean representing a data set whose values range from 12.2 at the lowest to 24.7 at the highest. Therefore, the study concluded that individualized facial neuromuscular re-education is more effective in improving facial symmetry in patients with Bell’s palsy than conventional therapeutic measures.

Variance and Standard Deviation

Variance is a statistical measure that describes how spread out , or dispersed, the data values are from the mean. Since the mean is a type of average of all the data as a whole, the more data that is like the mean, the more representative the mean is of the entire data set.
 

Example 1 – The mean of the values 2, 3, 3, 3, and 4 is 3 (the total of all the numbers is 15, and 15 divided by 5 = 3).

Example 2 – The mean of the values 1, 2, 3, 4, and 5 is 3 (the total of all the numbers is 15, and 15 divided by 5 = 3).

 

The mean in Example 1 (3) is more representative of the overall data than is the mean in Example 2 (also 3). This is because the values in the second data set are farther from the mean than are those in the first set, which has no ones or fives.

The variance measure is used less frequently than the standard deviation (SD), which is represented by the mathematical symbol σ (pronounced SIG-ma). The SD is calculated using a fairly complex formula, and as a reader of research literature, you don’t need to know how to do the calculation. It is important to understand that the principle behind the SD is exactly the same as variance – how far away from the mean the actual data for a population is distributed. The smaller the standard deviation, the closer many data points are to the mean. In this case, the data would be described as having a minimal amount of dispersion. Therefore, a larger SD means that more data points are farther away from the mean – more widely dispersed – and because of those extreme values, the mean is not as representative of that set of data.

The SD can be “plus or minus,” meaning that the data can be dispersed on either side of the mean, higher or lower. In a population with a normal distribution (bell curve), the data is dispersed around the mean, as shown in Figure 4-8. In a normal distribution, 68% of the values for that population are within 1 positive SD or 1 negative SD of the mean for that population. Similarly, 95% of the population is within 2 SD in either direction of the mean, and 99.7% is within 3 SD in either direction of the mean.

Data is reported in the format of mean ± (plus or minus)SD. For example, a measurement shown in inches as 4 ± 1.5 indicates a mean of 4 inches, with a standard deviation of 1.5 inches.

How the mean and the SD are used in the reporting of research results can be seen in the following example from a project that studied the effects of coconut and mineral oil in infant massage (Sankaranarayanan).5

The infants massaged with coconut oil showed an average weight gain in grams at 31 days of 2396.77 ± 208.94.

 

This tells the reader:

The average weight of the infants in that group is 2396.77 grams.

Of those infants’ weights, 68% fall in a range from 2187.83 grams (2396.77 grams – 208.94 grams, or 1 SD less than the mean) to 2605.71 grams (2396.77 grams + 208.94 grams, or 1 SD more than the mean).

Power and sample size

One limitation often found in massage research methods relates to study size—you’ll find statements in the literature like, “Most studies contain methodological limitations including … few subjects …”,1

 

or

“These conclusions are limited by the small sample size of the included [research studies].”2

 

Clearly, when it comes to results, something methodologically important is going on with small studies. Additionally, you may have heard people say a massage research study needs about 35 or 40 people, more or less, to have a large enough sample size—what’s up with that? What’s so special about that number?

Like the indicator of statistical significance p discussed in the last issue, the power of a test is a probability. In this case, it is the probability that the test will not make a Type II error (false negative) by missing a treatment effect that is really there. When p < 0.05, for example, it represents a less than 5% chance, or 1 time out of every 20 that you rerun the study, that you would make a Type I error (false positive), or think that you were observing a real effect, when it was really due to chance.

While there is no universal measure of power, you’ll often see 0.80 as a target that researchers aim for—it means that they expect that 80% of the time, or 4 times out of 5, if there is a treatment effect in the study, they will detect it. (Remember, for both p and power, when it is represented as a decimal number, multiply that number by 100 to get the percentage it represents.) The risks of false negative and false positive errors can never be totally eliminated, but judicious use of statistical significance and of power allow both of those risks to be managed, resulting in a certain degree of confidence in the validity of the study results.

The ideas of statistical power, sample size, and the null hypothesis are tightly linked to each other, and to considerations presented in the Methods section. For reasons we’ll get deeper into in a later discussion, researchers look at the evidence to see whether it calls for rejecting the null hypothesis and supporting their own hypothesis. For example, if a researcher hypothesizes, like Jönhagen’s (“Sports Massage After Eccentric Exercise”) team did, that “Sports massage can improve the recovery after eccentric exercise,”3 then the null hypothesis would be something like “Sports massage has no effect on recovery after eccentric exercise.” All of these concepts come back, ultimately, to whether to accept or reject the null hypothesis.

As it happens, Jönhagen’s team did end up accepting the null hypothesis and rejecting his research hypothesis, because they found that the massage had no effect on their measurements of quadriceps pain, strength, or function after the exercise. We’ll get back to the larger implications of those findings toward the end of this chapter, but here, we’ll just talk about the null hypothesis. A goal of a research study is to try to correctly determine whether or not to accept or reject the null hypothesis—neither to accept it mistakenly (false negative) nor to reject it mistakenly (false positive).

To see how that works in practice, we’ll switch from sports massage to cardiac surgery for a moment, since a particular research article demonstrates clearly how the researchers calculated a power analysis for their study.

Hattan’s (“The Impact of Foot Massage and Guided Relaxation Following Cardiac Surgery: A Randomized Controlled Trial”) research team investigated whether foot massage and guided relaxation promoted calmness (among other measures) in cardiac surgery patients. Their description of how they determined the ideal sample size for their study points at the multiple factors involved: “A post hoc [carried out after the study] power analysis test suggested that a sample size of 45 would be required to detect a difference of the size observed with an acceptable level of Type II error [false negative] (power = 0.8).”4 From this statement, we can see that statistical power has to do with detecting an effect, with the size of a sample, and with how much risk of error we’re willing to tolerate. In the literature, you’ll often see it written in a much shorter way, but Hattan’s description shows details of what is involved in a power analysis—sample size, effect size, and acceptable tolerance of error.

One way to think of it is, how large a study population do you need to make sure you see an effect that is there—that you don’t make a false negative error by missing something? If it’s a large effect, you probably don’t need as many people to see it as you do if it’s a small effect—in other words, if it’s something that could be easily missed, you improve your chances of seeing it by looking for it in more people. But if it’s a major effect, it will probably show up more dramatically, and you can see it in fewer people. For that reason, increasing sample size is a very common way of increasing the power of a test.

So where did that often-mentioned number 35–40 for massage studies referred to earlier come from? It’s an estimate that probably came out of one particular study as having sufficient power in that context, and was then accidentally generalized into a more universal number that is sometimes quoted as applying to many massage research studies. But since a sufficiently large sample size depends on the size of the effect being looked for, and how much risk of false negative error the researchers are willing to accept, it really depends on the question being researched. When researchers design a study, they put a lot of time and effort into the question of how many participants to include, and they consult statisticians to determine that number, because they know that funding agencies and peer review will (or, at least, they should) examine it carefully to determine whether they’ve gotten it right for their purposes.

There’s no “one size fits all” number that massage research studies should have to ensure sufficient power. Instead of trying to come up with such a number for all studies, a better strategy is to follow the researcher’s logic, as explained in the article, for why that particular number was right—ensured sufficient power—for that study on its own terms. If the researchers’ explanation of how the sample size was chosen makes sense, it’s probably worth trusting for purposes of evaluating that article. If it doesn’t make sense, or if it is not explained at all, it may indicate a problem for interpreting the study’s results.

The statistics we have covered up until now are useful, but in order to get a clearer picture of what all the data looks like, there are more refined tools we can use to understand the relationships among the values. Standard deviation (SD) is one of those tools

Average

The average is an attempt to describe qualities of a group by combining qualities of individual members of the group., The mean, median, and mode describe different ways of averaging, which tell something about the distribution of those individual qualities or values.

Mean

Although you may not have heard it referred to by that name, you’re already familiar with the concept of mean: it is the kind of average commonly seen in school grading. To get the mean, you add all the results together, and then divide by number of results.

 

Example from the literature:

Barlow 2004 investigated whether a single massage would alter the flexibility of the hamstring in physically-active young men, as measured by the value on the sit-and-reach test He included his data in Table 1, so we can calculate the mean of all the sit and reach scores for the subjects (1) before and (2) after the massage by adding all the values in the appropriate column, and then dividing by 11 (the number of subjects in the study):

 

 

 

The disadvantage of the mean is that it can’t tell you about extreme values in the data, or how any individual compares to the group, except in the most crudely approximate way. In order to examine this limitation further, let’s set up our own table including the mean score (shown in the callouts in table 1 on the previous page). In the last column, observe the difference in score for each subject from the mean score.

Mode

 

 

 

Figure 9.7: text.

The statistics we have covered up until now are useful, but in order to get a clearer picture of what all the data looks like, there are more refined tools we can use to understand the relationships among the values. Standard deviation (SD) is one of those tools.

Preparing to discuss standard deviation

This (standard deviation) is probably the hardest concept we are going to cover. But it is worth it, because of the value of the concept and its applicability to so many different situations. So let’s break this up into small pieces to tackle one piece at a time, and see how we can use it, not only in reading massage research, but in many other situations as well.

I remember when a sad event in my childhood brought home to me the concept of a population, although I certainly didn’t think about it that way at the time.

Figure 9.8: text.

When I was in fifth grade, a child at my school died. Although I didn’t know the child personally, I was sad to hear the news, as was everyone else there. Then I started putting it together with what had happened the year before, when another child had died. I figured out that there must be some kind of rule that every year one child dies at our school, and that next year it could be me. That particular thought was scary enough to keep me up awake for a couple of nights.

Although I was kind of on the right track in certain ways, there were some flaws in my analysis; however, as I was 10 years old at the time, I think I can be forgiven for a certain lack of mathematical rigor. The observation that there was a pattern—the death of one child per year—was a reasonable observation for that very short time span, although if I had been paying attention longer, it is possible that there would have been many other years where no child at that school died.

But from that observation of a pattern, I went a little too far in imaging a “rule” that one child died every year—it would be better to think of it as a description of what did happen, rather than as a prescription for what must happen. If you think of it in that way, you can see one function that statistics serves—descriptive statistics summarizes the data about a population or a study, and describes in what way they are similar (central tendency) or different (variability). It takes a very diverse group, and tries to convey concisely and efficiently to the audience what the important measures of that group are. The statistical measures we have gone over up until now—mean, median, mode, and percentile—are descriptive statistics.


Figure 9.9: text.

1I was, and still am, quite happy to have been proved wrong on that prediction.

Inferential statistics takes things a step farther—it lets us use reasoning to infer, or make predictions, about the group, based on what we already know. It’s what I was dimly sensing when I realized that another could die at my school the next year1, and so came up with my “rule”. The statistics we are going to talk about now are inferential statistics, and understanding the concepts of normal distribution, standard deviation, types of error, sample size and power, and inter-observer agreement will make a great deal—even most—of the massage research literature accessible to you.

Finally, one more thing about my example, and then we’ll let it go—remember in Chapter 3 when we talked about how science is about what’s common to everyone, while spirituality can be about what is unique and special? I’ve gotten the sense from some of my students, and have felt it myself, that there is something vaguely disturbing about talking about such sad events as a child’s death in terms of a population event, and I suspect that some of the aversion I’ve heard people express to science has something to do with the sense that science somehow sucks out what is special about being human. I would respectfully suggest that the two are not mutually exclusive—it is possible to operate in the two different modes at different times, as appropriate, and in that way to get the best of both—the rigor AND the compassion, as we talked about earlier. AND THAT ....

9.6 Standard deviation

Standard deviation has a lot in common with the averages we discussed earlier, and we will talk about how we can use it as a kind of descriptive statistic. To understand standard deviation, however, we first have to all be on the same page about what normal distribution means, so we’re going to talk about that first, and then come back to standard deviation.

9.6.1 Normal distribution

We talked earlier in Chapter 2 about how “normal” is one of those words that has a specific, neutral meaning in science, yet has very strong connotations in everyday language. It’s unfortunate that this word is so heavily loaded, as it is one of the most useful and powerful statistical concepts there is, and serves as a gateway to the world of inferential statistics. The word has been used as a weapon to enforce social and medical agendas—after I have taught a session on massage research and fibromyalgia, I’ve had people some up to me afterwards and tell me how painful it is to be told they are not “normal”, where “normal” is a prescriptive word for how they should be. Let’s be very clear that this is not how we’re using the word. Our specific statistical use of the word is defined below.

First of all, think about a situation you’ve been in with a lot of other people—a lot of the time, a few people are extreme in some value one way or the other, but most people are pretty close to average. We’ve all been born, so let’s consider the weight at birth in all healthy babies born in the US as our example situation.

A few very big babies: 8 (1/2) to 9 pounds, A few very small babies: 6 to 6 (1/2) pounds Most babies somewhere around 7 or 8 pounds or so, more or less: called normal birthweight because it forms a normal distribution. This is what that normal distribution looks like. The curved line is called a bell curve—a pretty descriptive name, because it is indeed shaped like a bell [bell character].

Figure 9.10: text.

 

Figure 9-2: Bell curve showing normal distribution of birthweights

While all bell curves have the same basic features of a small “tail” at either end (representing a few extreme values) and a large “bump” in the middle (representing a lot of typical values), there can still be some dramatic differences in how the data the bell curve represents is arranged. The following are both bell curves, but look how different they are from each other:

 

Figure 9-3: Two different bell curves

Figure 9.11: text.

The graph on the left is tall and narrow and drops off sharply.

The graph on the right is shorter and drops off much more gently.

These differences are useful, because they tell something about the data being studied—namely, about how different the extreme values are from the more typical values for that population. The standard deviation, which is coming up, will explore that distinction in more detail. So now that we are familiar with normal distributions and bell curves, let’s return to standard deviation, and see how that helps us with reading the massage research literature.

Back to standard deviation

We discussed earlier that the mean can sometimes be a useful way to summarize and describe the data. But the mean can be so different from that data that it does not give an accurate description of that data because the data under study is extremely high or extremely low. To put it another way, according to the “Bill Gates Net Worth” web page2 just now, at this moment, Bill Gates’ net worth is $27,600,000,000 (give or take). So, if I told you that on average, Bill Gatesand I each have a net worth of $13,800,000,000—did you really learn anything relevant and useful about me3? Or did you just get a graphic demonstration of how badly the mean fails when it has to deal with extreme values? Clearly, we need a better tool for describing populations that—like our big, small, and average-sized babies—exhibit a great deal of variation, and the standard deviation (SD) is one of those tools we can use. We won’t bother with the mathematics behind the SD here, because for our purposes, I just want you to be able to recognize it when you come across it in the literature, and to understand what it means.

2 Yes, there really are some people with that much spare time on their hands. You can find it at: http://bgnw.marcus5.net/bgnw.html if you like.

3 If only! :)

Sometimes you’ll see the SD called the mean of the mean [ref]—that refers to the way it is computed mathematically, and also to the way it describes data more accurately than just the mean alone does. Assuming a normal distribution of data (our bell curve), the standard deviation describes where in the bell curve the data lies. And so the normal distribution and standard deviation can deal with extreme data as well as more representative data. Further, a large standard deviation can indicate to the reader that there is something wrong with the data, or with the model, or with both.

First of all, let’s put some further meaning on our bell curve. Below we have a bell curve where the different sections are shaded.

Figure 9.12: text.

Figure 9-4: A bell curve with standard deviations

• The solid gray area, referred to as 1 standard deviation from the mean, represents the largest number of data values. Values that fall in this area of the graph are considered the most normal. We expect 68% of our values to lie somewhere within this range.

• The striped area, referred to as 2 standard deviations from the mean, represents a larger number of the values. We expect 98% of our values to lie within this range (notice that to get from one striped range to the other, we have to go through the gray ranges, so we include that previous 68% in our estimate of 98%).

• The data in the black area, referred to as 3 standard deviations from the mean, represents a small percentage of the values. We expect about 99.7% of our values to lie within this range (notice that to get from one black range to the other, we have to go through the striped ranges and the gray ranges, so we include that 98% in our estimate of about 99.7%). So now you can begin to see how this addresses the problem with the mean and the extreme values that we’ve encountered—if we know the mean, and we know how far away (how spread-out) from the mean a particular value of data is, then we have a much more powerful tool for accurately and clearly representing the data than the mean alone is able to provide4.

This is useful because it tells us how “spread-out” the population is. The larger the SD, the more chance you should be somewhat skeptical of the study. Remember our previous two bell curves?

Figure 9-5: Two different bell curves with varying standard deviations

A false positive error (also called a type I error), for our purposes, exists when it looks like the treatment, such as massage, caused an effect when it really didn’t. In other word, its positive result was false. Here is a hypothetical research experiment to see the effect of massage on blood pressure to illustrate how this

4So you get a much more accurate description of where I really am if I tell you that on average, Bill Gates and I each have a net worth of $13,800,000,000.00, and that I am more than 3 SD away from that mean. Let’s just leave it at that for now. :)

Figure 9.13: text.

can happen.

Figure 9.14: text.

Figure 9-6

In this experiment, the researchers concluded that massage does indeed lower blood pressure. But suppose the researchers made a change in the experimental design and instead of having the control subjects sit in a chair for one hour, they lay down on the massage table for an hour which caused a different result for the control group.

Figure 9.15: text.

Figure 9-7

With this experimental design lying down on the table for one hour, without being massaged, also lowered blood pressure. The conclusion from this experimental design would be that lying on the table, and not the massage itself, lowers blood pressure.

Note that this was not a real experiment and that this may or may not be true. Also note that this is an example of an experiment that any massage therapist can carry out.

A false negative error (also called a type II error) exists when it looks like the treatment, such as massage, had no effect, but it really did.

Here is another hypothetical experiment that demonstrates a false negative error. Suppose the 1 hour massage and also just lying on the table for an hour both resulted in no change in blood pressure.

Figure 9.16: text.

Figure 9-8

However, in this hypothetical experiment, the researcher hypothetically did not pay attention to a couple of important factors. One was that the massage therapist being used to perform the massages was only available on Monday. And on Monday, there were workmen using jackhammers just outside the window. Also, suppose it is summer and the windows are open. However, when the subjects in the control group came participate in the study by just lying on the table without getting a massage, it was later in the week and the workmen were gone.

Figure 9-9

If the researcher is unaware of the jackhammer annoyance factor, he will conclude that massage does not lower blood pressure. However it is possible that the jackhammer noise was having a blood pressure elevation effect which masked the the blood pressure lowering effect of the massage.

Here is a diagram that illustrates this.

Figure 9.17: text.

Figure 9.18: text.

Figure 9-9

p-value

My biostatistics professor would have an aneurysm (sorry, Dr. L.!) if he saw how we are going to treat the concept of p-value. And in the bigger picture, he would be right—it is a misunderstood and misused statistical measure, and deserves a fuller and richer treatment by experimenters and statisticians.

On the other hand, the purpose of this book is to give you enough information to read massage research, not to turn you into a specialist in any given area in experimental design. So our strategy will be to understand p-value enough to use it the way most clinicians do to read research articles. and we will note that in itself, that does not fully do justice to the concept.

Sampling

Power and sample size

Confidence interval and confidence level

In the political season (and at other times, too), we often see poll results reported as a certain set of results, plus or minus a particular margin of error.

Although it’s clear from the context that it means a little uncertainty in the exact results, now that we have discussed the normal distribution, we can understand it in a little more depth.

If the poll accurately reflects the population at large, and if we repeated the poll multiple times, we would expect the results to be about the same, with only a little bit of variation. The amount that it can vary—positive or negative, since it can vary either way—is the margin of error. So if Candidate A is preferred by 68% of the population, and Candidate B by 32%, with a margin of error of +/-5%, that means that either candidate’s number could be as much as 5% too high or too low in this poll. So in reality, Candidate A may have anywhere from 63% to 73%, and Candidate B may have anywhere from 27% to 37%. That positive and negative variation around the reported percentage is the margin of error, which leads us into the concept of a confidence interval. GIGO. Statistical dead heat.

You can think of the confidence interval as a band or a range around the reported value—the “true” number lies somewhere within that band. The confidence level, by contrast, reports how confident we are that the true result lies within that band.

examples

 

Break

This chapter and the previous one were really dense in terms of the material we covered. But the hardest part of learning about reading research is now over, and if you have stuck with it this far, I promise you that you will find the rest of the book to be smooth sailing in comparison, building readily on what you have already learned.

Since you’ve worked so hard on the methods and statistic parts, here’s another nice bear picture for you to look at, while we take a well-earned break.

\subsection{Average}

 

The average is an attempt to describe qualities of a group by combining qualities of individual members of the group., The mean, median, and mode describe different ways of averaging, which tell something about the distribution of those individual qualities or values.

 

\subsubsection{Mean}

 

Although you may not have heard it referred to by that name, you're already familiar with the concept of mean: it is the kind of average commonly seen in school grading. To get the mean, you add all the results together, and then divide by number of results.

 

\begin{figure} [ht]

\begin{center}

\epsfxsize 3 in

\epsfbox{9-1.eps}

\end{center}

\caption{\label{9-1}Mean (average) value of 5 final exam grades.}

\end{figure}

 

% Student Name    Final Exam Grade

%

% 1. John    85

%

% 2. Janet    90

%

% 3. Carmen    75

%

% 4. Michael    65

%

% 5. Sally    70

%

% Total    385

%

% Mean    385÷5 = 77

%

% Figure 9-1: Mean (average) value of 5 final exam grades

 

\subsubsection{Example from the literature:}

 

\index{Author---Barlow}

 

Barlow 2004 investigated whether a single massage would alter the flexibility of the hamstring in physically-active young men, as measured by the value on the sit-and-reach test He included his data in Table 1, so we can calculate the mean of all the sit and reach scores for the subjects (1) before and (2) after the massage by adding all the values in the appropriate column, and then dividing by 11 (the number of subjects in the study):

 

\begin{figure} [ht]

\begin{center}

\epsfxsize 3 in

\epsfbox{9-2.eps}

\end{center}

\caption{\label{9-2}Sit-and-reach scores in centimeters (Barlow 2004).}

\end{figure}

 

The disadvantage of the mean is that it can't tell you about extreme values in the data, or how any individual compares to the group, except in the most crudely approximate way. In order to examine this limitation further, let's set up our own table including the mean score (shown in the callouts in table 1 on the previous page). In the last column, observe the difference in score for each subject from the mean score.

The statistics we have covered up until now are useful, but in order to get a clearer picture of what all the data looks like, there are more refined tools we can use to understand the relationships among the values. Standard deviation (SD) is one of those tools

 

\section{Preparing to discuss standard deviation}

 

This (standard deviation) is probably the hardest concept we are going to cover. But it is worth it, because of the value of the concept and its applicability to so many different situations. So let's break this up into small pieces to tackle one piece at a time, and see how we can use it, not only in reading massage research, but in many other situations as well.

 

I remember when a sad event in my childhood brought home to me the concept of a population, although I certainly didn't think about it that way at the time. When I was in fifth grade, a child at my school died. Although I didn't know the child personally, I was sad to hear the news, as was everyone else there. Then I started putting it together with what had happened the year before, when another child had died. I figured out that there must be some kind of rule that every year one child dies at our school, and that next year it could be me. That particular thought was scary enough to keep me up awake for a couple of nights.

 

Although I was kind of on the right track in certain ways, there were some flaws in my analysis; however, as I was 10 years old at the time, I think I can be forgiven for a certain lack of mathematical rigor. The observation that there was a pattern---the death of one child per year---was a reasonable observation for that very short time span, although if I had been paying attention longer, it is possible that there would have been many other years where no child at that school died.

 

But from that observation of a pattern, I went a little too far in imaging a ``rule'' that one child died every year---it would be better to think of it as a \textbf{description} of what \emph{did} happen, rather than as a \textbf{prescription} for what \emph{must} happen. If you think of it in that way, you can see one function that statistics serves---\emph{descriptive statistics} summarizes the data about a population or a study, and \emph{describes} in what way they are similar (central tendency) or different (variability). It takes a very diverse group, and tries to convey concisely and efficiently to the audience what the important measures of that group are. The statistical measures we have gone over up until now---mean, median, mode, and percentile---are descriptive statistics.

 

\emph{Inferential statistics} takes things a step farther---it lets us use reasoning to \emph{infer}, or make predictions, about the group, based on what we already know. It's what I was dimly sensing when I realized that another could die at my school the next year\footnote{I was, and still am, quite happy to have been proved wrong on that prediction.}, and so came up with my ``rule''. The statistics we are going to talk about now are inferential statistics, and understanding the concepts of normal distribution, standard deviation, types of error, sample size and power, and inter-observer agreement will make a great deal---even most---of the massage research literature accessible to you.

 

Finally, one more thing about my example, and then we'll let it go---remember in Chapter 3 when we talked about how science is about what's common to everyone, while spirituality can be about what is unique and special? I've gotten the sense from some of my students, and have felt it myself, that there is something vaguely disturbing about talking about such sad events as a child's death in terms of a population event, and I suspect that some of the aversion I've heard people express to science has something to do with the sense that science somehow sucks out what is special about being human. I would respectfully suggest that the two are not mutually exclusive---it is possible to operate in the two different modes at different times, as appropriate, and in that way to get the best of both---the rigor AND the compassion, as we talked about earlier. AND THAT ....

 

\section{Standard deviation}

 

Standard deviation has a lot in common with the averages we discussed earlier, and we will talk about how we can use it as a kind of descriptive statistic. To understand standard deviation, however, we first have to all be on the same page about what \emph{normal distribution} means, so we're going to talk about that first, and then come back to standard deviation.

 

\subsection{Normal distribution}

 

We talked earlier in Chapter 2 about how ``normal'' is one of those words that has a specific, neutral meaning in science, yet has very strong connotations in everyday language. It's unfortunate that this word is so heavily loaded, as it is one of the most useful and powerful statistical concepts there is, and serves as a gateway to the world of inferential statistics. The word has been used as a weapon to enforce social and medical agendas---after I have taught a session on massage research and fibromyalgia, I've had people some up to me afterwards and tell me how painful it is to be told they are not ``normal'', where ``normal'' is a \emph{prescriptive} word for how they \emph{should} be. Let's be very clear that this is \textbf{not} how we're using the word. Our specific statistical use of the word is defined below.

 

First of all, think about a situation you've been in with a lot of other people---a lot of the time, a few people are extreme in some value one way or the other, but most people are pretty close to average. We've all been born, so let's consider the weight at birth in all healthy babies born in the US as our example situation. A few very big babies: 8½ (1/2) to 9 pounds, A few very small babies: 6 to 6½ (1/2) pounds Most babies somewhere around 7 or 8 pounds or so, more or less: called \emph{normal} birthweight because it forms a \emph{normal distribution}.

 

This is what that normal distribution looks like. The curved line is called a bell curve---a pretty descriptive name, because it is indeed shaped like a bell [bell character].

 

\begin{figure} [ht]

\begin{center}

\epsfxsize 3 in

\epsfbox{9-10.eps}

\end{center}

\caption{\label{9-10}text.}

\end{figure}

 

Figure 9-2: Bell curve showing normal distribution of birthweights

 

While all bell curves have the same basic features of a small ``tail'' at either end (representing a few extreme values) and a large ``bump'' in the middle (representing a lot of typical values), there can still be some dramatic differences in how the data the bell curve represents is arranged. The following are both \emph{\textbf{bell curves}}, but look how different they are from each other:

 

\begin{figure} [ht]

\begin{center}

\epsfxsize 3 in

\epsfbox{9-11.eps}

\end{center}

\caption{\label{9-11}text.}

\end{figure}

 

Figure 9-3: Two different bell curves

 

\begin{itemize}

 

\item The graph on the left is tall and narrow and drops off sharply.

 

\item The graph on the right is shorter and drops off much more gently.

 

\end{itemize}

 

These differences are useful, because they tell something about the data being studied---namely, about how different the extreme values are from the more typical values for that population. The standard deviation, which is coming up, will explore that distinction in more detail. So now that we are familiar with \emph{normal distributions} and \emph{bell curves}, let's return to \emph{standard deviation}, and see how that helps us with reading the massage research literature.

 

\section{Back to standard deviation}

 

We discussed earlier that the mean can sometimes be a useful way to summarize and describe the data. But the mean can be so different from that data that it does not give an accurate description of that data because the data under study is extremely high or extremely low. To put it another way, according to the ``Bill Gates Net Worth'' web page\footnote{Yes, there really are some people with that much spare time on their hands. You can find it at: http://bgnw.marcus5.net/bgnw.html if you like.} just now, at this moment, Bill Gates' net worth is \$27,600,000,000 (give or take). So, if I told you that \emph{on average}, Bill Gates and I each have a net worth of \$13,800,000,000---did you \emph{really} learn anything relevant and useful about me\footnote{If only! :)}? Or did you just get a graphic demonstration of how badly the mean fails when it has to deal with extreme values?

 

Clearly, we need a better tool for describing populations that---like our big, small, and average-sized babies---exhibit a great deal of variation, and the \emph{standard deviation (SD)} is one of those tools we can use. We won't bother with the mathematics behind the SD here, because for our purposes, I just want you to be able to recognize it when you come across it in the literature, and to understand what it means.

 

Sometimes you'll see the SD called the \emph{mean of the mean} [ref]---that refers to the way it is computed mathematically, and also to the way it describes data more accurately than just the mean alone does. Assuming a \emph{normal distribution} of data (our \emph{bell curve}), the standard deviation describes where in the bell curve the data lies. And so the normal distribution and standard deviation can deal with extreme data as well as more representative data. Further, a large standard deviation can indicate to the reader that there is something wrong with the data, or with the model, or with both.

 

First of all, let's put some further meaning on our bell curve. Below we have a bell curve where the different sections are shaded.

 

\begin{figure} [ht]

\begin{center}

\epsfxsize 3 in

\epsfbox{9-12.eps}

\end{center}

\caption{\label{9-12}text.}

\end{figure}

 

Figure 9-4: A bell curve with standard deviations

 

\begin{itemize}

 

\item The solid gray area, referred to as \emph{\textbf{1 standard deviation from the mean}}, represents the largest number of data values. Values that fall in this area of the graph are considered the most normal. We expect 68\% of our values to lie somewhere within this range.

 

\item The striped area, referred to as \emph{\textbf{2 standard deviations from the mean}}, represents a larger number of the values. We expect 98\% of our values to lie within this range (notice that to get from one striped range to the other, we have to go through the gray ranges, so we include that previous 68\% in our estimate of 98\%).

 

\item The data in the black area, referred to as \emph{\textbf{3 standard deviations from the mean}}, represents a small percentage of the values. We expect about 99.7\% of our values to lie within this range (notice that to get from one black range to the other, we have to go through the striped ranges and the gray ranges, so we include that 98\% in our estimate of about 99.7\%).

 

\end{itemize}

 

So now you can begin to see how this addresses the problem with the mean and the extreme values that we've encountered---if we know the mean, \emph{and we know how far away (how spread-out) from the mean a particular value of data is}, then we have a much more powerful tool for accurately and clearly representing the data than the mean alone is able to provide\footnote{So you get a much more accurate description of where I really am if I tell you that \emph{on average}, Bill Gates and I each have a net worth of \$13,800,000,000.00, \emph{and} that I am more than 3 SD away from that mean. Let's just leave it at that for now. :)}.

 

This is useful because it tells us how ``spread-out'' the population is. The larger the SD, the more chance you should be somewhat skeptical of the study. Remember our previous two bell curves?

 

\begin{figure} [ht]

\begin{center}

\epsfxsize 3 in

\epsfbox{9-13.eps}

\end{center}

\caption{\label{9-13}text.}

\end{figure}

 

Figure 9-5: Two different bell curves with varying standard deviations

 

A \emph{false positive error} (also called a \emph{type I error}), for our purposes, exists when it looks like the treatment, such as massage, caused an effect when it really didn't. In other word, its positive result was false. Here is a hypothetical research experiment to see the effect of massage on blood pressure to illustrate how this can happen.

 

\begin{figure} [ht]

\begin{center}

\epsfxsize 3 in

\epsfbox{9-14.eps}

\end{center}

\caption{\label{9-14}text.}

\end{figure}

 

Figure 9-6

 

In this experiment, the researchers concluded that massage does indeed lower blood pressure. But suppose the researchers made a change in the experimental design and instead of having the control subjects sit in a chair for one hour, they lay down on the massage table for an hour which caused a different result for the control group.

 

\begin{figure} [ht]

\begin{center}

\epsfxsize 3 in

\epsfbox{9-15.eps}

\end{center}

\caption{\label{9-15}text.}

\end{figure}

 

Figure 9-7

 

With this experimental design lying down on the table for one hour, without being massaged, also lowered blood pressure. The conclusion from this experimental design would be that lying on the table, and not the massage itself, lowers blood pressure.

 

\textbf{Note that this was not a real experiment and that this may or may not be true}. Also note that this is an example of an experiment that any massage therapist can carry out.

 

A \emph{false negative error} (also called a \emph{type II error}) exists when it looks like the treatment, such as massage, had no effect, but it really did.

 

Here is another hypothetical experiment that demonstrates a false negative error. Suppose the 1 hour massage and also just lying on the table for an hour both resulted in no change in blood pressure.

 

\begin{figure} [ht]

\begin{center}

\epsfxsize 3 in

\epsfbox{9-16.eps}

\end{center}

\caption{\label{9-16}text.}

\end{figure}

 

Figure 9-8

 

However, in this hypothetical experiment, the researcher hypothetically did not pay attention to a couple of important factors. One was that the massage therapist being used to perform the massages was only available on Monday. And on Monday, there were workmen using jackhammers just outside the window. Also, suppose it is summer and the windows are open. However, when the subjects in the control group came participate in the study by just lying on the table without getting a massage, it was later in the week and the workmen were gone.

 

\begin{figure} [ht]

\begin{center}

\epsfxsize 3 in

\epsfbox{9-17.eps}

\end{center}

\caption{\label{9-17}text.}

\end{figure}

 

Figure 9-9

 

If the researcher is unaware of the jackhammer annoyance factor, he will conclude that massage does not lower blood pressure. However it is possible that the jackhammer noise was having a blood pressure elevation effect which masked the the blood pressure lowering effect of the massage.

 

Here is a diagram that illustrates this.

 

\begin{figure} [ht]

\begin{center}

\epsfxsize 3 in

\epsfbox{9-18.eps}

\end{center}

\caption{\label{9-18}text.}

\end{figure}

 

Figure 9-9

 

Alpha ($\alpha$)

 

The statistical measure $\alpha$ is the probability of making a false positive error. In most of the research literature which you see $\alpha$, the researcher will tend to set $\alpha$ at about 0.05. That 0.05 means a 5\% risk making a false positive error. In the example below, that is what Hopper sets his $\alpha$ at, and he concludes that, to a 5\% or less probability of seeing an effect that is not really present, that his intervention (dynamic soft-tissue mobilization) significantly increased hamstring flexibility in the healthy male subjects he studied.

 

Example:

 

\begin{figure} [ht]

\begin{center}

\epsfxsize 3 in

\epsfbox{9-19.eps}

\end{center}

\caption{\label{9-19}text.}

\end{figure}

 

\begin{quotation}

 

OBJECTIVES: The purpose of this study was to investigate the effect of dynamic soft tissue mobilisation (STM) on hamstring flexibility in healthy male subjects...\textbf{The alpha level was set at 0.05.} RESULTS: Increase in hamstring flexibility was significantly greater in the dynamic STM group than either the control or classic STM groups with mean (standard deviation) increase in degrees in the HFA measures of 4.7 (4.8), -0.04 (4.8), and 1.3 (3.8), respectively. CONCLUSIONS: Dynamic soft tissue mobilisation (STM) significantly increased hamstring flexibility in healthy male subjects. (Hopper 2005) \index{Author---Hopper}

 

\end{quotation}

 

% Objectives    Investigate the effect of dynamic soft tissue (STM) flexibility in healthy male subjects

%

% Alpha level    Set to 0.05 (5%)

%

% Results    Group receiving dynamic STM group was significantly greater than the control group (no STM) and the classic STM group.

%

% Conclusion    Dynamic soft tissue mobilisation (STM) significantly increased hamstring flexibility in healthy male subjects.

 

(compare this with our other hamstring study, too). (what other hamstring study?)

 

\section{\emph{p}-value}

 

My biostatistics professor would have an aneurysm (sorry, Dr. L.!) if he saw how we are going to treat the concept of \emph{p}-value. And in the bigger picture, he would be right---it is a misunderstood and misused statistical measure, and deserves a fuller and richer treatment by experimenters and statisticians.

 

On the other hand, the purpose of this book is to give you enough information to read massage research, not to turn you into a specialist in any given area in experimental design. So our strategy will be to understand \emph{p}-value enough to use it the way most clinicians do to read research articles. and we will note that in itself, that does not fully do justice to the concept.

 

\section{Sampling}

 

\subsection{Power and sample size}

 

Confidence interval and confidence level

 

In the political season (and at other times, too), we often see poll results reported as a certain set of results, \textbf{plus or minus a particular margin of error}. Although it's clear from the context that it means a little uncertainty in the exact results, now that we have discussed the normal distribution, we can understand it in a little more depth.

 

If the poll accurately reflects the population at large, and if we repeated the poll multiple times, we would expect the results to be about the same, with only a little bit of variation. The amount that it can vary---positive or negative, since it can vary either way---is the margin of error. So if Candidate A is preferred by 68\% of the population, and Candidate B by 32\%, with a margin of error of +/- 5\%, that means that either candidate's number could be as much as 5\% too high or too low in this poll. So in reality, Candidate A may have anywhere from 63\% to 73\%, and Candidate B may have anywhere from 27\% to 37\%. That positive and negative variation around the reported percentage is the margin of error, which leads us into the concept of a \emph{confidence interval}. GIGO. Statistical dead heat.

 

You can think of the confidence interval as a band or a range around the reported value---the ``true'' number lies somewhere within that band. The confidence level, by contrast, reports how confident we are that the true result lies within that band.

 

examples

 

 

 

This chapter and the previous one were really dense in terms of the material we covered. But the hardest part of learning about reading research is now over, and if you have stuck with it this far, I promise you that you will find the rest of the book to be smooth sailing in comparison, building readily on what you have already learned. Since you've worked so hard on the methods and statistic parts, here's another nice bear picture for you to look at, while we take a well-earned break.

 

 

 

 

 

 


 

 

 

 

 

 

 

 

 

 

 

<cn> Chapter 4

<ct> Just Enough Statistics

<cout>

<h1>Learning Objectives

<h1> Key Terms

<h2>Boxplots

<h2>Data Range

<h3>Range of Means

<h2>Variance and Standard Deviation

Figure 4-1. The Ptolemaic model. Ptolemy’s observations of the sky led him to conclude that  the planets and the sun rotated around the Earth.

Figure 4-2. Epicycles. Ptolemy added the concept of epicycles, or loops, to his concept of perfect circular orbits to account for the backward motion of planets observed in the night sky.

Figure 4-3: Birth Weight as a normal distribution. The majority of babies are born weighing 7 to 8 pounds. Values representing that majority form the “bump” in the bell curve.

Figure 4-4. Shapes of bell curves. The steeply-sloped bell curve (left) means that its data is clustered more closely together. The gently sloped bell curve (right) represents data points that are more spread out from each other.

Figure 4-5. Mean scores as average. Mean sit-and-reach values in young men are shown before and after receiving massage to the hamstring muscles.

Figure 4-6. Boxplot. This boxplot shows study results (Sakurai) in regard to patients’ morphine consumption after surgery for the control group and the acupressure treatment group. Data are plotted as the medians with 25th and 75th percentiles. The asterisk shows an outlier.

Figure 4-7. Comparison of boxplot and bell curve. The area represented by the box is roughly equivalent to the “normal” area of the bell curve.

Figure 4-8. Standard deviation. Standard deviations in a normal distribution.

Table 4-1 Test Scores as an Example of the Median

 

Student Name
    

Score

John
    

94

Jane
    

80

Robert
    

75

Mary
    

70

Susan
    

65– MEDIAN

Ronald
    

55

Sam
    

45

Rose
    

30

George
    

25
 

 

 

Table 4-2  Median and Percentile in Results from Sakurai Study

 
    

 
    

Morphine Requirement
    

Pain Score

 
    

No. of Patients
    

 

25th Percentile
    

MEDIAN

50th Percentile
    

 

 

75th Percentile

 
    

25th Percentile
    

MEDIAN

50th Percentile
    

 

 

75th Percentile

 

Control

Group
    

30
    

27 mg
    

47 mg
    

58 mg
    

16 mm
    

29.5 mm
    

59 mm

Treatment

Group

Receiving

Minute

Sphere

Acupressure
    

23
    

25 mg
    

41 mg
    

69 mg
    

22 mm
    

40 mm
    

58 mm

TOTAL
    

53
    

 
    

 
    

 
    

 
    

 
    

 

 

 

 

9.10 Break

This chapter and the previous one were really dense in terms of the material we covered. But the hardest part of learning about reading research is now over, and if you have stuck with it this far, I promise you that you will find the rest of the book to be smooth sailing in comparison, building readily on what you have already learned.
203
204
Since you’ve worked so hard on the methods and statistic parts, here’s another nice bear picture for you to look at, while we take a well-earned break.
9.11 9.12 9.13
Exercise 1: Exercise 2: Next steps
Figure 9.20: text.
Now that we know what the methods and the most important (for our purposes) statistics are, let’s move on to look at how study data is reported in the “Results” section.
 

Alpha (α)

 

Figure 9.17: text.
Figure 9.18: text.
The statistical measure α is the probability of making a false positive error. In most of the research literature which you see α, the researcher will tend to set α at about 0.05. That 0.05 means a 5% risk making a false positive error. In the example below, that is what Hopper sets his α at, and he concludes that, to a 5% or less probability of seeing an effect that is not really present, that his intervention (dynamic soft-tissue mobilization) significantly increased hamstring flexibility in the healthy male subjects he studied.
Example:
OBJECTIVES: The purpose of this study was to investigate the ef- fect of dynamic soft tissue mobilisation (STM) on hamstring flexibility in healthy male subjects...The alpha level was set at 0.05. RE- SULTS: Increase in hamstring flexibility was significantly greater in the
195
196
Figure 9.19: text.
dynamic STM group than either the control or classic STM groups with mean (standard deviation) increase in degrees in the HFA measures of 4.7 (4.8), -0.04 (4.8), and 1.3 (3.8), respectively. CONCLUSIONS: Dy- namic soft tissue mobilisation (STM) significantly increased hamstring flexibility in healthy male subjects. (Hopper 2005)
(compare this with our other hamstring study, too). (what other hamstring study?)
 

 

9.8 p-value

My biostatistics professor would have an aneurysm (sorry, Dr. L.!) if he saw how we are going to treat the concept of p-value. And in the bigger picture, he would
197
be right—it is a misunderstood and misused statistical measure, and deserves a fuller and richer treatment by experimenters and statisticians. On the other hand, the purpose of this book is to give you enough information to read massage research, not to turn you into a specialist in any given area in experimental design. So our strategy will be to understand p-value enough to use it the way most clinicians do to read research articles. and we will note that in itself, that does not fully do justice to the concept.
9.9 Sampling
9.9.1 Power and sample size
 
 

 

Preparing to discuss standard deviation

This (standard deviation) is probably the hardest concept we are going to cover. But it is worth it, because of the value of the concept and its applicability to so many different situations. So let’s break this up into small pieces to tackle one piece at a time, and see how we can use it, not only in reading massage research, but in many other situations as well.
I remember when a sad event in my childhood brought home to me the concept of a population, although I certainly didn’t think about it that way at the time.
185
186
Figure 9.8: text.
When I was in fifth grade, a child at my school died. Although I didn’t know the child personally, I was sad to hear the news, as was everyone else there. Then I started putting it together with what had happened the year before, when another child had died. I figured out that there must be some kind of rule that every year one child dies at our school, and that next year it could be me. That particular thought was scary enough to keep me up awake for a couple of nights.
Although I was kind of on the right track in certain ways, there were some flaws in my analysis; however, as I was 10 years old at the time, I think I can be forgiven for a certain lack of mathematical rigor. The observation that there was a pattern—the death of one child per year—was a reasonable observation for that very short time span, although if I had been paying attention longer, it is possible that there would have been many other years where no child at that school died.
But from that observation of a pattern, I went a little too far in imaging a “rule” that one child died every year—it would be better to think of it as a description of what did happen, rather than as a prescription for what must happen. If you think of it in that way, you can see one function that statistics serves—descriptive
Figure 9.9: text.
statistics summarizes the data about a population or a study, and describes in what way they are similar (central tendency) or different (variability). It takes a very diverse group, and tries to convey concisely and efficiently to the audience what the important measures of that group are. The statistical measures we have gone over up until now—mean, median, mode, and percentile—are descriptive statistics.
Inferential statistics takes things a step farther—it lets us use reasoning to infer, or make predictions, about the group, based on what we already know. It’s what I was dimly sensing when I realized that another could die at my school the next year1, and so came up with my “rule”. The statistics we are going to talk about now are inferential statistics, and understanding the concepts of normal distribution, standard deviation, types of error, sample size and power,
1I was, and still am, quite happy to have been proved wrong on that prediction.
187
188
and inter-observer agreement will make a great deal—even most—of the massage research literature accessible to you.
Finally, one more thing about my example, and then we’ll let it go—remember in Chapter 3 when we talked about how science is about what’s common to every- one, while spirituality can be about what is unique and special? I’ve gotten the sense from some of my students, and have felt it myself, that there is something vaguely disturbing about talking about such sad events as a child’s death in terms of a population event, and I suspect that some of the aversion I’ve heard people express to science has something to do with the sense that science somehow sucks out what is special about being human. I would respectfully suggest that the two are not mutually exclusive—it is possible to operate in the two different modes at different times, as appropriate, and in that way to get the best of both—the rigor AND the compassion, as we talked about earlier. AND THAT ....
9.6 Standard deviation
Standard deviation has a lot in common with the averages we discussed earlier, and we will talk about how we can use it as a kind of descriptive statistic. To understand standard deviation, however, we first have to all be on the same page about what normal distribution means, so we’re going to talk about that first, and then come back to standard deviation.
9.6.1 Normal distribution
We talked earlier in Chapter 2 about how “normal” is one of those words that has a specific, neutral meaning in science, yet has very strong connotations in everyday language. It’s unfortunate that this word is so heavily loaded, as it is one of the most useful and powerful statistical concepts there is, and serves as a gateway to the world of inferential statistics. The word has been used as a weapon to enforce social and medical agendas—after I have taught a session on massage research and fibromyalgia, I’ve had people some up to me afterwards
189
and tell me how painful it is to be told they are not “normal”, where “normal” is a prescriptive word for how they should be. Let’s be very clear that this is not how we’re using the word. Our specific statistical use of the word is defined below.
First of all, think about a situation you’ve been in with a lot of other people—a lot of the time, a few people are extreme in some value one way or the other, but most people are pretty close to average. We’ve all been born, so let’s consider the weight at birth in all healthy babies born in the US as our example situation. A few very big babies: 8 (1/2) to 9 pounds, A few very small babies: 6 to 6 (1/2) pounds Most babies somewhere around 7 or 8 pounds or so, more or less: called normal birthweight because it forms a normal distribution.
This is what that normal distribution looks like. The curved line is called a bell curve—a pretty descriptive name, because it is indeed shaped like a bell [bell character].
Figure 9.10: text.
Figure 9-2: Bell curve showing normal distribution of birthweights While all bell curves have the same basic features of a small “tail” at either end (representing a few extreme values) and a large “bump” in the middle (repre- senting a lot of typical values), there can still be some dramatic differences in how the data the bell curve represents is arranged. The following are both bell curves, but look how different they are from each other: Figure 9-3: Two different bell curves
190
Figure 9.11: text.
• The graph on the left is tall and narrow and drops off sharply. • The graph on the right is shorter and drops off much more gently.
These differences are useful, because they tell something about the data being studied—namely, about how different the extreme values are from the more typ- ical values for that population. The standard deviation, which is coming up, will explore that distinction in more detail. So now that we are familiar with normal distributions and bell curves, let’s return to standard deviation, and see how that helps us with reading the massage research literature.
9.7 Back to standard deviation
We discussed earlier that the mean can sometimes be a useful way to summarize and describe the data. But the mean can be so different from that data that it does not give an accurate description of that data because the data under study is extremely high or extremely low. To put it another way, according to the “Bill Gates Net Worth” web page2 just now, at this moment, Bill Gates’ net worth is $27,600,000,000 (give or take). So, if I told you that on average, Bill Gates and I each have a net worth of $13,800,000,000—did you really learn anything relevant and useful about me3? Or did you just get a graphic demonstration of how badly the mean fails when it has to deal with extreme values?
Clearly, we need a better tool for describing populations that—like our big, small,
2Yes, there really are some people with that much spare time on their hands. You can find it at: http://bgnw.marcus5.net/bgnw.html if you like.
3If only! :)
191
and average-sized babies—exhibit a great deal of variation, and the standard deviation (SD) is one of those tools we can use. We won’t bother with the mathematics behind the SD here, because for our purposes, I just want you to be able to recognize it when you come across it in the literature, and to understand what it means.
Sometimes you’ll see the SD called the mean of the mean [ref]—that refers to the way it is computed mathematically, and also to the way it describes data more accurately than just the mean alone does. Assuming a normal distribution of data (our bell curve), the standard deviation describes where in the bell curve the data lies. And so the normal distribution and standard deviation can deal with extreme data as well as more representative data. Further, a large standard deviation can indicate to the reader that there is something wrong with the data, or with the model, or with both.
First of all, let’s put some further meaning on our bell curve. Below we have a bell curve where the different sections are shaded.
Figure 9.12: text.
Figure 9-4: A bell curve with standard deviations
192
• The solid gray area, referred to as 1 standard deviation from the mean, represents the largest number of data values. Values that fall in this area of the graph are considered the most normal. We expect 68% of our values to lie somewhere within this range.
• The striped area, referred to as 2 standard deviations from the mean, represents a larger number of the values. We expect 98% of our values to lie within this range (notice that to get from one striped range to the other, we have to go through the gray ranges, so we include that previous 68% in our estimate of 98%).
• The data in the black area, referred to as 3 standard deviations from the mean, represents a small percentage of the values. We expect about 99.7% of our values to lie within this range (notice that to get from one black range to the other, we have to go through the striped ranges and the gray ranges, so we include that 98% in our estimate of about 99.7%).
So now you can begin to see how this addresses the problem with the mean and the extreme values that we’ve encountered—if we know the mean, and we know how far away (how spread-out) from the mean a particular value of data is, then we have a much more powerful tool for accurately and clearly representing the data than the mean alone is able to provide4.
This is useful because it tells us how “spread-out” the population is. The larger the SD, the more chance you should be somewhat skeptical of the study. Re- member our previous two bell curves? Figure 9-5: Two different bell curves with varying standard deviations

 

 

Sampling

 

Power and sample size

 

 

 

 

 

 

Mean

Although you may not have heard it referred to by that name, you’re already familiar with the concept of mean: it is the kind of average commonly seen in school grading. To get the mean, you add all the results together, and then divide by number of results.
Example from the literature:
Barlow 2004 investigated whether a single massage would alter the flexibility of the hamstring in physically-active young men, as measured by the value on the sit-and-reach test He included his data in Table 1, so we can calculate the mean of all the sit and reach scores for the subjects (1) before and (2) after the massage by adding all the values in the appropriate column, and then dividing by 11 (the number of subjects in the study):
Figure 9.1: Mean (average) value of 5 final exam grades.
The disadvantage of the mean is that it can’t tell you about extreme values in the data, or how any individual compares to the group, except in the most crudely approximate way. In order to examine this limitation further, let’s set up our own table including the mean score (shown in the callouts in table 1 on the previous page). In the last column, observe the difference in score for each subject from the mean score.
 

 

 

A false positive error (also called a type I error), for our purposes, exists when it looks like the treatment, such as massage, caused an effect when it really didn’t. In other word, its positive result was false. Here is a hypothetical research experiment to see the effect of massage on blood pressure to illustrate how this

4So you get a much more accurate description of where I really am if I tell you that on average, Bill Gates and I each have a net worth of $13,800,000,000.00, and that I am more than 3 SD away from that mean. Let’s just leave it at that for now. :)
can happen.
Figure 9-6
Figure 9.13: text.
Figure 9.14: text.
In this experiment, the researchers concluded that massage does indeed lower blood pressure. But suppose the researchers made a change in the experimental design and instead of having the control subjects sit in a chair for one hour, they lay down on the massage table for an hour which caused a different result for the control group.
Figure 9.15: text.
Figure 9-7
With this experimental design lying down on the table for one hour, without being massaged, also lowered blood pressure. The conclusion from this experimental
193
194
design would be that lying on the table, and not the massage itself, lowers blood pressure. Note that this was not a real experiment and that this may or may not be true. Also note that this is an example of an experiment that any massage therapist can carry out.
 
 
A false negative error (also called a type II error) exists when it looks like the treatment, such as massage, had no effect, but it really did. Here is another hypothetical experiment that demonstrates a false negative error. Suppose the 1 hour massage and also just lying on the table for an hour both resulted in no change in blood pressure.
Figure 9.16: text.
Figure 9-8 However, in this hypothetical experiment, the researcher hypothetically did not pay attention to a couple of important factors. One was that the massage thera- pist being used to perform the massages was only available on Monday. And on Monday, there were workmen using jackhammers just outside the window. Also, suppose it is summer and the windows are open. However, when the subjects in the control group came participate in the study by just lying on the table without getting a massage, it was later in the week and the workmen were gone. Figure 9-9 If the researcher is unaware of the jackhammer annoyance factor, he will conclude that massage does not lower blood pressure. However it is possible that the jackhammer noise was having a blood pressure elevation effect which masked the the blood pressure lowering effect of the massage. Here is a diagram that illustrates this.
Figure 9-9
 
 

Confidence interval and confidence level

In the political season (and at other times, too), we often see poll results reported as a certain set of results, plus or minus a particular margin of error. Although it’s clear from the context that it means a little uncertainty in the exact results, now that we have discussed the normal distribution, we can understand it in a little more depth.
If the poll accurately reflects the population at large, and if we repeated the poll multiple times, we would expect the results to be about the same, with only a little bit of variation. The amount that it can vary—positive or negative, since it can vary either way—is the margin of error. So if Candidate A is preferred by 68% of the population, and Candidate B by 32%, with a margin of error of +/- 5%, that means that either candidate’s number could be as much as 5% too high or too low in this poll. So in reality, Candidate A may have anywhere from 63% to 73%, and Candidate B may have anywhere from 27% to 37%. That positive and negative variation around the reported percentage is the margin of error, which leads us into the concept of a confidence interval. GIGO. Statistical dead heat.
You can think of the confidence interval as a band or a range around the reported value—the “true” number lies somewhere within that band. The confidence level,
198
by contrast, reports how confident we are that the true result lies within that band. examples κ (kappa)
 

 

 

Average
The average is an attempt to describe qualities of a group by combining qualities of individual members of the group. The mean, median, and mode describe different ways of averaging, which tell something about the distribution of those individual qualities or values.
 
 
The statistics we have covered up until now are useful, but in order to get a clearer picture of what all the data looks like, there are more refined tools we can use to understand the relationships among the values. Standard deviation (SD) is one of those tools
 

 

Now that we know what the methods and the most important (for our purposes) statistics are, let's move on to look at how study data is reported in the "Results" section.On completion of this module students will be able to:

 

1. Describe how measurement and statistics can improve our understanding of the basis for massage therapy practice

 

2. Identify and understand basic concepts such as measurement scales (nominal, interval, ordinal, ratio scales), range, mean, standard deviation, normal distribution, variable, statistical significance

 

3. Define the difference between descriptive and inferential statistics

 

4. Define the difference between parametric and nonparametric statistical tests and identify key examples of each type

 

5. Name several common ways statistics can be manipulated to change results

 

* 1. Name one kind of each of the following: a descriptive study, an experimental study, an observational study.

 

* 2. Name three advantages and three disadvantages of the RCT research design.

 

* 3. Define descriptive and inferential statistics.

 

* 4. Write a 1-Minute Paper on why using statistics is important.

 

• Types of studies (experimental, correlational) (slide 9)

 

• Some important descriptive statistical terms (range, mean, standard deviation, normal distribution, variable, dependent variable, independent variable, operational definitions, “n”, “p”, “p value”, hypothesis, null hypothesis (slides 10-15)

 

• What to look for in a research paper if you are not a statistician: sample size, power, duration of follow-up, completeness of follow-up (slides 16-20)

 

• Common pitfalls to avoid when using statistics (Greenhalgh, 2001) (slides 21-25)

 

• Examples of parametric tests: t-tests, analysis of variance, multiple regression, Pearson’s product moment correlation coefficient (slides 33-38)

 

• Examples of nonparametric tests: Chi-square, Mann-Witney U, Spearman’s rank correlation coefficient (slides 39-42)

 

• Does statistical significance necessarily mean clinical significance? (slide 43)

 

Sample Research Statistics Evaluation Form

 

1. Using one of the four articles you found in your electronic literature search in Module 2, fill in the following information:

 

• RESEARCH STUDY TITLE

 

• LIST STATISTICAL TESTS USED; IDENTIFY EACH AS PARAMETRIC OR NONPARAMETRIC.

 

• FOR EACH OF THE IDENTIFIED TESTS, INDICATE WHETHER OR NOT ITS USE WAS APPROPRIATE TO THE DATA COLLECTED.

 

• WAS THE SAMPLE SIZE OKAY OR NOT OKAY, AND WHY?

 

• WAS THE DURATION OF FOLLOW-UP OKAY OR NOT OKAY, AND WHY?

 

• WAS THE COMPLETENESS OF FOLLOW-UP OKAY OR NOT OKAY, AND WHY?

 

2. For the study you chose, list two questions or potential concerns you have about the statistical analyses the authors used and give your rationale for each question or concern:

 

Sample Test Questions

 

1. Outline three assumptions underlying parametric tests. Give an example of a parametric test and describe it.

 

Answer: Measurements of the dependent variable are at the interval or ratio scale level; measurements approximate a normal distribution curve; variances of the samples compared are roughly equal.

 

Example: Paired t-test, which compares two sets of observations in a sample, e.g., comparing the weight of infants before and after they eat.

 

2. Outline three assumptions underlying nonparametric tests. Give an example of a nonparametric test and describe it.

 

Answer: Used to measure data at the nominal or ordinal scale level; few assumptions are made about the distribution of the population; addresses ranks, medians or frequencies of data.

 

Example: Chi Square Test, which compares observed frequencies within categories to frequencies expected by chance, e.g., assessing whether acceptance into medical school in the UK is more likely if the candidate was born in the UK.

 

3. According to Greenhalgh (2001), describe three pitfalls that should be avoided when using statistics.

 

Answer:

 

1. throwing all your data into a computer and reporting as significant any relationships where “p < 0.05”

 

2. if baseline differences between the groups favor the intervention group, not adjusting for them

 

3. not testing your data to see if they are normally distributed

 

 


12. Next steps

This chapter and the previous one were really dense in terms of the material we covered, but the hardest part of learning about reading research is now over. If you have stuck with it this far, I promise you that you will find the rest of the book to be relatively smooth sailing in comparison, since it builds on what you have already learned to this point.


13. Evaluate this chapter


14. Figures in this chapter


15. Tables in this chapter


16. Exercises in this chapter

Research Statistics Form (using the four articles each student identified in their literature search in Module 2).


17. Instructor material for this chapter

 

Talking past each other: Reactions to the pseudoscience of an "emotional energy" anatomical diagram, part 1

I got tagged in a Facebook discussion this morning, with the accompanying image that I've labeled, because the image resolution makes the text hard to read in the original:

 

1: Emotional Energy Centers of the Body

2: Burden Area || Burdens & Reponsibilities || * Carrying a heavy load || * Weight of the world on shoulders

3: Throat Center || Self-Expression Issues || * Lack of Trust || * Inability to speak feelings || * Lack of Nurturing

4: Burden Area || Burdens & Reponsibilities || * Carrying a heavy load || * Weight of the world on shoulders

5: Heart Center || Grief, Sorrow, Sadness, Loss || * Emptiness of Heart - Lack of Love || * Helplessness, Aloneness, Disillusionment || * Embarrassment, Shame, Humiliation || * Repressed feelings, Disappointment || * Genetic or Ancient memory || * Cruelty, Meanness

6: Fear Center || Fears & Phobias || * Loss of Control / Fear of losing control || * Giving our power to another person || * Relationships

7: Anger Center || Anger and Rage || * Anger at others || * Anger at self || * Jealousy || * Resentment

8: Guilt/Shame/Unworthiness Center || * Unacceptance || * Self-judgement (sic); self-criticism || * Not deserving of the good life has for us || * Inability to accept and receive

9: Old Stuff Center || Family Sexual Issues || * Childhood conditioning || * Violation of body or personal space || * Something done to us / Something taken from us without our permission || * Molestation, abuse, rape || * Impotence, frigidity

10: Support Area || * Lack of Financial Support

11: Support Area || * Lack of Emotional Support

12: Rejection Center || Abandonment || * Criticism, judgement (sic) by others || * Self-rejection || * Abandonment - pain in the heart

13: Betrayal Center || * Betrayed by someone we trusted || * Self-betrayal

14: Survival Center || Feeling we won't survive a life-threatening incident || * Violations related to surviving (accidents, abuse, violence, rape) || * Impotence, frigidity || * First year of life / Basic Creativity

 


The discussion went like this (I've shielded the names of everyone except Christopher Moyer and myself, because at POEM, the policy is "no blame, no shame"--this is not about publicly embarrassing individuals for not knowing something; it's about working to solve the problem so that all of us can know something better):

Person A: Emotional ills are stored in the muscles of the body. When an MT releases the tension in the muscle, the body is on its way to better health as long as the client let's the emotion ill go. So yes, our emotions can make us very ill.

 

Person B: Excellent. I've been looking for a diagram or list like this. Thank you!

 

Person C: very interesting indeed.

 

Christopher A Moyer: Sorry, but this is highly inaccurate. While it is true that muscular states and activities can play an indirect role in memory processes, it is definitely not true that "emotional ills are stored in the muscles." An abundance of evidence from psychology, neuroscience, anatomy, and physiology clearly indicate that is not the case.

 

Person B: I completely disagree Christopher.

 

Christopher A Moyer: Fine - but based on what evidence?

 

Christopher A Moyer: ‎Ravensara Travillian will be interested in this discussion if she has the time.

 

Person B: My own personal experiences throughout life; receiving acupuncture, massage and feeling those emotional releases.

 

Ravensara Travillian: ‎"Ravensara Travillian will be interested in this discussion if she has the time."

Possibly I will be interested, and possibly not.

As you point out, I have limited time, and I have decided to invest that limited time with students who are actually interested in learning something new, as opposed to squandering it in useless arguments.

If someone is not interested in learning, engaging is a waste of my time and theirs, and does not interest me in the least. It will inevitably degenerate into "Is so!" "Is not!" "Is so!" "Is not!".

Nothing could interest me less.

So the questions for the advocates of the chart above include:

Is there any evidence that could possibly convince you that your unique feelings might not be an accurate guide to what is universally going on?

Are there any circumstances at all under which you might reconsider what you have decided to believe?

If the answer to the above questions is "yes", then I would be potentially interested in a calm, civil, professional discussion about:

* why the above diagram is an example of simplistic "vending-machine science" that turns its back on evidence from anatomy 101 and neuroscience 101;

* what the actual stories behind the psychophysiological processes in our bodies really are, and why they feel the way the way they do to us, so convincingly that the above chart seems plausible;

* what are our professional obligations to our clients not to pass along misinformation and pseudoscience; and

* what do we do about our genuine moral distress at learning that our teachers were mistaken in what they taught us.

If the answer to the above questions is "no", then there is not enough common ground intellectually or ethically among us to even begin to discuss these matters, and I'm not interested in arguments for arguments' sake.

 

Person A: As I stated in another post, this confirms muscle issues that I have seen in my clients and myself which in my opinion displayed as a result of an emotional issue(s). While energy work is not always defined or accepted by the scientific community, there is validity when field work sees results and breakthroughs in a clients wellbeing. Muscles are known to stay tense when we're stressed and they can freeze. When released, there is significant healing. A reoccurring issue oftens has a basis in an emotional issue such as stress, loss of job or loved one. Massage is known to release those muscles and promote healing which can deepen intuitive the emotion field. Which is sometimes why it is not unusual for clients to break down and cry on the table. This chart confirms some of my field experience. The field of massage therapy does not hold a patent on this. "Healers" of all peoples know the value of touch and the subtle bodies that massage touches.

 

Ravensara Travillian: You're entitled to believe anything you want to, and that's fine. But there's no point in discussing the evidence, so Chris was wrong on one point--I'm not interested in this discussion.

 

Christopher A Moyer: Very briefly - the muscles seem to do all those things for one reason and one reason only - because they are connected to the brain. The muscles themselves have no capacity to store memory. Memory occurs in the brain. There are so many lines of evidence to establish this that it is not even close to controversial.

Why do I bring this up, even though I know some people will think Raven and I are being killjoys? Because we think it is important for the profession to improve and build itself on sound information.

This graphic is not sound information.

 

Person A: In your opinion and that is fine. Others will disagree. And the client can choose whom they wish to go to. I did not post this for a slap down discussion, but simply an FYI/Sharing which resonated with me. You are most certainly welcomed to accept it or reject it if it does fit within your "learned" responses. Be well!

 

Christopher A Moyer: It's not a matter of opinion, which is why I weighed in. If you'd posted that peanut butter and jelly is the best sandwich of all time, I wouldn't try to convince you otherwise, but this isn't like that. And we're not having a "slap down" discussion - just a regular discussion.

You're entitled to your own beliefs, but in this case they run counter to a tremendous amount of evidence.

 

Person A: BTW, some people think it is their duty to improve the "profession" according to their opinions while forgetting the profession covers a wide choice of modalities that may differ in theory and practice.

 

 


I actually spoke a little carelessly--when I said "I'm not interested in this discussion", what I really meant was "I am not interested in wasting Person A's time nor mine in a discussion that will inevitably get into an endlessly repeating loop at best, and at worst will devolve into name-calling and unprofessional behavior, as these discussions so often do".

But I actually am very intererested in the meta-discussion here, or the discussion about the discussion itself, because in a relatively brief exchange, quite a few issues of importance came to light.

Since these issues are directly related to POEM's mission of providing high-quality and evidence-based educational materials for massage stakeholders, let's go through the discussion and examine what they mean for us.

 


Person A: Emotional ills are stored in the muscles of the body. When an MT releases the tension in the muscle, the body is on its way to better health as long as the client let's the emotion ill go. So yes, our emotions can make us very ill.

 

Let's take the last sentence first, so that we can begin with the most correct thing they said.

So yes, our emotions can make us very ill.

 

Emotions can play a large role in physiological and pathological processes in the body; that fact has been recognized for a long time, and we refer to interactions between body and psyche as "psychophysiology".

As we explore evidence-based psychophysiology of massage here at POEM, we'll talk about why it feels as though

Emotional ills are stored in the muscles of the body.

 

Because, in reality, they're not. This is a classic example of where what it feels like to us can mislead us into believing something that is not true.

It's such a convincing illusion, though, that you can see very big names in massage therapy--people whom you would expect to have a great deal more anatomy, physiology, and neuroscience education--fall into that very same trap.

Why is this belief a trap, and what can we do to escape falling into it?

Think back to the first day of "Anatomy 101" in massage school--what was one of the first things you learned?

The names of the four tissue types, right?

Epithelial tissue, connective tissue, muscle tissue, and nervous tissue, right?

But were you taught just to remember the names, or were you taught what the differences between those tissues really mean?

Similarly, when you studied the brain, did you learn just to write words about the brain on anatomy tests? Or did you actually feel the disorientation that can come from the realization that how the brain really works is so breathtakingly different from how it feels like it works, and that the reason our senses inevitably mislead us on that fact is because we're "inside" them--using the very brain processes that we're trying to understand, in order to actually understand them?

If you truly understand the difference between muscle tissue and nervous tissue, and you truly understand the way the brain operates in multiple complex ways on the sensory information it processes, then you know why muscles can't store emotions: muscles simply are incapable of that same kind of complex processing.

No blame, no shame--it's not your fault if you were taught anatomy and the body in a rote memorization way. We're all in this together, and the only question is, now, what are we going to do to fix the situation?

When an MT releases the tension in the muscle, the body is on its way to better health as long as the client let's the emotion ill go.

 

This is an example of what I call "vending-machine science"--someone understands the body to be like a vending machine, where you always get the same things out when you put the same amount of money in.

In this example, MT pushes on muscle ==> emotional release ==> better health: in other words, a vending-machine model of psychophysiology.

Instead, to examine it through a systems-science lens yields a much better understanding of what is really going on in the interactions among massage and psychophysiology. Systems science tells us that, rather than one same output for one same input, that people are complex, and can experience the same input in very different ways. And yet, they do not need to follow exactly the same process as one another in order for each individual to derive a great deal of psychophysiological benefit from massage in their own unique ways.

I would recommend that approach, rather than relying on an outmoded vending-machine model that actively contradicts centuries of evidence from anatomy, psychophysiology, and neuroscience.

Having critiqued the errors in Person A's understanding of psychophysiology, I'd now like to switch gears, and praise them unreservedly for their willingness to share information. It is very generous, if you think you have something of value, to want to freely share it with others, and if I am going to point out where Person A is wrong, it is only fair that I also point out where they're right.

There are many massage teachers in Person A's situation--they are teaching to clients and students in good faith exactly what their teachers taught them in good faith, and so on up through generations of teachers.

One of the important decision points facing massage now is how do we encourage teachers to look at the quality of the information they are passing along to students, and how do we support them in developing better information and understanding?

We have to face this question head-on, and plan how to deal with it in a client-centered way, if we truly want to become a healthcare profession.

 

 

 

Syndicate content