# Trends in statistical methods in articles published in *Archives of Plastic Surgery* between 2012 and 2017

## Article information

## Abstract

This review article presents an assessment of trends in statistical methods and an evaluation of their appropriateness in articles published in the *Archives of Plastic Surgery* (*APS*) from 2012 to 2017. We reviewed 388 original articles published in *APS* between 2012 and 2017. We categorized the articles that used statistical methods according to the type of statistical method, the number of statistical methods, and the type of statistical software used. We checked whether there were errors in the description of statistical methods and results. A total of 230 articles (59.3%) published in *APS* between 2012 and 2017 used one or more statistical method. Within these articles, there were 261 applications of statistical methods with continuous or ordinal outcomes, and 139 applications of statistical methods with categorical outcome. The Pearson chi-square test (17.4%) and the Mann-Whitney U test (14.4%) were the most frequently used methods. Errors in describing statistical methods and results were found in 133 of the 230 articles (57.8%). Inadequate description of P-values was the most common error (39.1%). Among the 230 articles that used statistical methods, 71.7% provided details about the statistical software programs used for the analyses. SPSS was predominantly used in the articles that presented statistical analyses. We found that the use of statistical methods in *APS* has increased over the last 6 years. It seems that researchers have been paying more attention to the proper use of statistics in recent years. It is expected that these positive trends will continue in *APS*.

## INTRODUCTION

Evidence-based medicine (EBM) is an approach to medical practice intended to optimize decision-making by emphasizing the use of evidence from well-designed and well-conducted research [1]. The ability to understand sources of bias in the medical literature is required to practice EBM properly. Bias can occur from study design and/or the inappropriate use of statistical tests. Study design affects the choice of statistical methods. Moreover, statistics is deeply involved in all stages of research, from data collection to analysis and interpretation. An improper choice of study design and statistical techniques may lead to improper results and conclusions. Therefore, statistics plays a crucial role in evaluating the evidence of medical research.

*Archives of Plastic Surgery* (*APS*) adheres to the guidelines and best practices of the International Committee of Medical Journal Editors (ICMJE). The ICMJE recommendations for statistics state that authors should describe statistical analyses with enough detail to enable a reader to verify the reported results and that authors need to provide appropriate indicators of measurement errors or uncertainty, such as confidence intervals, beyond the P-value [2,3]. Furthermore, they recommend specifying the statistical software program(s) and versions used.

The aim of this article was to assess trends in statistical methods, and to evaluate their appropriateness, in papers published in *APS* from 2012 to 2017. *APS* is the official journal of the Korean Society of Plastic and Reconstructive Surgeons and is published 6 times per year. Since 2012, it continues the *Journal of the Korean Society of Plastic and Reconstructive Surgeons*, which was launched in 1974. This review article provides an overview of recent trends in the statistical methodology used in *APS*.

## METHODS

This study is a retrospective literature analysis, neither approval from the Institutional Review Board nor informed consent was required.

We collected 388 original articles published in *APS* from 2012 to 2017. Case reports, ideas and innovations, review articles, and letters were excluded. Of these articles, 230 (59.3%) used statistical methods to analyze data and to report results. We classified them according to the types of statistical methods and software used, and checked whether there were errors in the description of statistical methods and results. We counted the number of statistical methods applied. When multiple statistical analyses were used in a study, each method was counted separately. The Cochran-Armitage trend test was used to assess the presence of linear trends in the percentages of statistical methods and statistical software used in the published articles by year from 2012 to 2017. R version 3.4.2 (R Foundation for Statistical Computing, Vienna, Austria) was used to perform the statistical tests, and P-values <0.05 were considered to indicate statistical significance.

### Statistical methods according to the objective of the analysis

Table 1 lists statistical methods according to the objective of the analysis and whether they involve continuous or ordinal, categorical, or time-to-event outcomes. We classified the statistical methods into 1 of 3 commonly used objectives: comparisons, correlations, and regression analyses.

Comparisons can be performed using paired or independent samples. Paired data arise from the same individual at different points in time or from different regions of the body, while unpaired (or independent) data arise from distinct individuals. In paired data, the variables to be compared are correlated with each other, so that correlation should be considered in the analysis. In plastic surgery research, clinical assessments before and after surgery or from multiple regions within the same subject can be treated as paired (or clustered) data.

The association between two variables can be assessed through either a correlation or regression analysis. Correlation analyses quantify the relationship between the variables, while regression analyses model the relationship between an outcome variable and one or more explanatory variables. Regression can be used to predict an outcome based on one or more predictors.

Other statistical analyses that were not listed in Table 1 but were used in articles published in *APS* include the normality test, power analysis, multivariate analysis, and reliability analysis using such as the intraclass correlation coefficient, Bland-Altman plots, the Cronbach alpha, and the kappa statistic.

### Parametric or nonparametric methods

The statistical methods for continuous or ordinal outcomes can be further classified as parametric or nonparametric (Table 1). Parametric statistical methods assume a specific parametric form of the distribution for the underlying population, while nonparametric methods do not assume any parametric form. For example, the t-test assumes that the variables are normally distributed. When comparing the central tendency between groups, one should check whether the data can be assumed to be normally distributed before applying parametric tests. Nonparametric methods need fewer assumptions about the underlying distribution. In many cases, nonparametric methods are more appropriate when the sample size is not very large. Nonparametric methods based on ranks are especially useful for testing ordinal scale variables such as the visual analogue scale, which is widely used in the plastic surgery field.

### Errors in reporting statistical methods and results

We assessed whether there were errors in reporting statistical methods and results in the articles published in *APS*. Errors in presenting P-values were observed, such as writing “P=0.00” or “P=1.00” instead of indicating that the P-value was very small or large (e.g., P<0.001 or P>0.999) and insufficient descriptions of the P-value, such as mentioning only the significance of the results without an exact P-value. Errors in describing the statistical methods were evaluated in terms of whether the applied statistical methods were described in the Methods section and whether the description of the applied statistical methods was complete and correct.

### Statistical software

The frequency of the use of various statistical software packages was counted, as well as the type of statistical software package used.

## RESULTS

### Frequency and types of statistical methods

Of the 388 articles published in *APS* between 2012 and 2017, 230 (59.3%) used one or more statistical method. Fig. 1 shows a statistically significant increase in the number of articles that used statistical methods over 6 years (P for trend=0.023). In 2012 and 2013, the percentage of articles using statistics was around 50%. In 2017, 64.7% of the articles published in *APS* used statistical methods. The number of statistical methods used per article in *APS* was 1.87±1.06 (mean±standard deviation). Almost half of the articles (47.1%) using statistics employed one method (Table 2). One article used six statistical methods.

Table 3 shows the frequencies of the statistical methods applied in the articles published in *APS* by year. There were 261 applications of statistical methods for continuous or ordinal outcomes, and 139 applications of statistical methods for categorical outcomes. Statistical methods for comparisons of independent samples were most commonly used. Among the methods of comparison, the Pearson chi-square test (17.4%) and the Fisher exact test (11.3%) for categorical outcomes and the Mann-Whitney U test (14.4%) and independent t-test (13.7%) for continuous or ordinal outcomes were the most frequently used methods. The Wilcoxon signed-rank test, paired t-test, Kruskal-Wallis test, and analysis of variance (ANOVA) were also widely used, accounting for more than 7% of the published articles using statistics. Within the category of regression analysis, logistic regression was used almost twice as much as linear regression. More complicated methods, such as repeated-measures ANOVA or linear mixed models, were applied in very few articles.

There were 30 applications of other statistical methods for other outcomes and objectives. Eight articles evaluated questionnaires or inter-rater agreement using reliability statistics. Seven articles checked the assumption of normality using the Kolmogorov-Smirnov test or the Shapiro-Wilk test. Five articles performed a power analysis prior to beginning a study or retrospectively. Survival analyses were applied in three articles.

### Errors in reporting statistical methods and results

We found errors in describing statistical methods and results in 133 of the 230 articles (57.8%). The frequency of various types of errors is presented in Table 4. Errors in P-values were found in 90 articles, with 25 instances of presenting inadequate description of P-values as equal to 0 or 1, and 67 instances of not presenting the exact P-values. For example, reporting “P=NS (not significant)”, or “P<0.05” or “P<0.01” instead of an exact P-value was the most common error in presenting P-values. Although such errors are not critical, they are worth mentioning and can be easily corrected.

Twenty-one articles did not state which statistical methods were applied, and 32 articles presented incomplete or wrong descriptions and applications. The statistical methods used in the article should be described in the Methods section, but some articles only reported P-values, along with the statistical methods used, in the Results section. For correlation analyses, for instance, the method for estimating a correlation coefficient (e.g., Pearson or Spearman) should be described. Another example is just mentioning the “t-test” without providing further details. Whether the independent or paired t-test was used should be explicitly stated, because that choice depends on the study design and the data structure. An example of an incorrect description of statistical methods was the use of the Wilcoxon signed-rank test for the comparison of independent observations. The Wilcoxon signed-rank test is used for paired comparisons, while the Wilcoxon rank-sum test (the same as the Mann-Whitney U test) is used for independent comparisons. These two methods were sometimes misused or misstated, due to confusion arising from the similar names.

### Statistical software packages

Among the 230 articles published in *APS* that used statistical methods, 165 (71.7%) provided details about the statistical software programs used for analyses. Seventy-five articles did not provide any such information. The percentage of articles presenting information about the statistical software used has increased by over 10%, from 71.9% in 2012 to 84.8% in 2017, although a statistically significant increasing trend was not observed (P for trend=0.597) (Fig. 2).

SPSS (SPSS Inc., Chicago, IL, USA or IBM Corp., Armonk, NY, USA) was predominantly used in the articles that presented statistical analyses (141 of 168 cases, 83.9%). SAS (SAS Institute Inc., Cary, NC, USA) was used in roughly 4% of the articles. Other programs, such as GraphPad PRISM (GraphPad Software Inc., La Jolla, CA, USA), STATA (StataCorp LCC, Lakeway, TX, USA), and SigmaPlot (Systat Software Inc., Point Richmond, CA, USA), were only employed in one to four articles.

## DISCUSSION

In this review, we analyzed articles published in *APS* from 2012 to 2017 with respect to the use and type of statistical methods and statistical software packages. The results showed an increasing trend in the application of statistical methods and the use of statistical software packages.

Two relevant articles—one review and one editorial—regarding statistics have been published in specialized plastic surgery journals [4,5]. Januszyk and Gurtner [4] presented a practical overview of statistics in medicine, ranging from basic principles in statistics to descriptive and inferential statistical methods, with detailed guidelines for the interpretations of statistical tests. Freshwater [5] published a letter pleading for improvements in statistical analyses in plastic surgery. Some medical journals have published articles like this one, providing a systematic review and/or analysis of trends in the statistical methods applied [6-9]. However, we did not find such articles in the field of plastic surgery. To our knowledge, based on PubMed and KoreaMed (http://koreamed.org), this is the first article to report and summarize trends in the application of statistical methods in a plastic surgery journal.

Altman [10] reviewed the statistical contents of medical research published in the journal *Statistics in Medicine*. He found a considerable increase in the use of statistics and reported that a much greater use of complex statistical methodology in medical research was detected. The review articles regarding the use of statistics in medical journals [6-9] reflect Altman’s findings. Altman [10] also said as a final comment, “Reviewing medical papers is difficult, time-consuming, occasionally frustrating, and educational. Many journals are desperate for expert statistical help.” *APS* invited a statistical editor to join the editorial team in 2012, and started having statistical reviewers assess the submitted articles to improve the quality of statistical applications.

Despite the increasing use of statistics in *APS*, there were some statistical errors in the articles, including the presentation of P-values and the description of statistical methods and/or statistical software used. Some authors stated whether the results were statistically significant without providing exact P-values, especially for non-significant results; frequently presented as “P=NS.” Moreover, some authors did not report the P-values throughout the article even for significant results, only stating whether the results were statistically significant. The exact P-values are useful information for interpreting the statistical results of hypothesis testing. A very small P-value indicates that the null hypothesis is very incompatible with the data that have been collected [11-13]. Some software packages output results with the P-value listed as 0.000 or 1.000. Researchers usually copy and paste the P-value into the paper as is; however, such values should be presented as “P<0.001” or “P>0.999.” “P=0.000” means that there is absolutely zero chance of getting the results (and more extreme results) if the null hypothesis is true. However, there is always some chance of such an outcome, and we cannot definitively say that the probability is either 0 or 1. Some authors reported P-values without details regarding the data (e.g., summary estimates such as mean±standard deviation, number [%], or odds ratio). The P-value has nothing to do with the magnitude or the importance of an observed effect [11,12]. For example, a difference in the visual analogue scale for pain assessment before and after surgery of 0.1 with a P-value of 0.2 would be interpreted as a non-significant difference, while a difference of 0.01 with a P-value of 0.003 would be presented as significant. As argued by Wasserstein and Lazar [13], statistical significance is not equivalent to scientific, human, or economic significance. Recently, some statements about the misuse of P-values were announced by a statistical society [13] and presented in a major medical journal [14]. To provide a broad and appropriate interpretation of the results of research, authors should report not only P-values with summary estimates, but also uncertainty measures such as the 95% confidence interval and/or standard error of estimates.

No indication of which statistical software package was used was provided in 75 of the 230 articles using statistical methods (28.3%). Different statistical programs could present different statistical results. For example, the median values computed in SPSS and R are not the same, because different algorithms are employed to calculate the median in the default settings. Another salient difference is the default setup of the event probability of the binary dependent variable in logistic regression, for which SAS uses a smaller value as a default, while SPSS uses a higher value. Data interpretation can be influenced by these defaults, so authors should understand the statistical software they use in detail and indicate which statistical software was used in the article.

Which statistical methods were used should be presented in the Methods section of the article. We noticed that some articles presented the results without mentioning statistical methods. We included these articles in the category of articles that used statistics. However, which statistical methods/software were used and how the significance level was set cannot be known. The instructions for authors in *APS* state that “methods of statistical analysis and criteria for statistical significance should be described” in the Methods section. Not only the names of the statistical analyses, but also the objectives of the study for using statistical methods should be described in detail in the Methods section.

The inclusion of a small number of subjects could limit the use of statistical analysis. Plastic surgery is a predominantly clinical field, so many plastic surgeons have focused their efforts on improving clinical results, and particularly on improving surgical techniques [15]. Assessments of newly updated surgical techniques or preliminary studies to generate an idea based on an animal experiment generally have small sample sizes. In some cases, neither statistical tests nor regression analysis might be necessary. Indeed, sophisticated statistical techniques are not always needed. Nonetheless, good data summarization using appropriate descriptive statistics can be very helpful for understanding the data. If statistical tests are required for a study with a small sample size, nonparametric statistical methods may be useful.

Less familiar statistical methods, such as reliability analyses and power analysis, were infrequently but consistently applied in the articles published in *APS*. Reliability analyses for evaluating internal consistency, test-retest repeatability, or inter-rater agreement are performed to assess reproducibility or repeatability among techniques/modalities or human readers. Power analysis is needed when planning a prospective study to achieve an adequate number of subjects. One may want to perform power analysis if non-significant results are obtained due to a small sample size.

This article can serve as the first step for obtaining a better understanding of the statistical methods frequently used in *APS*. In conclusion, the use of statistical methods has increased in *APS* over the last 6 years. Although there is room for improvement, researchers have been paying more attention to the proper use of statistics in recent years. These positive trends in *APS* are expected to continue in the future.

## Notes

No potential conflict of interest relevant to this article was reported.

## References

*Archives of Plastic Surgery*. Arch Plast Surg 2017;44:359–60.