class: title-slide, left, middle # Presentation of data & statistical tests ---- <br> .right[ ### Ghana Medical Journal Workshop ### August 1, 2023 Dr Samuel Blay Nguah .title-t[FWACP FGCPS] ] --- class: inverse middle center # No conflict of interest to declare. ---- --- # Outline ---- - Determinants of appropriate analysis - Common pitfalls in analysis - Descriptive vs Inferential statistics - Parametric vs Non-parametric tests - Analysis and presentation - Continuous variables - Categorical variables - Time to event analysis - Confounding & Interaction --- # Determinants of appropriate statistical analysis ---- - What is the research question(s) - What is/are the objective(s)? - What type of variables were collected? - Relate variables required for each objective - What is the study methodology? - What is the sample size? - How many .red[**missing variables**]? - How do you intend to deal with missing? --- # Common pitfalls ---- .pull-left[ ## Statistical analysis - Poorly written - Does not conform to analysis done - Wrong statistical tests applied - Wrongful dichotomization - Assumption of test not evaluated - Missing data: How much was present and how they were dealt with. - Analysis does not conform to basis for the sample size determination ] .pull-right[ ## Reporting - P-value not reported as exact (NS) - Wrong interpretation of P-value (Especially combo p-values) - Data dredging (p-hacking) - Ignoring confounders and effect modifiers - Obvious bias: "Almost significant" - Transcription errors. ] --- # An example statistical analysis > Data Analysis: Statistical analysis was done with the Statistical Package for Social Sciences version 21.0 (Chicago Ill, IBM). Qualitative variables were summarized as frequencies (percentages), while quantitative variables were summarized as means (standard deviation). Student’s t-test, analysis of variance, or Chi-square test was used, as appropriate, to determine the statistical significance of the difference between the two groups. P <0.05 was taken as statistically significant. --- # What is the research question? ---- Some study questions may appear "the same" but are very different! -- .pull-left[ .red[Is there a difference in height between male and female residents?] ] .pull-right[ .red[Requires a **two-side** test.] ] -- .pull-left[ .green[Are female residents taller than male residents?] ] .pull-right[ .green[Requires a **one-side** test.] ] -- .pull-left[ .blue[Are male residents taller than female residents?] ] .pull-right[ .blue[Requires a **one-side** test.] ] --- # What are the variables required? ---- - **Dependent** vs. **Independent** variables .pull-left[ ## Categorical - Nominal - Binary - Ordinal - Are some data missing? - Are the groups balanced? - What are the proportions of the outcome and predictors? ] .pull-right[ ##Interval or ratio - Continuous - Discrete - What is the distribution, normal or non-normal? - Are there outliers? ] --- # Methodology .panelset[ .panel[ .panel-name[Descriptive] .pull-left[ ##Scenario - Describe what pertains - **What is the prevalence of hypertension among staff of the Komfo Anokye Teaching Hospital?** ] .pull-right[ ##Descriptive Analysis ###Categorical - Frequency, percentages ###Numeric - Mean(SD) - Median(IQR,Range) - Percentiles ] ] .panel[ .panel-name[Predictive] ## Scenario - What is the relationship between two or more variables - Can one variable predict the other >**E.g.: What is the relationship between age and incidence of arthritis?** ##Method of Analysis .pull-left[ - Agreement - Correlation ] .pull-right[ - Regression - Survival ] ] .panel[ .panel-name[Comparative] ## Scenario - Look for differences between groups > **Is the prevalence of low back pain higher among Surgeons compared to Physicians** ## Analysis .pull-left[ ###Compare averages - T-test, - ANOVA - etc ] .pull-right[ ###Compare proportions - Chi-square test - etc... ] ] ] --- # Descriptive statistics ---- .pull-left[ ## Categorical variable - Frequency tables – univariate - Contingency tables – bivariate - Row percentage - Column percentage - Graphical representations - Bar chart - Pie Chart - Others - Odds & Odds Ratio - Risk & Risk Ratio ] .pull-right[ ## Continuous Variable - Measures of central tendency - Mean - Arithmetic Mean - Geometric mean - Harmonic mean - Median - Mode - Measures of dispersion - Standard deviation - Variance - Interquartile range - Range ] --- #Inferential statistics ---- ## Statistical tests - p-value ##Estimates - Point estimates - Interval estimates - Confidence interval > The 95% confidence interval is the interval that is likely to contain the population parameter 95% of the time. --- # Paramatric vs. Non-parametric test? .panelset[ .panel[.panel-name[Outline] .pull-left[ ##Parametric - Must be interval data - Approximately normally distributed - Approximately normally distributed on transformation - Enough sample size? E.g: T-test, ANOVA, etc ] .pull-right[ ##Non-parametric - Alternative to parametric - Does not require a specific distribution - Can handle low samples sizes - Interpretation can be difficult ] ] .panel[.panel-name[Assumption] - Statistical tests have assumptions - More stringent in parametric tests - These should be evaluated for before using the test - E.g.: For a Student's T-test: 1. Independence of observations 2. Normality of data 3. Homogeneity of Variances 4. Random Sampling methodology 5. Adequacy of sample size ] ] --- # Analysis - Single categorical .panelset[ .panel[.panel-name[Descriptive] .pull-left[ Example: **What is the proportion of workers with diabetes mellitus in my study** ##Presentation - Frequency/Count - Proportion - Percentages ] .pull-right[
<caption class='gt_caption'><strong>Table 1</strong>: Univariate categorical table</caption>
Characteristic
N = 50
Sex, n (%)
Female
26 (52.0)
Male
24 (48.0)
Age Grouping, n (%)
Middle age
17 (34.0)
Elderly
33 (66.0)
] ] .panel[.panel-name[Inferential] .pull-left[ Example: **What is the proportion of workers with diabetes mellitus in Ghana** ##Presenation - Frequency/Count - Proportion - Percentages - Confidence intervals (Binomial) ] .pull-right[
<caption class='gt_caption'><strong>Table 1</strong>: Univariate categorical table</caption>
Characteristic
N = 50
95% CI
1
Sex, n (%)
Female
26 (52.0)
38%, 66%
Male
24 (48.0)
34%, 62%
Age Grouping, n (%)
Middle age
17 (34.0)
22%, 49%
Elderly
33 (66.0)
51%, 78%
1
CI = Confidence Interval
] ] ] --- # Analysis - Two categorical .panelset[ .panel[.panel-name[Descriptive] .pull-left[ Example: **What is the proportion of males who have DM in my study** ##Presenation - Frequency/Count - Proportion - Percentages - Row vs. Column - Contingency table ] .pull-right[
<caption class='gt_caption'>**Bivariate categorical table</caption>
Characteristic
Female
, N = 26
Male
, N = 24
Age Grouping, n (%)
Middle age
9 (34.6)
8 (33.3)
Elderly
17 (65.4)
16 (66.7)
] ] .panel[.panel-name[Inferential] .pull-left[ Example: **Is having diabetes related to sex in Ghana?** ##Presenation - Confidence intervals - p-value from appropriate statistical test - Chi-square test - Fisher's test - Etc ] .pull-right[
<caption class='gt_caption'>Bivariate categorical table</caption>
Characteristic
Female
, N = 26
Male
, N = 24
p-value
1
Age Grouping, n (%)
0.924
Middle age
9 (34.6)
8 (33.3)
Elderly
17 (65.4)
16 (66.7)
Treatment Type, n (%)
0.802
Old Drug
11 (42.3)
11 (45.8)
New Drug
15 (57.7)
13 (54.2)
1
Pearson’s Chi-squared test
] .panel[.panel-name[Effect: OR]
Characteristic
Female
, N = 26
1
Male
, N = 24
1
Odds Ratio
2
95% CI
2,3
p-value
2
Treatment Type
0.87
0.25, 3.06
>0.999
Old Drug
11 / 26 (42%)
11 / 24 (46%)
New Drug
15 / 26 (58%)
13 / 24 (54%)
Age Grouping
1.06
0.28, 4.03
>0.999
Middle age
9 / 26 (35%)
8 / 24 (33%)
Elderly
17 / 26 (65%)
16 / 24 (67%)
1
n / N (%)
2
Fisher’s Exact Test for Count Data
3
CI = Confidence Interval
] ] ] --- # Analysis: Single numeric .panelset[ .panel[.panel-name[Descriptive] .pull-left[ Example: **What is the average hemoglobin of doctors in Kath** ##Presenation - Mean(SD) - Median(IQR), - Median(Range) ] .pull-right[
<caption class='gt_caption'>**Table **: Univariate numeric table</caption>
Characteristic
N = 50
Age in years, Median (Minimum,Maximum)
63 (45,75)
Initial blood pressure (mmHg), Mean (SD)
98.3 (5.2)
Blood Pressure after treatment, Median (IQR)
88.2 (6.9)
] ] .panel[.panel-name[Inferential] .pull-left[ Example: **What is the average Hemoglobin of Ghanaians?** ##Presenation - Mean(SD) - Median(IQR), - Median(Range) - Confidence interval ] .pull-right[
<caption class='gt_caption'><strong>Table 1</strong>: Univariate continuous</caption>
Characteristic
N = 50
95% CI
1
Age in years, Median (Minimum,Maximum)
63 (45,75)
60, 63
Initial blood pressure (mmHg), Mean (SD)
98.3 (5.2)
97, 100
Blood Pressure after treatment, Median (IQR)
88.2 (6.9)
87, 90
1
CI = Confidence Interval
] ] ] --- # Analysis: Two continuous .panelset[ .panel[ .panel-name[Difference] Example: **What is the difference in weight between males and females?** .pull-left[ ## Tests - Unpaired - Parametric: - T-test (Unequal variance) - T-test (Equal variance) - Non-parametric: - Mann-Whitney U test ] .pull-right[ ## Test - Paired - Parametric - Paired T-test - Non-parametric - Wilcoxon-Signed rank test ] ] .panel[ .panel-name[Difference 2]
<caption class='gt_caption'><strong>Table</strong>: Bivariate continuous</caption>
Characteristic
Old Drug
, N = 22
New Drug
, N = 28
Difference
1
95% CI
1,2
p-value
1
Age in years, Mean (SD)
62 (7)
61 (7)
1.1
-2.7, 4.9
0.562
Initial blood pressure (mmHg), Mean (SD)
97.1 (3.6)
99.2 (6.0)
-2.1
-4.9, 0.68
0.135
Blood Pressure after treatment, Mean (SD)
92.2 (3.3)
85.8 (3.3)
6.4
4.5, 8.3
<0.001
1
Welch Two Sample t-test
2
CI = Confidence Interval
] .panel[ .panel-name[Correlation] Example: **Does the serum lactate in children with severe malaria correlate with their hemoglobin concentration?** .pull-left[ ##Parametric - Pearson correlation coefficient - Confidence interval - p-value ] .pull-right[ ## Non-parametric - Spearman correlation coefficient - Kendal's tau - Confidence interval - p-value ] ] .panel[.panel-name[Relationship] Example: **What is the relationship between age and mean blood pressure?** .pull-left[ ##Parametric - Linear regression ] .pull-right[ ##Non-parametric - Non-linear regression - Quantile regression ] ] ] --- # Regression .panelset[ .panel[.panel-name[Confounding] .pull-left[ #Confounder - Extraneous or nuisance pathway that the investigator hopes to rule out - If present, it has to be adjusted for >E.g: Relationship between Birth order, maternal age and Down Syndrome ] .pull-right[ #Effect modifier - The effect differs for the various groups - Simply put! it depends. - If present, the groups will have to be reported separately >E.g: Salt intake, hypertension and age. ] ] .panel[.panel-name[Linear] .pull-left[
<caption class='gt_caption'>Univariate Linear regression</caption>
Characteristic
N
Beta
95% CI
1
p-value
Sex
50
0.684
Female
—
—
Male
0.61
-2.4, 3.6
Age in years
50
0.10
-0.13, 0.33
0.370
1
CI = Confidence Interval
] .pull-right[
<caption class='gt_caption'>Multivariate Linear regression</caption>
Characteristic
Beta
95% CI
1
p-value
Sex
0.759
Female
—
—
Male
0.46
-2.5, 3.5
Age in years
0.10
-0.13, 0.33
0.397
1
CI = Confidence Interval
] ] .panel[.panel-name[Logistic] .pull-left[
<caption class='gt_caption'>Univariate Logistic regression</caption>
Characteristic
N
exp(Beta)
95% CI
1
p-value
Sex
50
0.592
Female
—
—
Male
1.08
0.82, 1.43
Age in years
50
0.99
0.97, 1.01
0.337
1
CI = Confidence Interval
] .pull-right[
<caption class='gt_caption'>Multivariate logistic regression</caption>
Characteristic
OR
1
95% CI
1
p-value
Sex
0.505
Female
—
—
Male
1.47
0.47, 4.71
Age in years
0.95
0.87, 1.04
0.295
1
OR = Odds Ratio, CI = Confidence Interval
] ] .panel[.panel-name[Interaction] .pull-left[
<caption class='gt_caption'>Multivariate Linear regression - Interaction</caption>
Characteristic
Beta
95% CI
1
p-value
Sex
0.330
Female
—
—
Male
14
-15, 43
Age in years
0.22
-0.12, 0.55
0.206
Sex * Age in years
0.343
Male * Age in years
-0.22
-0.69, 0.24
1
CI = Confidence Interval
] .pull-right[
<caption class='gt_caption'>Multivariate logistic regression - interaction</caption>
Characteristic
OR
1
95% CI
1
p-value
Sex
0.815
Female
—
—
Male
0.27
0.00, 21,466
Age in years
0.94
0.81, 1.07
0.347
Sex * Age in years
0.761
Male * Age in years
1.03
0.86, 1.24
1
OR = Odds Ratio, CI = Confidence Interval
] ] .panel[.panel-name[Cox's] <img src="Images/cox-regression.jpg" style="width: 70%" /> ] ] --- class: inverse #Summary of tests: Categorical <img src="Images/categorical-tests.png" style="width: 70%" /> .footnote[https://www.researchgate.net/figure/Common-statistical-tests-to-compare-categorical-data-for-difference_fig1_305213637] --- class: inverse #Summary of tests: Numeric <img src="Images/statistical_tests.jpg" style="width: 70%" /> .footnote[https://www.slideshare.net/ShefaliJain74/overview-of-different-statistical-tests-used-in-epidemiological] --- # Summary ---- - Determinants of appropriate analysis - Common pitfalls in analysis - Descriptive vs Inferential statistics - Parametric vs Non-parametric tests - Analysis and presentation - Continuous variables - Categorical variables - Time to event analysis - Confounding & Interaction --- background-image: url("Images/take_home_message.jpg") background-position: bottom right background-size: 35%, 35% # Take home ... >Good results from a study comes from planning, appropriate sample size, application of the right statistical techniques and appropriate presentation of the data. --- class: inverse middle center <style> .bye{ font-size: 3em; font-weight: bold; /*font-style: italic;*/ color: white; } </style> .bye[ Thank you!!! ]