false
ar,be,bn,zh-CN,zh-TW,en,fr,de,hi,it,ja,ko,pt,ru,es,sw,vi
Catalog
Didactics
Statistic Epidemiology Fundamentals 3
Statistic Epidemiology Fundamentals 3
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Hi, everyone. I am Tatiana Palacios. I'm a gynecology oncologist from Bogota, Colombia. And we are going to see today a Statistic Epidemiology Fundamentals Treaty. We are going to talk about today about probability sampling and population, normal distribution, measure of central location and measures of variability, hypothesis and a statistic test. We have to know that the probability of an outcome E in a sample space S is a number between p, 0 and 1 that measures like the likelihood that E will occur in a single trial of the corresponding random experiment. The value p equals 0 corresponds to the outcome E being impossible and the value p equals 1 corresponds to outcome B certain. In an ordinary language, probabilities are frequently expressed as a percentage. For example, we would say that there is a 40% chance of rain tomorrow, meaning that the probability of rain is 0.4. When we are talking about probability in epidemiology or in our clinical practice, we have a population. It's a group of patients, for example, with ovarian cancer that we want to study. We want to know how ovarian cancer, for example, is affecting all the women in the world, but we can't evaluate all the whole population, so we have to take a representative sample of the population to conduct our study and to get the results that can be reproduced in whole population. Here we have an example of what this is about. A population is an any specific collection of objects of interest. A sample is any subset of or subcollection of the population. A measurement is the number or attribute computed for each member of the population of our sample, and the measurements of the sample elements are collectively called as simple data. A parameter is a number that summarizes some aspects of the population as a whole, and a statistic is a number computed from the sample data. Here we have the population, the sampling of that population is a sample, and the parameters are the values that we are going to study in our studies. To do an example, we have here a review of Dr. Anna Fagotti that is in the International Journal of Gynecology-Oncology that is about a transvaginal ultrasound-guided biopsy in patients with suspicious primary advanced tubal ovarian carcinoma. In here we can see how the potential eligible people are 278, but the enrolled people are 158. This is the population we are talking about. This is the sample that we get to study, and it's what the data is going to talk about. When we talk about the population, we have to know that we have different kinds of population. This is an example based on the Anna Fagotti study. The first population we are going to talk about is the reference population. These in this study are all the women with preoperative suspicious primary advanced tubal ovarian carcinoma. The target population is this population of women with preoperative suspicious primary advanced tubal ovarian carcinoma, but we're going to add the location. These are women studied in the Policlinico Universitario Agostino Gemelli, and the study population that is the ones we are going to talk about in whole study is the women with the preoperative suspicious primary advanced tubal ovarian carcinoma at the Gynecology-Oncology Unit of the Policlinico Universitario Agostino Gemelli, presenting between a date, July of 2019 to September 2021. When we conduct a research about a group of people, it is rarely possible to collect data from every person in the group. Instead, we select a sample. The sample is the group of individuals who would actually participate in the research. To draw valid conclusions from your results, you have to carefully decide how you will select a sample that is representative of the group as a whole. This is called the sampling method. There are two primary types of sampling methods that you can use in the research. First is the probability sample. Probability sampling involves random selection, allowing you to make a strong statistical inference about the whole group. The other one is non-probability sampling. This one involves non-random selection based on convenience or another criteria, allowing you to easily collect the data. Probability sampling means that every member of the population has a chance of being selected. It is mainly used in quantitative research, and if you want to produce results that are representative of the whole population, probability sampling techniques are the most valid choice. There are four main types for probability sampling. First is the sampling random sampling. In a sampling random sample, every member of the population has an equal chance of being selected. Your sampling frame should include the whole population. To conduct this type of sampling, you can use tools like random numbers, generations, and other techniques that are based entirely on chance. Second is the systematic sampling. It's similar to simple random sampling, but it is usually slightly easier to conduct. Every member of the population is listed with a number, but instead of randomly reiteration numbers, individuals are chosen at a regular interval. Third is the stratified sampling. This stratified sampling involves dividing the population into subgroups population that may differ in important ways. It allows you to draw more precise conclusions by ensuring that every subgroup is properly represented in the sample. To use this sampling method, you divide the population into subgroups. This is called a strat, based on the relevant characteristics, for example, gender identity, age, range, job role. Based on the overall proportions of the population, you calculate how many people should be sampled from each subgroup. Then you use random or systematic sampling to select a sample from each subgroup. And the last one is the cluster sampling. Cluster sampling also involves dividing the population into subgroups, but each subgroup should have similar characteristics to the whole sample. Instead of sampling individuals from each subgroup, you randomly select entire subgroups. It is practically possible to include every individual from each sample cluster. If these clusters themselves are large, you can also sample individuals with each cluster using one of the techniques. Another sampling method is the non-probability sample. In this one, individuals are selected based on non-random criteria, and not every individual has a chance of being included. This type of sample is easier and cheaper to accept, but is a higher risk of sample bias. That means that inferences you can make about the population are weaker, and with probability sampling, your conclusions may be more limited. If you use non-probability sample, you should still aim to make it as representative of the population as possible. Non-probability sampling techniques are often used in exploratory or qualitative research. In this type of research, the aim is not to test an hypothesis about a broad population, but to develop an initial understanding of a small or under-researched population. The first example of non-probability sampling methods is convenience sampling. A convenience sampling simply includes the individuals who happen to be most accessible to the researcher. This is an easy and inexpensive way to gather initial data, but there is no way to tell if the sample is representative of the population, so it can't produce generate generateable results. Convenience samples add a risk for both sampling bias and selection bias. The second one is voluntary response sampling. This is similar to a convenience sample, but a voluntary response sampling is mainly based on ease of access. Instead of the researcher choosing participants and directly contacting them, people volunteer themselves, for example, responding to a public online survey, and these voluntary response samples are always at least somewhat biased, and some people will inherently be more likely to be volunteer than others, leading to self-selection bias. The third one is proportional sampling. This type of sampling, also known as judgment sampling, involves the researcher using their expertise to select a sample that is most useful to propose on the research. This is often used in qualitative researches, where the researcher wants to gain detailed knowledge about a specific phenomenon rather than make a statistical inference, or when the populations are very small and specific. And the last one is the snowball sampling. If a population is hard to access, snowball sampling can be used to recruit participants via other participants. The number of people you have access to snowballs as you can get in contact with more people. The downside here is also representative, as you have no way to know how representative your sample is due to the reliance of the participant recruiting another one. This can lead to sampling bias. Once we have our study population, we have to define the variables that we are going to analyze. This is another chapter, but we have to know that we have quantitative or numerical variables and categorical or qualitative variables. This is very important because they are going to give us the information of what our study is going to talk about. Quantitative variables are divided in continuous variables, as number of children. These variables do not have decimals, and the continuous variables that does have decimals as body mass index, for example. The qualitative variables are divided in ordinal, as the name describing order 1, 2, 3, 4, and nominal as male or female. The quantitative variables with normal distribution are going to be described in mean or standard distribution, and the ones with no normal distribution are going to be reported in median or range. When we analyze quantitative variables, we have to analyze the distribution that they have. That means that if they have a normal distribution that is known as the Bell distribution, in which the mean, the median, and the mode are going to be in the middle. This is important because it depends on the distribution of the data that we chose the measurements of central location. When we have a typical data, it means, for example, with patients with one year and another patient with 80 years, but more of the patients of our study has 40 years, we can talk about mean because the extreme data is going to affect the result. If we do have a normal distribution, we can use or we can choose the mean. Here we can see in the intergraphic, the normal distribution have the median mode and mean in the center. Now we are going to talk about measures of central location. When we describe the data of the study, we must resume the information in a number that represents the whole information. We have different measures. The measure of central location resumes the data in comparison with the central data. The mean is the sum of the all data divided in the number of the total of the data. The median is the data of the middle. And the mode we are going to remember because is the data is most repeated in our cohort. The measures of variability are needed to understand the distribution of the data and are complemented of the measures of central location discussed before. These measures give information about the dispersion of the variability of the data. We have the range, the median deviation, the standard deviation, and the coefficient of variation. In this study we can see here the variables and here we can see that is described in median, for example, age 66 years and the range is between 30 and 90. And this is how we have to interpret the data that we see in the different researches. And in here we are going to talk about relative position of data. The significance of one observed value in a data set strongly depends on how that value compares to the other observed value in a data set. Therefore, we wish to attach to each observed value a number that measures its relative position. When we talk about the position of the data independent of the location of the distribution is called measure of position. This make equal divisions. For example, quartiles divide in five equal parts, quintiles in five equal parts, deciles in ten parts, or percentile in a hundred. In addition to the three quartiles in here in the example, the two extreme values can the minimum called xmin and the maximum called xmax are also useful to describe the entire data set. And here in the graphic the five numbers summarized is used to construct a box plot. When we already have the data collected, organized, we must prove the hypothesis of our study. This is what inference statistic does. When we are going to analyze the data, we have to analyze the hypothesis. A hypothesis is a testable statement that tries to explain relationships and it can be accept or reject through scientific research. We have two hypotheses. We have the null hypothesis denote h0 is the statement about the population parameter that is assumed to be true unless there is convincing evidence of the contrary. And the alternative hypothesis denotes h1 is a statement about the population parameter that is contradictory to the null hypothesis and is accepted as truly only if there is convincing evidence in favor of it. In here we can see the graphic. In a test of hypothesis, we can have any type of error. Type one error is the decision to reject h0 when it is a fact that it's true. And a type two error is it is in fact not true. The number it is in fact not true. The number alpha that is used to determine the rejection ratio is called the level of significance of the test. And it is the probability that the test procedure will result in a type one error. This error is going to tell us when the resource of our investigation is randomly and also determine the magnitude of the result. An example of a error type one is the false positive when the doctor says a man you are pregnant. We know that is a false positive. And the type two error is a false negative when you see a pregnant woman and a doctor said oh ma'am you're not pregnant. This is an example of this type of errors. Now we are going to talk about p-value. The p-value observes significance of a specific test of hypothesis and its probability on the suspicion that H0 is true of obtaining a result at least as contrary to H0 in favor of H1 as the result actually observed with a sample data. The p-value is going to be denoted by the researchers and is going to be predetermined before the study. In here, for example, in Dr. Fagotti's study, when we saw the statistical analysis, we can see that it is determined before the analysis that the margin of error in a type 1 error alpha is 0.05 and to calculate all p-values are considered statistically significant where it is less than 0.05. When we talk about the confidence interval, it is certain that the result is the same characteristic of a study will reproduce in the 95 of the cases. The chance of the reproducibility of the result is 95%. It should not response the unit to have a statistical significance. And now when we have all the data, we are going to talk about the parametric and non-parametric test. They are going to let us to prove the hypothesis with the statistic results. This is given when we saw all of these names, G-squared, Kolobmanos, Mignov, Mann-Whitney, Wilcoxon. This can be a little bit scary but we have to know that this is going to be given by a statistical program and we have to learn how to interpret the result and which test we use when we conduct our research. In here, we have parametric test and non-parametric test and it depends of the sample, one sample or two samples, and there are the studies, the different tests that are in the different samples. For example, in the study, to do it more dynamic, we can see that in the statistical analysis, in this research, the continued variables were decodamators were compiled using G-squared, Fisher-Eckstad and McNemar's test that are here in the graphic. And what we talk about before, the p-value was less than 0.05 to consider a statistical significance. These other sources are given by these statistic programs and you only have to interpret it. And this is what we were going to talk about today. I hope you understand everything and you have to know that this is how to interpret the data and with this, we can transpose all the information we have for our research. I hope you understand and have a nice day.
Video Summary
In this video, Dr. Tatiana Palacios, a gynecology oncologist, discusses the fundamentals of statistical epidemiology. She explains the concept of probability sampling and population, normal distribution, measures of central location and variability, hypothesis, and statistical tests. She emphasizes the importance of selecting a representative sample from a population when conducting a study. Dr. Palacios also discusses different types of probability sampling methods, including random sampling, systematic sampling, stratified sampling, and cluster sampling, as well as non-probability sampling methods like convenience sampling and snowball sampling. She explains the difference between quantitative and qualitative variables and highlights the need to analyze their distributions. She also explores measures of central location, such as mean, median, and mode, and measures of variability, including range, median deviation, standard deviation, and coefficient of variation. Dr. Palacios introduces measures of position, such as quartiles and percentiles, to analyze data independently of its distribution. She discusses the significance of hypothesis testing and explores type I and type II errors. She also explains the concept of p-value and confidence intervals. Finally, Dr. Palacios briefly mentions parametric and non-parametric tests and their application in statistical analysis.
Asset Subtitle
Ana Tatiana Palacios Torres
Keywords
statistical epidemiology
probability sampling
measures of central location
measures of variability
hypothesis testing
p-value
Contact
education@igcs.org
for assistance.
×