false
ar,be,bn,zh-CN,zh-TW,en,fr,de,hi,it,ja,ko,pt,ru,es,sw,vi
Catalog
Didactics
Sample Size
Sample Size
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Okay, so hello everybody. We are going to continue with the Statistics and Epidemiology Fundamental course. My name is Angelica Fletcher. I'm a gynecologist, oncologist from the Colombian National Cancer Institute, and I'm also a clinical epidemiologist. So today we're going to talk about a little bit of a hard subject. This is sample size. The objective of giving this subject is to give you insights on two main things, right? And the first one is to, in order when you read studies, to be very critic and to analyze if that study is well done regarding sample size, because if the sample size is met and it's well done, then you can trust this study. It's one of the important issues of being critical and reading the literature. And the other thing is, if you're participating in an investigation, is to know what is your role as a clinician in better mining this sample size. Why are you very important? And we're going to pin this out. So, okay, let's start. So the sample. The aim of the sample is to study what happens in a subset of population data in order to attempt to estimate what really happens in the whole population. So you have a whole population, you know, whatever this is, cervical cancer in the whole woman in the world, and you're going to make a sampling, right? There are many sampling methods. We're not going to talk about them. This depends on the types of study, your time, your resources, but the important thing is you're going to get some individuals of that population, and that is going to be your sample. And from studying this sample, then you're going to have some data and you're going to be able to make inference of what happens in this disease and whatever you're studying for the whole population. That's what the objective of this sample is. So why should you take a sample and not like study the whole population? Well, of course, there are three main issues. One is logistical issues. If you want to study the whole population, all this is going to take a very demanding process, too many people to study, right? And it's going to be very hard administrative issues. So all of those logistical issues are going to also to convey with time issues. The more subjects you study, the more time you are going to take, right, to study this population. And you can never have like the whole population study because if you want to study the whole population, by the time you get all the subjects studied, too many years have passed and your results just won't mean anything. And of course, if you have a lot of patients to study, this will come with the costs, of course, that it's just connected to the time of the study and to this logistical issues. So that's why, of course, you have to have a sample. And that's why every study or at least analytical studies have to have calculated sample size. So in order to understand sample size, you can maybe imagine that you should have the largest population you can get, right? Should it be as large as possible according to the population? Well, this is not really the case. And we are going to try to understand why the sample size is really independent of the population size from which the sample is obtained. So we have this example, this very simple example. This is an example of a cook. And perhaps you have imagine two restaurants, right? You have one restaurant where that is really small and you're going to give, I don't know, per meal, you're going to to give dinner for 10 people. And you have a larger restaurants where you serve for, I don't know, 100 people, right? So how does if you're going to serve a soup, for example, how does a cook makes the soup? He just takes the same ingredients, right? Just the same proportion of ingredients, more ingredients, of course, for the bigger restaurant. And he just makes it right in a pot, a bigger pot for the bigger restaurant. And he just makes them very well. He makes very well the ingredients. And he's going to try the soup in both restaurants to see if the soup tastes well, right? So how does he does this? He just takes one spoon and he takes just one tryout. He doesn't take 10 spoons if he's going to serve for 10 diners. And he doesn't take a tryout of 100 spoons in order to try for 100 diners that he's going to give the soup to, right? He just takes one in both restaurants. And that is the same principle for sample size. You just stir very well, right? You mix very well the ingredients. He just takes one sample. And with the sample, you are going to know if the soup in this example is good, right? If it tastes delicious or whatever. So this is the same principle for sample size. You just have to take a very good sample, right? And that would be that you make the sampling very well, right? And this is what the absolute size means. The spoon you take for try the soup is going to be the absolute size. And the sample size, then you have to base it on this absolute sample size, not on the relative size. The relative size would be then the ratio between this spoon and the size of the pot. And that's not what happens, right? So if taken correctly, an absolute and relative small size can correctly infer the findings of the population. So over here is another example for you to understand this concept. So this is a table where it's illustrated like in a survey, the proportion of voters supporting, I don't know, candidate x politician, right? And over here on the y-axis, you have the sample size. So you have like the percentage of voters supporting this x candidate, and this numbers you see on the table are going to be the error of the estimate. How much error do you have in the survey? So you see that as long as your sample size increases, this error is going to diminish. For example, if you step over here, if you have, I don't know, the 50% of this population is supporting this candidate. And you see that when you interview 100 people, you have a 5% error. This error is going to diminish to 2.2 if you survey 500 people instead. So as long as your sample size gets bigger, I'm sorry, then your error is going to diminish. But if you see the table, when this error gets smaller and smaller, there's going to be a point when the difference is going to be really, really small. And maybe it's just not worth it to have such a big sample just in order to make a difference on 0.1 or 0.2 in this error. So for instance, if you have 1,000 voters, you are going to see that there is an error of 1.6%. But if you have 2,000 voters, this error is going to be 1.1. So is it worth it to survey an additional 1,000 people? This takes logistical issues, time, and costs, as we talked about, for only a 0.4% difference. Well, you have to balance that, and you have to make the analysis if that's worth it for your study. So the sample size of the surveys will be determined by the level of confidence desire, the degree of precision desire, and the nature of the population. So the fact that the population size does not play a role in this process is another proof that in surveys, the concern should be the absolute sample size and not the population size, just as we talked about in the later example of the cook. So if the sample size is correct, the number of subjects should be adequate to meet the study objectives. And this is going to decrease the standard error, and it's going to adjust the confidence interval. You're going to get a minor interval, right? And this is going to make that your point is more precise, right? So if you have a small sample size, then maybe your study won't respond to the objectives, and maybe it will be inaccurate. But on the other hand, if you have a very big sample size, well, you can waste resources. So that's when I say you just have to balance these two things. And there are other practical considerations you should take into account, right? So we're going to talk about now the determining factors on the sample size. So there are eight determining factors. The first one is the hypothesis, right? So for every analytical study, you are going to have a hypothesis. This doesn't happen for case series studies or for descriptive studies, right? You don't have a hypothesis there. But the hypothesis is going to be important, and it's going to be important if you are making a study, a two-tail study or one-tail study. Maybe you're going to see this in other modules. The thing is that you have to understand that a two-tail study is a study where you don't know where the effect is going to go, if the drug is going to perform better or worse, right? And a one-tail study is a study where you do know. And it's always like when we're talking about non-inferiority trials when you go out one-tail study. And why is this important? Because this is going to impact the CEDA score that is just some data for the scores when you're analyzing normal distribution. That is one of the most common distributions where you make your statistics. And further on, when I show you this formula, you're going to see that this is one of the things we take into account. And this is also important because this is going to be affected also by the alpha and the beta error, right? So type 1 and type 2 error, that is alpha and beta error, are going to also be determining factors. So over here, you have this table. I'm sure you've maybe seen it, where you are going to cross what happens in the reality versus your study findings, right? So you're going to know what are true positives, true negatives, and when your study just doesn't reflect the reality, right? So false positives or false negative results that are going to turn out to be the alpha error, right? When you find a difference on your study that doesn't really exist or a beta error, when you say that there is no difference, but in reality, there is, right? And these types of errors, well, they almost always have already set this data. You almost always set the alpha error on 0.05. But this could change, right? You can select to be more precise. For example, you can say, no, I want my alpha error to be 0.01, right? And if you do that, you have to take into account this is going to magnify your sample size. And on the other hand, like the same goes for the beta error, it's almost always 0.2. But you can always say you want a bigger or a lower beta error. And this is also going to impact like your sample size. Another determinant that goes in the hand with this alpha and beta error are also, it's also the study power, right? So the statistical power. And what is the power? The power is the probability of detecting an effect if it actually exists. It's desirable to be in between 90 and 80 percent. Almost all studies set on 80. 90 is a bit ambitious, but you can see very well done studies with this kind of power. So, okay, those are, those are minor factors. The fifth one is going to be variability. The variability depends on the variable of interest you have, and that's going to depend if it's a continuous variable or a categorical variable. And this data, you're going to take it out of the literature, or maybe if it's something very new you're studying, you might have to make a pilot studies or estimate from experts, then that is less recommended. The follow-up losses is another determining factor. So the follow-up losses is very important. If you've done studies or always when you read them, you know, that when you have set your study population on the way, you might lose subjects, right? So maybe some people decided to go into trial and then they say, I don't want to go into trial, right? Or you just lost follow-up, you just couldn't contact them, right? So all those things are losses and you have to take them into account because the sample size that is going to be the final one, right? After you made those losses, it's the sample size that has to be the one that gives you the right statistics, right? So you have to take this into account. So you always say, okay, for me to determine if the things I'm studying, right, it's good. You know, if I can make, see the difference in disease-free survival for this study, if I'm making a surgical practice versus a medical treatment, right? You have to say, okay, I need, I don't know, 100 subjects. So you have to go 10% or 20% more so that if in the way you lose those people at the end, you have those 100 subjects and then your results are really going to reflect their reality. Because if you don't make it like that, then your power is going to diminish and just your results won't be worth it, okay? They're just not going to reflect reality. So this is very, very important. And, okay, so we also have the type of study. The type of study, it's also going to be determinant. It's not the same to make a clinical trial or to make a case study, for example, or also cohort study. For example, in cohort studies, your sample size is going to depend on the rate of the outcome rather than on the prevalence of exposure. And that's happens on the contrary on the case control studies. So as the rate of the outcome is always smaller, this is going to take up very, a much greater sample size. So you're going to see that in cohort study, the sample size is almost every time bigger than for case control studies. So that's why it's also a determinant. And last but not least, the relevance of the effect size and the statistical significance. So this is really the most important determinant of the sample size. So the magnitude of the difference in the effect to be detected between the evaluates group. That is what you want to see, right? So you have to pin out what is this difference you want to detect? For example, okay, so I have two interventions in my study, and I want to see the disease pre-survival rate, right? What is the difference? With this intervention, I have X disease pre-survival, and with this intervention, I have Y disease pre-survival. That difference in magnitude that you want to see is the most important factor, and you're going to see, like, I'm going to give you some examples of this, why it's so detrimental. And this is what the statistics are going to ask you as a clinician, right? So you're going to say, okay, so we have ovarian cancer, and you're going to give this drug, and you have to determine what is the difference that is going to be an impact for you, okay? Is it going to be relevant for you that this drug gives this disease pre-survival difference of one week? Is that important for you? Or maybe one month, or if it's one year, right? That difference is the one that you have to determine. The statistical significance, as I said, it's almost always per convention, like, 0.05, right? That's the P, right? But you have to have something very clear, and it's that if the clinical relevance is it's going to be equal to the statistical relevance, right? Because they're not always the same. For example, if you are having a study on hypertension, for example, right? So maybe you are examining this drug, and with drug A versus drug B, you have two millimeters of mercury difference, right, in the control of your hypertension patients, right? So maybe that two millimeters of mercury is going to be statistically significant, but is it clinically significant? Does it have any relevance in the clinic to give you two millimeters of mercury of difference? Maybe not, right? So you have to determine what is the clinical difference rather than the statistical relevance, okay? And how do you get this data, right? How do you say, what is the difference in magnitude? Well, you use it, again, from the literature, right, from previous studies, or maybe, again, you will have to make a pilot study if you don't have data on this, and less recommended, but also that can be done at expert estimates. So those are the data manufacturers. Over here, we have not for you to get scared or anything, but this is like some examples of the formulas you use to determine the sample size, and it's just for you to see that the determinants we talked about are the ones you are going to replace in the formula, right? And the formulas are going to change if you're comparing two proportions or two means, and this is not what you have to do. Almost always, you're going to have an epidemiologist or, I don't know, a statistics person, a personal for you to rely on, right, to do the math, the math statistics, but what you do have to know is what are the determinants, and especially what is that clinical data that you have to fill in order to get the sample size. Of course, you're not going to do this formless, but that's why, like, the technology is very convenient because you don't have to make this formless. You just can make your programs make those calculations for you. So here, I brought you some aids. This is some programs that you can download, right, either on your smartphone or on your PC in order to calculate the sample size, right? You can also do it, like, on the statistical programs, IBM, right, Stata, but I brought you three examples. Here, we have EpiInfo. This is a CDC calculator. We have OpenMP. It's also an epidemiologic resort. This is public, and it goes in a lot of languages, so you can download it, and we have also EpiDat. This is specifically for Latin American people because it's in Spanish, but there are really simple tools you can use to calculate the sample size, so I'm going to show you, like, some examples, so with some studies, like, for instance, we have over here the UTERUS-11 trial. You may know this, of course, so this trial was going to compare, this is a clinical trial, comparing surgical versus clinical staging prior to chemoradiation in patients with locally advanced cancer, right? So when you read this trial, and you always have to go to methods, right? So they explain to you how they make the sample size, right? So here, they say, okay, we are assuming the disease-free survival of 54% for the experimental arm, that was the staging group, and 36.5% for the control arm, that was the clinical staging group, right? So where did they take that data from? They always tell you where, so they tell you from the literature, and with this data, they calculated 250 patients, including the dropouts that we were talking about, and with this, you should expect 129 events and a power at 80%, so when you have the data, you can go to these programs I show you, for example, over here, we have Epi Info, and we're going to make, like, a simulation of this calculation, so you go in Epi Info, you click on stat calc over here, and he's going to ask you, what do you want to do? So you go ahead, and in sample size, you're going to select, depending on the study, for this, you are going to select cohort studies, because it's the same thing you use for clinical trials, and he is going to give you, now, the stat calculator, so what do you have to do? It's very simple, you just have to replace the data, so you're going to put the data over here, so they told you an 80% power, you put it, the ratio on exposed to exposed one, the percentage outcome of the unexposed group, that would be the clinical stage group, right, they said a disease-free survival of 36.5%, and the outcome, the percentage of outcome in the exposed group, that would be 54%, that would be the surgical arm, right, the other data, the risk ratio and the odds ratio, they calculated for you, and over here, like, you have the sample size, they are going to calculate, so if you see, it's very similar, it's going to change upon the method you use, if you use Kelsey method, place, okay, that's not important, but you can see that it's rather similar, right, so this is very simple, you just have to replace data, right, over here, we have another example, this is another platform, this is OpenEPI, it's also very simple, you just go over here on the menu on your left side, you just look for sample size, cohort or randomized control trial, and again, you just fill in the data, the same thing I did, so you just click start, you put enter in the data, and you just fill the data, right, and he's going to give you, again, over here, the sample size, depending on the method used, 256, 254, 276, it's very similar, so, okay, with that data, you see how you calculate, but is this the important thing, okay, well, when you're reading a study, right, and if you're reading critically, what do you have to see, okay, was that sample size, really, the final sample size, as we were talking about, so if you keep on reading the article, you are going to notice that, no, they just fulfilled 240 patients, and the events were only 102 events, so what happened with this, well, you couldn't get the 80% power, they just got 70% power, so that is something you have to analyze about the sample size, right, and this is a great example on how sample size can be affected if you do not select very well those that are minding difference, and I would advise you, in order to understand this a bit more, to read this editorial from Michael Cromavitz, and to understand what happened over here in the study, that it's really important, and what happened is that they assumed a very ambitious result on the difference on the two arms, right, they assume a 17.5% difference, and that's huge, right, so if they'd be more accurate, they'd have more accurate estimates of the disease-free survival of the clinical stage patient, and are more modest prediction of how much surgical staging would improve survival, then perhaps they would have to have a bigger sample size, right, but perhaps their results would have had the right power, and maybe this would be a positive study, and it was negative, so that is why it's really important, and every time you read a study, you should make this exercise right, and you should think, okay, did they select it well, like the data from the literature, or is it good to select this difference in pre-survival, or the recurrence rate, or whatever, okay, so this is a great example. Over here, we have another example, the lab trial, I know everybody knows this, minimal invasive surgery compared to open surgery in radical hysterectomy for surgical cancer, so we can make this other exercise with the data they give us, so they just tell us that the sample size was based on the expected disease-free survival of 90% in the open surgery group at 4.5 years, right, this is a non-inferiority trial, so the margin, they said, okay, we will assume that there is no difference, and that it's not inferior, that the minimal invasive is not inferior to open surgery, if there is a margin of difference of 7.2, right, so why, because the literature tells you that between 6 to 8% difference, it's clinically acceptable, so they calculated that they needed 740 patients, right, and this would give a very good power, 87% of power, so we do the same things. Over here, I have an example of PIMFO, and I wanted you to see two different settings, so on the left side, you have the setting you would have on your smartphone if you download it, and on the right side, it's the vision you would have on your PC, but it's really the same thing, so in the left side, you just replace the data, right, so you put the percentage of outcome of the unexposed, like you set it at 90%, and the exposed group, right, at 82.8%, that would be like with the difference of 7.2, and you just give to calculate the sample, and I want to pin out that here, I set it with 80% of the power, okay. On this right vision, you can see that it's the same thing, I just inverted the data, so I didn't talk in terms of the disease-free survival, I talked on the rate of recurrence, but it's just the same, so you would say the percentage outcome of the unexposed group, right, it would be 10, 10% recurrence, right, and on the exposed groups, 17.2%, but it's the same thing, and you are going to have over here again, like a very similar estimate of the actual sample size, the remembering it's 740, it's similar, and over here, I have this OpenAPI platform, it's the same thing, but I want to show you something over here, so this is for default in a two-side confidence level of alcohol, so this platform gives you just the default of two tails, right, but remember, this is a non-inferiority trial, so it goes one tail, so because of this, you can see that if I replace the data with the same like data, and over here, I did use 87% as the power, you're going to see that the sample size is much bigger, but I believe this is the thing, so you have to analyze, maybe there are going to be a little bit of changes, because we have over here two-sided, right, but anyhow, like it's for you just to understand how this works, and the important thing is not that you just go and replace, because in the computer, you just give them data, and he will do any type of calculation, right, but you have to make sure that you're entering the data right, right, you have to make the really good entering of the data, so okay, so to print out this like trial at the end, remember, this trial had to end earlier, their, their enrollment, so they didn't, they didn't have the 740 patients, they just had 631 patients, and this was because they had to, they had to finish the study earlier, because they were, there were safety issues, and the committee, the monitoring committee said this had to stop, because the rate of recurrence and death in minimally invasive surgery were just too big, right, but even though that happened, right, you have to keep on reading, and you will just see that the power was still over 80 percent, it was 84 percent, so you just can say, okay, this study works, it meets like the determinants that the sample size have to have, so it's a good study, right, and actually, you know, it's a practice-changing study, so that is how you analyze like this kind of study, and it's the same thing you would do if you are going to make your own study, right, that this is how you have to think about it in order to get the sample size, so given the data to the computer, it's like simple, right, but what is the most important thing? The most important thing is that you, as a clinician, define the difference as we saw. Remember how in uterus 11, if you didn't define well that difference, and you were very ambitious, that might cross your results, right, and maybe you just, your study can't give good results because of this, so you are the important part of the investigations when the statistical people are going to ask you, okay, doctor, give me what is the important difference I have to see in the study? What is the point or the difference I want to detect? That is what your job is, to detect the good data over here in order to see that your study really works and that your findings are going to be valuable, right, so that is the most, most, most important thing you have to take into account, so this was about all I wanted to share with you. I hope it's a little bit more understanding and just, I want you to download these programs and to practice on it because if you practice with this, you are going to see that this is going to be every time easier and easier and easier. Thank you very much for your attention.
Video Summary
In this video, Dr. Angelica Fletcher, a gynecologist and oncologist from the Colombian National Cancer Institute, discusses the importance of sample size in research studies. She highlights two main objectives: to be critical and analyze the quality of studies based on sample size, and to understand the role of clinicians in determining sample size. Dr. Fletcher explains that a sample is a subset of a population used to estimate what happens in the entire population. She emphasizes that studying the entire population is not feasible due to logistical and time constraints. Instead, a well-calculated sample size can provide accurate insights. Dr. Fletcher clarifies that sample size is independent of the population size and should be based on the desired absolute sample size, not the relative size. A small sample size can still accurately reflect the population if selected correctly. Dr. Fletcher discusses determinants of sample size, including hypothesis, type 1 and type 2 errors, statistical power, variability, follow-up losses, study type, and the relevance of effect size and statistical significance. She stresses the importance of understanding the clinical relevance of the effect being studied. Dr. Fletcher provides examples of how to calculate sample size using different platforms and discusses the importance of accurately defining the difference to be detected in a study. She concludes by encouraging clinicians to practice sample size calculations using available tools and to be actively involved in determining the important difference to detect in their research studies. The video was produced by Dr. Angelica Fletcher.
Asset Subtitle
Angelica Viviana Fletcher Prieto
Keywords
sample size
research studies
clinicians
population
absolute sample size
statistical power
clinical relevance
Contact
education@igcs.org
for assistance.
×