Continuous vs. Discrete Data
Six Sigma – iSixSigma › Forums › Old Forums › General › Continuous vs. Discrete Data
 This topic has 21 replies, 14 voices, and was last updated 17 years, 4 months ago by Jonathon L. Andell.

AuthorPosts

July 12, 2004 at 3:59 pm #36131
Michael MillerParticipant@MichaelMiller Include @MichaelMiller in your post and this person will
be notified via email.A debate has been ongoing among MBBs here as the the essential difference(s) between discrete and continuous data. We are certainly aware of popular descriptions such as discrete being countable and indivisible vs. continuous being measurable. However, can this be applied when the metric is a rate? Suppose you have a discrete countable activity which can be counted as having occurred so many times within a certain timeframe. Is a timebased discrete metric such as Xs/hr discrete or continuous and why?
Also, discrete data are countable in that we are counting items which have a particular attribute. A reasonable definition of attribute is a constraint whereby objects or individuals can be distinguished. For something to be in the classification of having a particular attribute “X”, is it not true that it must be distinguishable from something that has the attribute “Not X?”
I look forward to responses; thanks.
Mike
0July 12, 2004 at 10:55 pm #103307
John H.Participant@JohnH. Include @JohnH. in your post and this person will
be notified via email.Michael
If c is a category, Rate because it involves the time variable is considered to be the instantaneous(Calculus) rate of change R=Dc/ Dt in most mathematical models(ex: Poisson Distributed) thereby making it continous(ex:Reliability Engineering, Chemical Kinetics etc..)
I hope this helps
John H.0July 12, 2004 at 10:59 pm #103308It is usually easy to distinguish until you get to the ratio that you mention. While you can sometimes “fake it” and assume continuous, the criteria I use is to look at the underlying characteristic you are actually measuring. If the numerator and denominator are both continuous then I treat the ratio as continuous. If the characteristic you are measuring is discrete (counts) I treat the ratio as discrete even though the denominator might be time such as your example. Then again, from a practical sense, what are you trying to do, what tool are you planning on using and does it really really matter which you assume? Common sense might dictate.
0July 13, 2004 at 1:47 am #103312
SigmordialMember@Sigmordial Include @Sigmordial in your post and this person will
be notified via email.John H started out nicely, but went awry with his conclusion: the Poisson is a discrete distribution, not a continuous distribution. Michael’s post (a discrete countable activity which can be counted as having occurred so many times within a certain timeframe) does suggest a Poisson distribution. I should note that there is an asymptotic relationship between the Poisson (discrete) and the Normal (continuous) distributions.
If the prespecified number of occurrences is of interest, then Michael may want to consider the Negative Binomial distribution.0July 13, 2004 at 3:48 am #103325
GabrielParticipant@Gabriel Include @Gabriel in your post and this person will
be notified via email.“Discrete” means “not continuous”. And “Continous” means that you always can find a possible value between any two values.
“Countable in that we are counting items which have a particular attribute” is not a good definition of “discrete”. Even an items count is allways discrete, disctete is not necesarily an items count.
In fact, the data is ALWAYS DISCRTETE, even when the characteristic you are measuring is continous. That is because there is a difference between the data and the characteristic itself, and that the data is “truncated” due to the lack of infinite resolution of your measuring and data recording systems.
For example. say that the characteristic is a diameter. It is clearly a continous characteristic since between any two diameters you can always find another possible value for a diameter (Laxman, dont jump in saying that between any two diameters there is only a limmited number of possible diamters since the atoms of material are of finite size).
The question is: How will you measure the diameter? With a digital caliper to the 0.01mm? Then the data (the record of the measurment) it is discrete, becuase you don’t have any possible value for a diameter betwen two consecutive values of the measuring scale like 10.12mm and 10.13mm.
Am I splitting hairs? Maybe. If all the data is distributed among say 10.12, 10.13 and 10.14, then the resolution of the data is the same than wat you would get using a gonogo gage at 10.12510.135. Would you say that the outcome data of checking with a gonogo gage can be continous? No.
On the other extreme, if de data is distributed in a range from 10mm to 15mm then saying that the data is discrete would be nearly like saying that the diameter is is discrete itself due to the finite size of the atoms. In this case, you still have no possible data between 10.12 and 10.13 so, what’s the difference? That you now have the data distributed in 50 classes, not in 3. So the data can be safely taken as continous.
We’ve been discussing a case where the characteristic itself was continous. What if what I am measuring is intrisicly discrte? For example I am counting occurrences in a time frame. Let’s be more specific and let’s say that they are customer complaints per month.
It is the same case. The data will be always discrtete. But if it has enough resolution I can take it as continous. If I always have 0, 1 or 2 complaints in a month then I cannot use it as continous (note that the same is true if I always had 999, 1000 or 1001, but it is very unlikely that with such a number of average complaints per month the variation will be so small to keep the number within such a small range). Now, if I always have something between 50 and 200 complaints per month, then the “continuity” will be a very good model.
One final remark: Note that making the number a ratio will change neither its nature nor the suitability of the continuty approach.
To say an example, imagine that I’m counting defecives every day, but to be able to make comparisons I divide the number of defective units by the number of units produced. The rate of defectives ranges btween 0.1 to 0.2%, while the production ranges from 1000 to 2000. In this case, it is seen that the number of defects ranges from 1 to 4. So never minds that it is a rate. There are too few possible values to consider approach this as a “continous” data. Now, if the defects ranged from 5 to 10% if productions from 10,000 to 20,000, it is perfectly suitable to consider it continous (500 to 2000 defectives, 1500 possible values). As a rule, the denominator is the one that defines whether the data (wich, again, is allways DISCRETE) can be treated as continous or not. A rule of thumb is 4 or less no, 10 or more yes, in the middle the “continuity” model is marginal (but I would give it a try and see if I find some problems, probably I won’t).
And to your last question, yes, if you count red and blue then you must classify each item either as red or blue (defective / not defective, conforming / not conforming…). Note however that there is a difference between “classifying them” and “classifying them well”. But this is the scope of Measurement System Analysis. And it hlods true for contnous characteristics too. You can classify parts by its diameter, but doing it and doing it well are two different things.0July 13, 2004 at 11:25 am #103340
Michael MillerParticipant@MichaelMiller Include @MichaelMiller in your post and this person will
be notified via email.Thanks for all the responses; I will digest and see how these hit the mark.
mpm0July 13, 2004 at 12:42 pm #103347
Hmnnn…Participant@Hmnnn... Include @Hmnnn... in your post and this person will
be notified via email.Is this the same as what drives the different methods/calculations for control charting of variable (measured) vs. attribute (counts) data, eg shouldnt attribute p, np, c or u c control charts be used for daily yield/defect rates (counts) rather than variable spc control charts?
0July 14, 2004 at 12:56 am #103387
John H.Participant@JohnH. Include @JohnH. in your post and this person will
be notified via email.Sigmordial
Re: Your Comments on The Poisson as a Discrete Distribution
Wrong! not always in its application as a Mathematical Model
If a cumulative distribution function is continous everywhere and possesses a continous derivative (except maybe at a certain number of interval finite points) then the stochastic variable and distribution is continous(Statistical Theory with Engineering ApplicationsHald)
Example: If the Probability that there will be an Equipment Failure in a time interval T is assumed to be KT(K a Constant) and the events independent, then the Probability of no failures within N time intervals translates to
P(T)= (1KT)^N = EXP(K T ) as N approaches Infinity.
A similiar model would apply with respect to density fluctuations in a gas with the time interval being replaced by a volume interval
John H.0July 14, 2004 at 3:41 am #103395
SigmordialMember@Sigmordial Include @Sigmordial in your post and this person will
be notified via email.Hi John,
Michaels situation is suggesting the number of events in an interval of time. Michael has described this a discrete countable activity which can be counted as having occurred so many times within a certain timeframe. This is a tailormade scenario for the Poisson distribution, which is a discrete distribution. I did mention that under certain conditions, the Poisson (as the rate gets large) approaches the Normal distribution.
Now, your quote: If a cumulative distribution function is continuous everywhere and possesses a continuous derivative (except maybe at a certain number of interval finite points) then the stochastic variable and distribution is continuous (Statistical Theory with Engineering ApplicationsHald)
The cdf for the Poisson is not continuous everywhere. The cdf is P(X <= x) we are placing these individual probabilities at discrete masses (x = 0, 1, ). As a recommendation, be wary of tossing definitive quotes that have qualifiers such as except maybe Plus, maybe Hald was not referring to the Poisson distribution. By no means is this a dig on you stochastic modeling, though exciting, can get a tad bit challenging.
One last comment: if Michael was interested in the time between events, then we are definitely dealing with a continuous distribution (Exponential). Your example is closer to this scenario reliability.0July 15, 2004 at 3:52 am #103453
John H.Participant@JohnH. Include @JohnH. in your post and this person will
be notified via email.Hi Sigmordial
Re: Your Comments
The Exponential Function illustrated in my example is a special case of a Poisson distributed process involving Pn(T)=(EXP(KT) (KT)^n)/n! n=0,1,2… subject to the Probability contraints that Po(0)=1(initial condition)and Pn(0)=0 for n>=1 thus generating Po(T)=EXP(KT) which has the familiar applications in Reliability Engineering, Physical Chemistry and Nuclear Decay processes. i.e , in this Model, the Probability is assumed to be a function of a time interval of length T. hence the representation by continous curves. I hope that this also clarifies my original Post with regards to rate. As regards Hald’s statement he did not include the Poisson Distribution as an exception to his statement.
I apologize for the abbreviated responses but I hate typing and usually am not “long winded” .
John H.0July 15, 2004 at 11:45 am #103465
SigmordialMember@Sigmordial Include @Sigmordial in your post and this person will
be notified via email.Hi John H,
No worries on the abbreviated response. Looks like we were in agreement.0July 22, 2004 at 6:32 am #104032
arihalosParticipant@arihalos Include @arihalos in your post and this person will
be notified via email.I’d say that in this case, the most appropriate chart to use is a c chart. And we know that a c chart is an attribute chart. I can’t imagine how we can use the X mR and other variable control charts in this case.
0July 22, 2004 at 8:44 am #104042Reda Don Wheeler’s nooks on SPC for using XmR for rates for info
GG0July 22, 2004 at 12:27 pm #104058
GabrielParticipant@Gabriel Include @Gabriel in your post and this person will
be notified via email.Simple: Each rate is an X and the differecne between two consecutive rates is a mR.
0July 22, 2004 at 12:51 pm #104061Just had to destroy a nice complicated purely theoretical thread with a straightforward, practical example, didn’t you?
0July 22, 2004 at 1:59 pm #104065Don,
I suspect you were kidding Gabriel, but Gabriel nicely addressed the practical core issue.
That’s the difference between engineering and science. Engineering being the application of science to the practical solution of problems. And what I believe a lot of people, including many SS practitioners don’t get, is that SS is intended as engineered improvement.
It is fun to debate theoreticals. Unfortunately we sometimes mystify or turn off people by debating theoreticals when what they need is a practical solution and understanding.
Gabriel, good explanation. Mark0July 22, 2004 at 3:01 pm #104071
GabrielParticipant@Gabriel Include @Gabriel in your post and this person will
be notified via email.Don,
Sorry for spoiling it. I can’t avoid to jump in when practitioners talk about things such as normality, continuity, stability etc… as if those things actually existed in real life problems. Not that I don’t like theory. In fact, I LOVE theory, but it is important to understand how it applies to real problems. In fact, my post has a lot of theory, only that explained with examples instead of with theorems. The theory is: sometimes you can use a discrete variable as if it was continous and sometimes you cannot use a continous variable as a continous one and you have to use it as if it was discrete.
As someone said (now I can’t recall who): “No model is right. But some do work”.0July 22, 2004 at 3:44 pm #104080I believe you paraphrased George Box.
0July 22, 2004 at 4:47 pm #104082All models are wrong; some models are useful. – George Box
In my opinion, the Official Quote for this site – for Six Sigma – should be:
What we have to learn to do, we learn by doing.
Aristotle0July 22, 2004 at 5:03 pm #104083Or
“One has to be extradordinarily lucky, in our society, to meet one nymphomaniac in a lifetime.”
Alex Comfort in “Darwin and the Naked Lady..”0July 22, 2004 at 6:36 pm #104087Amen.
0July 23, 2004 at 8:52 pm #104190
Jonathon L. AndellParticipant@JonathonL.Andell Include @JonathonL.Andell in your post and this person will
be notified via email.In the strictest sense one should use nonparametric statistics for discrete events. Realistically, however, we can regard the countable events as approximating continuous if 1) there are fairly high counts, like >100 per subgroup, and 2) there are many “shades of gray” among different subgroups, like at least 1020 individual counts.
Thus, we readily could approximate continuous data if we counted calls to a phone center ranging anywhere between 100 and 150 calls per day. If the number was more like 30 to 40, per day, the approximation would be more tenuous.
Also, if your events are quite rare, like worktime accidents: consider tracking personhours per accident as a quasicontinuous variable.
Hope this helps.0 
AuthorPosts
The forum ‘General’ is closed to new topics and replies.