|
|
Question : Finding a Percentile with array of numbers
|
|
I want to find out what number is the nth Percentile of a group of numbers.
For example:
If I have the numbers (1,2,3,4). What number would be in the 75th percentile?
The answer is 3.25. I can easily do this calculation by using the "Percentile" function in excel. Anyone have any idea to do this with SQL?
|
Answer : Finding a Percentile with array of numbers
|
|
The percentiles are nasty thing to do in SQL...
1) What is percentile 75: Percentile divide ordered data into portions. Percentile 75 is a value from set of data, that 75% of the data values are equal to or less than.
For example percentile 33 from the data (1,2,3,4) is 2.
The algorythm is described here: http://www.staffs.ac.uk/sands/buss/bscal/mandev/m_qm/t_mld/mld_o.htm , but in brief:
For percentile P 1) order the data nad count them (n - number of values) 2) find the position i = (P*n)/100 3) if i is not a whole number, the percentile is the next value in the ordered data (on i+1 position) 4) if i is a whole number, the percentile is calculated from values on i and i+1 position - the calculation, as I know, should be an average from these two values, but it looks like the Microsoft thinks some other way ;-)
Anyway the right way to do this in SQL is following:
A) get the count of the data and store it in a local variable B) calculate the 'i' value from the formula above C) test if the i is a whole number D) do a select over the requiered column ORDERED BY this colum ASCENDING E) declare cursor over this result sets F) in a loop do fetch to get to the i-th (and if the i is a whole number to (i+1)th value) G) if the i is a whole number calculate the percentile from the i and i+1 value H) return the result
As I said - it is a messy thing to do in SQL - It will do a table scan over whole table ;-(
P.S. you can also modify this algorythm to scan the table from the bottom values (for percentiles <= 50) and from the top (percentiles >50)
|
|
|
|
|