About Ranking

more help

A column's value is ranked according to how far it's value lies from it's best. For any given parameter, say Price/Earnings in column C, the companies are sorted so that column C appears in numerically ascending order. The rank for any company is then the number of the row where that company name appears for this parameter. The same principle is used to determine the column rank for Debt/Equity, and Price/Book.

Similarly the Return on Equity in Column D is sorted in descending order, and each company's rank for this parameter is the rows number where that company name appears. The same holds true for the other parameters for which larger values are better namely:

For both types descending, and ascending parameters there is the problem of missing data. Any row,column for which the data is blank (displayed as N/A in the source of the data).

Two solutions are typically possible. First one could just ignore that data. Clearly if one is trying to find correlations between 2 parameters it makes little sense to compare NA to NA, or NA to some piece of data that is available. However for purposes of Rank it seems more valid to interpolate the missing data into the set of valid data. To do this NA's are given their worst possible rank. In other words if there were 200 companies in a sector, and 23 have a blank value for P/E then these companies are all assigned a rank of max = 177 + ( 200-177 )/2 for the Price/Earnings column rank.

Finally a composite equal weighted so called H-Rank for all the parameters in a given row is created by summing all the individual Column Ranks for that row. Each of the 4 Value and 2 Growth parameters are not quite of equal rank because of the tie conditions created by N/A values (as in the example above). However it is normalized by dividing by the sample size ( max as defined above ) of the collection before summation. So the summation is a number between an idealized 0 (there are no perfect companies), and 6 (also beyond believe that a company that bad is still in business). To further normalize it is divided by 6, and multiplied by 100. That idealized rank is now a value between 1, and 100. This is the so called H-Rank an average of the individual parameter Ranks. A second Ranking method called C-Rank treats each of those 6 normalized rank's [0-1] as a vector along a 30 degree arc of circle. It is the area created by these 6 vectors (forming a polygon) that is now the C-Rank making a number between 0, and pi. That too is typically displayed as an integer in the range [0,100].

The implicit assumption is that the 6 parameters which we are tracking for any company are equally important. This is clearly not so. For example financial companies (banks) often do not calculate Debt/Equity yet insurance companies which are also in the financial Sector do. Does this mean one can impose Debt/Equity statistics of insurance companies on banks ?.

Another problem occurs with certain descending parameters like

For these lower positive numbers are better until they become negative and these are in fact worse than large positive numbers. One solution is to change ALL negative numbers to the max value. A better more complicated solution is to sort the positive numbers ascending followed by the negative numbers descending, and then Rank the resultant row number into that array.


George Elgin
Questions or Comments