Descriptive statistics is used to summarize data in a way that everyone understands
Descriptive Statistics
There are several different topics in statistics. But no topic is considered more important than descriptive statistics. Descriptive statistics is used to summarize data in a way that everyone understands and interprets clearly. There are two ways to summarize data. The data can be summarized graphically and numerically. Using numerical approach one would calculate, for example, the mode, median, and mean. Using the graphical approach one would create a pie chart, scatter plot, or a histogram. There are positive and negatives about both. The graphical approach is great for finding patterns. While a numerical approach is more detailed. Both of these approaches should be used together for a better understanding of the measurement. There are several uses for descriptive statistics. This paper will only discuss some these uses. This paper will go into more detail of what descriptive statistics is, if it's important, different ways to measure descriptive statistics, how to use it in Microsoft Excel, and an example of descriptive statistics in the real world. By the end of the paper a general understanding of descriptive statistics should be reached. More attention should be paid to descriptive statistics because it is a powerful statistic that can be used even for everyday events.
Business Schools and Statistics. Descriptive statistics and statistics in general are important subjects in business schools. This is not only in undergraduate but graduate degrees. It has been important even after 1993 when the AACSB (American Assembly of Collegiate Schools of Business) ruled that undergraduate business students should have at least 50% of all classes be undergraduate classes. Most schools had to require less business courses to meet this new general education requirement. Many schools have been thinking about dropping statistic courses and offering more communication courses. But the AACSB required that schools offer foundation courses in statistics. In 1992 at the MBA level, 90% required a corelevel course in statistical methods (Parker, 51). That percentage is not much different today. From the many different statistic topics, descriptive statistics is the 4th most popular topic thought in 93.7% of courses. In undergraduate introduction statistical courses descriptive statistics is the most popular topic. A survey has concluded that college professors perceived descriptive statistics as the most important topic.
Why Descriptive Statistics. The biggest use in descriptive statistics is to understand research data. They assist with understanding how the data are distributed across the possible range of values; with knowing whether or not the shape of a variable is normal; and with understanding whether one’s subjects tend to clump together in one spot on the distribution, or if they are widely scattered throughout the possible range of values (McHugh, 35). Descriptive statistics are great for summarizing and organizing data. There are different types of descriptive statistics that are measured at different levels. There are four different ways to measure descriptive statistics. The measurements are the nominal, ordinal, interval, and ratio. Descriptive statistics also fall into four groups. The four groups are shape, form, or normality statistics, central tendency, measures of dispersion or variation, and quartile and percentile measures.
Nominal Scale. Another word for nominal is name. For nominal measurements a variable is split up into categories and then are given names. Only one category must be measured each time a variable occurs. In other words, each variable that is measured will only be given one value on the scale. In a nominal scale all points on the scale are characterized as being equivalent (e.g., sex, race, country or origin) (Vahter 39). The different categories in a nominal scale are called levels. The numbers assigned to the nominal levels of a variable are just labels. One great example in nominal measurement is pain. There are all types of different pain but can only be described within a category. If for example we want to measure dull, sharp, or burning pain then the levels would be dull, sharp, and burning. Another great example of a nominal scale is measuring gender or marital status. For gender the levels would be male or female, while for marital status it would be single, married, or divorced. A nominal scale is the least powerful of the four scales. The scale wastes any information a sample element might share about varying degrees of the property being measured (Cooper, 284). Even though nominal scales are weak they are still useful in certain cases. Nominal scales are greatly used in surveys. For example, measuring gender would limit the user to choosing between two options. This type of scale is the simple category scale. There are also more complex scales like the multiple choice single response scale, where only a single response is chosen among multiple options. There is also the nominal scale where multiple responses are chosen among multiple options, that would be the multiple choice multiple response scale.
Ordinal Scale. Ordinal is the second type of measurement. An ordinal scale is like a nominal scale but in an ordinal scale order in indicated. In other words ordinal scales have magnitude. For example, when measuring pain it can be measured from 1 to 10. With 10 indicating the most pain while 1 being the least pain. Ordinal scales do not need to use numbers. Ordinal scales can use greater than or less then measurements. While ordinal measurement speaks of greater than and less than measurements, other descriptors may be used “superior to, “ “happier than, “ “poorer than,” or “important than (Cooper, 285).” Many books says that the Likert scale is an interval scale. But according to the article “Scientific Inquiry an example of ordinal data is the Likert scale. A Likert scale is a rating where a user chooses if they strongly agree, agree, neither agree nor disagree, disagree, or strongly disagree. The negative thing about ordinal scales is that the distance is always the same between each level. Ordinal scales do not have equal intervals. For example, the difference between a mild pain and moderate pain can be great but the ordinal scale will not measure that. In other words the ordinal scale has categories like the nominal scale with the addition of magnitude but do not have equal intervals. An ordinal scale is more powerful than a nominal scale.
Interval Scale. The 3rd level of measurement is interval measures. Interval measures have nominal characteristics because it has categories and ordinal characteristics because it has magnitude. What interval measures adds is that it has equal intervals. An example of an equal interval is a ruler. The distance between 1 and 2 inches or 5 and 6 inches is always the same. One thing that interval measures do not have is an absolute zero. A great example of not having an absolute zero is temperature. Since even at zero degrees Fahrenheit the zero stands for a certain temperature. Absolute zero means that there is none of whatever is being measured, and there is always a temperature (McHugh, 36). An example of an interval scale is the semantic differential scale. This scale has 7 points where each user rates whether something is important or unimportant. If someone was to rate a car repair as important than it would receive a 7 or somewhere near 7 while unimportant would receive a 1 or somewhere near a 1. Some scales used in surveys can be labeled either as ordinal or interval. Like for example the numerical scale, Likert scale mentioned earlier, or the staple scale.
Ratio Scale. Without having an absolute zero comparisons are impossible with the interval scale. This is where our 4th measurement comes in. The 4th measurement is the ratio scale. The ratio scale has absolute zero. Because of the absolute zero length, it is meaningful to say, “Four inches is twice as long as 2 inches (McHugh, 37).” The ratio scale unlike the other scales has category, magnitude, equal intervals, and the absolute zero point. Because of this multiplication and division can only be used with the ratio scale. The ratio scale will usually give more information. Going from a ratio scale to any of the other scales results in loss of information. Some examples of a ratio scale are money values, population counts, distances, return rates, productivity rates, and amounts of time (Cooper, 286). Ratio scale is the most powerful measurement of the four mentioned. An example of a ratio scale is the constant sum scale which helps in finding proportions. Lets say a customer was taking part in a survey and comes across a constant sum scale. A customer would choose any numbers that totals 100. For example, ranking the repair on the speed and price. If a customer was happy with the speed of the repair they would give a number like 70, while if the customer was unhappy with the price than a number would be 30. No matter what numbers are chosen they have to equal 100 or else it is not a constant sum scale. There is one interesting scale that can be considered an ordinal, interval, or ratio scale. In this scale a continuum is used to mark an X along a line that is not numbered in any way depending on whether someone prefers one thing over another. The negative about this scale is that it can be difficult to measure the exact point on the line. This scale is mostly used with children since pictures can replace words.
Shape, Form, or Normality Statistics. Certain measurements can be inserted into a graph. The graph than takes a certain shape or form. It is the shape of the data when the data are graphed so that the levels of the variable are on the xaxis and the number of cases found at each data point constitutes the yaxis (McHugh, 111). Most common shape on a graph for an ordinal, interval, or ratio variables is a normal distribution. Most of the time a normal distribution is called the bellshaped curve. Than there is a way to measure whether two halves of a distribution are symmetrical. This type of measurement is called skew. Where the graph records variables and then declines is called a positively skewed graph. An example of this is a graph that records survival of child cancer deaths. Very few children die from cancer in the first couple years but later on less children survive. The opposite of a positively skewed graph is a negative skewed graph. Both the positive and negative skewed graphs are not a normal distribution. A graph that peaks very high in the middle and looks flat on both side is called the leptokurtotic. There is an opposite to leptokurtotic which is platykurtotic. In a platykurtotic the graph is flat on the sides and low in the middle. Both the leptokurtotic and platykurtotic graphs cannot be considered to be normally curved. A graph called the mesokurtotic would be a normally curved graph. A mesokurtotic graph is not too high and not too low in the middle.
Measures of Central Tendency. Measures of tendency are sometimes called measures of location. Measures of tendency determine where most of the data is in the distribution. The first measure is the mode. The mode is the number that occurs the most frequently in a data. An example of the mode in use is when making diagnosis decisions for a patient. Before a diagnosis decision is made common illnesses should first be looked into before searching for a rare illnesses. It makes sense that majority of patients will have an illness that is most common. The second type of measure is the median. Median is the middle. For example if there are five numbers 1, 2, 4, 6, 8 the median of these numbers is 4. The median is usually the most accurate indicator of central tendency (McHugh, 113). Median and the mean will be compared later in the paragraph. The third type of measure is the mean. The mean is the average of all the numbers. If for example there are three numbers, the three numbers would be added and divided by 3 to find the mean. When one of the variables is very high or low than the value of the mean gets distorted. This is especially a problem when the sample size is small. Even though the mean is the most useful measure of central tendency it is not useful for everything. The first problem is the mean being used with nominal or ordinal data. Since the numbers in nominal data do not mean anything that means that the mean cannot be used. The mean also does not mean much in ordinal data since the intervals between points would be different. While for the median it would not matter if one of the variables is high or low. The great thing about the mean unlike the median is that it can be used for powerful statistical analytical techniques. An example of using the mean or the median is in union negotiations. Management in unions prefer to use the mean salary to represent all workers, while unions prefer to use the median. This is because the managers make more; some chief executive officers make millions of dollars. Regular union employees usually do not make more than 50,000 dollars a year. Because of this the mean salary would be distorted and would not represent a typical salary for the majority of union workers. While the median would be a better representation of the average salary. If the mean was to be used than a raise for employees would not be approved, while with the median there is a possibility. Especially, if the mean was very high when compared to the median questions would be asked why a raise is being asked for. Sometimes one paper will write median salary, another the mean, and the third the mode. That’s when the public thinks that one or both sides are lying. Just like in the previous example, sometimes great care has to be made whether using mode, median, or mean.
Measures of Dispersion or Variation. Variability is another great way that researchers measure. Variability describes how cases tend to be scattered throughout the entire range of the variable (McHugh, 114). There are three ways to measure variability. This is by using range, variance, or standard deviation. Range is the least precise and the easiest to calculate of the three measurements. Range is calculated by taking the highest score and subtracting it from the lowest score. For example, there are five numbers, 5, 8, 15, and 20. To calculate the range the highest number which is 20 is subtracted from the lowest which is 5. The range would be 15. The range tells us how wide the values are. The negatives are that the range does not tell us if the values are evenly spread, have clumps of spaces, or empty spots. The third way to measure variability is by variance. Variance measures how the scores are close to the middle. The middle being the mean. Each score is subtracted from the mean. If the scores are added up it would equal zero. To avoid this scores are squared to turn them into positive numbers. After that the sum is divided by the sample size to get the variance. The standard deviation is found by taking the square root of the variance. Standard deviation is useful in finding what numbers in the variable are not in line with the mean. A small standard deviation (relative to the mean) means that most of the scores cluster tightly around the mean (McHugh, 115). While with a large standard deviation its the opposite.
Quartile and Percentile Measures. A percentile measures is where a score falls compared to the entire distribution. With percentile all the scores are arranged from highest to lowest and then the percentage is calculated to see how many are below individual scores. An example of this is national school achievement tests. When a child gets test results from the national school achievement it will give them a score like 74. The 74 means that 74% of the kids scored below. With this number it is easy to find out how many kids scored above by subtraction 74 from 100. After subtraction we would know that 26 percent of the kids scored above. A quartile takes a percentage chart and divides it into four equal sections. The lowest quartile would have scores from 1 to 25th percentile, third highest quartile would be 26th to 49th percentile, second highest quartile would have 50th to 75th percentile, and scores from 75th to 99th percentile are the top quartile. The interquartile range are the scores in the second and third quartile which describe the central tendency. Basically the quartile and percentile measures are to show placement in a population.
Descriptive Statistics in Microsoft Excel. Descriptive statistics can be displayed using Microsoft Excel spreadsheet software. The great thing about Microsoft Excel is that the user using the program does not need to be good in math. Most of the math especially the difficult math is done by the software. First thing to do is to enter the data into the spreadsheet. If we are for example entering student test scores than each student should have a separate row. Even if the student has multiple test scores the scores should be entered in one row. A student should not have multiple rows on the spreadsheet. After the data is entered, Excel can be used to display the data visually. Interpreting visual representations is much more intuitive than interpreting statistics alone, although using the two formats together provides the greatest clarity (Carr, 46). Excel can display different types of graphs like frequency distributions, frequency polygons, and many others. Before starting anything complicated a new user should attempt something easy like a frequency distribution. The new user than can move on to tougher graphs like the polygon and the histogram which will be discussed next.
Histograms in Excel. Histogram is the type of graph that is displayed with bars, where each bar is a certain data value. First some interesting facts about histograms before explaining on how to create one in Microsoft Excel. One thing to note about histograms is that as the sample size increases the shape of the histogram grows smoother, in other words it is more like a normal distribution. And when ever comparing histograms to one another they should be on the same scale. Before beginning to use Excel to make a histogram a new user should note that there are many different Excel versions. Some versions will have different features than others. There are also programs that are similar to Microsoft Excel that can be used like version 15 of SPSS. This paper will focus only on Microsoft Excel. To make histogram in Excel a Data Analysis Toolpack needs to be installed if it is not already installed. This sometimes works with a software update. The first thing to do in the Excel spreadsheet is to setup bins. Bin numbers are numbers which represent intervals used for measuring the input data. Excel can determine bin numbers on its own but the result is a histogram that does not look right. Bin numbers should be numbers evenly divisible by the size being used. After the bin numbers and the data have been entered several steps are taken to create the histogram. Steps like selecting tools and then the histogram option. After that selecting the data that will make a histogram. But that type of detail will not be mention in this paper, just the most important features in Excel. Once the histogram is created it can be further developed. One negative part of Excel is that if the histogram is created with a mistake there is no way to undo something about it. The histogram would need to be deleted and started over. But other than that Excel has some neat features. For example, histograms in Excel can be made larger or smaller. Can have a title if needed. Labels can be added, for example, labeling the bins. Direction of the text can be changed. Colors can also be changed, that means the histogram can be black and white. One other nice thing about creating histograms in Excel is that it can add trendlines. Trendlines are upward or downward lines on the graph that indicate movements of data. Trendlines can be very useful demanding on what is needed from the graph. A histogram can also be easily changed to a frequency polygon in Microsoft Excel. A frequency polygon adds a line to the middle of each bar in the histogram. This estimates the form of the data distribution.
Affordable Housing Case. One great real world example of descriptive statistics is the article “Towards an Accurate Description of Affordability”. The article is about low income families and affordable housing in England. Low income families have had problems getting an affordable home to live in. Because of this since the 1980s affordable housing options for low income families are on the rise. Organization for Economic Cooperation and Development (OECD) countries strive to provide low income housing. Most low income families that are in OECD countries rent. The focus of this study is on describing the measurement and description of affordability among tenants. Low income housing is a certain standard in living that does not put too much financial pressure on families with low incomes. The focus of this study is on the relationship between a household’s housing expenditure, income and any housing allowance entitlement and housing standards (Chaplin, 1949). There are two ways that affordability will be measured. The two ways are rent to income ratio and residual income measure. The former examines rent as a proportion of income to arrive at a percentage which represents that part of income which must be paid out as rent; the latter deducts rent and a normative allowance for nonhousing goods and services (food, clothing etc.) from income to arrive at a residual income that part of income that may be “freely disposed of” (Chaplin, 1949).
Ratio Formula. The rent to income ratio is the measurement that is used most often throughout the world. The article tries to make an argument that the mean and head count statistics when used with the rent to income ration are not correct. The rent to income ratio is calculated by the formula (ratio = rent/net income). The ratio calculates the percentage of income spent on rent. This formula has been criticized by many. There are three main criticisms. First criticism is that the ratio does not always describe affordability. The 2nd criticism is that the ration does not distinguish varying income and rent variations. For example, one family has income of 100 pounds a week and spends on rent 20 pounds a week. The second family has income of 1000 pounds a week and rent of 200 pounds a week. The two family’s ratios will be 20 percent. The second family is better off but the ratio does not indicate that. The third criticism is that the ratio does not take regional differences in housing and nonhousing costs.
Headcount and Average Ratio. The headcount and the mean are the two main statistics that are used with the rent to income ratio. A benchmark in the rent to income ratio is used to calculate who can afford and who cannot afford housing. This is called the headcount statistic which is used in the US, UK, and Australia. In the US there is the National Affordability Housing Act which was written in 1990 that calculates how many families spend more than 30 percent on their gross income on rent and utilities. In the UK there is the National Housing Federation which describes low income families where rent to income ratio exceeds 25 percent. While in Australia there is the national housing strategy written in 1991 where the benchmark ratios are set between 25 and 30 percent. The mean on the other hand is mostly used in European countries. Average ratios are calculated that places individuals in a level for a housing allowance.
Affordability. Poverty has been difficult to understand in terms of measurement and description. There has been much discussion on this and great strides have been made in description. Poverty line is set; whoever is below a poverty line can be described in a certain way. For example, to get the headcount the number of poor is taken divided by the total population. Alternatively, one can find the “poverty gap” that is the gap between an individual (household)’s poverty level and the poverty line, add these together for all the poor households and divide this sum by the number of poor and the level of the poverty line, this creates the income gap ratio (Chaplin, 1951). Even though there are problems with the headcount and the income gap ratio. The headcount is good at finding the poor but does not say how poor they are. While the income gap ratio is good at finding how poor someone is but it will not find the number of poor.
Poverty Axioms. Descriptive statistics for calculating poverty has three axioms. The first axiom is the monotonicity axiom, the second axiom is the transfer axiom, and the third axiom is transfer sensitivity axiom. The monotonicity axiom also called the ceteris paribus means that when ever a poor household’s income declines this increases the poverty measure. The transfer axiom also called ceteris paribus means that when a poor household income becomes larger than the poverty measure will increase. Transfer sensitivity axiom also called ceteris paribus is when a poor household’s income becomes wealthier it will increase the poverty measure but the rate increase will be smaller the higher the original poor income is. There is a special formula that can calculate all three axioms. This formula is called the FGT statistic where the depth of the poverty is measured. In other words it’s the gap between household’s income and the poverty line. The FGT statistic can also be changed to measure affordability which would be very useful for the NHF.
National Housing Federation Policy. The National Housing Federation or NHF is responsible to providing social housing units in England. In 1988 Housing Act came out with a new bill that called for mixed funding. The government provides money through the Social Housing Grant and associations collect private money. The associations are responsible to set their own grants to eventually repay the loan. Because of this rents have risen to the point where many low income families cannot afford them. Government of England has not told the associations of what the affordable rent should be. Because of this the NHF has come up with a policy of affordability. The NHF failed to follow the axioms mentioned earlier; instead it follows the headcount rule. Rents are labeled affordable by the NHF if rent to income ratio exceeds 25 percent and is less then 50 percent of all working households. Continues Recording or CORE is used by the NHF to calculate the headcount. Every time an association gets a new tenant a CORE form is filled out and the data is entered into the database. The CORE form asks tenants for issues like household income or rent amounts. This information is then used to calculate affordability for the tenant.
Case Study Method and Results. For the case study in the article each region was tested. Calculations were made on the FGT statistic, headcount percentage, and mean rent to income ration. Data for the study was used from an Existing Tenants’ Survey which is similar to the CORE data. CORE data was not able to be used because of access restrictions. While the CORE data only covers new tenants the Existing Tenants’ Survey covered all tenants in July 1995. The data had 420 working two parent families. With this data Housing Benefit eligibility was calculated. Each formula resulted in describing affordability in its own way. The data was than put into a table according to region and the type of formula used. The stats showed that Eastern and South East regions received similar results for the headcount and mean because these families have similar values. But according to the FGT values poverty levels are much different which means that affordability is different. The Eastern region had the worst affordability out of the 2. The Eastern region had 2 households where the rent to income ratios was larger than 45 percent. The FGT statistic picked this up and increased the affordability statistic. The FGT statistic shows a weakness in the headcount and mean ratio. This is where the NHF made a mistake.
Case Conclusion. The point of this article was to show that the descriptive statistics to describe affordability currently is not correct. The three axioms must be used or some families will suffer. The statistic that uses all three axioms is the FGT statistic which should be used in the future to not only describe poverty but also affordability. Misallocation of resources will occur if the headcount and mean ratios are only used. That means that the households that have affordability problems will not be noticed. The FGT statistic describes the gap between individual households’ rent to income ratios and prescribed benchmark and it weights households according to the size of this gap (Chaplin, 1956). This is why the FGT statistic is superior in calculating affordability and should be used in the future not only in the UK but in the US, Australia, and throughout the world. This FGT statistic would give low income housing to families that really need it and take it away from the one's that don't. It's a more accurate form of measurement.
Conclusion
Taken together, it is clear that descriptive statistics is important. For example, although there are many topics in a business statistics class descriptive statistics is the most important topic. With four different ways to measure descriptive statistics; nominal, ordinal, interval, and ratio. Also with four major groups; shape, form, or normality statistics, central tendency, measures of dispersion or variation, and quartile and percentile measures. Furthermore with Microsoft Excel descriptive statistics can be calculated and graphs created with ease. This makes descriptive statistics a great tool for office use. A real world example written about earlier on low income housing also shows why descriptive statistics is important. As shown, errors can be fixed by using descriptive statistics that were not caught before using other statistical techniques. There are many applications that descriptive statistics can be used for; some of these applications have been discussed. There are many other uses for descriptive statistics that are not mentioned in this paper. This means that more attention should be paid to descriptive statistics because it is a powerful statistic that can be used even for everyday events.
References
Beachkofski, B. (2009, January). Comparison of Descriptive Statistics for Multidimensional
Point Sets. Monte Carlo Methods and Applications, 15(3), 211228.
Candido, C. (2005, January). Service Quality Strategy Implementation: A Model and the Case of
the Algarve hotel Industry. Total Quality Management and Business Excellence, 16(1), 3
14.
Carr, N. (2008, January). Using Microsoft Excel to Calculate Descriptive Statistics and Create
Graphs. Language Assessment Quarterly: An International Journal, 5(1), 4362.
Chaplin, R., & Freeman, A. (1999, October). Towards an Accurate Description of Affordability.
Urban Studies, 36(11), 19491957.
Cooper, D, & Schindler, P. (2010). Business Research Methods (10th ed.). Boston: McGraw
Hill/Irwin.
Kaiser, M. (2008, April). Economic Limit of Offshore Structures in the Gulf of Mexico –
Descriptive Statistics. Energy Sources, Part B: Energy, Economics, and Planning, 3(2),
203214.
Kalmijn, W., & Veenhoven, R. (2005, December). Measuring Inequality of Happiness in
Nations: In Search for Proper Statistics. Journal of Happiness Studies, 6(4), 357396.
McHugh, M. (2003, January). Descriptive Statistics, Part I: Level Measurement. Journal for
specialists in Pediatric Nursing, 8(1), 3537.
McHugh, M. (2003, July). Descriptive Statistics, Part II: Most Commonly Used Descriptive
Statistics. Journal for specialists in Pediatric Nursing, 8(3), 111116.
Parker, S., & Pettijohn, C. (1999, SeptemberOctober). The Nature and Role of Statistics in the
Business School Curriculum. Journal of Education for Business, 75(1), 51.
Shatz, M. (1985). The Greyhound Strike: Using a Labor Dispute to Teach Descriptive Statistics.
Teaching of Psychology, 12(2), 8586.
Spina, D. (2007, October). Statistics in Pharmacology. British Journal of Pharmacology, 152(3),
291293.
Vahter, P. (2006). Descriptive Statistics About the Heterogeneity of Productivity Among Firms.
Working Papers of Eesti Pank, 7, 1550. .......
