Calculate covariance

Author: Judy Howell
Date Of Creation: 2 July 2021
Update Date: 1 July 2024
Anonim
How To Solve For Covariance
Video: How To Solve For Covariance

Content

Covariance is a statistical calculation to make the relationship between two data sets more transparent. For example, suppose anthropologists study the height and weight of a population within a particular culture. For each person in the study, height and weight can be displayed with a pair of data (x, y). These values ​​can be used in a standard formula for calculating the covariance relationship. This article first explains the calculations for determining the covariance of a data set. Next, two other automated ways of determining the result will be discussed.

To step

Method 1 of 4: Calculate the covariance by hand using the standard formula

  1. Learn the standard covariance formula and its parts. The standard formula for calculating covariance is Σ(XiXavg)(yiyavg)/(n1){ displaystyle Sigma (x_ {i} -x _ { text {avg}}) (y_ {i} -y _ { text {avg}}) / (n-1)}Construct your data table. Before getting started, it is helpful to collect your data. Create a table consisting of five columns. You must declare each column as follows:
    • X{ displaystyle x}Calculate the mean of the x data points. This sample data set contains 9 numbers. To find the mean, add them together and divide the sum by 9. This gives the result 1 + 3 + 2 + 5 + 8 + 7 + 12 + 2 + 4 = 44. When you divide this by 9 , you get the average 4.89. This is the value you will use as x (avg) for the upcoming calculations.
    • Calculate the mean of the y data points. This y column must also consist of 9 data points that coincide with the x data points. Determine the average of these. For this sample data set, this becomes 8 + 6 + 9 + 4 + 3 + 3 + 2 + 7 + 7 = 49. Divide this total by 9 to get an average of 5.44. You are going to use 5.44 as the value of y (avg) for the upcoming calculations.
    • Calculate the values (XiXavg){ displaystyle (x_ {i} -x _ { text {avg}})}Calculate the values (yiyavg){ displaystyle (y_ {i} -y _ { text {avg}})}Calculate the products for each data row. You fill in the rows of the last column by multiplying the numbers you calculated in the two previous columns of (XiXavg){ displaystyle (x_ {i} -x _ { text {avg}})}Find the sum of the values ​​in the last column. This is where the Σ symbol comes in. After doing all the calculations so far, add the results together. For this sample data set, you should now have nine values ​​in the last column. Add those nine numbers together. Pay close attention to whether a number is positive or negative.
      • The sum of this sample data set should add up to -64.57. Write this total in the space at the bottom of the column. This is the value of the numerator of the standard covariance formula.
    • Calculate the denominator of the covariance formula. The numerator of the standard covariance formula is the value you just calculated. The denominator is represented by (n-1), and it is one less than the number of pairs of the data in your data set.
      • In this example problem, there are nine data pairs, so n is 9. Therefore, the value of (n-1) is equal to 8.
    • Divide the numerator by the denominator. The last step in the calculation of the covariance is to divide the numerator, Σ(XiXavg)(yiyavg){ displaystyle Sigma (x_ {i} -x _ { text {avg}}) (y_ {i} -y _ { text {avg}})}Note what the repetitive calculations there are. Covariance is a calculation that you have to do by hand a few times so that you understand the meaning of the result. However, if you are going to routinely use covariance for interpreting data, then you need a faster and more automated way to get the results. By now you may have noticed that with our relatively small data set of just nine data pairs, the calculations consisted of two means, eighteen separate subtractions, nine multiplications, one addition, and finally another division. That's 31 relatively small calculations to find the solution. Along the way you run the risk of missing negative signs or copying the results incorrectly, so that the answer is no longer correct.
    • Create a worksheet for calculating the covariance. If you are familiar with Excel (or another calculation program), you can easily create a table for determining the covariance. Label the headings of the five columns as you did for the calculations by hand: x, y, (x (i) -x (avg)), (y (i) -y (avg)), and Product.
      • To simplify naming, call the third column something like "x difference" and the fourth column "y difference," as long as you remember the meaning of the data.
      • If the table starts in the top left corner of the worksheet, cell A1 will be labeled x, while the other labels continue up to cell E1.
    • Enter the data points. Enter the data values ​​in the two columns x and y. Remember that the order of the data points matters, so you must match each y with the corresponding value of x.
      • The x values ​​start in cell A2 and continue up to the number of data points you need.
      • The y values ​​start in cell B2 and continue up to the number of data points you need.
    • Determine the means of the x and y values. Excel calculates the averages for you very quickly. In the first blank cell below each column of data, type the formula = AVERAGE (A2: A ___). Fill the empty space with the number of the cell that corresponds to your last data point.
      • For example, if you have 100 data points, cells A2 through A101 will be filled, so in the cell you type: = AVERAGE (A2: A101).
      • For the y data, type the formula = AVERAGE (B2: B101).
      • Remember that a formula in Excel starts with a "=" sign.
    • Type the formula for the column (x (i) -x (avg)). In cell C2, enter the formula for calculating the first subtraction. This formula becomes: = A2 -___. Fill the blank space with the cell address containing the mean of the x data.
      • For example, out of 100 data points, the average will be in cell A103, so your formula becomes: = A2-A103.
    • Repeat the formula for the data points (y (i) -y (avg)). Following the same example, it enters cell D2. The formula becomes: = B2-B103.
    • Type the formula for the "Product" column. In the fifth column, type in cell E2 the formula to calculate the product of the two preceding cells. This then becomes: = C2 * D2.
    • Copy the formulas to fill the table. Until now, you have only programmed the first few data points in row 2. Using your mouse, mark cells C2, D2 and E2. Place your cursor on the small box in the lower right corner until a plus sign appears. Click and hold the mouse button and drag the mouse down to expand the selection and fill the entire data table. This step will automatically copy the three formulas from cells C2, D2 and E2 to the whole table. The table should be automatically filled with all the calculations.
    • Program the sum of the last column. You need the sum of the items in the "Product" column. In the blank cell immediately below the last data point in that column, type the formula: = SUM (E2: E ___). Fill the blank space with the cell address of the last data point.
      • In the example with 100 data points, this formula goes into cell E103. Type: = SUM (E2: E102).
    • Determine the covariance. You can also have Excel perform the final calculation for you. The last calculation in cell E103 in our example represents the numerator of the covariance formula. Right below that cell, type the formula: = E103 / ___. Fill the empty space with the number of data points you have. In our example, this is 100. The result is the covariance of your data.

Method 3 of 4: Using online covariance calculators

  1. Search online for covariance calculators. Various schools, companies or other sources have websites that calculate the covariance values ​​very easily for you. Use the search term "covariance calculator" in a search engine.
  2. Enter your details. Read the instructions on the website carefully to make sure you enter the information correctly. It is important that your data pairs are kept in order, otherwise the generated result will be an incorrect covariance. Websites have different styles of data entry.
    • For example, on the website http://ncalculators.com/statistics/covariance-calculator.htm, there is a horizontal box for inputting the x values ​​and a second horizontal box for inputting the y values. You must enter your data separated by commas. Thus, the x data set calculated earlier in this article should then be entered as 1,3,2,5,8,7,12,2,4. The y data as 8,6,9,4,3,3,2,7,7.
    • On another site, https://www.thecalculator.co/math/Covariance-Calculator-705.html, you will be asked to enter the x data in the first box. Data is entered vertically, with one item per line. Therefore the entry on this site looks like:
    • 1
    • 3
    • 2
    • 5
    • 8
    • 7
    • 12
    • 2
    • 4
  3. Calculate your results. The attractive thing about these online calculations is that after entering the data, you usually only have to click the "Calculate" button and the results will appear automatically. Most sites will provide you with the intermediate calculations of x (avg), y (avg) and n.

Method 4 of 4: Interpreting the results of the covariance

  1. Look for a positive or negative relationship. The covariance is a single statistical number that indicates the relationship between one data set and another. In the example mentioned in the introduction, the height and weight are measured. You would expect that as people grow, their weight will also increase, leading to a positive covariance view. Another example: Suppose data is collected that indicates the number of hours someone practices golf and the score he or she achieves. In this case you expect a negative covariance, which means that as the number of training hours increases, the golf score will decrease. (In golf, a lower score is better).
    • Consider the sample data set calculated above. The resulting covariance is -8.07. The minus sign means that as the x values ​​increase, the y values ​​tend to decrease. You can see that this is true by looking at some of the values. For example, the x values ​​of 1 and 2 correspond to the y values ​​of 7, 8, and 9. The x values ​​of 8 and 12 are linked to the y values ​​of 3 and 2, respectively.
  2. Interpret the magnitude of the covariance. If the number of the covariance score is large, either a large positive number or a large negative number, then you can interpret this as two data elements that are strongly connected, either in a positive or negative way.
    • The -8.07 covariance of the sample data set is quite large. Note that the data ranges from 1 to 12. So 8 is a fairly large number. This indicates a fairly strong relationship between the data sets x and y.
  3. Understand the lack of a relationship. If your result is a covariance equal to or very close to 0, you can conclude that the data points are unrelated. That is, an increase in one value can, but does not have to result in an increase in the other. The two terms are linked almost randomly.
    • Suppose you relate shoe sizes to exam grades. Because there are so many factors that influence a student's exam grades, a covariance score close to 0 can be expected. This indicates that there is almost no relationship between the two values.
  4. View the relationship graphically. To visually understand covariance, you can plot your data points on an x, y graph. When you do that, you should see quite easily that the points, while not exactly in a straight line, tend to approach a cluster in a diagonal line from top left to bottom right. This is the description of a negative covariance. You can also see that the value of the covariance is equal to -8.07. This is quite a large number compared to the data points. The high number suggests that the covariance is quite strong, which you can deduce from the linear shape of the data points.
    • To go through this again, read articles on drawing points in a coordinate system on wikiHow.

Warnings

  • Covariance has limited application in statistics. It is often a step towards calculating correlation coefficients or other concepts. Be careful of overly bold interpretations based on a covariance score.