How to calculate emissions

Author: Janice Evans
Date Of Creation: 23 July 2021
Update Date: 1 July 2024
Anonim
How to Calculate Greenhouse Gas Emissions
Video: How to Calculate Greenhouse Gas Emissions

Content

In statistics, outliers are values ​​that differ sharply from other values ​​in the collected dataset. An outlier can indicate anomalies in the data distribution or measurement errors, so outliers are often excluded from the dataset. By eliminating outliers from the dataset, you can come to unexpected or more accurate conclusions. Therefore, it is necessary to be able to calculate and estimate outliers in order to ensure a proper understanding of the statistics.

Steps

  1. 1 Learn to recognize potential outliers. Potential outliers should be identified before excluding outliers from the dataset. Outliers are values ​​that are very different from most of the values ​​in the dataset; in other words, outliers are outside the trend of most values. This is easy to find in tables of values ​​or (especially) in graphs. If the values ​​in the dataset are plotted, the outliers will lie far from most other values. If, for example, most of the values ​​fall on a straight line, then the outliers lie on both sides of such a straight line.
    • For example, consider a dataset representing the temperatures of 12 different objects in a room. If 11 objects are approximately 70 degrees, but the twelfth object (possibly a furnace) is 300 degrees, then a quick look at the values ​​can indicate that the furnace is a likely blowout.
  2. 2 Sort the data in ascending order. The first step in determining outliers is to calculate the median of the dataset. This task is greatly simplified if the values ​​in the dataset are arranged in ascending order (from smallest to largest).
    • Continuing with the example above, consider the following dataset representing the temperatures of multiple objects: {71, 70, 73, 70, 70, 69, 70, 72, 71, 300, 71, 69}. This set should be ordered as follows: {69, 69, 70, 70, 70, 70, 71, 71, 71, 72, 73, 300}.
  3. 3 Calculate the median of the dataset. The median of a dataset is the value in the middle of the dataset. If the dataset contains an odd number of values, the median is the value before and after which there are the same number of values ​​in the dataset. But if the dataset contains an even number of values, then you need to find the arithmetic mean of the two means. Note that when calculating outliers, the median is usually referred to as Q2, as it lies between Q1 and Q3, the lower and upper quartiles, which we will define later.
    • Don't be afraid to work with datasets that have an even number of values ​​- the arithmetic mean of the two averages will be a number that is not in the dataset; this is normal. But if the two mean values ​​are the same number, then the arithmetic mean is equal to this number; this is also in the order of things.
    • In the example above, the middle 2 values ​​are 70 and 71, so the median is ((70 + 71) / 2) = 70.5.
  4. 4 Calculate the bottom quartile. This value, referred to as Q1, is below which 25% of the data set values ​​lie. In other words, it is half of the values ​​up to the median. If there is an even number of values ​​from the dataset before the median, you need to find the arithmetic mean of the two means in order to calculate Q1 (this is similar to calculating the median).
    • In our example, 6 values ​​are located after the median and 6 values ​​- before it. This means that in order to calculate the lower quartile, we need to find the arithmetic mean of the two means of the six values ​​that lie before the median. Here the average values ​​are 70 and 70. Thus, Q1 = ((70 + 70) / 2) = 70.
  5. 5 Calculate the upper quartile. This value, referred to as Q3, is above which 25% of the data set values ​​lie. The process for calculating Q3 is similar to the process for calculating Q1, but here the values ​​after the median are considered.
    • In the example above, the two averages of the six after the median are 71 and 72. So Q3 = ((71 + 72) / 2) = 71.5.
  6. 6 Calculate the interquartile range. Having calculated Q1 and Q3, it is necessary to find the distance between these values. To do this, subtract Q1 from Q3. The value of the interquartile range is extremely important for determining the boundaries of values ​​that are not outliers.
    • In our example, Q1 = 70 and Q3 = 71.5. The interquartile range is 71.5 - 70 = 1.5.
    • Note that this also applies to negative Q1 and Q3 values. For example, if Q1 = -70, then the interquartile range is 71.5 - (-70) = 141.5.
  7. 7 Find the "inner boundaries" of the values ​​in the dataset. Outliers are determined by analyzing the values ​​- whether or not they fall within the so-called "internal boundaries" and "external boundaries". A value outside the “inner bounds” is classified as a “minor outlier”, while a value outside the “outer bounds” is classified as a “significant outlier”. To find the inner boundaries, you need to multiply the interquartile range by 1.5; the result must be added to Q3 and subtracted from Q1. The two numbers found are the inner boundaries of the dataset.
    • In our example, the interquartile range is (71.5 - 70) = 1.5. Further: 1.5 * 1.5 = 2.25. This number must be added to Q3 and subtracted from Q1 to find the inner boundaries:
      • 71,5 + 2,25 = 73,75
      • 70 - 2,25 = 67,75
      • Thus, the internal boundaries are 67.75 and 73.75.
    • In our example, only the furnace temperature - 300 degrees - lies outside these limits and can be considered an insignificant emission. But don't jump to conclusions - we have to determine if this temperature is a significant outlier.
  8. 8 Find the "outer boundaries" of the dataset. This is done in the same way as for interior boundaries, except that the interquartile range is multiplied by 3 instead of 1.5. The result must be added to Q3 and subtracted from Q1. The two numbers found are the outer boundaries of the dataset.
    • In our example, multiply the interquartile range by 3: 1.5 * 3 = 4.5. Calculate the outer bounds:
      • 71,5 + 4,5 = 76
      • 70 - 4,5 = 65,5
      • So the outer boundaries are 65.5 and 76.
    • Any values ​​that fall outside the outer boundaries are considered significant emissions. In our example, a furnace temperature of 300 degrees is considered a significant blowout.
  9. 9 Use a qualitative estimate to determine if outliers should be excluded from the dataset. The method described above allows you to determine if some values ​​are outliers (minor or significant). Make no mistake, though - a value that is classified as an outlier is only a "candidate" for an exception, meaning you don't have to exclude it. The cause of the outlier is the main factor influencing the decision to exclude the outlier. As a rule, outliers that occur due to errors (in measurements, recordings, etc.) are excluded. On the other hand, outliers that are not due to errors but to new information or trend are usually left in the dataset.
    • It is equally important to assess the effect of outliers on the median of the dataset (whether they distort it or not). This is especially important when you are drawing conclusions from the median of a dataset.
    • In our example, it is extremely unlikely that the oven will heat up to a temperature of 300 degrees (unless natural anomalies are taken into account). Therefore, it can be concluded (with a high degree of certainty) that such a temperature is a measurement error that needs to be excluded from the dataset. Moreover, if you don't rule out the outlier, the median of the dataset will be (69 + 69 + 70 + 70 + 70 + 70 + 71 + 71 + 71 + 72 + 73 + 300) / 12 = 89.67 degrees, but if you exclude the outlier, the median will be (69 + 69 + 70 + 70 + 70 + 70 + 71 + 71 + 71 + 72 + 73) / 11 = 70.55 degrees.
      • Outliers are usually the result of human error, so outliers need to be excluded from datasets.
  10. 10 Understand the importance of the (sometimes) outliers left in the dataset. Some outliers should be excluded from the dataset as they are caused by errors and technical problems; other outliers should be left in the dataset. If, for example, an outlier is not the result of an error and / or provides a new understanding of the phenomenon under test, then it should be left in the dataset. Scientific experiments are especially sensitive to outliers — by mistakenly eliminating an outlier, you may be missing out on some new trend or discovery.
    • For example, we are developing a new drug to increase the size of fish in fisheries. We will use the old dataset ({71, 70, 73, 70, 70, 69, 70, 72, 71, 300, 71, 69}), but this time each value will represent the weight of the fish (in grams) after ingestion experimental drug. In other words, the first drug leads to an increase in fish weight up to 71 g, the second drug - up to 70 g, and so on. In this situation, 300 is a significant outlier, but we must not rule it out; if we assume that there were no measurement errors, then such an outlier is a significant success in the experiment. The drug, which increased the weight of the fish to 300 grams, works much better than other drugs; thus 300 is the most important value in the dataset.

Tips

  • When outliers are found, try to explain their presence before excluding them from the dataset. They can indicate measurement errors or distribution anomalies.

What do you need

  • Calculator