The value of values - Introduction to Experimentation in Physics

What is, in science, the value of a value? In physics, this depends on how certain you are of this value. As quantities are determined using experiments, and these experiments are subject to error and uncertainties, the value can only be determined to some degree. We are never 100% sure of the exact value. Therefore, values are given with their uncertainty: $g$ = 9.81 ± 0.01 m/s $^2$ . So how do we determine to what extent a value is certain?

Errors and uncertainty¶

Uncertainties in values can arise from the precision of the instruments used in the experiment, errors made by the person doing the experiment, vibrations, temperature effects, and fundamental errors related to the phenomenon being studied. Some of these uncertainties can be reduced, others just have to be accepted. Whatever their cause, these effects influence experiments and their outcomes of the experiments and therefore influence the uncertainty in the quantities we want to determine.

Gaussian noise¶

Random errors will usually conform to a Gaussian distribution. The probability of an error of some sort occurring can be calculated through:

P(x) = \frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{(x - \mu)^2}{2 \sigma^2}}

(1)

In this function, $\mu$ is the average value of the error, $\sigma$ the standard deviation, a measure of the spread of the error, $x$ is the value of the error. In section Repeating measurements we assume that the errors we have to deal with are following a Gaussian distribution. The graph below shows what Gaussian noise looks like, following the probability density function shown in the second graph.

mu = 0
sigma = 25
N = 10000 #number of data points

y = np.random.normal(mu,sigma,N) #noise creation
x = np.linspace(1,N,N)

plt.figure()
plt.plot(x[:1000],y[:1000],'k.')
plt.show()
print("""(a) A scatter plot of the noise. Although most data are in between -25 and 25, one can find data points with values >50.""")

plt.hist(y,bins = 50) #bins chosen relatively large, but at random
plt.show()
print("""(b) A histogram of the noise. One can easily see that roughly 2/3 of the data is within µ ± σ and 90% """)

(a) A scatter plot of the noise. Although most data are in between -25 and 25, one can find data points with values >50.

(b) A histogram of the noise. One can easily see that roughly 2/3 of the data is within µ ± σ and 90%

Systematic error¶

If there is, e.g., a calibration error of the instrument, a systematic error occurs with each and every measurement. If your ruler starts at 0.2 cm but you didn’t notice, this will cause a systematic error.

If you suspect a systematic error, you can look for it using e.g. Python. If the to be fitted function is $F = \frac{\alpha}{r^4}$ and you suspect a systematic error in the distance, $r$ , you can try to fit the function $F =\frac{\alpha}{(r+\Delta r)^4}$ . You still have to validate whether the systematic error is within a sensible range and whether there is indeed a systematic error.

Figuring out whether you have the problem of a systematic error can be done by analysing the residuals. If $M(x)$ are the values of your measurements at certain point $x$ , and $F(x)$ is the value of the fitted function at the same point $x$ , the residuals are defined by:

R(x)=M(x)-F(x)

(2)

# Code to show the influence of a systematic error (in x)
#create data
x = np.linspace(0,10,11)

def quadratic(x):
    return 4.3*(x+.6)**2 + np.random.normal(0,1)

y= quadratic(x)

#curvefit without systematic error
def solvex1(x,a):
    return a*x**2

pval_1, pcov_1 = curve_fit(solvex1,x,y)
y2 = solvex1(x,pval_1[0])

      
plt.plot(x,y,'r.')
plt.plot(x,y2,'--')
plt.show()

print("(a) A curve fit using least square method without compensation for a systematic error $y = a \cdot x^2$")

#curvefit with system
def solvex2(x,a,b):
    return a*(x+b)**2

pval_2, pcov_2 = curve_fit(solvex2,x,y)
y3 = solvex2(x,pval_2[0],pval_2[1])

plt.plot(x,y,'r.')
plt.plot(x,y3,'--')
plt.show()
print(f"(b) A curve fit using least square method with compensation for a systematic error $y = a \cdot (x + \Delta x)^2$")

(a) A curve fit using least square method without compensation for a systematic error $y = a \cdot x^2$

(b) A curve fit using least square method with compensation for a systematic error $y = a \cdot (x + \Delta x)^2$

Repeating measurements¶

Repeating a measurement helps us determine the ‘exact’ value. The best estimate of the exact value is the mean:

\overline{x}=\frac{\sum_{i=1}^{N} x_i}{N}

(3)

in which $x_i$ is a measurement and $N$ is the number of repeated measurements.

In the analysis of experimental data, an important parameter is the standard deviation, $\sigma$ . If the experiment is done again, the chance that the value is between $\overline{x}\pm\sigma$ is 2/3. The standard deviation is calculated by

\sigma = \sqrt{\frac{\sum_{i=1}^N (x_i-\overline{x})^2}{N-1}}

(4)

There is another way to determine the standard deviation more quickly, but this is somewhat less accurate (the rough-and-ready approach). The standard deviation is roughly $\frac{2}{3}(x_{max}-\overline{x})$ .

If the same experiment is repeated, the average value will differ slightly each time. This means that their is an uncertainty within the average. This parameter is called the standard deviation of the mean, $\alpha$ :

\alpha = \frac{\sigma}{\sqrt{N}}

(5)

This uncertainty tells you that if the entire experiment is repeated, there is a 2/3 change that the average value is within $\overline{x} \pm \alpha$ .

The average, standard deviation and standard error as function of $N$ noise samples. It can be seen that the average and standard deviation do not change much and only get better determined. The standard error decreases with $\sqrt{N}$ .

mu = 0
sigma = 25
N = 10000 #number of data points

y = np.random.normal(mu,sigma,N) #noise creation
x = np.linspace(1,N,N)

#creates and fills list for average,std, and error against the number of data points used
av_n = []
std_n = []
error_n = []
for n in range(1,N+1):
    av_n.append(np.mean(y[:n]))
    std_n.append(np.std(y[:n]))
    error_n.append(std_n[-1]/np.sqrt(n))
    
plt.figure()

plt.xlabel('N')
plt.ylabel('$\overline{x}$')

plt.plot(x, av_n, "b.")
plt.show()
print("""(a) The average value of N noise samples. It can be clearly seen that the value converges to 0.""")


plt.figure()

plt.xlabel('N')
plt.ylabel('$\sigma_x$')

plt.plot(x, std_n, "b.")
plt.show()
print("""(b) The standard deviation of N noise samples. It can be clearly seen that the value converges to 25.""")


plt.figure()

plt.xlabel('N')
plt.ylabel('$\mu_x$')

plt.plot(x, error_n, "b.")
plt.show()
print(""" (c) The standard error of N noise samples. It can be seen that the uncertainty decreases.""")

(a) The average value of N noise samples. It can be clearly seen that the value converges to 0.

(b) The standard deviation of N noise samples. It can be clearly seen that the value converges to 25.

 (c) The standard error of N noise samples. It can be seen that the uncertainty decreases.

Chauvenet’s criterion¶

What if a measurement is repeated ten times, and one value is very different from the rest? Can it just be discarded? To decide whether a value can be discarded, one can use the theory above and extend it. To start, you calculate the mean and the standard deviation. Subsequently you calculate the occurrence of the outlier $P_{out}$ , using the error function: $P_{out}=2Erf(x_{out},\overline{x},\sigma)$ . You can use this site to use the error function

If $N\cdot P_{out}$ is smaller than 0.5, the measurement may be discarded. Disregarding a measurement should be mentioned in the report! You also have to calculate a new mean value and uncertainty as the data set has changed.

Error Function(s)¶

There are two types of error functions, the $erf(x)$ and the $Erf(x,\overline{x},\sigma)$ . Which are defined like this:

erf(x) = \frac{2}{\sqrt{\pi}} \int_0 ^x e^{-t^2} dt

(6)

Erf(x,\overline{x},\sigma_x) = \frac{1}{2} \left[1+erf\left(\frac{x-\overline{x}}{\sqrt{2}\sigma_x}\right)\right]

(7)

The easiest way is to use the error function is to import it from the scipy package. A plot of both functions can be seen in the graph below.

#erf(x)
x = np.linspace(-3, 3, 1000)
plt.plot(x, special.erf(x))
plt.xlabel('$x$')
plt.ylabel('$erf(x)$')
plt.xlim(-3,3)
plt.show()

#Erf(x_out, x_bar, sigma)
# parameters
sigma = 5
x_bar = 15

def Erf(x, x_bar, sigma):
    return 0.5*(1+ special.erf((x-x_bar)/(np.sqrt(2)*sigma)))

x = np.linspace(0, 30, 10000)
plt.plot(x, Erf(x, x_bar, sigma))
plt.xlabel('$x$')
plt.ylabel(r'$Erf(x,\bar{x},\sigma)$')
plt.xlim(0,30)
plt.show()

What can be seen is that the function $Erf(x,\overline{x},\sigma)$ is basically a shifted version of the the $erf(x)$ function. As by dividing it by two and adding 0.5 shifts it up and makes it smaller. The $x_{out}-\overline{x}$ shifts it so that the mean is at the center of the function and dividing it by $\sigma$ ‘stretches’ the function so that it is in the right range. What one also can see if that when you have an outlier which is higher than the mean the $Erf$ will return a value higher than 0.5 which will always result in a value which cannot be discarded. When this happens you have to do $1-Erf(x_{out},\overline{x},\sigma)$ , this you multiply by $2N$ , or one can use:

1 - erf\left(\frac{|x_{out} - \overline{x}|}{\sigma}\right)

(8)

This you still have to multiply by $N$ (not by 2).

Significant figures¶

Significant figures are essential to physics. Most of you are probably already familiar with them. Significant figures are important because they indicate the uncertainty of a value. The number of figures after the comma of $\overline{x}$ and $\alpha$ are always the same!

20 \pm 1

(9)

0.25 \pm 0.02

(10)

10.25 \pm 0.20

(11)

A brief reminder on how to determine the number of significant figures:

All non-zero digits are significant: $2.998 \cdot 10^8$ m/s has four significant figures.
All zeroes between non-zero digits are significant: $6.02214179 \cdot 10^{23}$ mol $^{-1}$ has nine significant figures.
Zeroes to the left of the first non-zero digits are not significant: 0.51 MeV has two significant figures.
Zeroes at the end of a number to the right of the decimal point are significant: $1.60 \cdot 10^{-19}$ C has three significant figures.
If a number ends in zeroes without a decimal point, the zeroes might be significant: 270 might have two or three significant figures.

Rules¶

For most numbers, it is not hard to round of to the correct number of significant figures:

6.62 → 6.6

(12)

5.67 → 5.7

(13)

However, always rounding up 0.5 to 1 will result in a higher rounded value. So the rule is: even numbers before a 5 are cut, odd numbers before a 5 are rounded:

3.45 → 3.4 \text{ (since 4 is even)}

(14)

3.55 → 3.6 \text{ (since 5 is odd)}

(15)

With adding and/or subtracting the least number of figures after the comma is decisive. This also means that the total number of significant figures might change:

1.23 + 45.6 = 46.8

(16)

8.2 + 3.5 = 11.7

(17)

100.5 - 2.5 = 98.0

(18)

With multiplication or division the least number of significant figures is decisive:

1.2 \cdot 345.6 = 4.1 \cdot 10^2

(19)

5 / 2.00 = 2

(20)

This last example needs perhaps a further explanation: 5 / 2.00 = 2.5. However, one figure is allowed. 2 is an even number, so the last decimal is cut.

Error propagation¶

If the uncertainty is known for one value, and that value is used in an equation, the result of that equation will also have some degree of uncertainty. Often, we have multiple variables each having their own degree of uncertainty. There are two ways to calculate this uncertainty: the functional approach, which involves propagating $P \pm \alpha$ throughout the function, and the calculus approach, which is a linearization of the function.

Functional approach¶

When a function $Z$ only depends on a single variable $A$ , the uncertainty $\alpha_{Z}$ can be calculated using:

\alpha_Z = \frac{f(A+\alpha_A) - f(A - \alpha_A)}{2}

(21)

When a function $Z$ depends on multiple independent variables, like for instance (P = UI), the uncertainty needs to be calculated separately for each value using (22), (23) and (24). This method can be used for any number of independent variables.

\alpha_{P,U} = \frac{P(U+\alpha_U, I) - P(U-\alpha_U,I)}{2}

(22)

\alpha_{P,I} = \frac{P(U, I+\alpha_I) - P(U,I-\alpha_I)}{2}

(23)

\alpha_P = \sqrt{ {\alpha_{P,U}}^2+{\alpha_{P,I}}^2}

(24)

Note

You measured the voltage over and the current through a light bulb. The voltage is 6.0 ± 0.2 V, the current 0.25 ± 0.01 A. What is the electrical power of the light bulb?

\alpha_{P,U} = \frac{P(U+\alpha_U, I) - P(U-\alpha_U,I)}{2}=\frac{6.2 \cdot .25 - 5.8 \cdot .25}{2}=0.05 W

(25)

\alpha_{P,I} = \frac{P(U, I+\alpha_I) - P(U,I-\alpha_I)}{2}=\frac{6.0 \cdot .26 - 6.0 \cdot .24}{2}=0.06 W

(26)

\alpha_P = \sqrt{ {\alpha_{P,U}}^2+{\alpha_{P,I}}^2} = \sqrt{0.05^2+0.06^2} = 0.08 W

(27)

$P$ = 1.50 ± 0.08 W

Calculus approach¶

The calculus approach uses a linearization to determine the effect a measured value $A$ has on the value $Z$ . For a single variable function $Z(A)$ the uncertainty in $Z$ is given by:

\alpha_Z= \frac{\partial Z}{\partial A}\alpha_A

(28)

The linearization will result in an error that becomes noticeable when the error of $A$ is relatively big and there is a lot of curve in the function in that area of $A$ .

The general form for the calculus approach of Z( $A, B$ ,...) with uncertainties ( $\alpha_A$ ), ( $\alpha_B$ ),... is given in (29).

\alpha_Z = \sqrt{ \left({\frac{\partial Z}{\partial A}\cdot \alpha_A}\right)^2+\left(\frac{\partial Z}{\partial B}\cdot \alpha_B\right)^2 + ...}

(29)

An example of the calculus approach for $(P = UI)$ is given in (30).

\alpha_P^2 = \left(\frac{\partial P}{\partial U}\cdot \alpha_U\right)^2 + \left(\frac{\partial P}{\partial I}\cdot \alpha_I\right)^2

(30)

Resulting in (31).

\alpha_p^2 = (I\cdot \alpha_U)^2 + (U\cdot \alpha_I)^2

(31)

Dividing both sides of the equation by $P(U,I)^2$ yields us a simple and direct equation for $\alpha_P$ :

\left(\frac{\alpha_P}{P}\right)^2 = \left(\frac{\alpha_U}{U}\right)^2+\left(\frac{\alpha_I}{I}\right)^2

(32)

More general, for a function $f(y,x)=cy^nx^m$ the uncertainty in $f$ can be calculated by:

\left(\frac{\alpha_f}{f}\right)^2 = \left(\frac{\alpha_c}{c}\right)^2+n^2\left(\frac{\alpha_y}{y}\right)^2+m^2\left(\frac{\alpha_x}{x}\right)^2

(33)

Advantages of both methods¶

The advantage of the functional approach is that it does not use the approximation of a linearization and is therefore more accurate. This is usually only noticeable when the uncertainty of a measured value is large and the function has a lot of bend. The advantage of the calculus approach is that it gives a clearer relation between the uncertainty of a variable and how it propagates into the uncertainty of the determined value

Error in function fit¶

You have plotted your data ( $y_i$ ) with their uncertainties and are now looking for a function $y(x_i)$ that best describes the data set (Note: earlier we talked about a measurement $M$ and function $F$ , this is the same principle). You make an educated guess (or use a theoretical framework) to predict the function. An estimate of how well your function predicts the data is given by:

\chi^2 = \sum (y_i-y(x_i))^2

(34)

If there is a perfect match between data and fit, the sum will be 0. Most curve fitting tools use this principle and are looking for the values for the variables for which this sum is minimal. A good fit goes at least through 2/3 of the error bars.

We can learn a lot about our data by looking at the residuals: $y_i-y(x_i)$ . There might still be a detectable pattern, hidden in the noise. Using Python is an excellent way to find out what your data is telling you...

from ipywidgets import interact
import ipywidgets as widgets
import numpy as np
import matplotlib.pyplot as plt

# Data
x = np.linspace(0, 11, 10)
y = np.random.normal(0, 2, 10)

# Function
def f(a):
    chi_square = np.sum((y - a) ** 2)
    
    # Subplots
    fig, axes = plt.subplots(1, 2, figsize=(12, 5))
    
    # Left: Data and fit
    axes[0].plot(x, a * np.ones(len(x)), 'r--', label=f'a = {a:.1f}')
    axes[0].plot(x, y, 'k.', label='Data')
    axes[0].set_title("Data en lijn")
    axes[0].legend()
    
    # Right: Chi-square as function of a
    a_values = np.linspace(-5, 5, 100)
    chi_square_values = [np.sum((y - a_val) ** 2) for a_val in a_values]
    axes[1].plot(a_values, chi_square_values, 'b-', label="chi^2(a)")
    axes[1].plot(a, chi_square, 'ro', label=f'Current chi^2 = {chi_square:.2f}')
    axes[1].set_title("chi^2 als functie van a")
    axes[1].set_xlabel("a")
    axes[1].set_ylabel("chi^2")
    axes[1].legend()
    
    plt.tight_layout()
    plt.show()

# Interactieve widget
interact(f, a=widgets.FloatSlider(min=-5, max=5, step=0.1, value=0))

Loading...

Poisson distribution¶

We covered so far the Gaussian distribution. However, there is also the Poisson distribution which is important when counting, e.g., radioactive decay. The Poisson distribution is a discrete probability distribution.

The chance of $k$ counting events in a certain amount of time, with the expected value $\lambda$ is given by:

P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}.

(35)

The standard deviation of the Poisson distribution is given by the square root of the expected value: ( $\sigma = \sqrt{\lambda}$ ).

The shape of the function resembles the Gaussian distribution. This is not really strange as for large numbers of $\lambda$ the Poisson distribution indeed becomes similar to the Gaussian distribution. However, for small numbers of $\lambda$ , the Poisson distribution is not symmetrical.

#poison plot
lamb = 20 #lambda

k = np.arange(0,2*lamb+1,1)

def Poisson_prob(k, lamb):
    return  np.exp(-lamb)*np.power(lamb, k, dtype = "float")/special.factorial(k)

plt.figure()
plt.plot(k, Poisson_prob(k, lamb), "b.")
plt.xlim(0,2*lamb)
plt.show()

Practice¶

These questions will help you digest the information described above. The answers can be found at the end of this manual. Do not forget to look at the python assignments!

Exercise 1

Determine the mean, standard deviation and standard error of the following data sets:

0.10; 0.15; 0.18; 0.13
25; 26; 30; 27; 19
3.05; 2.75; 3.28; 2.88

Exercise 2

Use Chauvenet’s criterium to find out whether 19 in the previous task can be considered a real outlier.

Exercise 3

Eric weighed a small cubic box. Its mass was 56 ± 2 grams. The box has sides of 3.0 ± 0.1 cm.

Determine the gravitational force working on this box.
Determine the volume of this box.
Determine the density of this box.
Evaluate whether the density is determined precisely enough to determine the material of the box.

Exercise 4

During an experiment the electrical power of a light bulb is determined by measuring the voltage over and current through the light bulb. The measurements are displayed in the table below:

$U$ (V)	$\alpha_U$ (V)	$I$ (mA)	$\alpha_I$ (mA)
6.0	0.2	0.25	0.01

Measurements on a light bulb

Determine the electrical power of the light bulb.
Determine the resistance of the light bulb.
If you could determine either voltage or current more precise, which would you choose and why?

Exercise 5

In the table below are the values given for constants $A, B$ and $C$ . Determine in each of the following exercises the value and uncertainty of $Z$ , use both the calculus and the functional approach. Describe in each exercise which uncertainty has the biggest influence on $\alpha_Z$ .

$A$	$\alpha_A$	B	$\alpha_B$	C	$\alpha_C$
5	0.1	500	2	1	0.1

$Z = \frac{A}{BC^2}$
$Z = \frac{A}{BC^4}$
$Z = C\sqrt{AB}$

Exercise 6

In the table below are values given for $a$ and $x$ .

$a$	$\alpha_a$	x	$\alpha_x$
$5\cdot10^{-8}$	$1\cdot10^{-9}$	$5\cdot10^{-3}$	$1\cdot10^{-3}$

Calculate the value and uncertainty of $F = a \cdot x^{-4}$ .
Do the same as in the previous exercise, but now calculate $\alpha_F$ for $\alpha_a = 1 \cdot 10^{-9}$ and $\alpha_x = 1 \cdot 10^{-4}$ . What do you notice?
Now let $\alpha_x$ be $1\cdot10^{-3}$ and take $\alpha_x = 1\cdot 10^{-3}$ and $\alpha_a = 1\cdot 10^{-10}$ , what do you notice?

Assignment¶

The final assignment for Measurement and uncertainty consists of a Python Jupyter Notebook with questions. You are allowed to bring code, this manual, notes. If it concerns a morning session the assignment is available from 8:45. Handing in your work, using Brightspace, can be done until 11:30. If it concerns a afternoon session, the assignment is available from 13:45. Handing in your work, using Brightspace, can be done until 16:00. This information will be provided before the test as well.