When Maths are Interesting

Compute the dielectrict constant for the Bruggeman Effective Medium Approximation

2016-09-04T12:50:00.001-07:00

The Effective Medium Approximation (EMA) is used to obtain the effective dielectric constant \(\epsilon\) of a material that is formed by a mix of different materials in the limit of large wavelengths when the geometry of the mixture can be ignored.

Among its applications the EMA is used is often used in reflectometry and ellipsometry to model the surface roughness of film or to model voids or other kind of inclusions.

In this context the Bruggeman Effective Medium Approximation (BEMA) is a specific recipe to compute the approximation of \(\epsilon\). The formula that should be satisfied for \(\epsilon\) is:\[\sum_i f_i \frac{\epsilon_i - \epsilon}{\epsilon_i + 2 \epsilon}=0\]where \(f_i\) are the fractions of each material and \(\epsilon_i\) the corresponding dielectric constant. The terms \(f_i\) obey to the equation\[\sum_i f_i = 1\]
The \(\epsilon\) of the Bruggeman model is defined implicitly by this first equation but computing the solution is not as simple as it may appear.

The problem is that the unknown \(\epsilon\) appear both on the numerator and on the denominator of each fraction and this make the equation non-trivial to solve. In the attempt of solving the equation one can observe that it can be transformed into a polynomial equation of degree n where n is the number of fractions. If one consider the the terms \(\epsilon_i\) can have an imaginary part it turns out that the polynomial has complex coefficients.

One problem of transforming the equation in polynomial form is that is that the coefficients are difficult to compute in the general case. In addition the numerical solution of a complex polynomial of degree n is computationally expensive and we will have an additional problem. The polynomial equation will have n complex roots while we are looking for just one solution, the only one that is physically meaningful.

So the problem is, how the equation can be practically solved in a reasonable way and how we can be sure to obtain the physical solution ?

My first guess was to use a simple iterative approach. I make the ansatz that the solution will be close to the weighted average of \(\epsilon_i\), i.e. \(\epsilon = \sum f_i \epsilon_i\) which is the solution the equation would have if the denominators were all equals. As we know that the \epsilon given will not be a solution we can just assume that it is just an approximation and solve the equation above replacing the \(\epsilon\) in the denominator by our guess. In this way we will obtain a new estimation of \(\epsilon\) and we can iterate using therefore the following schema:\[\epsilon^{(i+1)} = \frac{\sum \frac{f_i \epsilon_i}{\epsilon_i + 2 \epsilon^{(i)}}}{\sum \frac{f_i}{\epsilon_i + 2 \epsilon^{(i)}}}\]
It turns out that the idea is not so bad and it will converge to the solution in many cases. In addition the approach above can be replaced using the Newton-Raphson method that may converge more rapidly.

The problem with the approach above in that is some cases, when mixing a dielectric with an absorbing material like a metal or silicon at some wavelengths, the iterative procedure converges very slowly or it can converge to the wrong solution so the method is not reliable.

What happens is that when mixing an highly absorbing material the initial estimate we use, the weighted average of epsilons, is a really bad estimation of the actual solution and this explain the convergence problem of the iterative approach.

The requirement I had for Regress Pro, an ellipsometry and reflectometry application, was to have a reliable computation of epsilon that converge in all cases. It was not acceptable to say, "sorry, in this case I was unable to found a solution" !

By thinking about the problem I finally had a good idea that let me solve reliably the equation with the guarantee that it will converge to the real physical solution. In addition the amount computations involved is quite reasonable which is also important.

The idea is quite simple. Let us consider the \(\epsilon\) that satisfies the first equation but with a set of fractions that depends on a parameter we will call \(t\):\[\sum_i f_i(t) \frac{\epsilon_i - \epsilon}{\epsilon_i + 2 \epsilon}=0\]by dropping the constraint that the fractions add up to one.

The fractions are defined with \(t\) as follow:\[\begin{array}{ll}f_1(t) = f_1 & \\ f_i(t) = t \, f_i & \textrm{for} \, i = 2 \dots n \end{array}\]In practice this schema can be seen like if the fractions \(f_2, \dots f_n\) were gradually turned on when \(t\) grows from 0 to 1. One interesting thing is that we know the solution for \(t = 0\), it is just \(\epsilon = \epsilon_1\) since all the other fractions are zero. On the other hand, for \(t = 1\) \(\epsilon(t)\) will be just the solution to the BEMA equation we are searching.

The point now is, ok, we know \(\epsilon(t)\) for \(t=0\) but we need to obtain \(\epsilon(t)\) for \(t=1\) so we need some way to bring our solution for \(t = 0\) to \(t = 1\). How it can be done ? Well, it happens that a simple solution exists, one just needs to find out the the differential equations to which \(\epsilon(t)\) obey over the variations of the parameter \(t\) and solve the differential equation numerically using \(\epsilon(0)\) as our starting point.

In the language on numerical computation the differential equation we want to solve is an Ordinary Differential Equation (ODE) of a complex variable \(\epsilon\). In most case a solver can be found for arrays of real variables, let say \(y_i(t)\) so in our case we can just pretend that the real and imaginary part of \(\epsilon\) are two real-valued components of an array of size two.

The differential equation for \(\epsilon(t)\) can be found just by differentiating wrt \(t\) the BEMA equation. One easily founds:\[\epsilon'(t) = \frac{1}{3} \, \frac{\sum_{i=2}^n f_i \frac{\epsilon_i - \epsilon}{\epsilon_i + 2 \epsilon}}{\sum_{i=1}^n f_i \, \alpha_i(t) \, \frac{\epsilon_i}{(\epsilon_i + 2 \epsilon)^2}}\]where the coefficients \(\alpha_i(t)\) are defined as:\[\begin{array}{ll}\alpha_1(t) = 1 & \\ \alpha_i(t) = t & \textrm{for} \, i = 2 \dots n \end{array}\]
I have opted to solve the differential equation working directly on the complex \(\epsilon\) and using the Runge-Kutta-Fehlberg (4, 5) method by adapting of the code found in the GNU Scientific Library. My implementation can be found in the Regress Pro's github repository in the disp-bruggeman.c file.

I have also added a final step, after the ODE integration, to "refine" the solution using the Newton-Raphson method. In this way we can use a more relaxed requirement on the accuracy of the ODE integrator and refine the solution using the quickly converging Newton-Raphson method. In this approach we are actually using the ODE integration to obtain an \(\epsilon\) which is just close to the solution and leave the final determination of \(\epsilon\) to the Newton-Raphson iterations.

The formula for the Newton-Raphson iterations is:\[\epsilon^{\textrm{(next)}} = \epsilon + \frac{1}{3} \, \frac{\sum_i f_i \frac{\epsilon_i - \epsilon}{\epsilon_i + 2 \epsilon}}{\sum_i f_i \frac{\epsilon_i}{(\epsilon_i + 2 \epsilon)^2}}\]

Some Final Remarks

The solution of the differential equation exists and is unique under the condition that \(f_1 \neq 0\). To ensure the condition of \(f_1\) and also to have more treatable equation I have chosen to reorder the fractions so the \(f_1\) is always the biggest coefficient.

It should be noted that \(\epsilon(t)\) has a physical meaning for each value of \(t\) even if the fractions are not normalized. One just need to observe that the equation of \(\epsilon\) can be multiplied by a normalizing constant without changing its solution.

Final Thoughts

The solution I present here for the Bruggeman EMA equation can have a real interest for people implementing the Bruggeman model but it is otherwise a very specific application. What I found more interesting and gave me the motivation to write a post was the elegance and the effectiveness of the approach that I found. It does leverage the knowledge we have about the equation and is able to bring a solution by starting from a configuration where a solution is known.

Matrix representation for tridimensional space geometric algebra

2013-05-11T15:44:00.002-07:00

In my previous post I wrote about Geometric Algebra generalities. We saw that the tridimensional space generate a geometric algebra of dimension \(2^3 = 8 = 1 + 3 + 3 + 1\) composed of four linear spaces: scalars, vectors, bivectors and pseudo-scalars.

The elements of the subspaces can be used to describe the geometry of euclidean space. The vectors are associated to a direction in the space, bivectors are associated to rotations and the pseudo-scalar corresponds to volumes.

The product of two vectors is composed of a scalar part, their scalar product, and a bivector part. The bivector part correspond to the oriented area of the parallelogram constructed with the two vectors like is shown in the figure below:

An illustration of a vector, a bivector and a volume, equivalent to a pseudo scalar (image from Wikipedia)

GA computations in euclidean space

If you want to use the GA space to make computations you will need to "represent" the elements of each space. With my surprise I discovered that a generic multivector for tridimensional space can be represented by a 2x2 complex matrix.

A rapid calculation shows that the dimensions of the space is fine: 2x2 complex matrix have dimension 8 just like tridimensional GA space.

The interest of this representation is that the GA product corresponds to the ordinary matrix product.

What is the actual matrix representation ?

Well, the easy one is the unit scalar. We can easily guess that it does correspond to the unitary matrix\[\begin{pmatrix}1 & 0 \\ 0 & 1 \end{pmatrix}\] but things get slightly more interesting for vectors.

If we call \(\hat{\mathbf{x}}\), \(\hat{\mathbf{y}}\), \(\hat{\mathbf{z}}\) the basis vectors, their matrix counterparts should satisfy the following equalities\[\hat{\mathbf{x}}^2 = \hat{\mathbf{y}}^2 = \hat{\mathbf{z}}^2 = 1\] It can be verified that the properties above are verified by hermitian matrices so that we can write \[\hat{\mathbf{x}} = \begin{pmatrix}0 & i \\ -i & 0 \end{pmatrix} , \qquad \hat{\mathbf{y}} = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} , \qquad \hat{\mathbf{z}} = \begin{pmatrix}1 & 0 \\ 0 & -1 \end{pmatrix} \] Once we have the vectors we can derive the matrices for bivectors by taking their products to obtain \[\hat{\mathbf{x}} \hat{\mathbf{y}} = \mathbf{i} \hat{\mathbf{z}} = \begin{pmatrix}i & 0 \\ 0 & -i \end{pmatrix} , \qquad \hat{\mathbf{y}} \hat{\mathbf{z}} = \mathbf{i} \hat{\mathbf{x}} = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix} , \qquad \hat{\mathbf{z}} \hat{\mathbf{x}} = \mathbf{i} \hat{\mathbf{y}} = \begin{pmatrix}0 & i \\ i & 0 \end{pmatrix} \]
Finally the pseudo-scalar is obtained as the product of the three basis vector\[ \mathbf{i} = \hat{\mathbf{x}} \hat{\mathbf{y}} \hat{\mathbf{z}} = \begin{pmatrix}i & 0 \\ 0 & i \end{pmatrix} \]
Since this latter matrix is equal to \(i I\) we can simply identify the imaginary unit with the unitary pseudo-scalar \(\mathbf{i}\).

And so, what ?

From the practical point of view it means that you can represent a vector of components \((v_x, v_y, v_z)\) with the matrix \[\mathbf{v} = \begin{pmatrix} v_z & v_y + i v_x \\ v_y - i v_x & -v_z \end{pmatrix}\]The vector addition can be performed simply by taking the matrix sum.

The multiplication of two vectors yield a bivector whose general matrix representation is \[\mathbf{w} = \begin{pmatrix} i w_z & i w_y - w_x \\ i w_y + w_x & -i w_z \end{pmatrix}\]

These relation let us compute the matrix representation of any vector or bivector but we need also to perform the opposite operation: given a complex matrix, extract its scalar, vector, bivector and imaginary parts.

This is actually quite trivial to work out. For any complex matrix\[\begin{pmatrix} s & u \\ v & t \end{pmatrix}\] the scalar part, real plus imaginary, is simply \(\frac{s + t}{2}\). The vector part is given by\[\begin{pmatrix} v_x + i w_x \\ v_y + i w_y \\ v_z + i w_z \end{pmatrix} = \begin{pmatrix} \frac{u - v}{2i} \\ \frac{u + v}{2} \\ \frac{s - t}{2} \end{pmatrix}\]

With the relation above you can extract the scalar and vector part for any given complex matrix.

Rotations

Rotations of a vector \(\mathbf{v}\) can be expressed in GA using the relation\[e^{\mathbf{i} \, \hat{\mathbf{u}} \, \theta / 2} \, \mathbf{v} \, e^{-\mathbf{i} \, \hat{\mathbf{u}} \, \theta / 2}\] where \(\hat{\mathbf{u}}\) is the versor oriented along the rotation axis.

So if you want to compute rotations you can do it by using the exponential of a bivector. This can be obtained very easily using the relation\[e^{\mathbf{i} \, \hat{\mathbf{u}} \, \theta} = \cos(\theta) + \mathbf{i} \, \sin(\theta) \hat{\mathbf{u}}\]

The relation above does not let you compute the exponential of any 2x2 complex matrix. It only work for matrices that represents bivectors. You can easily extend the formula to take into account a scalar component but things get complicated for the vector part.

Actually the exponential of a vector involves the hyperbolic sinus and cosinus as given by the relation\[e^{\hat{\mathbf{u}} \, \theta} = \cosh(\theta) + \sinh(\theta) \hat{\mathbf{u}}\]but the exponential of a combination of vector and bivectors is more complicated (I confess I do not know the explicit formula).

Anyway this is not really a problem since for rotations you only need to take the exponential of bivectors plus real numbers.

A silly example...

Let us make an example to compute a rotation of 30 degree of a vector \(\mathbf{b} = 1/2 \, \hat{\mathbf{x}}+\hat{\mathbf{z}}\) around the z axis. In matrix representation we have\[\mathbf{b} = \begin{pmatrix} 1 & i/2 \\ -i/2 & -1 \end{pmatrix}\]The rotation of 30 degree is given by\[U \mathbf{b} = e^{i \pi/12 \hat{\mathbf{z}}} \, \mathbf{b} \, e^{-i \pi/12 \hat{\mathbf{z}}}\] which, in matrix representation becames\[\begin{multline}U \mathbf{b} = \begin{pmatrix}\cos(\pi/12) + i \sin(\pi/12) & 0 \\ 0 & \cos(\pi/12) - i \sin(\pi/12) \end{pmatrix} \begin{pmatrix} 1 & i/2 \\ -i/2 & -1 \end{pmatrix} \\ \begin{pmatrix}\cos(\pi/12) - i \sin(\pi/12) & 0 \\ 0 & \cos(\pi/12) + i \sin(\pi/12) \end{pmatrix} \end{multline}\]and by carrying out the products we obtain\[U \mathbf{b} = \begin{pmatrix}1 & -1/2 \sin(\pi/6)+i/2 \cos(\pi/6) \\ -1/2 \sin(\pi/6)-i/2 \cos(\pi/6) & -1 \end{pmatrix} \]which is the expected results, since rotation around the z axis transform the x component into a mix of x and y with coefficients \(\cos(\pi/6)\) and \(\sin(\pi/6)\).

Le coup de scène

We said above that to generate rotations we only need bivectors and real numbers. It turns out that they constitute a subalgebra of the GA euclidean space. It is called the even subalgebra and it is isomorphic to quaternions. This shows that quaternions and bivectors are essentially equivalent representation and both are inherently related to rotations.

To terminate this post, the reader who already know quantum physics have probably noted that the basis vectors in the matrix representation are actually the Pauli matrices. What is interesting is that in Geometric Algebra they appear naturally as a representation of the basis vector and they are not inherently related to quantum phenomenons.

From the practical point of view we have seen that 2x2 complex matrix can be used to compute geometrical operations in an elegant and logical way. The matrix representation is also reasonable for practical computations even if it is somewhat redundant since it always represent the most general multi-vector with 8 components.

Geometric Algebra, the wonderful revelation

2013-05-04T14:56:00.001-07:00

Some time ago I discovered on Hacker News the Oersted Medal lecture of David Hestenes, Reforming the mathematical language of physics . The text describe among other things the field of geometric algebra with a very didactic and easy to read style.

For me it was a revelation, I was fascinated by elegance of this theory and how naturally it did explain a lot of things that were otherwise unconnected.

Hestenes does explain that you can sum scalar and vectors and it make sense and you can multiply vectors in a natural and meaningful way. The Algebra that you obtain, the Geometric Algebra, is incredibly rich and suggests a lot of ideas that could have been otherwise unseen.

The central idea of the geometric algebra is the product of two vectors written as \(\mathbf{a} \mathbf{b}\). It is not commutative and it does embody the concepts of both scalar and extern product.

One of the most fascinating things to me was the fact that each space of dimension \(n\) is associated to a multi-vector space of dimension of \(2^n\). In turn the multi-vector space is composed of n + 1 subspaces each of dimension \[\binom{n}{k} \qquad \textrm{for} \, \, k = 0, \ldots, n \]
For example the space has dimension 3 so it is broken down to 4 subspaces of dimensions 1, 3, 3, 1. The first one is the space of scalars which are just real numbers. Then we have the familiar vector space of dimension 3. What is interesting is the other two subspaces of dimensions 3 and 1. They are the bivectors and the pseudo-scalars proportional to the imaginary unit \(i\).

Actually one of the first things you discover when you learn geometric algebra is that the imaginary unit is naturally introduced by geometric algebra for each space of dimensions greater then one. It is quite fascinating to me to discover that a purely geometric theory naturally requires the imaginary unit whereas in ordinary geometry it is an extraneous concept.

The imaginary unit is actually related to the vectors by the relation\[\mathbf{e}_1 \mathbf{e}_2 \mathbf{e}_3 = \mathbf{i}\]

Now you can wonder what bivectors are and what is their meaning. Probably their properties are better described by the geometric algebra itself but you can get an idea of what they are quite easily. You can think to them as product of two orthogonal vectors or equivalently as purely imaginary vector.

Actually the following equality holds for the tridimensional space \[\mathbf{e}_1 \mathbf{e}_2 = \mathbf{i} \, \mathbf{e}_3\]
where the vectors \(\mathbf{e}_i\) are the base vectors. The expression above is actually true for any pair permutation of the three indexes.

The image above, taken from Wikipedia, illustrates the basis vectors and their product, including the pseudo-scalar depicted as a volume.

From the relation above it is quite clear that there is a one-to-one relation between vectors and bivectors. The transformation is done simply by multiplying the vector by the imaginary unit like in the right hand side of the equation above.

The bivectors explain the mysterious axial vectors that you probably already know. Actually axial vector are nothing else that bivectors that you represent by an ordinary vector using the biunivocal relation between them.

One of the most wonderful simplification stemming from GA is that the electromagnetic field can be thought as composed of a vector and a bivector part as described by the relation\[F=\mathbf{E}+\mathbf{i}\mathbf{B}\]
so that all the Maxwell equations are expressed by a single equation:
\[\left(\frac{1}{c}\partial_t + \nabla \right) F = \rho - \frac{1}{c} \mathbf{J}\]
For example the equation above, in the electrostatic case, becomes:
\[\nabla F = \rho \] that correspond to the two equations \[\nabla \cdot \mathbf{E} = \rho \qquad \nabla \wedge B = 0\]

Another fascinating thing about Geometric Algebra is that the non-commuting rules of basis vectors in tridimensional space is the same of relations of the Pauli matrices. The difference is that, in geometric algebra the relations are a simple properties of the geometry and does not need to be introduced ad hoc like it happens in quantum mechanics.

Last but not least one of the most clear advantage of GA is to describe spatial rotations directly in term of vectors. For example is \(\hat{\mathbf{u}}\) is an unitary vector a rotation of a vector \(\mathbf{v}\) of \(\theta\) degrees around \(\hat{\mathbf{u}}\) is described by \[e^{\mathbf{i} \, \hat{\mathbf{u}} \, \theta / 2} \, \mathbf{v} \, e^{-\mathbf{i} \, \hat{\mathbf{u}} \, \theta / 2}\]

For the interested people I strongly recommend to read the lecture of Hestenes for an excellent, complete introduction to this fascinating subject.

Data Frames for GSL Shell

2013-04-30T13:57:00.001-07:00

Lately I've made a lot of work to implement "General Data Tables" in GSL Shell. I choose this name to designate what is otherwise called DataFrame in GNU R or other environments.

The difference between data tables and matrices are:

each column is identified by a name
the data in each cell can be a number but also a string or be undefined

The fact that you can store strings in each cell is very useful, I guess everyone can understand the reasons, not all data is numeric.

In addition the fact that each column has a name greatly simplifies a lot operations since you can refer to the data by name instead of having anonymous columns identified by an index.

Here an example taken from the excellent |STAT user manual of Gary Pearlman.

student	teacher	sex	m1	m2	final
S-1	john	male	56	42	58
S-2	john	male	96	90	91
S-3	john	male	70	59	65
S-4	john	male	82	75	78
S-5	john	male	85	90	92
S-6	john	male	69	60	65
S-7	john	female	82	78	60
S-8	john	female	84	81	82
S-9	john	female	89	80	68
S-10	john	female	90	93	91
S-11	jane	male	42	46	65
S-12	jane	male	28	15	34
S-13	jane	male	49	68	75
S-14	jane	male	36	30	48
S-15	jane	male	58	58	62
S-16	jane	male	72	70	84
S-17	jane	female	65	61	70
S-18	jane	female	68	75	71
S-19	jane	female	62	50	55
S-20	jane	female	71	72	87

The data above can be used to show some of plotting functions.

What is very interesting is that, having the data in tabular format, many operations becomes very easy. For example to create an histogram of the "final" column you can simply type:

> gdt.hist(ms, "final")

to obtain the following plot:

Given the data above you may wish to have a more expressive plot based on the teacher and the sex of the students. Here come to help the "gdt.plot" function I'm very proud of. You can use it very simply:

> gdt.plot(ms, "final ~ teacher, sex, student")

to obtain the following plot:

The function "gdt.plot" use a sort of mini language that let you specify what should be plotted (y variables) in term of which variables.

Something interesting is that the function figure out by himself if the x variable is a numeric variable of an enumeration like in the example above. In addition you can "layer up" more enumeration variables just like you can do with Excel's pivot tables.

The following form can be also used:

> gdt.plot(ms, "final ~ sex, student | teacher")

to create many lines grouped by the field teacher.

The mini language is actually quite flexible. You can use arbitrary mathematical expression, not just variable names. If you want you can try to discover yourself its possibilities. There is a specific chapter in the GSL Shell's user manual.

I hope this is interesting for you. In the next post I will talk about the linear regression function modelled after the GNU R's function "lm"...