Robust mathematical modeling

Société de Calcul Mathématique, SA

Robust mathematical modeling

We continue here the description of our joint Research Program with several Companies, Institutions and Universities.

What is a model ?

A model is a set of rules, or formulas, which try to represent the behavior of a given phenomenon. For instance, if you throw an object upwards, you may wish to know how long it will take before it hits the ground and where it will fall: this will be given by mathematical formulas.

Another example is the propagation of a disease: you may wish to know how many people will be infected after a certain number of days. This is likely to depend upon a large number of parameters: type of disease, category of population, habits, temperature, and so on.

Who needs mathematical models ?

Usually, there is already a good deal of empirical knowledge around any given phenomenon: mankind was not born yesterday. So why should we build mathematical models ? There are three reasons:

It gives a better understanding of the phenomenon, which leads to a more precise tuning of the parameters;
It warns you if you get "off limits" : for instance, this device works fine if the current is between 1 and 10 A, but what if the current is 0.1 or 50 A ? An empirical knowledge usually is not enough to answer such questions;
It allows you to find the values of the parameters which lead to a given result. For instance, you want your object to fall at 5 km from you; with what angle and speed should it be thrown ?

So, building a mathematical model usually means better control upon the phenomenon, which, in turns, means more precision, cheaper results and better quality output.

Several classes of models

Three classes appear :

Models which come from laws of physics: this is the case for gravitation laws, Maxwell equations (waves), Navier-Stokes equations (fluid dynamics), and so on;
Model which come from empirical laws, such as air resistance for a movement: this laws are of empirical nature;
Models which use statistical laws, for instance that fit a line between several points and assume the response to be linear.

Warning : These classes are not as distinct as they look. There is no such thing as a "law of physics". All our knowledge is largely empirical. For instance, for the fall of an object, one starts with the usual law of gravitation, but the Earth is not reduced to a single point, so a second model is built, considering it as a sphere. But it is not exactly a sphere either, so one has to take into account the shape near the poles : this is a third model. Also, air resistance should come in, and this is a fourth model. If the speed of the object is high, a relativistic correction should be added: fifth model. And finally, of course, the Earth rotates when the object is in the air, so this also has to enter: sixth model. All these models are largely empirical. For instance, NASA and ESA use “models of atmosphere” (air density as a function of altitude) which are statistical in nature.

The three questions that lead to a good model

What are the objectives ? (what do we want ?)
What do we know ? (what are the known laws ?)
What are the data ? (what did we observe ?)

These questions should be addressed in this order. Indeed, "what do we want ?" comes first, because it will decide of the whole structure of the model: should it be very precise? should it be coarse ? For the fall of an object, for instance, you will not build the same model at all if you want to predict the arrival point with a precision of 100 m or with a precision of 1 mm. In the first case, a rough gravity model will be enough, and it will take you five minutes to complete, in the second, you will likely need very precise properties of the atmosphere (pressure, temperature at various heights, speed of wind) which you will never obtain: you might spend years at it, and it won't work ; no model, at present, is capable to predict with a precision of 1 mm the fall of an object thrown, say, from 1 km away.

Then, the law comes second: once the objectives have been defined, one tries to figure out what are the parameters that interfere. It might be the wind, for a falling object, the age for a disease, and so on. How do these parameters interfere ? Under what laws ?

Finally, the data come third. In order to build and validate a model, numerical data are of course necessary, but these data should not be collected until the first two questions are answered : what objectives ? what laws ?

Indeed, if you start collecting data from scratch, without thinking of what you want to do, very likely you will always find that you do not have enough, so you will never start thinking. And you will finish with an enormous amount of useless data.

Who should build mathematical models ?

Anyone should, and indeed anyone does. When you make some simulations about your taxes, and find out you have better donate something to your children, or pretend you are taking care of your grand parents, this is genuine mathematical modeling, and you can legitimately be proud of yourself.

It's just like plumbing : you can buy a big drill and start making some big holes. But in some cases, better leave it to professionals.

Numerical implementation ; computer implementation

When the model is built in theory, then the numerical part comes : for instance, for the propagation of a pollution, you would divide the zone into squares of 1 km each, and find how each square receives some amount of pollution, adds to it, and passes it to its neighbors. Then, the whole thing is put into a computer, which will allow some visualization : you might see a map on the screen, showing how the pollution propagates over a whole country.

These two steps : numerical implementation, computer implementation, are just as important as the initial theoretical model. They should receive exactly the same amount of attention. If one of them is poorly made, the whole process will be affected. For instance, if the numerical implementation is too coarse, it won't reveal some details, locally important, that might be required. On the other hand, if it is too thin, if for instance the zone is divided into squares of 1 m instead of 1 km, the computer will take hours, for a result which will not be more precise, if the laws do not permit this precision (what do we know about contamination ?).

So the whole procedure is an art : the art of mathematical modeling, as Don Knuth said about "the art of computer programming". It is far from being a science.

If one wants a mathematical model to be effective, one cannot afford to be lousy on any of its aspects. Let's say it very clearly : Nature is always very complicated, and even if we do our best all the way through, being very careful at each place, we hardly succeed in producing anything satisfactory. Let's always remember to be modest.

Robust mathematical models

There is a natural tendency to build precise models, which can end up as theorems. A theorem is a well-proven edifice: If the assumptions are exactly this, I can prove that the outcome will be exactly that. For instance, if I can prove that the solutions to a problem, such as the propagation of a blast, tend to zero at infinity, I will not have to worry about getting a protection if I am far enough: this is satisfactory, intellectually speaking. But what were the assumptions and what do I mean by "far enough" ?

In real life, the three requirements we mentioned earlier are never correctly fulfilled: the objectives are unclear or contradictory, the laws are unknown, the data are missing or corrupted.

A robust mathematical model (in short RMM) is, by definition, a model which takes these uncertainties into account. It will work, it will give something, even if the objectives are unclear, even if the laws are uncertain, even if the data are corrupted.

You might think of garbage collecting, and see it as a variation of the "traveling salesman" problem: find the shortest path through all houses in a city. So you would try to locate precisely each container (perhaps using a GPS), each road which is closed for repair (which requires real-time information), find the present position of each truck, and then you would launch some gigantic algorithm, in order to find the shortest path, or perhaps the quickest : this would take hours. And this would be totally useless, because the true problem of the companies which do garbage collecting is the total cost of ownership of the trucks over a year.

You might want to pack oranges in a clever way: a well-known problem to the National Science Foundation in the US ! Then you would measure precisely each orange (they are not spherical and they are not equal), and you would launch some gigantic algorithm, which would tell you, within 12 hours of computation, that indeed you can put 68 oranges in a box where some illiterate immigrant puts 67 in 10 seconds. That's great mathematics.

No, precisely, this is not what robust mathematical models will do. A robust mathematical model will tell you in seconds :

How many trucks and drivers you will need for garbage collecting in a given city, and how long it will take. So, when you finish collecting in the morning, you can lend the trucks to the next city, in the afternoon, and you save money;
How many oranges you can put in a given box, taking into account some possible variation in their sizes;
How many people might be infected by a disease, depending on the way it disseminates.

Robust models in general do not only give values, they give intervals: the number of infected people may be between this number and that number, depending on the circumstances.

They may, for more refined models, give probabilities. This is more refined than just an interval. An interval is built with extreme values, but you may be happy to know, for instance, that in 95 % of the cases, a smaller interval will suffice : this is the probabilistic approach.

To continue the description of our program, please click here

Back to RMM's main page
Back to SCM's home page (English)
Back to SCM's home page (French)
Back to SCM's home page (Russian)