Reading dymolas mat-files with python

INFO: This post is outdated! Please have a look here.

Dymola stores its simulation results in regular mat-files, but the program uses a special variable structure to store the data efficiently. An easy way to access the data is MATLAB, but I was looking for a more pythonic solution.

If you want to access the data from python you can use the function scipy.io.loadmat() from the scipy package. Some conversion is necessary get access to the data as simple time rows. The format of the result file dsres.mat is explained dymola users guide. Short version: the file contains matrices of different shape, every matrix has its own abscissa. The names of all dymola variables are stored in a special matrix. Every dymola variable name points to a column of one of the data matrices. Usually a lot of names point to the same column, possibly using a different sign (+/-).

The following python script can be imported as a module or run as a standalone script.

http://download.j-raedler.de/Python/DyMat/DyMat.zip

The first argument is the name of the mat-file. If no second argument is given, it prints  the names of all variables in the file. When you provide a variable name as the second argument, the values are printed with the corresponding time values in a format that can be directly printed with gnuplot. You may need to quote the variable name for the shell in some cases.

$ python DyMat.py dsres.mat
x
der(x)
t
$ python DyMat.py dsres.mat "der(x)" # Time | der(x)
0.000000 0
200.000000 0.0597325
400.000000 0.117873
600.000000 0.172892
800.000000 0.223386
1000.000000 0.268128
...
$ python DyMat.py dsres.mat "der(x)" > tmp
$ gnuplot
gnuplot> plot "tmp"

Outlook

The module needs more testing. I played a bit with the conversion to netCDF, XLS and CSV, but it's not easy to transfer the space-efficient data structure to these formats. The conversion to netCDF with the same structure is trivial, but has no advantage over the original format. If you expand all variables the size of the file will explode. This needs to be checked in detail.

When I find the time I will implement a handler for pydap. A plotting tool based on PyQwt would be nice too.

How to convert a csv file to netCDF with 7 lines of python code

When preparing files for the ncDataReader2 I often have to convert text files that contain values separated by TABs or commas. In most simple cases this can be done with some lines of python code. An example is here:

  1. import sys, pupynere
  2. ll = [l.strip().split(',') for l in open(sys.argv[1]) if not l.startswith('#')]
  3. vv = zip(*[map(float, l) for l in ll])
  4. nc = pupynere.netcdf_file(sys.argv[1]+'.nc', 'w')
  5. nc.createDimension('dim', None)
  6. for i in range(len(vv)):
  7. nc.createVariable('var_%02d' % i, 'd', ('dim',))[:] = vv[i]

This short program will read the column data separated by commas, skip comment lines starting with # and write a netCDF file, all in 7 lines of code! Pupynere is one of the netCDF-modules available for python. It is written in pure python and has an API which is compatible with 'Scientific.IO.NetCDF'. This simple approach works for small, well-formed files without headers. For all other cases I'm working on more sophisticated tool which uses the CSV module of python.

ncDataReader2 | 2.2.4

A new release 2.2.4 of ncDataReader2 is out. Some bugs were fixed and the API has slightly changed. This version should work both with Dymola and OpenModelica on Linux and Windows. When you have a DAP-enabled netcdf library  it will even fetch data from the web in background. I already did some Dymola simulations with on-demand loading of data from a server, a post about this topic will follow.

Changes include:

  • the meaning of offset and scale factor for a ncDatSet1D was changed
  • chunk loading was implemented
  • Akima interpolation was changed heavily, works much faster now
  • better support for newer netcdf libraries
  • new helper functions for setting attributes
  • changed some char* to const char* to work with OpenModelica
  • a lot of bugs were fixed

Polygon | reading and writing gpc files

A user pointed me to a problem when reading and writing gpc files. GPC understands two kinds of files, with or without hole-flags. Polygon supports both formats, but this was not yet documented. When using the Polygon() constructor with a filename or the read() or write() methods you may specify an optional  second boolean argument. It defaults to True (with hole-flags), with False you may read and write the old-style formats without hole-flags.

Caution: the file format will not be checked automatically! When using the wrong settings the coordinates read from file may be incorrect!

The next release of Polygon will contain updated documentation and a function to manually check the format of a gpc file.

Accessing external data from Modelica with ncDataReader2

This example demonstrates the use of the library ncDataReader2 to access data sets from Modelica in a way suitable for the simulation system. The access is handled via external functions from Modelica which read and interpolate the data stored in netCDF files.

Dymola is used as the simulation system here. As ncDataReader2 is a simple, platform-independant  library implemented in ANSI-C this approach should work (in theory) with other Modelica implementations too. As soon as I have tested other simulation systems the results will be documented here.

Why?

When working with complex models you often have to access external data. This may be measured time rows of climate parameters or friction factors for different plant components. Some advantages of using ncDataReader2 in such situations:

  • data is presented as continous functions of one or two variables with automatic interpolation
  • can handle very large data sets that will not even fit into memory at once (load on demand)
  • offers different interpolation and extrapolation methods
  • data is stored in netCDF format which offers very efficient access to multi-dimensional data
  • data files can be easily exchanged without changing or recompiling your model
  • data files can contain many different data sets (scalar, 1D, 2D) at the same time
  • when you store interrelated data sets in one file, the chance of accidently mixing inconsistent data is minimized

A typical use is reading weather files into a thermal building simulation. The weather files contain different climate parameters like air temperature and humidity, radiation, wind speed as time rows for one year as well as latitude, longitude and the timezone for one location. You can easily exchange the data file and simulate the building in a totally different location without touching the model. ncDataReader2 is used in the Modelica library FluidFlow.

Prerequisites

If you don't already have Dymola installed, you can download a demo version here. It has some restrictions but will work with the simple model used in this example.

The simple model

The following model is used as a starting point for our example. The equation for t will be exchanged with a function call in the next steps. x is just a dummy to have something to integrate.

http://download.j-raedler.de/Modelica/ncDataReader2/NcTest1.mo

Installing ncDataReader2

There is no special installation procedure. Just put the necessary header and library files into the working directory (usually where you saved the model file). Which files are actually needed depends on the Dymola version and the compiler (gcc, msvc). The easiest way is to extract the whole contents of precompiled binary package (Win32 zip) into the working directory. It includes everything needed by Dymola and the executables ncgen and ncdump which are used in the next step. If you compile ncDataReader by yourself, you should at least copy ncDataReaderEA.h and the netCDF and ncDataReader2 library files (dll and/or lib).

Defining Modelica functions

The  next file makes the C functions of the library ncDataReader2 available in Modelica. It contains descriptions of the most important functions of the library (actually of the easy api / EA) that can be read by Modelica. Just put it in the working directory.

http://download.j-raedler.de/Modelica/ncDataReader2/NcDataReader2.mo

Generating the data file in netCDF format

The data must be stored in file in netCDF format. There are many different ways to convert data to this format. My favourite way is a python script that uses one of the netCDF-modules.

But in this example we will write a CDL text file and convert it to netCDF. This step does not require additional software. Use the command:

ncgen -o datafile.nc datafile.cdl

to convert the following file or download the example file here. ncgen is usually shipped with netCDF.

http://download.j-raedler.de/Modelica/ncDataReader2/datafile.cdl

This file contains the two variables time and temperature . The temperature will be interpolated with cubic Akima splines and time will be handled periodically. A netCDF file can be converted back to a CDL file with ncdump.

Adjust the model and simulate

The variable t as a function of time will now contain values from the file instead of the functions used above. An import statement is necessary to use the functions from the library. The call to NcEasyGet1D will return a value of the variable temperature in the file datafile.nc for the current value of the variable time.

http://download.j-raedler.de/Modelica/ncDataReader2/NcTest2.mo

That's it! Just run the simulation and look at the results.

The temperature is read and interpolated from the file. It can be used as a normal variable in models.

Hints

This was only a very basic example. Of course you can read and interpolate a large number of variables using different methods of interpolation and extrapolation. Scalar parameters can be read as well. Please have a look at the documentation of ncDataReader2 for more information on the naming conventions, attributes and more.

The conversion of data to the netCDF formats looks confusing to the beginner but is not hard to understand. A typical CSV file can usually be converted with a couple of lines of Python code which is my favourite method.

ncDataReader2 works both with netCDF-3 and netCDF-4, but you should not mix the tools and libraries between both versions.

Making Windows work – free software essentials

Although working mostly with linux  I need to install or to use a MS Windows machine from time to time. Being spoiled by the large amount of software available for linux I really miss a lot of functionality on a "naked" windows computer. After the installation and update process of windows I usually install a bunch of packages which are open source or at least free (as in beer). Here's the list of my favourites. It's not meant as a complete list for typical users but more as a reminder for myself.

Office / Productivity / Multimedia

Programming / Math

  • python(x,y) - scientific python distribution with lots of additional packages
  • processing - programming language for graphical applications and sketches
  • TortoiseSVN - client for version control with svn
  • Scite - text editor for programmers (shipped with python(x,y))
  • gnuplot - plotter for data sets and functions (shipped with python(x,y))

Tools

Smooth transition between functions with tanh()

The problem to get a smooth transition between two curves appears very often. I had this problem a lot when working with simulations of DAE systems where some parameters were defined by empiric functions that did not fit well at the intersection point. An approach that is very easy to implement is the use of the tanh() function (hyperbolic tangent). This function is available in many of the modern programming languages.

Let's exercise a simple example: f(x) = 1/x and  g(x) = x². Obviously both functions intersect at x=1. We want a function that follows f(x) left of the intersection and g(x) for values right of this point.

To get a smooth transition at the intersection (x=1) we need a third function that smoothly "switches" between two values at a defined point. The pure tanh() function shows this behaviour, it returns values close to -1 for x<<0 and values close to 1 for x>>0. We define our new function s(x) = 0.5+0.5 tanh((x-a)/b). The function was shifted to return values between 0 and 1 and the parameters a and b can be used to define the switch point and the smoothing level. With a=1 (intersection point between f(x) and g(x)) and b=1 it shows the following curve:

With b=3 it is much smoother:And with b=0.1 it looks more like a step function:

In the last step we only need to combine the new switch function s(x) with the original functions: h(x) = s(x) g(x) + (1-s(x)) f(x). The result is a smooth curve that meets our requirements:

The smoothing level can be adjusted with the parameter b. The example above uses b=0.1. The following chart shows the results with b=0.05:

This method works even at points where both functions  do not intersect. The smoothing function s(x) can fill the gap between f(x) and g(x). For the next chart the parameter a was changed from 1 to 1.2 (with b=0.01).

The result may not be perfect for further calculations, but it is a smooth, continous function (given that f(x) and g(x) have the same quality). Since s(x) will never reach 0 or 1, the resulting function h(x) will differ from f(x) and g(x) over the whole range. This difference can be minimized with small values of b, but it will never disappear. If you can't tolerate this, you need to replace s(x) with another function. One possibility is a piece of the sin()-function combined with a constant part, like the SinSteps interpolation method used in ncDataReader2.

You can play a bit with the example above using this gnuplot script. It reproduces all pictures shown in this post. In gnuplot press ENTER to continue.

set samples 1000
set xtics 1

f(x) = 1/x
g(x) = x*x
plot f(x), g(x)
pause -1

s(x) = 0.5+0.5*tanh((x-a)/b)
a = 1
b = 1
plot s(x)
pause -1

b = 3
replot
pause -1

b = 0.1
replot
pause -1

set xrange [0.5:1.5]
set xtics 0.1

plot f(x), g(x), s(x), s(x)*g(x) + (1-s(x))*f(x)
pause -1

b = 0.05
replot
pause -1

a = 1.2
b = 0.01
replot
pause -1