Parent Design: designs/formak_v0.md
Overview¶
FormaK aims to combine symbolic modeling for fast, efficient system modelling with code generation to create performant code that is easy to use.
The values (in order) are:
- Easy to use
- Performant
The Five Key Elements the library provides to achieve this (see parent) are:
- Python Interface to define models
- Python implementation of the model and supporting tooling
- Integration to scikit-learn to leverage the model selection and parameter tuning functions
- C++ and Python to C++ interoperability for performance
- C++ interfaces to support a variety of model uses
This design provides the initial implementation of second of the Five Keys "Python implementation of the model and supporting tooling". This design also prepares for the third of the 5 Keys: "Integration to scikit-learn to leverage the model selection and parameter tuning functions". At this stage it is helpful to inform the design of the tooling so that it won't have any big hurdles to the next steps in the design.
Solution Approach¶
The basic step for this feature is translating from Sympy to Python (without a sympy dependency). Sympy provides this functionality already, so getting the basics working wasn't too hard. The follow on work to refactor will be important in order to make sure that the library remains easy to use.
The key classes involved are:
ui.Model
: User interface class encapsulating the information required to define the modelpy.Model
: (new) Class encapsulating the model for running a model efficiently in Python code
Tooling¶
Along with the py.Model
encapsulation, the code generation tooling provides
an Extended Kalman Filter implementation to quantify variance (based on best
fit of a Kalman Filter to data) and outliers (innovation as a function of
variance). This part of the design is more focused on being used with the
coming scikit-learn integration.
The key classes involved are:
py.Model
: (new) Class encapsulating the model for running a model efficiently in Python codepy.ExtendedKalmanFilter
: (new)- Looking ahead to model fitting, characterize model quality, data variance by fitting an EKF
- Constructor accepts state type, state to state process model (as a
ui.Model
), process noise, sensor types, state to sensor models and sensor noise - Process Model Function: take in current state, current variance, dt/update time. Return new state, new variance
- Sensor Model Function: take in current state, current variance, sensor id, sensor reading
These two classes will likely share a lot under the hood because they both want
to run Python efficiently; however, they'll remain independent classes to start
for a separation of concerns. The EKF class at this point is more aimed to
using it under the hood of the scikit-learn stuff whereas the py.Model
class
is aimed at the Formak User (easy to use first, performant second).
The Cherry On Top - Transparent Compilation¶
In addition to merely repackaging the model defined in the ui.Model
, this
design integrates with Python compiler tooling
(Numba) to write Python in the py.Model
class, but JIT compile high use model functions.
This has some trade-offs (increased implementation complexity, increased startup time), but should likely also have some performance benefits especially for longer-running analysis use cases (e.g. running with a long sequence of data). Numba was selected because it could easily be adapted to work with the generated code, whereas some other compilers (for example Cython) require code annotation or other changes that would be more involved than I wanted to pursue at this stage.
Notes¶
- In the spirit of don't pay for what you don't use, the compiler option motivated the creation of a common configuration pattern. We want to be able to (at conversion time) selectively enable or disable the compilation. Continuing to put thought into a common configuration pattern will make it easier to reuse in future designs (e.g. selecting configuration about other model optimizations)