Common Subexpression Elimination¶
- Author:
Buck Baskin @buck@fosstodon.org
- Created:
2023-06-15
- Updated:
2023-06-25
- Parent Design:
- See Also:
- Status:
Merged
Overview¶
FormaK aims to combine symbolic modeling for fast, efficient system modelling with code generation to create performant code that is easy to use.
The Five Key Elements the library provides to achieve this user experience are:
Python Interface to define models
Python implementation of the model and supporting tooling
Integration to scikit-learn to leverage the model selection and parameter tuning functions
C++ and Python to C++ interoperability for performance
C++ interfaces to support a variety of model uses
This design provides an extension to the second of the Five Keys “Python implementation of the model…” and the fifth of the Five Keys “C++ interfaces to support a variety of model uses” to support the performance goals of the library: Common Subexpresion Elimination (CSE).
CSE is a transformation on the compute graph of the underlying model (for
either Python or C++) to remove duplicate compuation at multiple scales. For
example, if a sensor model computes a + b
multiple times, CSE identifies this
subexpression and computes it once. This could also apply to a more complicated
expression like 9.81 * sin(theta + bias)
.
One of the nicest benefits of this transformation is that it will provide a performance benefit to the model without a compromise to the user interface.
Solution Approach¶
The basic approach will be to apply sympy
’s common subexpression elimination
implementation to the sympy model before generating the output. This will be an
overhaul of both the Python and C++ implementations, but for different reasons.
The C++ implementation doesn’t do any CSE now. The Python implementation is
more subtle. Currently the CSE algorithm is applied for each field of the
sensor model, but the correct full process is to eliminate subexpressions
across all fields. This should offer improved performance (and at least no
regression in performance).
The key classes in the implementation are:
Basic Block: Basic Block wraps uninterrupted sequences of assignments, so CSE can be applied across all assignments in the CSE (as long as the output is ordered so all sub-expressions are computed before they are used). The Python implementation may also adopt the Basic Block pattern.
Config: Adding/Updating a new feature flag for CSE
Feature Tests¶
The feature tests for this design are based on generating models with many common subexpressions, then comparing the performance of the model with and without CSE.
Generate a tree of
2 * sin(l * r) * cos(l * r)
At its leaves,
l
andr
are input symbolsAt the inner nodes,
l
andr
are left and right subexpressions in the above pattern
In each feature test, the time to execute should grown in log(N) where N is the number of nodes for the CSE implementation; however, without CSE it should grow proportional to N. This should be a 10x performance improvement for between 30 and 40 nodes. In practice it may take more nodes to demonstrate a difference if the compute time is small for all cases.
In unit testing, the difference can be more precisely tested by generating a common subexpression then asserting it gets removed in the output.
Road Map and Process¶
Write a design
Write a feature test(s)
Build a simple prototype
Pass feature tests
Refactor/cleanup
Build an instructive prototype (e.g. something that looks like the project vision but doesn’t need to be the full thing)
Add unit testing, etc
Refactor/cleanup
Write up successes, retro of what changed (so I can check for this in future designs)