Back when I was a PhD student working on worst-case execution-time (WCET) analysis, one of the leading groups researching the topic was the “Saarbrücken gang” led by Professor Reinhard Wilhelm. Last year, Professor Wilhelm published a retrospective look on their work on WCET in the Communications of the ACM. It is a really interesting history write-up from the perspective of the Saarbrücken group.
Doing WCET as Right as Possible
Arguably, the research group that Professor Wilhelm led in Saarbrücken (at the Universität des Saarlandes) was the leading group in WCET at the time (and most likely even to this day), with the best technology for static WCET analysis. They applied a background in static program analysis and in particular abstract interpretation, using an existing and stable set of tools that had previously been applied to rather different problems in the field of static program analysis.
The static approach, which was also pursued by the research group I was part of, is based on the concept of building models of hardware that allow us to determine the maximum execution time of a piece of code on the hardware. The models have to cover the processor pipeline and supporting mechanisms like cache memories and memory systems, and combine this knowledge with an analysis of the possible program execution paths. The group in Saarbrücken used their program analysis tools to build up models for such hardware, starting the with the easiest one, caches.
Static and Dynamic Analysis
The static analysis approach is always contrasted with dynamic analysis – also known as measurement. There were a few research groups at the time that determined that trying to build hardware models was a rather Quixotic task since the hardware just kept becoming more complex and it was difficult to get information on it. These groups instead tried to create methods that would measure segments of code running on the hardware, and then combine these segments in a way that would create an estimate for the WCET (along with some estimate of the uncertainty of the result). In a way, this approach is a logical extension of pre-existing practice of measuring a program’s runtime under various inputs and using that to guesstimate its maximum possible execution time.
Professor Wilhelm was not particularly fond of this approach as it was impossible to prove anything definite about its safety, and this shines through in the article. Like this quote:
Worse yet, industry “best practice” was, and unfortunately partly still is, to do some end-to-end measurements, ignore some unwelcome outliers, if optimism prevailed, or add some safety margin, if more problem awareness dominated…
As well as:
The results of a sound WCET analysis are conservative, that is, they will never be exceeded by an execution. We consider being conservative as a Boolean property. Often conservative is used as a metric, being more conservative meaning being less accurate. For an unsound method, however, it does not make sense to speak about being more or less conservative.
Working out the WCET
The article nicely recounts how the research group built up their understanding for the WCET analysis problem. They started with caches, creating a highly elegant analysis that was easy to explain and prove thanks to building on the program analysis generation tools that had previously been built. They went on to untangle further issues like processor pipelines, even if that was a much harder problem than caches. And then they hit the system issues like memory controllers and resource sharing between processors and DMA devices.
I am bit uncertain what happened there as this was mostly after I graduated (and joined the startup Virtutech and started doing simulation instead of WCET analysis). The article basically just points out that this is a big problem. The telling of the history agrees with my recollection and I recognize the names of the PhD students and recall their inspiring papers from real-time conferences around they year 2000.
During my years working with Simics I have noticed that computer architecture modeling and design shows a similar pattern of difficulty: doing cache analysis is relatively simple and cheap to compute, while modeling a pipeline easily requires 10x-100x more work for each target instruction.
AbsInt, the Startup
The work I did in WCET analysis never got to the point that it was useful for commercialization. However, the Saarbrücken group turned their work into a commercial company, AbsInt. As Professor Wilhelm tells it in the article, they lucked out on the market timing and went commercial just when there was funding available as well as a receptive user, Airbus. Airbus was the perfect partner for a startup pursuing difficult technological goals. As the article puts it:
Some words about our long-time cooperation partner, Airbus, in Toulouse. We have experienced the group in charge of developing safety-critical software as problem-aware, highly competent, and extremely cooperative. They wanted to have this problem solved and they trusted us to solve it, and they kept us focused on the real problems, and thus prevented us from solving simplified, self-posed problems, a tendency academic researchers often are inclined to follow.
Could not agree more, and finding such a partner is really rare.
In the decades since (it really is decades, time flies) I have gone over to the AbsInt booth at the Embedded World, and it has been great to see them survive and succeed in a very tough market. Beginning with their timing analysis tools, they have turned into a company that picks up software tools with a strong theoretical foundation and commercializes them. Tools that require many PhDs to build, and arguably a rather senior technologist to fully see the value of.
As Professor Wilhelm puts it:
However, our development of a sound method that actually solved a real problem of real industry was considered a major success story for the often disputed formal-methods domain. AbsInt became the favorite partner for the industrialization of academic prototypes.
My favorite example of such tools (which is mentioned in the article) is CompCert, a formally verified C compiler – using CompCert, you can avoid testing to show the equivalence between a C-language source program and the generated assembly code, which is a huge time-saver in certifying safety-critical systems. I remember back in the 1990s when I worked at IAR and we developers were rather dismissive of the idea that such a compiler was even possible. Turns out it was. It is a not a tool for most programmers and most problems, but AbsInt picked it up and it appears to be doing well.
I must say that the team in Saarbrücken was also lucky (or skilled at recruiting) in that many team members turned naturally from PhD researchers into business people. In particular Christian Ferdinand who has been the CEO for AbsInt from the start. Daniel Kästner is still at AbsInt too, and managing to produce real research papers while building commercial products. Henrik Theiling appears to have gone to SysGo, and also produces research papers.
Is WCET Analysis Practical?
The WCET analysis case study that Professor Wilhelm recounts is a successful research project resulting in real-world applications, products, and a spin-off company that has survived and prospered for more than twenty years. That is a great success by any measure.
Despite this, the use of static WCET analysis is very limited.
There are multiple reasons for this.
First of all, the number of systems that really need this kind of analysis is rather limited. Making a safety case based on the fact that a certain computation cannot ever take more than a certain amount of time is a comparatively rather rare task. Static WCET analysis does not really help with systems where the focus is not on worst-case behaviors and being safe under all circumstances. When it is needed it is definitely needed, and doing the required analysis by hand is unthinkable (see above). It just does not happen to all that many programmers and projects.
Second, the analysis has to be tailored to the precise hardware that is used in project. This reduces the amount of reuse possible across multiple users/customers, and makes the business model one of selling modeling (customization) services rather than a standard product. It means that you cannot just pick up a WCET tool off the shelf to try it, unless you are totally lucky and happens to be using the precisely same hardware as some previous customer. It means increased friction for use, and a rather high cost of entry.
Third, and probably most importantly, hardware trends are not exactly going in the direction of static analyzability. In the 2000s, there were several research papers that tried to define and describe what predictable, easily analyzable, low-execution-time variable hardware would look like. My PhD thesis was in effect part of this research (and I found that even rather simple pipelines with simplistic memory systems had the potential for tricky behaviors).
The AbsInt analyzer originally targeted the PowerPC 755, which by today’s standards is a fairly simple processor core tied to a rather simple cache system. It is by no means a trivial processor design, and analyzing its pipeline using abstract interpretation is an amazing feat of computer science. However, today’s mid-to-high-performing cores are way more complex, and most crucially feature a much higher possible variation from best cases to worst cases. A conservative WCET (if it could be produced) might be so high as to be practically useless, even if it was successfully found.
For WCET analysis to be useful, hardware needs to expose a limited range of possible execution times for a computation. Even when I was a PhD student, it seemed striking that some people in industry both wanted a nice tight WCET estimate… and also wanted to go and buy the latest greatest general-purpose hardware to cheaply get a lot of computing power. That general-purpose hardware that was designed to optimize the average case with very little concern for the rare corner cases resulting in very long execution times.
If having predictable and bounded worst-case behavior offers a useful simplification of certification and validation processes, it would seem that it would make sense for the companies building such systems to build their own hardware tailored for it. Instead of complaining about the average-case optimizations in an off-the-shelf design, why not just build something optimized for the worst case? Professor Wilhelm mentions this towards the end of the article:
In Wilhelm et al we collected our wisdom concerning timing predictability of several types of architectural components. It heavily influenced the design of the Kalray MPPA. One could at this point remark that a future automatically driven car will employ a GPU executing learned-pattern recognition, which is controlled by an 8-core-ARM architecture whose design contradicts under almost all aspects of this collected wisdom of ours.
Rambling Speculation
Speculating and rambling wildly, I can see a few ways that WCET analysis could become practical in the future:
- Design and build a general-purpose architecture for analyzability from the ground up – which is very hard since you have to forego a lot of standard computer architecture tricks. On the other hand, if you want to have a provably safe system, having hardware that supports it from the ground up makes a lot of sense.
- Maybe we can focus the analysis and safety arguments on important computational sub-blocks? Simple massively parallel compute accelerators seem like a rather good target for WCET analysis, once a computation has been set up. Could we combine GPU-style compute with super-simple control processors? And then have some big random cores doing stuff that is not timing-critical and does not have to be proven as tightly? This kind of system architecture might simplify the problem.
- Or should we drop all pretense of knowing precisely and instead make the argument that given a sufficiently complex processor we a sufficiently high number of moving parts, we can treat timing as a statistical property? This is not quite the same as the industrial criticized by Professor Wilhelm… as those tried to use measurement on simple architectures within the reach of static analysis. Instead, we need orders of magnitude more noise and randomness in the system, like what you might get in an SoC with dozens of modern high-end cores. It might even help to intentionally randomize aspects of the system to ensure that bad corner cases are unlikely to happen.
In any case, the article was a fun read, especially if you have a background in the field. Excellent work by Professor Wilhelm!