Performance-risk driven design: minimising the chances of late-stage project failure
Neil Davis
Pnsol (Chief Scientist & Co-Founder) |
ABSTRACT:
The modern world depends on large-scale software systems to run its critical infrastructure: telecommunications, energy, finance, transport, government, the military, and major multinational companies all rely on correct and reliable software operating over a global scope to carry out their day-to-day operations, as do individuals for many aspects of their daily lives. The issue of assuring functional correctness is addressed by a variety of programming/testing/verification paradigms, but the question of reliability cannot be answered purely by inspecting the code. Systems must do more than simply function correctly, they must also provide suitable performance when deployed at scale – and by ‘performance’ we mean delivering outcomes that users want/need within the timeliness they need/want with a sufficiently low rate of failure.
Unfortunately the transition from initial design/coding through small-scale testing to wide-scale deployment is often fraught with difficulties, leading to cost overruns or even late-stage abandonment of projects as initial designs turn out to be unable to deliver the necessary performance at the required scale. Our observation is that one cause of such problems is modern software development practices that emphasise speed and flexibility but fail to adequately consider whether a system can actually meet its resource/performance constraints, particularly when deployed at scale. Thus in distributed system development, there is a tendency for cost/performance hazards to appear late in the System Development Life Cycle (SDLC). Scaling hazards are not exposed early enough in the development flow to be mitigated at a reasonable cost.
To prevent this, performance needs to be a first-class element in the SDLC, rather than an afterthought, which implies a framework for capturing, refining, reasoning about and assuring performance risks. One candidate for such a framework is ∆QSD, which we will outline in this talk.
BIOGRAPHY:
Neil Davies is an expert in resolving the practical and theoretical challenges of large scale distributed and high-performance computing, particularly scalability effects in large distributed systems, their operational quality, and how to manage their degradation gracefully under saturation and adverse operational conditions. He is a computer scientist, mathematician and hands-on software developer who builds rigorously engineered working systems and scalable demonstrators of new computing and networking concepts.
Throughout his 20-year career at the University of Bristol he was involved with early developments in networking, its protocols and their implementations. He collaborated with organisations such as NATS, Nuclear Electric, HSE, ST Microelectronics and CERN on issues relating to scalable performance and operational safety. He was also technical lead on several large EU Framework collaborations relating to high performance switching, and has mentored PhD candidates at CERN.
He co-founded Degree2 Innovations in 2000, commercialising network QoS research, and went on to found Predictable Network Solutions in 2003. He has worked on performance aspects of the Department of Defense’s Future Combat Systems project, and has developed approaches to delivering consistent video telephony for sign language users over retail broadband. He is the co-author of several patent families.
Dates
March 1st ,2025
Abstract submission deadline
March 8th ,2025
Paper submission deadline
April 14th ,2025
Accept/Reject notification
May 21-23 ,2025
Netys Conference