Performance-risk driven design: minimising the chances of late-stage project failure

Dr Neil Davies Neil Davis

Pnsol (Chief Scientist & Co-Founder)

ABSTRACT:

The modern world depends on large-scale software systems to run its critical infrastructure: telecommunications, energy, finance, transport, government, the military, and major multinational companies all rely on correct and reliable software operating over a global scope to carry out their day-to-day operations, as do individuals for many aspects of their daily lives. The issue of assuring functional correctness is addressed by a variety of programming/testing/verification paradigms, but the question of reliability cannot be answered purely by inspecting the code. Systems must do more than simply function correctly, they must also provide suitable performance when deployed at scale – and by ‘performance’ we mean delivering outcomes that users want/need within the timeliness they need/want with a sufficiently low rate of failure.

Unfortunately the transition from initial design/coding through small-scale testing to wide-scale deployment is often fraught with difficulties, leading to cost overruns or even late-stage abandonment of projects as initial designs turn out to be unable to deliver the necessary performance at the required scale. Our observation is that one cause of such problems is modern software development practices that emphasise speed and flexibility but fail to adequately consider whether a system can actually meet its resource/performance constraints, particularly when deployed at scale. Thus in distributed system development, there is a tendency for cost/performance hazards to appear late in the System Development Life Cycle (SDLC). Scaling hazards are not exposed early enough in the development flow to be mitigated at a reasonable cost.

To prevent this, performance needs to be a first-class element in the SDLC, rather than an afterthought, which implies a framework for capturing, refining, reasoning about and assuring performance risks. One candidate for such a framework is ∆QSD, which we will outline in this talk.

BIOGRAPHY:

Neil Davies is an expert in resolving the practical and theoretical challenges of large scale distributed and high-performance computing, particularly scalability effects in large distributed systems, their operational quality, and how to manage their degradation gracefully under saturation and adverse operational conditions. He is a computer scientist, mathematician and hands-on software developer who builds rigorously engineered working systems and scalable demonstrators of new computing and networking concepts.

Throughout his 20-year career at the University of Bristol he was involved with early developments in networking, its protocols and their implementations. He collaborated with organisations such as NATS, Nuclear Electric, HSE, ST Microelectronics and CERN on issues relating to scalable performance and operational safety. He was also technical lead on several large EU Framework collaborations relating to high performance switching, and has mentored PhD candidates at CERN.

He co-founded Degree2 Innovations in 2000, commercialising network QoS research, and went on to found Predictable Network Solutions in 2003. He has worked on performance aspects of the Department of Defense’s Future Combat Systems project, and has developed approaches to delivering consistent video telephony for sign language users over retail broadband. He is the co-author of several patent families.

Dates

March 1st ,2025

Abstract submission deadline

March 8th ,2025

Paper submission deadline

April 14th ,2025

Accept/Reject notification

May 21-23 ,2025

Netys Conference

Proceedings

Revised selected papers will be published as a post-proceedings in Springer's LNCS "Lecture Notes in Computer Science"

Parteners & Sponsors (TBA)