Nathan Scott
Nathan Scott's contributions
Lightweight Application Instrumentation with PCP
Nathan Scott
Wait... what? I was involved in diagnosing a production system performance problem: a web application serving thousands of interactive users was acting up. Symptoms included significant time running kernel code on behalf of the application (unexpectedly), and at those times substantial delays were observed by end users. As someone with a systems programming background, I figured I had a decent shot at figuring this one out. Naively I reached for strace(1), the system call and signal tracer, to provide insights...
Performance Regression Analysis with Performance Co-Pilot [video]
Nathan Scott
In an earlier post we looked into using the Performance Co-Pilot toolkit to explore performance characteristics of complex systems. While surprisingly rewarding, and often unexpectedly insightful, this kind of analysis can be rightly criticized for being "hit and miss". When a system has many thousands of metric values it is not feasible to manually explore the entire metric search space in a short amount of time. Or the problem may be less obvious than the example shown - perhaps we...
Exploratory Performance Analysis with Performance Co-Pilot [video]
Nathan Scott
Investigating performance in a complex system is a fascinating undertaking. When that system spans multiple, closely-cooperating machines and has open-ended input sources (shared storage, or faces the Internet, etc) then the degree of difficulty of such investigations ratchets up quickly. There are often many confounding factors, with many things going on all at the same time. The observable behaviour of the system as a whole can be frequently changing even while at a micro level things may appear the same...
Lightweight Application Instrumentation with PCP
Nathan Scott
Wait... what? I was involved in diagnosing a production system performance problem: a web application serving thousands of interactive users was acting up. Symptoms included significant time running kernel code on behalf of the application (unexpectedly), and at those times substantial delays were observed by end users. As someone with a systems programming background, I figured I had a decent shot at figuring this one out. Naively I reached for strace(1), the system call and signal tracer, to provide insights...
Performance Regression Analysis with Performance Co-Pilot [video]
Nathan Scott
In an earlier post we looked into using the Performance Co-Pilot toolkit to explore performance characteristics of complex systems. While surprisingly rewarding, and often unexpectedly insightful, this kind of analysis can be rightly criticized for being "hit and miss". When a system has many thousands of metric values it is not feasible to manually explore the entire metric search space in a short amount of time. Or the problem may be less obvious than the example shown - perhaps we...
Exploratory Performance Analysis with Performance Co-Pilot [video]
Nathan Scott
Investigating performance in a complex system is a fascinating undertaking. When that system spans multiple, closely-cooperating machines and has open-ended input sources (shared storage, or faces the Internet, etc) then the degree of difficulty of such investigations ratchets up quickly. There are often many confounding factors, with many things going on all at the same time. The observable behaviour of the system as a whole can be frequently changing even while at a micro level things may appear the same...