Utilizing Chakra execution traces for benchmarking and community efficiency optimization

- Meta presents Chakra execution traces, an open graph-based illustration of AI/ML workload execution, laying the inspiration for benchmarking and community efficiency optimization.
- Chakra execution traces symbolize key operations, reminiscent of compute, reminiscence, and communication, knowledge and management dependencies, timing, and useful resource constraints.
- In collaboration with MLCommons, we’re searching for industry-wide adoption for benchmarking.
- Meta open sourced a set of instruments to allow the gathering, evaluation, technology, and adoption of Chakra execution traces by a broad vary of simulators, emulators, and replay instruments.
At Meta, our endeavors usually are not solely geared in direction of pushing the boundaries of AI/ML but additionally in direction of optimizing the huge networks that allow these computations. Our agile, reproducible, and standardized benchmarking system performs an vital position on this. By our collaboration with MLCommons, and our deep insights into conventional benchmarking constraints, we’ve got initiated the Chakra execution traces—a graph-based illustration of AI/ML workloads. This method goals to unify numerous execution hint schemas, searching for industry-wide adoption for enhanced AI effectivity evaluation instruments and holistic efficiency benchmarking.
The restrictions of conventional AI benchmarking methodology
Historically, benchmarking AI programs has largely relied on operating full ML workloads. Established benchmarking approaches, reminiscent of MLPerf, have supplied invaluable insights into the habits and efficiency of AI workloads and programs. Nonetheless, conventional full workload benchmarking presents a number of challenges:
- Issue in forecasting future system efficiency: When designing an AI system, engineers continuously face the problem of predicting the efficiency of future programs. Such predictions turn into much more complicated when the compute engines aren’t prepared or when modifications in community topology and bandwidth turn into obligatory. Counting on full workloads to guage the efficiency of those not-yet-realized programs is just not possible.
- Excessive compute value: Executing full workload benchmarks comes at a considerable compute value. Provided that coaching up to date ML fashions typically requires 1000’s of graphics processing models (GPUs), these benchmarks ought to ideally be executed on a equally huge variety of GPUs. Moreover, gauging the efficiency of a system utilizing this methodology may be time-consuming.
- Incapacity to adapt to evolving workloads: The panorama of ML workloads and their necessities is quickly evolving. Conventional full workload benchmarks fall brief on the subject of addressing these altering wants, primarily as a result of they necessitate vital efforts to standardize workloads as benchmarks.
An outline of Chakra
Constructing upon our insights into the constraints of conventional benchmarking, we current the Chakra execution traces. This new method supplies an open, interoperable graph-based depiction of AI/ML workload execution. The Chakra execution hint captures core operations—together with compute, reminiscence, and communication—together with their dependencies, timing, and metadata.
Although execution traces are a invaluable illustration of an ML process, the construction and metadata of the ensuing traces can differ primarily based on the ML framework utilized. Recognizing this, Chakra introduces a standardized schema for efficiency modeling, termed the Chakra execution hint. The under determine outlines the Chakra ecosystem, with execution traces as its central part. As depicted within the determine, Chakra additionally gives a variety of instruments to transform, visualize, generate, and simulate these execution traces.
How Meta leverages Chakra execution traces
At Meta, we gather execution traces from our manufacturing servers daily. These execution traces serve a number of functions: Benchmarking, visualization, and efficiency optimization.
Benchmarking
Benchmarking is crucial for bettering present AI programs and planning future networks. We particularly make the most of Chakra execution traces for this process. We’ve developed a number of benchmarking instruments, together with Mystique and PARAM. Mystique permits us to duplicate the efficiency of an ML workload by replaying each compute and communication operators present in execution traces. It leverages the Chakra execution hint to report runtime particulars of a mannequin on the operator stage after which replays them to breed the unique efficiency. Consistent with our imaginative and prescient, the MLCommons Chakra working group is curating the ‘Chakra hint benchmark suite’ by gathering execution traces from numerous {industry} gamers.
Visualization and efficiency optimization
One instance of visualization and efficiency optimization is the evaluation of collective message sizes. We analyze manufacturing execution traces utilizing an automatic system. The visible knowledge generated aids us in figuring out any steadiness or imbalance in collective message sizes throughout totally different ranks. Our visualization device can exactly spotlight these imbalances, as proven by the under determine.
With this data at hand, Meta engineers are geared up to craft applicable options, guaranteeing a balanced message dimension, as demonstrated within the under determine.
Future plans
Enhancing the benchmarking functionality of Chakra execution traces
Whereas the execution hint replayer permits replay of execution traces, it brings forth challenges. A main problem is the intrinsic linkage of collected execution traces to particular programs. As a result of traces are gathered from precise machine runs, the kernels executed are optimized for the particular system at play. Because of this, traces sourced from one system may not precisely simulate on one other with a distinct GPU, community topology, and bandwidth.
We’re addressing this constraint in collaboration with the MLCommons Chakra working group. We aspire to assemble execution traces previous to the operator optimization section for any goal system, as proven within the determine. These are termed pre-execution traces. In parallel, to allow benchmarking next-gen AI programs, we’re streamlining the method from hint assortment to simulation on a simulator.
Utilizing AI to generate consultant execution traces
Chakra execution traces are able to figuring out community bottlenecks in ML workload execution. Nonetheless, optimizing SW/HW stacks with manufacturing execution traces presents a sensible problem. The primary problem arises when attempting to globally optimize our manufacturing programs. Given the sheer quantity of manufacturing traces, exhaustively operating them for system optimization is neither possible nor environment friendly. Doing so can be each time-consuming and computationally costly. Thus, choosing a consultant subset of manufacturing execution traces turns into crucial.
Nonetheless, there’s a danger: The chosen traces may not holistically symbolize the worldwide traits, doubtlessly skewing optimization efforts in direction of solely particular ML workloads. We envision a generative AI mannequin that may establish and generate execution traces which are consultant of the first traits noticed. We additionally plan to include an obfuscation mechanism throughout the AI mannequin. This may facilitate hint sharing with out jeopardizing mental property, fostering SW/HW co-design between totally different firms.
Taking the leap with {industry} collaboration
For such an ecosystem to flourish, {industry} consensus is paramount. Our collaboration with the MLCommons consortium, an open engineering meeting of over 50 main firms, is a testomony to our dedication. This collaboration goals to determine Chakra inside its fold, offering a framework for broad adoption.
Chakra’s working group beneath MLCommons will spearhead efforts to create and develop:
- A standardized schema that may seize and convert execution traces from numerous frameworks.
- ML fashions for creating consultant Chakra execution traces – defending proprietary data whereas additionally projecting future AI workloads.
- An open ecosystem of instruments for benchmarks, simulations, and emulations.
- Complete benchmarks with Chakra execution traces primarily based on MLCommons/MLPerf pointers.
Be a part of us on this journey
Our imaginative and prescient is to forge an agile, reproducible benchmarking and co-design system for AI. Collaboration with friends, tutorial establishments, and consortiums can be pivotal. We invite people and corporations to turn into part of the Chakra working group, to assist contribute to the paradigm shift in benchmarking and community efficiency optimization.
Learn the analysis paper
Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces
Acknowledgements
We wish to thank all contributors to the Chakra undertaking inside Meta: Taekyung Heo, Srinivas Sridharan, Brian Coutinho, Hiwot Kassa, Matt Bergeron, Parth Malani, Shashi Gandham, Omar Baldonado, our exterior companions in Georgia Tech and MLCommons, in addition to exterior collaborators in AMD, CMU, Cornell, Enfabrica, Google, Harvard, HP Labs, Intel, Keysight Applied sciences, Microsoft, NVIDIA, OCP, and Stanford.