Beyond Efficiency: Scaling AI Sustainably

Carole-Jean Wu, Bilge Acun, Ramya Raghavendra, Kim Hazelwood

FAIR at Meta

Abstract

Barroso’s seminal contributions in energy-proportional warehouse-scale computing launched an era where modern datacenters have become more energy efficient and cost effective than ever before. At the same time, modern AI applications have driven ever-increasing demands in computing, highlighting the importance of optimizing efficiency across the entire deep learning model development cycle. This paper characterizes the carbon impact of AI, including both operational carbon emissions from training and inference as well as embodied carbon emissions from datacenter construction and hardware manufacturing. We highlight key efficiency optimization opportunities for cutting-edge AI technologies, from deep learning recommendation models to multi-modal generative AI tasks. To scale AI sustainably, we must also go beyond efficiency and optimize across the life cycle of computing infrastructures, from hardware manufacturing to datacenter operations and end-of-life processing for the hardware.

I From Warehouse Scale Computing to AI

Large-scale computing infrastructures today are extremely efficient [45, 27], with Power-Usage-Effectiveness (PUE) of roughly 1.1. Performance-per-watt energy efficiency of microprocessors are also steadily improving. For GPUs, the theoretical GFLOPS performance-per-watt doubles every 3 to 4 years [56]. Figure 1 illustrates the notable GPU performance improvement (in FP32 GFLOPS) as a combined effect of higher transistor density, higher frequency, and larger die size. Decades of efficiency optimization across the various dimensions of computer systems has led to orders-of-magnitude energy efficiency improvement for computing.

Thus, the global data center energy use increased by an estimated 6% between 2010 and 2018 [43], despite a 550% increase in the global datacenter compute instances. Thanks to Barroso’s foundational contributions in energy proportional datacenter computing [9], the tail at scale [18], warehouse-scale computing infrastructures have become extremely energy efficient and significantly more cost-effective today [8]. This, in turn, influenced the seismic shift for computation demand from personal computers and traditional datacenters to warehouse-scale computing infrastructures.

This is a pre-print of the article that has been accepted and to appear at the IEEE Micro Special Issue on The Past, Present, and Future of Warehouse-Scale Computing. It is based on the industry experience and key lessons learned from the Green AI journey that many colleagues at Meta have contributed to. We pinpoint new opportunities to sustainably scale AI computing beyond the decades of focus on efficiency for computing.

Despite the significant advancement in efficiency and cost, as digital technologies become an essential part of humanity, their growing prominence is reflected in computing’s energy footprint. Between 2017 and 2021, the electricity consumption of Google, Meta, and Microsoft grew by more than 2-3 times — AI is the most important application driver to computing’s growth at-scale. The International Energy Agency (IEA) projected the data center electricity use to double from the energy consumption of 460 TWh in 2022 to more than 1,000 TWh by 2026 [25].

Refer to caption — Figure 1: Significant GPU performance growth from higher transistor density, higher frequency, and larger die size.

Workloads shape computing infrastructure design. New application drivers, such as artificial intelligence (AI), are introducing an enormous change to the overall computing industry — at a rate that has not been seen in the history of computing before. The rise of Generative AI models, such as, ChatGPT [48], LLaMA [58], DALL-E [47], Sora [13], have demonstrated impressive results on generation tasks for a wide range of modalities, in the form of images, videos, language text, and speech. Today’s ChatBots and AI assistants are only scratching the surface of generative AI technologies as we consider personalized AI assistants that learn and recommend videos to users, write their own articles, solve math problems, create novel music and arts. While unlocking the potential of fueling significant economic growth and boosting productivity, scaling modern AI technologies to billions of people demands a prohibitive amount of computing capabilities, energy, and environmental resources.

AI’s system infrastructure requirements are paramount, requiring a few to hundreds of thousands of training accelerators to develop a foundation model, and many more for inference for real-time millisecond query-serving deployment. Depending on the machine learning tasks, the number of GPUs used per model can vary dramatically. For example, training a state-of-the-art text-to-image/video generation model may use 14 times more GPUs per model parameter than that of large language models (LLMs) for industry-scale use cases [24]. Optimizing efficiency across the entire deep learning model development cycle is particularly important now to scale modern AI in a cost-effective manner and to sustainably accelerate the realization of artificial general intelligence.

II Understanding the Carbon Impact of AI

To scale AI sustainably, we must understand the carbon impact of AI quantitatively across its lifecycle. AI’s overall lifecycle carbon footprint comes from manufacturing — carbon emissions from manufacturing infrastructures and hardware specifically for AI (embodied carbon footprint) and product use — carbon emissions from the use of AI (operational carbon footprint).

Operational Carbon Footprint: Training Figure 2(a) presents the operational carbon footprint of model training for key open-source AI technologies: GPT-3 [14], DLRM and Universal LM [61], GLaM, BLOOM-175B [21], NLLB [57], Llama-65/33/13/7B [58]. Training a universal language translation model (LM) produces 45.2 metric tons of carbon emissions whereas training a state-of-the-art deep learning recommendation model (DLRM), on average, produces over 4 times higher carbon emissions than that of the language model [61]. We continue to observe that newer generation machine learning models come with better capabilities in model quality. For example, Meta’s Llama3-70B model – pre-trained and instruction tuned generative text model, comes with more than 6 times higher carbon footprint as compared to Llama2-70B. The higher training carbon footprint is due to a larger training corpus and an increased context window size, resulting in higher machine learning capabilities. However, it is also important to note that the training carbon footprint of open-source models, such as the family of Llama models, can be more effectively amortized over all use cases worldwide, significantly reducing the energy demand and environmental footprint from individual entities repeating the development of their own models. In addition to the open-source AI models, Figure 2(b) presents the cumulative resource usage for training the top three DLRM tasks, normalized to that of LLM. DLRM tasks demand upto 3 times more training resources than the LLM.

Operational Carbon Footprint: Inference Once the model is trained, it is further optimized for deployment, for which additional carbon footprint is incurred. When considering inference, the operational carbon footprint of AI across the model development cycle can be increased by 2-3 times for the DLRM and LM tasks [61]. The exact ratio depends on the need and frequency of new model development.

Embodied Carbon Footprint AI’s carbon footprint goes beyond operational energy use. Systems used for model training and deployment come with manufacturing carbon emissions that are embedded in the hardware used for AI. Figure 3(a) illustrates the carbon footprint breakdown for the multi-lingual language translation model (University LM in Figure 2(a)) — operational to embodied carbon footprint is approximately 2 to 1. Optimizing operational and/or embodied carbon footprint can both translate to improvement in the overall lifecycle carbon impact of AI.

Embodied carbon in the hardware has become one of the most significant source of carbon footprint for warehouse-scale computing infrastructures [44, 30]. Datacenter construction and hardware manufacturing carbon footprint has increased from 41% to over 60% of Meta’s Scope 3 Greenhouse Gas (GHG) emissions between 2019 and 2022. The Scope 3 GHG emission category is similarly a significant factor for Google’s computing infrastructures [26].

For battery-operated consumer electronics, the stringent requirement of low power and low latency for smartphones means significant effort has been put into optimizing for operational efficiency. This has led to an ever-increasing degree of accelerators – general-purpose programmable as well as domain-specific ones – for smartphone hardware [34]. Using advanced semiconductor manufacturing technologies further improves the operational efficiency. However, more accelerators translate into larger dies; more advanced semiconductor technologies translate into higher manufacturing GHG emissions [7]. Embodied carbon footprint is the dominating source of the computing system’s overall lifecycle emissions. Taking iPhone 15 Pro released in 2023 as an example, embodied carbon footprint is 83% of the product lifeycycle emissions whereas operational carbon footprint is only 15% with the remaining 2% emissions from transportation and end-of-life processing [5]. This carbon footprint breakdown highlights the hidden cost of dark silicon – embodied carbon emissions.

III Efficiency and Beyond

To reduce the significant carbon impact of AI, we have a lot to learn from Barroso’s foundational work in energy proportionality for computing. To bend the ever-increasing resource demand of AI, we must accelerate efficiency optimization across all layers of the AI system stack. At the same time, some key differences exist when strictly optimizing for energy efficiency or TCO versus optimizing for carbon footprints.

In terms of where carbon and TCO align, hardware-software co-design and optimization can effectively reduce AI’s operational energy and carbon footprint. Taking the multi-lingual language model as an example, a combination of data locality optimization, GPU acceleration, low precision data format use, and algorithmic optimization can bring over 800 times energy efficiency improvement [61]. Efficiency optimization across the following key dimensions leads to multiplicative improvement on the operational carbon impact of AI.

•

Data: AI models train on massive amounts of data. Model performance increases with data scale, following a power law scaling [37]. Recent works have shown that power law scaling can be significantly improved if data is selected for model training. Furthermore, the benefits of data pruning grows as data sizes increase[55, 54]. When designed well, data scaling, sampling and curation strategies can result in substantial hardware resource efficiency improvement while achieving faster training time and higher model quality. Complementary to AI data optimization, the data storage and ingestion pipeline for AI demands significant power capacity [64]. An optimized composite data storage infrastructure using novel application-aware cache policies can absorb more than 3 times higher IO than a baseline LRU flash cache, reducing power demand in a petabyte-scale production AI training cluster by 29% [65].

(a)

(b)

Figure 4: (a) Across the key generative AI tasks, over language, image, video, and speech tasks, Attention, Feedforward Network (FFN), and Convolution operations are the targets for timing performance optimization. As compared to deep learning recommendation models, that contribute to 79% of AI inference cycles at Meta’s datacenter fleet in 2018 [31], we expect a shift in where time is spent from embedding to the other key operators as generative AI tasks become increasingly deployed at-scale. (b) Communication contributes significantly to the overall model training time performance. Optimizing Communication is increasingly important as Compute is improved through model-hardware co-design.
•

Models: Parameter-efficient machine learning models can contribute to significant carbon footprint reduction. Taking the family of foundational language models, Llama, as an example [58], LlaMA-13B outperforms GPT-3 (175B) for a variety of tasks and, at the same time, consumes approximately 24 times lower energy than GPT-3. Llama as a parameter-efficient model for language tasks is superior across the key design dimensions of accuracy, training time, energy and carbon footprint.
•

Systems: Systems designed to efficiently accelerate common computation primitives for AI further reduces operational energy and carbon footprint. Figure 4(a) illustrates that the Attention module [59] is a common operator across the generative AI tasks, from language [58] to image, video, and speech tasks [40, 24]. Accelerating the execution of Attention via new system innovations, such as FlashAttention [17], can translate into substantial latency and energy footprint reduction.

As the performance of GPUs continues to improve, the overall model execution time will increasingly depend on the effectiveness of the communication collectives and the underlying networking technologies. Figure 4(b) illustrates that, across Deep Learning Recommendation Models (DLRMs), LLMs, and Multi-Modal (MM) AI technologies, communication optimization can also lead to substantial training throughput improvement [35]. However, we must balance the operational efficiency improvement with embodied carbon stemmed from newer generation of hardware to optimize for total lifecycle emissions.
•

Datacenters: As machine learning model training scales out to tens of thousands of AI accelerators housed in warehouse-scale computing infrastructures, it can benefit from hyperscale datacenters that leverages economy of scale. For example, more efficient cooling and power delivery solutions can be deployed at the datacenter scale, shared by all hardware in the same datacenter building, as opposed to using server- or blade-level cooling. As model training continues to scale out, we must design datacenters with reliability and fault-tolerance in mind. Failure in hardware, let it be from GPUs, memory, storage, or network equipment, will directly impact the lifecycle emissions of AI. In particular, failure probability can increase exponentially as the scale of model training increases. While reliability through hardware redundancy comes with additional embodied carbon, the lifecycle emissions can be a net win when factoring in operational efficiency improvement.
•

Energy: Minimizing the carbon intensity of electricity powering computing infrastructures is another lever to reduce the carbon impact of AI. When spare computing capacity is available, mapping computations, such as model training or AI-powered services, to where renewable energy is abundant can lower the environmental impact. However, it comes with unique design challenges. For model training, due to data protection regulations, AI datasets may not always be available in datacenters located in different geographic regions. And, for inference with real-time limitation, product quality may be impacted. Ultimately, if electricity from the power grid is clean, it benefits datacenters and all other consumers. It must be realized with minimal manufacturing carbon and in a cost-effective way [1].

That said, there are notable instances where TCO-neutral or TCO-negative design decisions are actually positive from a carbon perspective, such as the choice of whether and how much to leverage clean energy, battery storage, time-shifting compute, or other tactics. Therefore, expanding our approach to balance these competing interests is key.

Sustainability means ”[d]evelopment that meets the needs of the present without compromising the ability of future generations to meet their own needs” from Our Common Future by the United Nation in 1987. Despite the importance for optimizing energy efficiency across AI model development and hardware life cycles, AI’s resource demand is growing rapidly. While algorithmic efficiency and domain-specific hardware systems can improve the operational energy footprint of AI model training by more than 90% [61, 52], efficiency improvement has encouraged higher uses, leading to ever-increasing computing resource consumption. This is a phenomenon that is known as the Jevon’s paradox. Thus, to scale AI technologies sustainably, we must understand the rebound effect’s implications on the power grid infrastructure and on the environment.

•

Power capacity is the limiting factor at-scale. The rapid growth in computing is putting significant stress on power grids. Power grids, such as EirGrid in Ireland, may not have energy production capacity to source large power loads of datacenter computing [36]. Datacenters already used around 4,000 GWh of electricity in Ireland, corresponding to 14% of Ireland’s total electricity used in 2021 [15] and the datacenter electricity need is likely to increase to 30% of Ireland’s annual electricity supply by 2029 [23]. For other power grids that are able to support the large power capacity requirement of datacenters, making it cost-effective and environmentally-sustainable is a key challenge. In northern Virginia, where the largest concentration of datacenters in the world is at, Dominion Energy – the Virginia Electric and Power Company – considered grid infrastructure designs that balance between cost and carbon emissions [20]. It does so by considering a wide variety of energy generation options, from building renewable energy and storage for the power grid to extending the life of coal and natural gas energy generation infrastructure while also considering small modular nuclear reactors [10].
•

Embodied carbon is becoming the dominating source of the lifecycle carbon footprint for computing. From iPhone 3 (2008) to iPhone 15 (2023), the lifecycle carbon emissions due to hardware manufacturing has increased from 49% to 83%. The operational carbon footprint reduced by 2.39x while manufacturing carbon footprint increased by 2.2x. Reducing embodied carbon stemmed from more complex hardware design and advanced semiconductor manufacturing is key to sustainable computer systems [12]. Developing expandable hardware and software stack that facilitate significantly longer lifetimes helps mitigate embodied carbon as well.
However, it is extremely challenging to optimize for lifecycle carbon emissions for AI in the presence of fast-evolving algorithmic innovations. Even though application-specific hardware comes with significant operational efficiency improvement, the operational carbon footprint improvement can be over-shadowed by the additional manufacturing, embodied carbon emissions — an optimal hardware refreshment period that minimizes the lifecycle carbon emissions depends on system hardware design and operational efficiency benefits.
•

Hundreds of millions of servers and other hardware IT equipment in the cloud and billions of consumer electronics in the world will reach end-of-life in less than 10 years, with an average of 3-4 years for consumer electronics. Designing computer systems with modularity and right-to-repair in mind can lead to effective e-waste reduction and upcycling opportunities. Modular systems enable component-level upgrades without having to decommission the system at its entirety, reducing overall planetary impact. Designing systems with repairability in mind increases product lifetime, leading to better amortization of embodied carbon footprint. Finally, mining raw materials and metals, such as copper, lithium, silver, gold, nickel, and aluminum, for electronics produces significant GHG emissions. Extracting precious metals, such as gold, from old electronics circuit boards is already demonstrated with significant business potential [39] while upcycling aluminum for, e.g., future iPhones, reduces the product’s overall lifecycle emissions as well as the amount of e-waste from consumer electronics.

IV Looking Forward

Understanding state-of-the-art carbon emission quantification methodologies using the Greenhouse Gas (GHG) Protocol [28] and Life Cycle Assessment (LCA) [6] is an important first step. The GHG protocol defines an accounting standard for carbon emissions and equivalent (CO2e) of a company. There are three GHG categories: Scope 1 (direct emissions), Scope 2 (indirect emissions from purchased energy), and Scope 3 (indirect emissions from upstream/downstream uses).

From the perspective of a datacenter operator, Scope 1 emissions come from fuel combustion in offices and datacenters, Scope 2 come from purchased energy produced at respective power grids, and Scope 3 come from all other activities, such as IT hardware manufacturing, datacenter construction, and others (Figure 5(a)). Here, the operational footprint of a datacenter is the sum of Scope 1 and Scope 2 emissions of the datacenter whereas the embodied carbon footprint is the part of the Scope 3 emissions from datacenter construction and hardware manufacturing. Based on the publicly-available sustainability reports using the GHG protocol, in 2022, 67.9%, 47.4%, 40.1% 51.7% of datacenter carbon emissions by Meta [44], Google [26], Microsoft [46], and Oracle [50] come from datacenter construction and hardware manufacturing. Here, embodied carbon emissions account for the emissions from Capital Goods, Upstream Transportation and Distribution, Purchased Goods and Services.

In addition to organization-level analysis using the GHG protocol, LCA is a standard practice used to evaluate environmental impact of individual hardware products. There are four major phases over a product’s life: Manufacturing, Transport, Use, and End-of-life Processing (Figure 5(b)). Using the LCA methodology, the operational footprint of a computer is the product of its energy consumption and carbon intensity of electricity during the use phase whereas the embodied carbon footprint is the carbon footprint to procure raw materials, manufacture wafers, fabricate integrated circuits, packaging and assembling the system. Based on publicly-available LCA reports, 49.8% of the lifecycle emissions of Dell PowerEdge R740, a general-purpose rack server, comes from hardware manufacturing [19], 80% of which is due to IC manufacturing. In comparison, 83% of the lifecycle emissions of iPhone 15 Pro (128GB) comes from hardware manufacturing with 15% from product use and the remaining 2% from transportation and end-of-life processing. Having a framework for sustainability in computing elucidates the problem space.

Taking a step beyond the first principle analysis, we need more granular, higher quality carbon telemetry, datasets, metrics, in order to enable sustainability as a computer system design principle:

Carbon Telemetry: We cannot reduce what cannot be measured. We need high fidelity tools to measure the lifecycle emissions of a computer system – both operational and embodied carbon footprint. However, it is challenging to do so systematically today. Characterizing and analyzing carbon emissions is a complex process, as compared to performance measurement, power and energy modeling.

To enable and accelerate environmentally sustainable computing, recent research studies explored and built carbon modeling frameworks, such as ACT [29], Carbon Explorer [1], GreenChip [38], FOCAL [38]. Expanding upon the early carbon modeling research frameworks, imec.netzero ( https://netzero.imec-int.com/) is developing an industry-grade carbon quantification framework to further advance semiconductor manufacturing and integrated circuit design by enabling carbon emissions and equivalent as a first-class design principle along performance, power, and cost.

Carbon Dataset: At the datacenter fleet level, Meta started developing and scaling a first-of-its-kind carbon dataset based on the best available embodied carbon estimates at the scale of the hundreds of millions of components in Meta’s data center hardware [44]. This dataset lays the foundation for embodied carbon reductions by enabling Meta to make a data-driven decision for lowering value chain carbon emissions. At the same time, a carbon dataset that characterizes the manufacturing and operational carbon emissions of AWS EC2 instances is being developed [11]. Making more carbon datasets for cloud computing helps advance sustainability of the hardware-software ecosystem. Furthermore, having the dataset for emissions embodied in the hardware components opens up new opportunities for optimization, such as designing ML model architectures and hardware jointly, to reduce an AI model’s overall carbon footprint. Hardware-aware neural architecture search techniques can include embodied carbon costs as part of the multi-objective optimization search process.

Carbon Impact Disclosure: As AI’s demand on computing and energy capacity increases, the impact on the environment can be significant. Measuring and reporting the carbon impact of AI is an important act towards responsible technology development [33]. When carbon impact of AI is disclosed in a transparent manner, it drives forward progress. Just within the past three years, many key AI breakthroughs are published with Carbon Impact Statements, i.e., Hugging Face’s BLOOM (176B) [42], Meta’s Open Pre-train Transformer (OPT) Language Model [63], Llama2 Open Foundation and Fine-Tuned Chat Models [58], Meta’s No Language Left Behind (NLLB) Machine Translation Model [57], Google’s GLaM [51]. Carbon impact assessment and disclosure, when becoming ingrained in the AI technology development process, drives responsible innovations.

Carbon Metrics: Many carbon-related metrics have been explored to align sustainability optimization from the perspective of computer system design and management. CFE (Carbon Free Energy) is used to guide Google’s 24/7 carbon-free datacenter computing whereas Exergy is proposed as a measure for the available energy in a system, used to guide the sustainability design space for computer systems [16]. In addition, new metrics, such as CDP (Carbon-Delay Product), CEP (Carbon-Energy Product) [29], tCDP (total Carbon-Delay Product) [22], are proposed to align design and optimization. Knowing what carbon metrics to use for when is still at a nascent stage while it is intuitive that directly optimizing carbon than energy is more effective since it is a direct measure of carbon emission impacts than energy-based metrics, such as megawatt-hour (MWh).

From the perspective of power grids, a key aspect contributing to the carbon intensity of energy produced is what energy source(s) is on the margin – the marginal generation units. If wind generators were on the margin, it translates to a lower carbon emission rate. On the other hand, if a coal unit is on the margin, any additional energy demand in the grid comes with higher carbon emissions. The metric of Marginal Emission Rates (MER) is defined as, given a location, emission rate of the individual marginal unit at that time. To minimize carbon emissions, the power grid must minimize the likelihood to activate energy generation units of high carbon intensity, thus, lowering MER. On the other hand, Average Emission Rates (AER) which simply measures the average emissions of all of the generator units, is more commonly used for carbon optimization.

Figure 6 shows the difference between MER and AER metrics at two different locations. First, MER is higher in magnitude compared to AER. This is important as it might change the cost/benefit trade-offs in multi objective optimization. Second, standard deviation of MER is larger than AER at CAISO. This is result of the solar energy generation, which is much higher compared to PJM during daytime hours in CAISO. The variability in MER shows the importance of coping with intermittent renewable energy generation, which is not reflected in AER as significantly.

While what metric to use for demand-response scheduling or renewable energy procurement strategies, is still an open question for at-scale deployment, it is clear that coordination between datacenters and the grid is necessary to achieve a clean energy future in a cost-effective and reliable way [32].

Sustainability as a Computer System Design Principle: Figure 7 depicts sources for carbon footprint improvement of warehouse-scale computing. With carbon telemetry, high-quality carbon datasets and metrics, it opens new design and optimization opportunities across the system stack, from workload management in warehouse-scale computing infrastructures, programming languages and runtime management to system architecture and hardware design. At the datacenter level, what new features need to be introduced to WSC infrastructures, for datacenter operators to cooperate with power grids and reduce its operational carbon footprint? Recent endeavors are already exploring carbon-aware datacenter design and management [2], with advanced planning [62, 41] or in real-time [53].

In addition, we envision significant potential for operational carbon footprint reduction in datacenter computing if delay tolerance of computations is visible. At a coarser-granularity, services can be designed with feature tiers in mind. By embedding modularity into cloud services, the power consumption of the services at the datacenter scale can be modulated seamlessly depending on power capacity availability or carbon intensity of electricity. At a finer-granularity, application software can be designed with energy and carbon in mind to enable design space tradeoff between execution time performance and carbon emissions, such as Treehouse [4].

The design space that puts carbon as the first-class optimization principle brings new challenges.

•

Higher energy efficiency does not always translate into lower carbon emissions. Carbon intensity of electricity varies over time and depends on how electricity is generated at power grids in different geographic regions.
•

The supply chain of electronics manufacturing spans across many geographic regions of distinct energy generation characteristics and over time, leading to a wide design space for greener semiconductor manufacturing.
•

Optimizing the efficiency of semiconductor manufacturing process and improving the chemical gas abatement process are effective ways to further reduce the embodied carbon emissions of AI systems.

V Conclusion

To ensure an environmentally-sustainable growth of AI, we must focus on efficiency optimization holistically across the entire AI system stack — data, algorithms and models, systems, and infrastructures at-scale. Efficiency optimization helps bend the ever-increasing resource demands of AI. AI also plays an important role in the solution space, demonstrating significant potential to discover new catalysts to address energy storage efficiency challenges [66], to unlock the potential of renewable energy generation [3], to accelerate the discovery of greener chemical gasses for semiconductor manufacturing abatement, and many more. Furthermore, a circular economy for computing [49], from consumer electronics to infrastructures at-scale, that supports the sustainability principle of reduce, reuse, repair and recycle, will have a step-function change to reducing computing’s environmental impact. Looking forward, sustainability for computing means more than delivering first-class computing capabilities with minimal impact on the environment — large-scale computing infrastructures need to become more reliable, secure and resilient to extreme weather events [60]. It is upon us — each and everyone of us — to contribute to a sustainable future for computing and the society.

Acknowledgements

This work is based on the experience and key lessons learned during the Green AI journey that many colleagues at Meta have contributed to. The original paper on Sustainable AI: Environmental Implications, Challenges and Opportunities is published at the 2022 Conference on Machine Learning and Systems [61]. In addition, we would like to highlight the important efficiency research being carried out over the overall AI model development cycle by Newsha Ardalani, Zachary DeVito, Mostafa Elhoushi, Jeff Johnson, Samual Hsia, Basil Hosmer, Michael Kuchnik, and Yejin Lee. We would also like to thank our partners: Justin Meza, Raghu Prabhu, Thote Gowda, Katherine Hurrell, Ricky Ghoshroy, Bruce McLeish, Brian White, Janaki Vamaraju, Alex Bruefach, Tobias Tiecke, Jordan Tse, Frances Amatruda, Sylvia Lee, Nikky Avila, Holly Lahd, Aynsley Kretschmar, Urvi Parekh, Brent Morgan, Peter Freed, Jeff Bladen for their inputs and collaboration on energy analytics and carbon emission modeling; Doug Carmean and Larry Zitnik for brainstorming optimization opportunities at the intersection of AI, computing, and sustainability; Joelle Pineau for her leadership and vision without which this work would not have been possible.

References

[1] B. Acun, B. Lee, F. Kazhamiaka, K. Maeng, U. Gupta, M. Chakkaravarthy, D. Brooks, and C.-J. Wu, “Carbon explorer: A holistic framework for designing carbon aware datacenters,” in Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, 2023.
[2] B. Acun, B. Lee, F. Kazhamiaka, A. Sundarrajan, K. Maeng, M. Chakkaravarthy, D. Brooks, and C.-J. Wu, “Carbon dependencies in datacenter design and management,” SIGENERGY Energy Inform. Rev., vol. 3, no. 3, 2023.
[3] B. Acun, B. Morgan, H. Richardson, N. Steinsultz, and C.-J. Wu, “Unlocking the potential of renewable energy through curtailment prediction,” in NeurIPS 2023 Workshop on Tackling Climate Change with Machine Learning, 2023. [Online]. Available: https://www.climatechange.ai/papers/neurips2023/123
[4] T. Anderson, A. Belay, M. Chowdhury, A. Cidon, and I. Zhang, “Treehouse: A case for carbon-aware datacenter software,” SIGENERGY Energy Inform. Rev., 2023.
[5] Apple, “Product Environmental Report iPhone 15 Pro and iPhone 15 Pro Max,” https://www.apple.com/environment/pdf/products/iphone/iPhone_15_Pro_and_iPhone_15_Pro_Max_Sept2023.pdf, 2023.
[6] R. U. Ayres, “Life cycle analysis: A critique,” Resources, Conservation and Recycling, vol. 14, no. 3, pp. 199–223, 1995.
[7] M. Bardon, P. Wuytens, L.-A. Ragnarsson, G. Mirabelli, D. Jang, G. Willems, A. Mallik, A. Spessot, J. Ryckaert, and B. Parvais, “Dtco including sustainability: Power-performance-area-cost-environmental score (ppace) analysis for logic technologies,” in Proceedings of the IEEE International Electron Devices Meeting, 2020.
[8] L. A. Barroso, “A brief history of warehouse-scale computing,” IEEE Micro, vol. 41(02), pp. 78–83, 2021.
[9] L. A. Barroso and U. Hölzle, “The case for energy-proportional computing,” IEEE Computer, vol. 40, 2007.
[10] M. Barthel, “Northern virginia’s data center industry is booming. but is it sustainable?” https://dcist.com/story/23/09/01/northern-virginia-data-center-report/, 2023.
[11] Benjamin Davy, “Building an aws ec2 carbon emissions dataset,” ttps://medium.com/teads-engineering/building-an-aws-ec2-carbon-emissions-dataset-3f0fd76c98ac, 2021.
[12] D. S. Berger, D. Brooks, F. Kazhamiaka, M. D. Hill, R. Bianchini, C.-J. Wu, K. Strauss, K. Frost, J. Wang, K. Martins, S. Gillett, E. Choukse, D. Ernst, R. Fonseca, K. Lio, B. Narayanasetty, P. Patel, C. Irvene, A. Sriraman, G. Porter, A. Jones, U. Gupta, B. Acun, K. Hazelwood, and D. Carmean, “Reducing embodied carbon is important,” https://www.sigarch.org/reducing-embodied-carbon-is-important/, 2023.
[13] T. Brooks, B. Peebles, C. Homes, W. DePue, Y. Guo, L. Jing, D. Schnurr, J. Taylor, T. Luhman, E. Luhman, C. Ng, R. Wang, and A. Ramesh, “Video generation models as world simulators,” Open AI Research, 2024. [Online]. Available: https://openai.com/research/video-generation-models-as-world-simulators
[14] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language models are few-shot learners,” 2020.
[15] J. Campbell, “Data centres used 14% of republic of ireland’s electricity use,” https://www.bbc.com/news/world-europe-61308747, 2022.
[16] J. Chang, J. Meza, P. Ranganathan, A. Shah, R. Shih, and C. Bash, “Totally green: evaluating and designing servers for lifecycle environmental impact,” in Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, 2012.
[17] T. Dao, D. Y. Fu, S. Ermon, A. Rudra, and C. Ré, “Flashattention: Fast and memory-efficient exact attention with io-awareness,” 2022.
[18] J. Dean and L. A. Barroso, “The tail at scale,” Communications of the ACM, vol. 56, pp. 74–80, 2013.
[19] Dell, “Life cycle assessment of dell poweredge r740,” https://corporate.delltechnologies.com/content/dam/digitalassets/active/en/unauth/data-sheets/products/servers/lca_poweredge_r740.pdf, 2019.
[20] Dominion Energy, “Planing and investing for our future,” https://www.dominionenergy.com/-/media/pdfs/global/company/2021-de-integrated-resource-plan.pdf, 2021.
[21] N. Du, Y. Huang, A. M. Dai, S. Tong, D. Lepikhin, Y. Xu, M. Krikun, Y. Zhou, A. W. Yu, O. Firat, B. Zoph, L. Fedus, M. Bosma, Z. Zhou, T. Wang, Y. E. Wang, K. Webster, M. Pellat, K. Robinson, K. Meier-Hellstern, T. Duke, L. Dixon, K. Zhang, Q. V. Le, Y. Wu, Z. Chen, and C. Cui, “Glam: Efficient scaling of language models with mixture-of-experts,” 2022.
[22] M. Elgamal, D. Carmean, E. Ansari, O. Zed, R. Peri, S. Manne, U. Gupta, G.-Y. Wei, D. Brooks, G. Hills, and C.-J. Wu, “Carbon-efficient design optimization for computing systems,” in Proceedings of the 2nd Workshop on Sustainable Computer Systems, 2023.
[23] R. Galvin, “Data centers are pushing ireland’s electric grid to the brink,” https://gizmodo.com/data-centers-are-pushing-ireland-s-electric-grid-to-the-1848282390, 2021.
[24] A. Golden, S. Hsia, F. Sun, B. Acun, B. Hosmer, Y. Lee, Z. DeVito, J. Johnson, G.-Y. Wei, D. Brooks, and C.-J. Wu, “Generative AI beyond LLMs: System implications of multi-modal generation,” in IEEE International Symposium on Performance Analysis of Systems and Software, 2024.
[25] M. Gooding, “Global data center electricity use to double by 2026 - IEA report,” https://www.datacenterdynamics.com/en/news/global-data-center-electricity-use-to-double-by-2026-report/, 2024.
[26] Google, “2023 environmental report,” https://sustainability.google/reports/google-2023-environmental-report/.
[27] Google, “Google Data Centers Efficiency,” https://www.google.com/about/datacenters/efficiency/.
[28] Greenhouse Gas Protocol, “The greenhouse gas protocol: A corporate accounting and reporting standard,” https://ghgprotocol.org/corporate-standard.
[29] U. Gupta, M. Elgamal, G. Hills, G.-Y. Wei, H.-H. S. Lee, D. Brooks, and C.-J. Wu, “ACT: Designing sustainable computer systems with an architectural carbon modeling tool,” in Proceedings of the 49th Annual International Symposium on Computer Architecture, 2022.
[30] U. Gupta, Y. G. Kim, S. Lee, J. Tse, H.-H. S. Lee, G.-Y. Wei, D. Brooks, and C.-J. Wu, “Chasing Carbon: The elusive environmental footprint of computing,” in 2021 IEEE International Symposium on High-Performance Computer Architecture, 2021.
[31] U. Gupta, C.-J. Wu, X. Wang, M. Naumov, B. Reagen, D. Brooks, B. Cottel, K. Hazelwood, M. Hempstead, B. Jia, H.-H. S. Lee, A. Malevich, D. Mudigere, M. Smelyanskiy, L. Xiong, and X. Zhang, “The architectural implications of facebook’s dnn-based personalized recommendation,” in IEEE International Symposium on High Performance Computer Architecture (HPCA), 2020, pp. 488–501.
[32] H. He, A. Derenchuk, R. Tabors, and A. Rudkevich, “A comparison of strategies for tackling corporate scope 2 carbon emissions,” https://tcr-us.com/paths-to-carbon-neutrality-tcr-white-paper.html, 2023.
[33] P. Henderson, J. Hu, J. Romoff, E. Brunskill, D. Jurafsky, and J. Pineau, “Towards the systematic reporting of the energy and carbon footprints of machine learning,” J. Mach. Learn. Res., 2020.
[34] M. D. Hill and V. J. Reddi, “Accelerator-level parallelism,” 2021.
[35] S. Hsia, A. Golden, B. Acun, N. Ardalani, Z. DeVito, G.-Y. Wei, D. Brooks, and C.-J. Wu, “Mad max beyond single-node: Enabling large machine learning model acceleration on distributed systems,” 2023.
[36] P. Judge, “EirGrid pulls plug on 30 irish data center projects,” https://www.datacenterdynamics.com/en/news/eirgrid-pulls-plug-on-30-irish-data-center-projects/, 2022.
[37] J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei, “Scaling laws for neural language models,” 2020.
[38] D. Kline, N. Parshook, X. Ge, E. Brunvand, R. Melhem, P. K. Chrysanthis, and A. K. Jones, “GreenChip: A tool for evaluating holistic sustainability of modern computing systems,” Sustainable Computing: Informatics and Systems, vol. 22, 2019.
[39] A. Klotz, “Startups are raking in up to $85,000 per day by recycling gold and copper from electronics thrown in the trash — e-waste ’gold mining’ efforts are expanding,” https://www.tomshardware.com/pc-components/cpus/startups-are-raking-in-up-to-dollar85000-per-day-by-recycling-gold-and-copper-from-electronics-thrown-in-the-trash-e-waste-gold-mining-efforts-are-expanding, 2024.
[40] Y. Lee, C.-J. Wu, C. Puhrsch, J. Schlosser, D. Guessous, J. Wan, J. Isaacson, C. Balioglu, and J. Pino, “Accelerating generative ai with pytorch iv: Seamless m4t, fast,” https://pytorch.org/blog/accelerating-generative-ai-4/, 2024.
[41] L. Lin and A. A. Chien, “Adapting datacenter capacity for greener datacenters and grid,” in Proceedings of the 14th ACM International Conference on Future Energy Systems, 2023.
[42] A. S. Luccioni, S. Viguier, and A.-L. Ligozat, “Estimating the carbon footprint of bloom, a 176b parameter language model,” 2022.
[43] E. Masanet, A. Shehabi, N. Lei, S. Smith, and J. Koomey, “Recalibrating global data center energy-use estimates,” Science, vol. 367, no. 6481, pp. 984–986, 2020.
[44] Meta, “2023 Sustainability Report,” https://sustainability.fb.com/2023-sustainability-report/.
[45] Meta, “Meta Sustainability – Data Centers,” https://sustainability.fb.com/report-page/data-centers/.
[46] Microsoft, “The 2023 Impact Summary,” https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RW1f1Fv.
[47] Open AI, “Dall-e: Creating images from text,” 2021. [Online]. Available: https://openai.com/research/dall-e
[48] Open AI, “Introducing chatgpt,” 2022. [Online]. Available: https://openai.com/blog/chatgpt
[49] Open Compute Project, “OCP Sustainability: Call for Climate Action & Circularity for Information and Communications Technology (ICT) Industry,” https://www.opencompute.org/documents/ocp-sustainability-2021-industry-whitepaper-pdf, 2021.
[50] Oracle, “Oracle Social Impact Datasheet,” https://www.oracle.com/a/ocom/docs/social-impact-datasheet.pdf, 2024.
[51] D. Patterson, J. Gonzalez, U. Hölzle, Q. Le, C. Liang, L.-M. Munguia, D. Rothchild, D. R. So, M. Texier, and J. Dean, “The carbon footprint of machine learning training will plateau, then shrink,” Computer, 2022.
[52] D. Patterson, J. Gonzalez, Q. Le, C. Liang, L.-M. Munguia, D. Rothchild, D. So, M. Texier, and J. Dean, “Carbon emissions and large neural network training,” in arXiv 2104.10350, 2021.
[53] A. Radovanović, R. Koningstein, I. Schneider, B. Chen, A. Duarte, B. Roy, D. Xiao, M. Haridasan, P. Hung, N. Care, S. Talukdar, E. Mullen, K. Smith, M. Cottman, and W. Cirne, “Carbon-aware computing for datacenters,” IEEE Transactions on Power Systems, 2023.
[54] N. Sachdeva, C.-J. Wu, and J. McAuley, “Svp-cf: Selection via proxy for collaborative filtering data,” 2021.
[55] B. Sorscher, R. Geirhos, S. Shekhar, S. Ganguli, and A. S. Morcos, “Beyond neural scaling laws: beating power law scaling via data pruning,” in Advances in Neural Information Processing Systems, A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, Eds., 2022. [Online]. Available: https://openreview.net/forum?id=UmvSlP-PyV
[56] Y. Sun, N. B. Agostini, S. Dong, and D. Kaeli, “Summarizing cpu and gpu design trends with product data,” ArXiv, vol. abs/1911.11313, 2019.
[57] N. Team, M. R. Costa-jussà, J. Cross, O. Çelebi, M. Elbayad, K. Heafield, K. Heffernan, E. Kalbassi, J. Lam, D. Licht, J. Maillard, A. Sun, S. Wang, G. Wenzek, A. Youngblood, B. Akula, L. Barrault, G. M. Gonzalez, P. Hansanti, J. Hoffman, S. Jarrett, K. R. Sadagopan, D. Rowe, S. Spruit, C. Tran, P. Andrews, N. F. Ayan, S. Bhosale, S. Edunov, A. Fan, C. Gao, V. Goswami, F. Guzmán, P. Koehn, A. Mourachko, C. Ropers, S. Saleem, H. Schwenk, and J. Wang, “No language left behind: Scaling human-centered machine translation,” 2022.
[58] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, “Llama: Open and efficient foundation language models,” in arXiv 2302.13971, 2023.
[59] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017. [Online]. Available: https://arxiv.org/pdf/1706.03762.pdf
[60] C.-J. Wu, S. Manne, P. Ranganathan, S. Bird, and S. Greenstein, “Socio-technological challenges and opportunities: Paths forward,” 2021.
[61] C.-J. Wu, R. Raghavendra, U. Gupta, B. Acun, N. Ardalani, K. Maeng, G. Chang, F. A. Behram, J. Huang, C. Bai, M. Gschwind, A. Gupta, M. Ott, A. Melnikov, S. Candido, D. Brooks, G. Chauhan, B. Lee, H.-H. S. Lee, B. Akyildiz, M. Balandat, J. Spisak, R. Jain, M. Rabbat, and K. Hazelwood, “Sustainable AI: Environmental implications, challenges and opportunities,” in Proceedings of Machine Learning and Systems, 2022.
[62] J. Xing, B. Acun, A. Sundarrajan, D. Brooks, M. Chakkaravarthy, N. Avila, C.-J. Wu, and B. C. Lee, “Carbon responder: Coordinating demand response for the datacenter fleet,” 2023.
[63] S. Zhang, S. Roller, N. Goyal, M. Artetxe, M. Chen, S. Chen, C. Dewan, M. Diab, X. Li, X. V. Lin, T. Mihaylov, M. Ott, S. Shleifer, K. Shuster, D. Simig, P. S. Koura, A. Sridhar, T. Wang, and L. Zettlemoyer, “Opt: Open pre-trained transformer language models,” 2022.
[64] M. Zhao, N. Agarwal, A. Basant, B. Gedik, S. Pan, M. Ozdal, R. Komuravelli, J. Pan, T. Bao, H. Lu, S. Narayanan, J. Langman, K. Wilfong, H. Rastogi, C.-J. Wu, C. Kozyrakis, and P. Pol, “Understanding data storage and ingestion for large-scale deep recommendation model training: industrial product,” in Proceedings of the 49th Annual International Symposium on Computer Architecture, 2022.
[65] M. Zhao, S. Pan, N. Agarwal, Z. Wen, D. Xu, A. Natarajan, P. Kumar, S. S. P, R. Tijoriwala, K. Asher, H. Wu, A. Basant, D. Ford, D. David, N. Yigitbasi, P. Singh, and C.-J. Wu, “Tectonic-Shift: A composite storage fabric for Large-Scale ML training,” in 2023 USENIX Annual Technical Conference (USENIX ATC 23), 2023.
[66] C. L. Zitnick, L. Chanussot, A. Das, S. Goyal, J. Heras-Domingo, C. Ho, W. Hu, T. Lavril, A. Palizhati, M. Riviere, M. Shuaibi, A. Sriram, K. Tran, B. Wood, J. Yoon, D. Parikh, and Z. Ulissi, “An introduction to electrocatalyst design using machine learning for renewable energy storage,” in arXiv 2010.09435, 2020.