- ecologits
+ _ecologits
diff --git a/dev/search/search_index.json b/dev/search/search_index.json
index b497682..ae62d8a 100644
--- a/dev/search/search_index.json
+++ b/dev/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Welcome to EcoLogits","text":"EcoLogits tracks the energy consumption and environmental impacts of using generative AI models through APIs. It supports major LLM providers such as OpenAI, Anthropic, Mistral AI and more (see supported providers).
"},{"location":"#requirements","title":"Requirements","text":"Python 3.9+
EcoLogits relies on key libraries to provide essential functionalities:
Pydantic for data modeling. Wrapt for function patching. "},{"location":"#installation","title":"Installation","text":"Select providers
Anthropic Cohere Google Gemini Hugging Face Inference Endpoints LiteLLM Mistral AI OpenAI
Run this command
For detailed instructions on each provider, refer to the complete list of supported providers and features. It is also possible to install EcoLogits without any provider.
"},{"location":"#usage-example","title":"Usage Example","text":"Below is a simple example demonstrating how to use the GPT-3.5-Turbo model from OpenAI with EcoLogits to track environmental impacts.
from ecologits import EcoLogits\nfrom openai import OpenAI\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = OpenAI(api_key=\"<OPENAI_API_KEY>\")\n\nresponse = client.chat.completions.create(\n model=\"gpt-3.5-turbo\",\n messages=[\n {\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}\n ]\n)\n\n# Get estimated environmental impacts of the inference\nprint(f\"Energy consumption: {response.impacts.energy.value} kWh\")\nprint(f\"GHG emissions: {response.impacts.gwp.value} kgCO2eq\")\n
Environmental impacts are quantified based on four criteria and across two phases:
Criteria:
Energy (energy): Final energy consumption in kWh, Global Warming Potential (gwp): Potential impact on global warming in kgCO2eq (commonly known as GHG/carbon emissions), Abiotic Depletion Potential for Elements (adpe): Impact on the depletion of non-living resources such as minerals or metals in kgSbeq, Primary Energy (pe): Total energy consumed from primary sources in MJ. Phases:
Usage (usage): Represents the phase of energy consumption during model execution, Embodied (embodied): Encompasses resource extraction, manufacturing, and transportation phases associated with the model's lifecycle. Learn more about environmental impacts assessment in the methodology section.
"},{"location":"#license","title":"License","text":"This project is licensed under the terms of the Mozilla Public License Version 2.0 (MPL-2.0) .
"},{"location":"#acknowledgements","title":"Acknowledgements","text":"EcoLogits is actively developed and maintained by GenAI Impact non-profit. We extend our gratitude to Data For Good and Boavizta for supporting the development of this project. Their contributions of tools, best practices, and expertise in environmental impact assessment have been invaluable.
"},{"location":"contributing/","title":"Contribution","text":"Help us improve EcoLogits by contributing!
"},{"location":"contributing/#issues","title":"Issues","text":"Questions, feature requests and bug reports are all welcome as discussions or issues.
When submitting a feature request or bug report, please provide as much detail as possible. For bug reports, please include relevant information about your environment, including the version of EcoLogits and other Python dependencies used in your project.
"},{"location":"contributing/#pull-requests","title":"Pull Requests","text":"Getting started and creating a Pull Request is a straightforward process. Since EcoLogits is regularly updated, you can expect to see your contributions incorporated into the project within a matter of days or weeks.
For non-trivial changes, please create an issue to discuss your proposal before submitting pull request. This ensures we can review and refine your idea before implementation.
"},{"location":"contributing/#prerequisites","title":"Prerequisites","text":"You'll need to meet the following requirements:
Python version above 3.9 git make poetry pre-commit "},{"location":"contributing/#installation-and-setup","title":"Installation and setup","text":"Fork the repository on GitHub and clone your fork locally.
# Clone your fork and cd into the repo directory\ngit clone git@github.com:<your username>/ecologits.git\ncd ecologits\n\n# Install ecologits development dependencies with poetry\nmake install\n
"},{"location":"contributing/#check-out-a-new-branch-and-make-your-changes","title":"Check out a new branch and make your changes","text":"Create a new branch for your changes.
# Checkout a new branch and make your changes\ngit checkout -b my-new-feature-branch\n# Make your changes and implements tests...\n
"},{"location":"contributing/#run-tests","title":"Run tests","text":"Run tests locally to make sure everything is working as expected.
make test\n
If you have added a new provider you will need to record your tests with VCR.py through pytest-recording.
make test-record\n
Once your tests are recorded, please check that the newly created cassette files (located in tests/cassettes/...
) do not contain any sensible information like API tokens. If so you will need to update the configuration accordingly in conftest.py
and run again the command to record tests.
"},{"location":"contributing/#build-documentation","title":"Build documentation","text":"If you've made any changes to the documentation (including changes to function signatures, class definitions, or docstrings that will appear in the API documentation), make sure it builds successfully.
# Build documentation\nmake docs\n# If you have changed the documentation, make sure it builds successfully.\n
You can also serve the documentation locally.
# Serve the documentation at localhost:8000\npoetry run mkdocs serve\n
"},{"location":"contributing/#code-formatting-and-pre-commit","title":"Code formatting and pre-commit","text":"Before pushing your work, run the pre-commit hook that will check and lint your code.
# Run all checks before commit\nmake pre-commit\n
"},{"location":"contributing/#commit-and-push-your-changes","title":"Commit and push your changes","text":"Commit your changes, push your branch to GitHub, and create a pull request.
Please follow the pull request template and fill in as much information as possible. Link to any relevant issues and include a description of your changes.
When your pull request is ready for review, add a comment with the message \"please review\" and we'll take a look as soon as we can.
"},{"location":"contributing/#documentation-style","title":"Documentation style","text":"Documentation is written in Markdown and built using Material for MkDocs. API documentation is build from docstrings using mkdocstrings.
"},{"location":"contributing/#code-documentation","title":"Code documentation","text":"When contributing to EcoLogits, please make sure that all code is well documented. The following should be documented using properly formatted docstrings.
We use Google-style docstrings formatted according to PEP 257 guidelines. (See Example Google Style Python Docstrings for further examples.)
"},{"location":"contributing/#documentation-style_1","title":"Documentation style","text":"Documentation should be written in a clear, concise, and approachable tone, making it easy for readers to understand and follow along. Aim for brevity while still providing complete information.
Code examples are highly encouraged, but should be kept short, simple and self-contained. Ensure that each example is complete, runnable, and can be easily executed by readers.
"},{"location":"contributing/#acknowledgment","title":"Acknowledgment","text":"We'd like to acknowledge that this contribution guide is heavily inspired by the excellent guide from Pydantic. Thanks for the inspiration!
"},{"location":"faq/","title":"Frequently Asked Questions","text":""},{"location":"faq/#why-are-training-impacts-not-included","title":"Why are training impacts not included?","text":"Even though the training impacts of generative AI models are substantial, we currently do not implement them in our methodologies and tools. EcoLogits is aimed at estimating the impacts of an API request made to a GenAI service. To make the impact assessment complete, we indeed should take into account training impacts. However, given that we focus on services that are used by millions of people, doing billions of requests annually the training impacts are in fact negligible.
For example, looking at Llama 3 70B, the estimated training greenhouse gas emissions are \\(1,900\\ tCO2eq\\). This is significant for an AI model but comparing it to running inference on that model for say 100 billion requests annually makes the share of impacts induced by training the model becomes very small. E.g., \\(\\frac{1,900\\ \\text{tCO2eq}}{100\\ \\text{billion requests}} = 1.9e-8\\ \\text{tCO2eq per request}\\) or \\(0.019\\ \\text{gCO2eq per request}\\). This, compared to running a simple request to Llama 3 70B that would yield \\(1\\ \\text{to}\\ 5\\ \\text{gCO2}\\) (calculated with our methodology).
It does not mean that we do not plan to integrate training impacts, it is just not a priority right now due to the difference in order of magnitude. It is also worth mentioning that estimating the number of requests that will be ever made in the lifespan of a model is very difficult, both for open-source and proprietary models. You can join the discussion on GitHub #70 .
"},{"location":"faq/#whats-the-difference-with-codecarbon","title":"What's the difference with CodeCarbon?","text":"EcoLogits and CodeCarbon are two different tools that do not aim to address the same use case. CodeCarbon should be used when you control the execution environment of your model. This means that if you deploy models on your laptop, your server or in the cloud it is preferable to use CodeCarbon to get energy consumption and estimate carbon emissions associated with running your model (including training, fine-tuning or inference).
On the other hand EcoLogits is designed for scenarios where you do not have access to the execution environment of your GenAI model because it is managed by a third-party provider. In such cases you can rely on EcoLogits to estimate energy consumption and environmental impacts for inference workloads. Both tools are complementary and can be used together to provide a comprehensive view of environmental impacts across different deployment scenarios.
"},{"location":"faq/#how-can-i-estimate-impacts-of-general-use-of-genai-models","title":"How can I estimate impacts of general use of GenAI models?","text":"If you want to estimate the environmental impacts of using generative AI models without coding or making request, we recommend you to use our online webapp EcoLogits Calculator .
"},{"location":"faq/#how-do-we-assess-impacts-for-proprietary-models","title":"How do we assess impacts for proprietary models?","text":"Environmental impacts are calculated based on model architecture and parameter count. For proprietary models, we lack transparency from providers, so we estimate parameter counts using available information. For GPT models, we based our estimates on leaked GPT-4 architecture and scaled parameters count for GPT-4-Turbo and GPT-4o based on pricing differences. For other proprietary models like Anthropic's Claude, we assume similar impacts for models released around the same time with similar performance on public benchmarks. Please note that these estimates are based on assumptions and may not be exact. Our methods are open-source and transparent, so you can always see the hypotheses we use.
"},{"location":"faq/#how-to-reduce-my-environmental-impact","title":"How to reduce my environmental impact?","text":"First, you may want to assess indirect impacts and rebound effects of the project you are building. Does the finality of your product or service is impacting negatively the environment? Does the usage of your product or service drives up consumption and environmental impacts of previously existing technology?
Try to be frugal and question your usages or needs of AI:
Do you really need AI to solve your problem? Do you really need GenAI to solve your problem? (you can read this paper ) Prefer fine-tuning of small and existing models over generalist models. Evaluate before, during and after the development of your project the environmental impacts with tools like EcoLogits or CodeCarbon (see more tools ) Restrict the use case and limit the usage of your tool or feature to the desired purpose. Do not buy new GPUs or hardware. Hardware production for data centers is responsible for around 50% of the impacts compared to usage impacts. The share is even more bigger for consumer devices, around 80%.
Use cloud instances that are located in low emissions / high energy efficiency data centers (see electricitymaps.com ).
Optimize your models for production use cases. You can look at model compression technics such as quantization, pruning or distillation. There are also inference optimization tricks available in some software.
"},{"location":"why/","title":"Why use EcoLogits?","text":"Generative AI significantly impacts our environment, consuming electricity and contributing to global greenhouse gas emissions. In 2020, the ICT sector accounted for 2.1% to 3.9% of global emissions, with projections suggesting an increase to 6%-8% by 2025 due to continued growth and adoption Freitag et al., 2021. The advent of GenAI technologies like ChatGPT has further exacerbated this trend, causing a sharp rise in energy, water, and hardware costs for major tech companies. [0, 1].
"},{"location":"why/#which-is-bigger-training-or-inference-impacts","title":"Which is bigger: training or inference impacts?","text":"The field of Green AI focuses on evaluating the environmental impacts of AI models. While many studies have concentrated on training impacts [2], they often overlook other critical phases like data collection, storage and processing phases, research experiments and inference. For GenAI, the inference phase can significantly overshadow training impacts when models are deployed at scale [3]. EcoLogits specifically addresses this gap by focusing on the inference impacts of GenAI.
"},{"location":"why/#how-to-assess-impacts-properly","title":"How to assess impacts properly?","text":"EcoLogits employs state-of-the-art methodologies based on Life Cycle Assessment and open data to assess environmental impacts across multiple phases and criteria. This includes usage impacts from electricity consumption and embodied impacts from the production and transportation of hardware. Our multi-criteria approach also evaluates carbon emissions, abiotic resource depletion, and primary energy consumption, providing a comprehensive view that informs decisions like model selection, hardware upgrades and cloud deployments.
"},{"location":"why/#how-difficult-is-it","title":"How difficult is it?","text":"Assessing environmental impacts can be challenging with external providers due to lack of control over the execution environment. Meaning you can easily estimate usage impact regarding energy consumption with CodeCarbon and also embodied impacts with BoaviztAPI, but these tools become less relevant with external service providers. EcoLogits simplifies this by basing calculations on well-founded assumptions about hardware, model size, and operational practices, making it easier to estimate impacts accurately. For more details, see our methodology section.
"},{"location":"why/#easy-to-use","title":"Easy to use","text":"EcoLogits integrates seamlessly into existing GenAI providers, allowing you to assess the environmental impact of each API request with minimal code adjustments:
from ecologits import EcoLogits\n\nEcoLogits.init() \n\n# Then, you can make request to any supported provider.\n
See the list of supported providers and more code snippets in the tutorial section.
"},{"location":"why/#have-more-questions","title":"Have more questions?","text":"Feel free to ask question in our GitHub discussions forum!
"},{"location":"methodology/","title":"Methodology","text":""},{"location":"methodology/#evaluation-methodologies","title":"Evaluation methodologies","text":"The following methodologies are currently available and implemented in EcoLogits:
Upcoming methodologies (join us to help speed up our progress):
Embeddings Image Generation Multi-Modal "},{"location":"methodology/#methodological-background","title":"Methodological background","text":"EcoLogits employs the Life Cycle Assessment (LCA) methodology, as defined by ISO 14044, to estimate the environmental impacts of requests made to generative AI inference services. This approach focuses on multiple phases of the lifecycle, specifically raw material extraction, manufacturing, transportation (denoted as embodied impacts), usage and end-of-life. Notably, we do not cover the end-of-life phase due to data limitations on e-waste recycling.
Our assessment considers three key environmental criteria:
Global Warming Potential (GWP): Evaluates the impact on global warming in terms of CO2 equivalents. Abiotic Resource Depletion for Elements (ADPe): Assesses the consumption of raw minerals and metals, expressed in antimony equivalents. Primary Energy (PE): Calculates energy consumed from natural sources, expressed in megajoules. Using a bottom-up modeling approach, we assess and aggregate the environmental impacts of all individual service components. This method differs from top-down approaches by allowing precise allocation of each resource's impact to the overall environmental footprint.
Our current focus is on high-performance GPU-accelerated cloud instances, crucial for GenAI inference tasks. While we exclude impacts from training, networking, and end-user devices, we thoroughly evaluate the impacts associated with hosting and running the model inferences.
The methodology is grounded in transparency and reproducibility, utilizing open market and technical data to ensure our results are reliable and verifiable.
"},{"location":"methodology/#licenses-and-citations","title":"Licenses and citations","text":"All the methodologies are licensed under CC BY-SA 4.0
Please ensure that you adhere to the license terms and properly cite the authors and the GenAI Impact non-profit organization when utilizing this work. Each methodology has an associated paper with specific citation requirements.
"},{"location":"methodology/llm_inference/","title":"LLM Inference","text":"Page still under construction
This page is still under construction. If you spot any inaccuracies or have questions about the methodology itself, feel free to open an issue on GitHub.
Early Publication
Beware that this is an early version of the methodology to evaluate the environmental impacts of LLMs at inference. We are still testing and reviewing the methodology internally. Some parts of the methodology may change in the near future.
"},{"location":"methodology/llm_inference/#environmental-impacts-of-llm-inference","title":"Environmental Impacts of LLM Inference","text":"Known limitations and hypotheses Based on a production setup: models are quantized, high-end servers with A100... Current implementation of EcoLogits assumes a fixed and worldwide impact factor for electricity mix. Model architectures are assumed when not dislosed by the provider. Not accounting the impacts of unused cloud resources, data center building, network and end-user devices, model training and data collection... Not tested on multi-modal models for text-to-text generation only. The environmental impacts of a request, \\(I_{request}\\) to a Large Language Model (LLM) can be divided into two components: the usage impacts, \\(I_{request}^u\\), which account for energy consumption, and the embodied impacts, \\(I_{request}^e\\), which account for resource extraction, hardware manufacturing, and transportation.
\\[ \\begin{equation*} \\begin{split} I_{request}&=I_{request}^u + I_{request}^e \\\\ &= E_{request}*F_{em}+\\frac{\\Delta T}{\\Delta L}*I_{server}^e \\end{split} \\end{equation*} \\] Where \\(E_{request}\\) represents the energy consumption of the IT resources associated with the request. \\(F_{em}\\) denotes the impact factor of electricity consumption, which varies depending on the location and time. Furthermore, \\(I_{server}^e\\) captures the embodied impacts of the IT resources, and \\(\\frac{\\Delta T}{\\Delta L}\\) signifies the hardware utilization factor, calculated as the computation time divided by the lifetime of the hardware.
"},{"location":"methodology/llm_inference/#usage-impacts","title":"Usage impacts","text":"To assess the usage impacts of an LLM inference, we first need to estimate the energy consumption of the server, which is equipped with one or more GPUs. We will also take into account the energy consumption of cooling equipment integrated with the data center, using the Power Usage Effectiveness (PUE) metric.
Subsequently, we can calculate the environmental impacts by using the \\(F_{em}\\) impact factor of the electricity mix. Ideally, \\(F_{em}\\) should vary with location and time to accurately reflect the local energy mix.
"},{"location":"methodology/llm_inference/#modeling-gpu-energy-consumption","title":"Modeling GPU energy consumption","text":"By leveraging the open dataset from the LLM Perf Leaderboard, produced by Hugging Face, we can estimate the energy consumption of the GPU using a parametric model.
We fit a linear regression model to the dataset, which models the energy consumption per output token as a function of the number of active parameters in the LLM, denoted as \\(P_{active}\\).
What are active parameters? We distinguish between active parameters and total parameter count for Sparse Mixture-of-Experts (SMoE) models. The total parameter count is used to determine the number of required GPUs to load the model into memory. In contrast, the active parameter count is used to estimate the energy consumption of a single GPU. In practice, SMoE models exhibit lower energy consumption per GPU compared to dense models of equivalent size (in terms of total parameters).
For a dense model: \\(P_{active} = P_{total}\\) For a SMoE model: \\(P_{active} = P_{total} / \\text{number of active experts}\\) On the LLM Perf Leaderboard dataset filtering We have filtered the dataset to keep relevant data points for the analysis. In particular we have applied the following conditions:
Model number of parameters >= 7B Keep dtype set to float16 GPU model is \"NVIDIA A100-SXM4-80GB\" No optimization 8bit and 4bit quantization excluding bitsandbytes (bnb) Figure: Energy consumption (in Wh) per output token vs. number of active parameters (in billions) \\[ \\frac{E_{GPU}}{\\#T_{out}} = \\alpha * P_{active} + \\beta \\] We found that \\(\\alpha = 8.91e-5\\) and \\(\\beta = 1.43e-3\\). Using these values, we can estimate the energy consumption of a simple GPU for the entire request, given the number of output tokens \\(\\#T_{out}\\) and the number of active parameters \\(P_{active}\\):
\\[ E_{GPU}(\\#T_{out}, P_{active}) = \\#T_{out} * (\\alpha * P_{active} + \\beta) \\] If the model requires multiple GPUs to be loaded into VRAM, the energy consumption \\(E_{GPU}\\) should be multiplied by the number of GPUs \\(\\#GPU_{required}\\) (see below).
"},{"location":"methodology/llm_inference/#modeling-server-energy-consumption","title":"Modeling server energy consumption","text":"To estimate the energy consumption of the entire server, we will use the previously estimated GPU energy model and separately estimate the energy consumption of the server itself (without GPUs), denoted as \\(E_{server\\backslash GPU}\\).
"},{"location":"methodology/llm_inference/#server-energy-consumption-without-gpus","title":"Server energy consumption without GPUs","text":"To model the energy consumption of the server without GPUs, we consider a fixed power consumption, \\(W_{server\\backslash GPU}\\), during inference (or generation latency), denoted as \\(\\Delta T\\). We assume that the server hosts multiple GPUs, but not all of them are actively used for the target inference. Therefore, we account for a portion of the energy consumption based on the number of required GPUs, \\(\\#GPU_{required}\\):
\\[ E_{server\\backslash GPU}(\\Delta T) = \\Delta T * W_{server\\backslash GPU} * \\frac{\\#GPU_{required}}{\\#GPU_{installed}} \\] For a typical high-end GPU-accelerated cloud instance, we use \\(W_{server\\backslash GPU} = 1\\ kW\\) and \\(\\#GPU_{installed} = 8\\).
"},{"location":"methodology/llm_inference/#estimating-the-generation-latency","title":"Estimating the generation latency","text":"The generation latency, \\(\\Delta T\\), is the duration of the inference measured on the server and is independent of networking latency. We estimate the generation latency using the LLM Perf Leaderboard dataset with the previously mentioned filters applied.
We fit a linear regression model on the dataset modeling the generation latency per output token given the number of active parameters of the LLM \\(P_{active}\\):
Figure: Latency (in s) per output token vs. number of active parameters (in billions) \\[ \\frac{\\Delta T}{\\#T_{out}} = A * P_{active} + B \\] We found \\(A = 8.02e-4\\) and \\(B = 2.23e-2\\). Using these values, we can estimate the generation latency for the entire request given the number of output tokens, \\(\\#T_{out}\\), and the number of active parameters, \\(P_{active}\\). When possible, we also measure the request latency, \\(\\Delta T_{request}\\), and use it as the maximum bound for the generation latency:
\\[ \\Delta T(\\#T_{out}, P_{active}) = \\#T_{out} * (A * P_{active} + B) \\] With the request latency, the generation latency is defined as follows:
\\[ \\Delta T(\\#T_{out}, P_{active}, \\Delta T_{request}) = \\min[\\#T_{out} * (A * P_{active} + B), \\Delta T_{request}] \\]"},{"location":"methodology/llm_inference/#estimating-the-number-of-active-gpus","title":"Estimating the number of active GPUs","text":"To estimate the number of required GPUs, \\(\\#GPU_{required}\\), to load the model in virtual memory, we divide the required memory to host the LLM for inference, \\(M_{model}\\), by the memory available on one GPU, \\(M_{GPU}\\).
The required memory to host the LLM for inference is estimated based on the total number of parameters and the number of bits used for model weights related to quantization. We also apply a memory overhead of \\(1.2\\) (see Transformers Math 101 ):
\\[ M_{model}(P_{total},Q)=\\frac{P_{total}*Q}{8}*1.2 \\] We then estimate the number of required GPUs, rounded up:
\\[ \\#GPU_{required}(P_{total},Q,M_{GPU}) = \\lceil \\frac{M_{model}(P_{total},Q)}{M_{GPU}}\\rceil \\] To stay consistent with previous assumptions based on LLM Perf Leaderboard data, we use \\(M_{GPU} = 80\\ GB\\) for an NVIDIA A100 80GB GPU.
"},{"location":"methodology/llm_inference/#complete-server-energy-consumption","title":"Complete server energy consumption","text":"The total server energy consumption for the request, \\(E_{server}\\), is calculated as follows:
\\[ E_{server} = E_{server\\backslash GPU} + \\#GPU_{required} * E_{GPU} \\]"},{"location":"methodology/llm_inference/#modeling-request-energy-consumption","title":"Modeling request energy consumption","text":"To estimate the energy consumption of the request, we multiply the previously computed server energy by the Power Usage Effectiveness (PUE) to account for cooling equipment in the data center:
\\[ E_{request} = PUE * E_{server} \\] We typically use a \\(PUE = 1.2\\) for hyperscaler data centers or supercomputers.
"},{"location":"methodology/llm_inference/#modeling-request-usage-environmental-impacts","title":"Modeling request usage environmental impacts","text":"To assess the environmental impacts of the request for the usage phase, we multiply the estimated electricity consumption by the impact factor of the electricity mix, \\(F_{em}\\), specific to the target country and time. We currently use a worldwide average multicriteria impact factor from the ADEME Base Empreinte\u00ae:
\\[ I^u_{request} = E_{request} * F_{em} \\] Some values of \\(F_{em}\\) per geographical area Area or country GWP (\\(gCO2eq / kWh\\)) ADPe (\\(kgSbeq / kWh\\)) PE (\\(MJ / kWh\\)) \ud83c\udf10 Worldwide \\(590.4\\) \\(7.378 * 10^{-8}\\) \\(9.99\\) \ud83c\uddea\ud83c\uddfa Europe (EEA) \\(509.4\\) \\(6.423 * 10^{-8}\\) \\(12.9\\) \ud83c\uddfa\ud83c\uddf8 USA \\(679.8\\) \\(9.855 * 10^{-8}\\) \\(11.4\\) \ud83c\udde8\ud83c\uddf3 China \\(1,057\\) \\(8.515 * 10^{-8}\\) \\(14.1\\) \ud83c\uddeb\ud83c\uddf7 France \\(81.3\\) \\(4.858 * 10^{-8}\\) \\(11.3\\)"},{"location":"methodology/llm_inference/#embodied-impacts","title":"Embodied impacts","text":"To determine the embodied impacts of an LLM inference, we need to estimate the hardware configuration used to host the model and its lifetime. Embodied impacts account for resource extraction (e.g., minerals and metals), manufacturing, and transportation of the hardware.
"},{"location":"methodology/llm_inference/#modeling-server-embodied-impacts","title":"Modeling server embodied impacts","text":"To estimate the embodied impacts of IT hardware, we use the BoaviztAPI tool from the non-profit organization Boavizta. This API embeds a bottom-up multicriteria environment impact estimation engine for embodied and usage phases of IT resources and services. We focus on estimating the embodied impacts of a server and a GPU. BoaviztAPI is an open-source project that relies on open databases and open research on environmental impacts of IT equipment.
"},{"location":"methodology/llm_inference/#server-embodied-impacts-without-gpu","title":"Server embodied impacts without GPU","text":"To assess the embodied environmental impacts of a high-end AI server, we use an AWS cloud instance as a reference. We selected the p4de.24xlarge
instance, as it corresponds to a server that can be used for LLM inference with eight NVIDIA A100 80GB GPU cards. The embodied impacts of this instance will be used to estimate the embodied impacts of the server without GPUs, denoted as \\(I^e_{server\\backslash GPU}\\).
The embodied environmental impacts of the cloud instance are:
Server (without GPU) GWP (\\(kgCO2eq\\)) \\(3000\\) ADPe (\\(kgSbeq\\)) \\(0.25\\) PE (\\(MJ\\)) \\(39,000\\) These impacts does not take into account the eight GPUs. (see bellow)
Example request to reproduce this calculation On the cloud instance route (/v1/cloud/instance) you can POST the following JSON.
{\n \"provider\": \"aws\",\n \"instance_type\": \"p4de.24xlarge\"\n}\n
Or you can use the demo available demo API with this command using curl
and parsing the JSON output with jq
.
curl -X 'POST' \\\n 'https://api.boavizta.org/v1/cloud/instance?verbose=true&criteria=gwp&criteria=adp&criteria=pe' \\\n -H 'accept: application/json' \\\n -H 'Content-Type: application/json' \\\n -d '{\n \"provider\": \"aws\",\n \"instance_type\": \"p4de.24xlarge\"\n}' | jq\n
"},{"location":"methodology/llm_inference/#gpu-embodied-impacts","title":"GPU embodied impacts","text":"Boavizta is currently developing a methodology to provide multicriteria embodied impacts for GPU cards. For this analysis, we use the embodied impact data they computed for a NVIDIA A100 80GB GPU. These values will be used to estimate the embodied impacts of a single GPU, denoted as \\(I^e_{GPU}\\).
NIDIA A100 80GB GWP (\\(kgCO2eq\\)) \\(143\\) ADPe (\\(kgSbeq\\)) \\(5.09 * 10^{-3}\\) PE (\\(MJ\\)) \\(1,828\\) The GPU embodied impacts will be soon available in the BoaviztAPI tool.
"},{"location":"methodology/llm_inference/#complete-server-embodied-impacts","title":"Complete server embodied impacts","text":"The final embodied impacts for the server, including the GPUs, are calculated as follows. Note that the embodied impacts of the server without GPUs are scaled by the number of GPUs required to host the model. This allocation is made to account for the fact that the remaining GPUs on the server can be used to host other models or multiple instances of the same model. As we are estimating the impacts of a single LLM inference, we need to exclude the embodied impacts that would be attributed to other services hosted on the same server.
\\[ I^e_{server}=\\frac{\\#GPU_{required}}{\\#GPU_{installed}}*I^e_{server\\backslash GPU} + \\#GPU_{required} * I^e_{GPU} \\]"},{"location":"methodology/llm_inference/#modeling-request-embodied-environmental-impacts","title":"Modeling request embodied environmental impacts","text":"To allocate the server embodied impacts to the request, we use an allocation based on the hardware utilization factor, \\(\\frac{\\Delta T}{\\Delta L}\\). In this case, \\(\\Delta L\\) represents the lifetime of the server and GPU, which we fix at 5 years.
\\[ I^e_{request}=\\frac{\\Delta T}{\\Delta L} * I^e_{server} \\]"},{"location":"methodology/llm_inference/#conclusion","title":"Conclusion","text":"This paper presents a methodology to assess the environmental impacts of Large Language Model (LLM) inference, considering both usage and embodied impacts. We model server and GPU energy consumption based on various parameters and incorporate PUE and electricity mix impact factors. For embodied impacts, we use the BoaviztAPI tool to estimate environmental impacts of IT hardware. Our methodology offers a comprehensive understanding of the environmental footprint of LLM inference, guiding researchers and practitioners towards more sustainable AI practices. Future work may involve refining the methodology and exploring the impacts of multi-modal models or RAG applications.
"},{"location":"methodology/llm_inference/#references","title":"References","text":" LLM-Perf Leaderboard to estimate GPU energy consumption and latency based on the model architecture and number of output tokens. BoaviztAPI to estimate server embodied impacts and base energy consumption. ADEME Base Empreinte\u00ae for electricity mix impacts per country. "},{"location":"methodology/llm_inference/#citation","title":"Citation","text":"Please cite GenAI Impact non-profit organization and link to this documentation page.
Coming soon...\n
"},{"location":"methodology/llm_inference/#license","title":"License","text":"This work is licensed under CC BY-SA 4.0
"},{"location":"reference/SUMMARY/","title":"SUMMARY","text":" ecologits electricity_mix_repository exceptions impacts model_repository tracers anthropic_tracer cohere_tracer google_tracer huggingface_tracer litellm_tracer mistralai_tracer openai_tracer utils "},{"location":"reference/ecologits/","title":"ecologits","text":""},{"location":"reference/ecologits/#ecologits.EcoLogits","title":"EcoLogits
","text":"EcoLogits instrumentor to initialize function patching for each provider.
By default, the initialization will be done on all available and compatible providers that are supported by the library.
Examples:
EcoLogits initialization example with OpenAI.
from ecologits import EcoLogits\nfrom openai import OpenAI\n\nEcoLogits.init()\n\nclient = OpenAI(api_key=\"<OPENAI_API_KEY>\")\nresponse = client.chat.completions.create(\n model=\"gpt-3.5-turbo\",\n messages=[\n {\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}\n ]\n)\n\n# Get estimated environmental impacts of the inference\nprint(f\"Energy consumption: {response.impacts.energy.value} kWh\")\nprint(f\"GHG emissions: {response.impacts.gwp.value} kgCO2eq\")\n
"},{"location":"reference/ecologits/#ecologits.EcoLogits.init","title":"init()
staticmethod
","text":"Initialization static method.
Source code in ecologits/ecologits.py
@staticmethod\ndef init() -> None:\n \"\"\"Initialization static method.\"\"\"\n if not EcoLogits.initialized:\n init_instruments()\n EcoLogits.initialized = True\n
"},{"location":"reference/electricity_mix_repository/","title":"electricity_mix_repository","text":""},{"location":"reference/exceptions/","title":"exceptions","text":""},{"location":"reference/exceptions/#exceptions.TracerInitializationError","title":"TracerInitializationError
","text":" Bases: EcoLogitsError
Tracer is initialized twice
"},{"location":"reference/exceptions/#exceptions.ModelingError","title":"ModelingError
","text":" Bases: EcoLogitsError
Operation or computation not allowed
"},{"location":"reference/model_repository/","title":"model_repository","text":""},{"location":"reference/impacts/dag/","title":"dag","text":""},{"location":"reference/impacts/llm/","title":"llm","text":""},{"location":"reference/impacts/llm/#impacts.llm.gpu_energy","title":"gpu_energy(model_active_parameter_count, output_token_count, gpu_energy_alpha, gpu_energy_beta)
","text":"Compute energy consumption of a single GPU.
Parameters:
Name Type Description Default model_active_parameter_count
float
Number of active parameters of the model.
required output_token_count
float
Number of generated tokens.
required gpu_energy_alpha
float
Alpha parameter of the GPU linear power consumption profile.
required gpu_energy_beta
float
Beta parameter of the GPU linear power consumption profile.
required Returns:
Type Description float
The energy consumption of a single GPU in kWh.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef gpu_energy(\n model_active_parameter_count: float,\n output_token_count: float,\n gpu_energy_alpha: float,\n gpu_energy_beta: float\n) -> float:\n \"\"\"\n Compute energy consumption of a single GPU.\n\n Args:\n model_active_parameter_count: Number of active parameters of the model.\n output_token_count: Number of generated tokens.\n gpu_energy_alpha: Alpha parameter of the GPU linear power consumption profile.\n gpu_energy_beta: Beta parameter of the GPU linear power consumption profile.\n\n Returns:\n The energy consumption of a single GPU in kWh.\n \"\"\"\n return output_token_count * (gpu_energy_alpha * model_active_parameter_count + gpu_energy_beta)\n
"},{"location":"reference/impacts/llm/#impacts.llm.generation_latency","title":"generation_latency(model_active_parameter_count, output_token_count, gpu_latency_alpha, gpu_latency_beta, request_latency)
","text":"Compute the token generation latency in seconds.
Parameters:
Name Type Description Default model_active_parameter_count
float
Number of active parameters of the model.
required output_token_count
float
Number of generated tokens.
required gpu_latency_alpha
float
Alpha parameter of the GPU linear latency profile.
required gpu_latency_beta
float
Beta parameter of the GPU linear latency profile.
required request_latency
float
Measured request latency (upper bound) in seconds.
required Returns:
Type Description float
The token generation latency in seconds.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef generation_latency(\n model_active_parameter_count: float,\n output_token_count: float,\n gpu_latency_alpha: float,\n gpu_latency_beta: float,\n request_latency: float,\n) -> float:\n \"\"\"\n Compute the token generation latency in seconds.\n\n Args:\n model_active_parameter_count: Number of active parameters of the model.\n output_token_count: Number of generated tokens.\n gpu_latency_alpha: Alpha parameter of the GPU linear latency profile.\n gpu_latency_beta: Beta parameter of the GPU linear latency profile.\n request_latency: Measured request latency (upper bound) in seconds.\n\n Returns:\n The token generation latency in seconds.\n \"\"\"\n gpu_latency = output_token_count * (gpu_latency_alpha * model_active_parameter_count + gpu_latency_beta)\n return min(gpu_latency, request_latency)\n
"},{"location":"reference/impacts/llm/#impacts.llm.model_required_memory","title":"model_required_memory(model_total_parameter_count, model_quantization_bits)
","text":"Compute the required memory to load the model on GPU.
Parameters:
Name Type Description Default model_total_parameter_count
float
Number of parameters of the model.
required model_quantization_bits
int
Number of bits used to represent the model weights.
required Returns:
Type Description float
The amount of required GPU memory to load the model.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef model_required_memory(\n model_total_parameter_count: float,\n model_quantization_bits: int,\n) -> float:\n \"\"\"\n Compute the required memory to load the model on GPU.\n\n Args:\n model_total_parameter_count: Number of parameters of the model.\n model_quantization_bits: Number of bits used to represent the model weights.\n\n Returns:\n The amount of required GPU memory to load the model.\n \"\"\"\n return 1.2 * model_total_parameter_count * model_quantization_bits / 8\n
"},{"location":"reference/impacts/llm/#impacts.llm.gpu_required_count","title":"gpu_required_count(model_required_memory, gpu_memory)
","text":"Compute the number of required GPU to store the model.
Parameters:
Name Type Description Default model_required_memory
float
Required memory to load the model on GPU.
required gpu_memory
float
Amount of memory available on a single GPU.
required Returns:
Type Description int
The number of required GPUs to load the model.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef gpu_required_count(\n model_required_memory: float,\n gpu_memory: float\n) -> int:\n \"\"\"\n Compute the number of required GPU to store the model.\n\n Args:\n model_required_memory: Required memory to load the model on GPU.\n gpu_memory: Amount of memory available on a single GPU.\n\n Returns:\n The number of required GPUs to load the model.\n \"\"\"\n return ceil(model_required_memory / gpu_memory)\n
"},{"location":"reference/impacts/llm/#impacts.llm.server_energy","title":"server_energy(generation_latency, server_power, server_gpu_count, gpu_required_count)
","text":"Compute the energy consumption of the server.
Parameters:
Name Type Description Default generation_latency
float
Token generation latency in seconds.
required server_power
float
Power consumption of the server in kW.
required server_gpu_count
int
Number of available GPUs in the server.
required gpu_required_count
int
Number of required GPUs to load the model.
required Returns:
Type Description float
The energy consumption of the server (GPUs are not included) in kWh.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef server_energy(\n generation_latency: float,\n server_power: float,\n server_gpu_count: int,\n gpu_required_count: int\n) -> float:\n \"\"\"\n Compute the energy consumption of the server.\n\n Args:\n generation_latency: Token generation latency in seconds.\n server_power: Power consumption of the server in kW.\n server_gpu_count: Number of available GPUs in the server.\n gpu_required_count: Number of required GPUs to load the model.\n\n Returns:\n The energy consumption of the server (GPUs are not included) in kWh.\n \"\"\"\n return (generation_latency / 3600) * server_power * (gpu_required_count / server_gpu_count)\n
"},{"location":"reference/impacts/llm/#impacts.llm.request_energy","title":"request_energy(datacenter_pue, server_energy, gpu_required_count, gpu_energy)
","text":"Compute the energy consumption of the request.
Parameters:
Name Type Description Default datacenter_pue
float
PUE of the datacenter.
required server_energy
float
Energy consumption of the server in kWh.
required gpu_required_count
int
Number of required GPUs to load the model.
required gpu_energy
float
Energy consumption of a single GPU in kWh.
required Returns:
Type Description float
The energy consumption of the request in kWh.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef request_energy(\n datacenter_pue: float,\n server_energy: float,\n gpu_required_count: int,\n gpu_energy: float\n) -> float:\n \"\"\"\n Compute the energy consumption of the request.\n\n Args:\n datacenter_pue: PUE of the datacenter.\n server_energy: Energy consumption of the server in kWh.\n gpu_required_count: Number of required GPUs to load the model.\n gpu_energy: Energy consumption of a single GPU in kWh.\n\n Returns:\n The energy consumption of the request in kWh.\n \"\"\"\n return datacenter_pue * (server_energy + gpu_required_count * gpu_energy)\n
"},{"location":"reference/impacts/llm/#impacts.llm.request_usage_gwp","title":"request_usage_gwp(request_energy, if_electricity_mix_gwp)
","text":"Compute the Global Warming Potential (GWP) usage impact of the request.
Parameters:
Name Type Description Default request_energy
float
Energy consumption of the request in kWh.
required if_electricity_mix_gwp
float
GWP impact factor of electricity consumption in kgCO2eq / kWh.
required Returns:
Type Description float
The GWP usage impact of the request in kgCO2eq.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef request_usage_gwp(\n request_energy: float,\n if_electricity_mix_gwp: float\n) -> float:\n \"\"\"\n Compute the Global Warming Potential (GWP) usage impact of the request.\n\n Args:\n request_energy: Energy consumption of the request in kWh.\n if_electricity_mix_gwp: GWP impact factor of electricity consumption in kgCO2eq / kWh.\n\n Returns:\n The GWP usage impact of the request in kgCO2eq.\n \"\"\"\n return request_energy * if_electricity_mix_gwp\n
"},{"location":"reference/impacts/llm/#impacts.llm.request_usage_adpe","title":"request_usage_adpe(request_energy, if_electricity_mix_adpe)
","text":"Compute the Abiotic Depletion Potential for Elements (ADPe) usage impact of the request.
Parameters:
Name Type Description Default request_energy
float
Energy consumption of the request in kWh.
required if_electricity_mix_adpe
float
ADPe impact factor of electricity consumption in kgSbeq / kWh.
required Returns:
Type Description float
The ADPe usage impact of the request in kgSbeq.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef request_usage_adpe(\n request_energy: float,\n if_electricity_mix_adpe: float\n) -> float:\n \"\"\"\n Compute the Abiotic Depletion Potential for Elements (ADPe) usage impact of the request.\n\n Args:\n request_energy: Energy consumption of the request in kWh.\n if_electricity_mix_adpe: ADPe impact factor of electricity consumption in kgSbeq / kWh.\n\n Returns:\n The ADPe usage impact of the request in kgSbeq.\n \"\"\"\n return request_energy * if_electricity_mix_adpe\n
"},{"location":"reference/impacts/llm/#impacts.llm.request_usage_pe","title":"request_usage_pe(request_energy, if_electricity_mix_pe)
","text":"Compute the Primary Energy (PE) usage impact of the request.
Parameters:
Name Type Description Default request_energy
float
Energy consumption of the request in kWh.
required if_electricity_mix_pe
float
PE impact factor of electricity consumption in MJ / kWh.
required Returns:
Type Description float
The PE usage impact of the request in MJ.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef request_usage_pe(\n request_energy: float,\n if_electricity_mix_pe: float\n) -> float:\n \"\"\"\n Compute the Primary Energy (PE) usage impact of the request.\n\n Args:\n request_energy: Energy consumption of the request in kWh.\n if_electricity_mix_pe: PE impact factor of electricity consumption in MJ / kWh.\n\n Returns:\n The PE usage impact of the request in MJ.\n \"\"\"\n return request_energy * if_electricity_mix_pe\n
"},{"location":"reference/impacts/llm/#impacts.llm.server_gpu_embodied_gwp","title":"server_gpu_embodied_gwp(server_embodied_gwp, server_gpu_count, gpu_embodied_gwp, gpu_required_count)
","text":"Compute the Global Warming Potential (GWP) embodied impact of the server
Parameters:
Name Type Description Default server_embodied_gwp
float
GWP embodied impact of the server in kgCO2eq.
required server_gpu_count
float
Number of available GPUs in the server.
required gpu_embodied_gwp
float
GWP embodied impact of a single GPU in kgCO2eq.
required gpu_required_count
int
Number of required GPUs to load the model.
required Returns:
Type Description float
The GWP embodied impact of the server and the GPUs in kgCO2eq.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef server_gpu_embodied_gwp(\n server_embodied_gwp: float,\n server_gpu_count: float,\n gpu_embodied_gwp: float,\n gpu_required_count: int\n) -> float:\n \"\"\"\n Compute the Global Warming Potential (GWP) embodied impact of the server\n\n Args:\n server_embodied_gwp: GWP embodied impact of the server in kgCO2eq.\n server_gpu_count: Number of available GPUs in the server.\n gpu_embodied_gwp: GWP embodied impact of a single GPU in kgCO2eq.\n gpu_required_count: Number of required GPUs to load the model.\n\n Returns:\n The GWP embodied impact of the server and the GPUs in kgCO2eq.\n \"\"\"\n return (gpu_required_count / server_gpu_count) * server_embodied_gwp + gpu_required_count * gpu_embodied_gwp\n
"},{"location":"reference/impacts/llm/#impacts.llm.server_gpu_embodied_adpe","title":"server_gpu_embodied_adpe(server_embodied_adpe, server_gpu_count, gpu_embodied_adpe, gpu_required_count)
","text":"Compute the Abiotic Depletion Potential for Elements (ADPe) embodied impact of the server
Parameters:
Name Type Description Default server_embodied_adpe
float
ADPe embodied impact of the server in kgSbeq.
required server_gpu_count
float
Number of available GPUs in the server.
required gpu_embodied_adpe
float
ADPe embodied impact of a single GPU in kgSbeq.
required gpu_required_count
int
Number of required GPUs to load the model.
required Returns:
Type Description float
The ADPe embodied impact of the server and the GPUs in kgSbeq.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef server_gpu_embodied_adpe(\n server_embodied_adpe: float,\n server_gpu_count: float,\n gpu_embodied_adpe: float,\n gpu_required_count: int\n) -> float:\n \"\"\"\n Compute the Abiotic Depletion Potential for Elements (ADPe) embodied impact of the server\n\n Args:\n server_embodied_adpe: ADPe embodied impact of the server in kgSbeq.\n server_gpu_count: Number of available GPUs in the server.\n gpu_embodied_adpe: ADPe embodied impact of a single GPU in kgSbeq.\n gpu_required_count: Number of required GPUs to load the model.\n\n Returns:\n The ADPe embodied impact of the server and the GPUs in kgSbeq.\n \"\"\"\n return (gpu_required_count / server_gpu_count) * server_embodied_adpe + gpu_required_count * gpu_embodied_adpe\n
"},{"location":"reference/impacts/llm/#impacts.llm.server_gpu_embodied_pe","title":"server_gpu_embodied_pe(server_embodied_pe, server_gpu_count, gpu_embodied_pe, gpu_required_count)
","text":"Compute the Primary Energy (PE) embodied impact of the server
Parameters:
Name Type Description Default server_embodied_pe
float
PE embodied impact of the server in MJ.
required server_gpu_count
float
Number of available GPUs in the server.
required gpu_embodied_pe
float
PE embodied impact of a single GPU in MJ.
required gpu_required_count
int
Number of required GPUs to load the model.
required Returns:
Type Description float
The PE embodied impact of the server and the GPUs in MJ.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef server_gpu_embodied_pe(\n server_embodied_pe: float,\n server_gpu_count: float,\n gpu_embodied_pe: float,\n gpu_required_count: int\n) -> float:\n \"\"\"\n Compute the Primary Energy (PE) embodied impact of the server\n\n Args:\n server_embodied_pe: PE embodied impact of the server in MJ.\n server_gpu_count: Number of available GPUs in the server.\n gpu_embodied_pe: PE embodied impact of a single GPU in MJ.\n gpu_required_count: Number of required GPUs to load the model.\n\n Returns:\n The PE embodied impact of the server and the GPUs in MJ.\n \"\"\"\n return (gpu_required_count / server_gpu_count) * server_embodied_pe + gpu_required_count * gpu_embodied_pe\n
"},{"location":"reference/impacts/llm/#impacts.llm.request_embodied_gwp","title":"request_embodied_gwp(server_gpu_embodied_gwp, server_lifetime, generation_latency)
","text":"Compute the Global Warming Potential (GWP) embodied impact of the request.
Parameters:
Name Type Description Default server_gpu_embodied_gwp
float
GWP embodied impact of the server and the GPUs in kgCO2eq.
required server_lifetime
float
Lifetime duration of the server in seconds.
required generation_latency
float
Token generation latency in seconds.
required Returns:
Type Description float
The GWP embodied impact of the request in kgCO2eq.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef request_embodied_gwp(\n server_gpu_embodied_gwp: float,\n server_lifetime: float,\n generation_latency: float\n) -> float:\n \"\"\"\n Compute the Global Warming Potential (GWP) embodied impact of the request.\n\n Args:\n server_gpu_embodied_gwp: GWP embodied impact of the server and the GPUs in kgCO2eq.\n server_lifetime: Lifetime duration of the server in seconds.\n generation_latency: Token generation latency in seconds.\n\n Returns:\n The GWP embodied impact of the request in kgCO2eq.\n \"\"\"\n return (generation_latency / server_lifetime) * server_gpu_embodied_gwp\n
"},{"location":"reference/impacts/llm/#impacts.llm.request_embodied_adpe","title":"request_embodied_adpe(server_gpu_embodied_adpe, server_lifetime, generation_latency)
","text":"Compute the Abiotic Depletion Potential for Elements (ADPe) embodied impact of the request.
Parameters:
Name Type Description Default server_gpu_embodied_adpe
float
ADPe embodied impact of the server and the GPUs in kgSbeq.
required server_lifetime
float
Lifetime duration of the server in seconds.
required generation_latency
float
Token generation latency in seconds.
required Returns:
Type Description float
The ADPe embodied impact of the request in kgSbeq.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef request_embodied_adpe(\n server_gpu_embodied_adpe: float,\n server_lifetime: float,\n generation_latency: float\n) -> float:\n \"\"\"\n Compute the Abiotic Depletion Potential for Elements (ADPe) embodied impact of the request.\n\n Args:\n server_gpu_embodied_adpe: ADPe embodied impact of the server and the GPUs in kgSbeq.\n server_lifetime: Lifetime duration of the server in seconds.\n generation_latency: Token generation latency in seconds.\n\n Returns:\n The ADPe embodied impact of the request in kgSbeq.\n \"\"\"\n return (generation_latency / server_lifetime) * server_gpu_embodied_adpe\n
"},{"location":"reference/impacts/llm/#impacts.llm.request_embodied_pe","title":"request_embodied_pe(server_gpu_embodied_pe, server_lifetime, generation_latency)
","text":"Compute the Primary Energy (PE) embodied impact of the request.
Parameters:
Name Type Description Default server_gpu_embodied_pe
float
PE embodied impact of the server and the GPUs in MJ.
required server_lifetime
float
Lifetime duration of the server in seconds.
required generation_latency
float
Token generation latency in seconds.
required Returns:
Type Description float
The PE embodied impact of the request in MJ.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef request_embodied_pe(\n server_gpu_embodied_pe: float,\n server_lifetime: float,\n generation_latency: float\n) -> float:\n \"\"\"\n Compute the Primary Energy (PE) embodied impact of the request.\n\n Args:\n server_gpu_embodied_pe: PE embodied impact of the server and the GPUs in MJ.\n server_lifetime: Lifetime duration of the server in seconds.\n generation_latency: Token generation latency in seconds.\n\n Returns:\n The PE embodied impact of the request in MJ.\n \"\"\"\n return (generation_latency / server_lifetime) * server_gpu_embodied_pe\n
"},{"location":"reference/impacts/llm/#impacts.llm.compute_llm_impacts_dag","title":"compute_llm_impacts_dag(model_active_parameter_count, model_total_parameter_count, output_token_count, request_latency, if_electricity_mix_adpe, if_electricity_mix_pe, if_electricity_mix_gwp, model_quantization_bits=MODEL_QUANTIZATION_BITS, gpu_energy_alpha=GPU_ENERGY_ALPHA, gpu_energy_beta=GPU_ENERGY_BETA, gpu_latency_alpha=GPU_LATENCY_ALPHA, gpu_latency_beta=GPU_LATENCY_BETA, gpu_memory=GPU_MEMORY, gpu_embodied_gwp=GPU_EMBODIED_IMPACT_GWP, gpu_embodied_adpe=GPU_EMBODIED_IMPACT_ADPE, gpu_embodied_pe=GPU_EMBODIED_IMPACT_PE, server_gpu_count=SERVER_GPUS, server_power=SERVER_POWER, server_embodied_gwp=SERVER_EMBODIED_IMPACT_GWP, server_embodied_adpe=SERVER_EMBODIED_IMPACT_ADPE, server_embodied_pe=SERVER_EMBODIED_IMPACT_PE, server_lifetime=HARDWARE_LIFESPAN, datacenter_pue=DATACENTER_PUE)
","text":"Compute the impacts dag of an LLM generation request.
Parameters:
Name Type Description Default model_active_parameter_count
float
Number of active parameters of the model.
required model_total_parameter_count
float
Number of parameters of the model.
required output_token_count
float
Number of generated tokens.
required request_latency
float
Measured request latency in seconds.
required if_electricity_mix_adpe
float
ADPe impact factor of electricity consumption of kgSbeq / kWh (Antimony).
required if_electricity_mix_pe
float
PE impact factor of electricity consumption in MJ / kWh.
required if_electricity_mix_gwp
float
GWP impact factor of electricity consumption in kgCO2eq / kWh.
required model_quantization_bits
Optional[int]
Number of bits used to represent the model weights.
MODEL_QUANTIZATION_BITS
gpu_energy_alpha
Optional[float]
Alpha parameter of the GPU linear power consumption profile.
GPU_ENERGY_ALPHA
gpu_energy_beta
Optional[float]
Beta parameter of the GPU linear power consumption profile.
GPU_ENERGY_BETA
gpu_latency_alpha
Optional[float]
Alpha parameter of the GPU linear latency profile.
GPU_LATENCY_ALPHA
gpu_latency_beta
Optional[float]
Beta parameter of the GPU linear latency profile.
GPU_LATENCY_BETA
gpu_memory
Optional[float]
Amount of memory available on a single GPU.
GPU_MEMORY
gpu_embodied_gwp
Optional[float]
GWP embodied impact of a single GPU.
GPU_EMBODIED_IMPACT_GWP
gpu_embodied_adpe
Optional[float]
ADPe embodied impact of a single GPU.
GPU_EMBODIED_IMPACT_ADPE
gpu_embodied_pe
Optional[float]
PE embodied impact of a single GPU.
GPU_EMBODIED_IMPACT_PE
server_gpu_count
Optional[int]
Number of available GPUs in the server.
SERVER_GPUS
server_power
Optional[float]
Power consumption of the server in kW.
SERVER_POWER
server_embodied_gwp
Optional[float]
GWP embodied impact of the server in kgCO2eq.
SERVER_EMBODIED_IMPACT_GWP
server_embodied_adpe
Optional[float]
ADPe embodied impact of the server in kgSbeq.
SERVER_EMBODIED_IMPACT_ADPE
server_embodied_pe
Optional[float]
PE embodied impact of the server in MJ.
SERVER_EMBODIED_IMPACT_PE
server_lifetime
Optional[float]
Lifetime duration of the server in seconds.
HARDWARE_LIFESPAN
datacenter_pue
Optional[float]
PUE of the datacenter.
DATACENTER_PUE
Returns:
Type Description dict[str, float]
The impacts dag with all intermediate states.
Source code in ecologits/impacts/llm.py
def compute_llm_impacts_dag(\n model_active_parameter_count: float,\n model_total_parameter_count: float,\n output_token_count: float,\n request_latency: float,\n if_electricity_mix_adpe: float,\n if_electricity_mix_pe: float,\n if_electricity_mix_gwp: float,\n model_quantization_bits: Optional[int] = MODEL_QUANTIZATION_BITS,\n gpu_energy_alpha: Optional[float] = GPU_ENERGY_ALPHA,\n gpu_energy_beta: Optional[float] = GPU_ENERGY_BETA,\n gpu_latency_alpha: Optional[float] = GPU_LATENCY_ALPHA,\n gpu_latency_beta: Optional[float] = GPU_LATENCY_BETA,\n gpu_memory: Optional[float] = GPU_MEMORY,\n gpu_embodied_gwp: Optional[float] = GPU_EMBODIED_IMPACT_GWP,\n gpu_embodied_adpe: Optional[float] = GPU_EMBODIED_IMPACT_ADPE,\n gpu_embodied_pe: Optional[float] = GPU_EMBODIED_IMPACT_PE,\n server_gpu_count: Optional[int] = SERVER_GPUS,\n server_power: Optional[float] = SERVER_POWER,\n server_embodied_gwp: Optional[float] = SERVER_EMBODIED_IMPACT_GWP,\n server_embodied_adpe: Optional[float] = SERVER_EMBODIED_IMPACT_ADPE,\n server_embodied_pe: Optional[float] = SERVER_EMBODIED_IMPACT_PE,\n server_lifetime: Optional[float] = HARDWARE_LIFESPAN,\n datacenter_pue: Optional[float] = DATACENTER_PUE,\n) -> dict[str, float]:\n \"\"\"\n Compute the impacts dag of an LLM generation request.\n\n Args:\n model_active_parameter_count: Number of active parameters of the model.\n model_total_parameter_count: Number of parameters of the model.\n output_token_count: Number of generated tokens.\n request_latency: Measured request latency in seconds.\n if_electricity_mix_adpe: ADPe impact factor of electricity consumption of kgSbeq / kWh (Antimony).\n if_electricity_mix_pe: PE impact factor of electricity consumption in MJ / kWh.\n if_electricity_mix_gwp: GWP impact factor of electricity consumption in kgCO2eq / kWh.\n model_quantization_bits: Number of bits used to represent the model weights.\n gpu_energy_alpha: Alpha parameter of the GPU linear power consumption profile.\n gpu_energy_beta: Beta parameter of the GPU linear power consumption profile.\n gpu_latency_alpha: Alpha parameter of the GPU linear latency profile.\n gpu_latency_beta: Beta parameter of the GPU linear latency profile.\n gpu_memory: Amount of memory available on a single GPU.\n gpu_embodied_gwp: GWP embodied impact of a single GPU.\n gpu_embodied_adpe: ADPe embodied impact of a single GPU.\n gpu_embodied_pe: PE embodied impact of a single GPU.\n server_gpu_count: Number of available GPUs in the server.\n server_power: Power consumption of the server in kW.\n server_embodied_gwp: GWP embodied impact of the server in kgCO2eq.\n server_embodied_adpe: ADPe embodied impact of the server in kgSbeq.\n server_embodied_pe: PE embodied impact of the server in MJ.\n server_lifetime: Lifetime duration of the server in seconds.\n datacenter_pue: PUE of the datacenter.\n\n Returns:\n The impacts dag with all intermediate states.\n \"\"\"\n results = dag.execute(\n model_active_parameter_count=model_active_parameter_count,\n model_total_parameter_count=model_total_parameter_count,\n model_quantization_bits=model_quantization_bits,\n output_token_count=output_token_count,\n request_latency=request_latency,\n if_electricity_mix_gwp=if_electricity_mix_gwp,\n if_electricity_mix_adpe=if_electricity_mix_adpe,\n if_electricity_mix_pe=if_electricity_mix_pe,\n gpu_energy_alpha=gpu_energy_alpha,\n gpu_energy_beta=gpu_energy_beta,\n gpu_latency_alpha=gpu_latency_alpha,\n gpu_latency_beta=gpu_latency_beta,\n gpu_memory=gpu_memory,\n gpu_embodied_gwp=gpu_embodied_gwp,\n gpu_embodied_adpe=gpu_embodied_adpe,\n gpu_embodied_pe=gpu_embodied_pe,\n server_gpu_count=server_gpu_count,\n server_power=server_power,\n server_embodied_gwp=server_embodied_gwp,\n server_embodied_adpe=server_embodied_adpe,\n server_embodied_pe=server_embodied_pe,\n server_lifetime=server_lifetime,\n datacenter_pue=datacenter_pue,\n )\n return results\n
"},{"location":"reference/impacts/llm/#impacts.llm.compute_llm_impacts","title":"compute_llm_impacts(model_active_parameter_count, model_total_parameter_count, output_token_count, if_electricity_mix_adpe, if_electricity_mix_pe, if_electricity_mix_gwp, request_latency=None, **kwargs)
","text":"Compute the impacts of an LLM generation request.
Parameters:
Name Type Description Default model_active_parameter_count
ValueOrRange
Number of active parameters of the model.
required model_total_parameter_count
ValueOrRange
Number of total parameters of the model.
required output_token_count
float
Number of generated tokens.
required if_electricity_mix_adpe
float
ADPe impact factor of electricity consumption of kgSbeq / kWh (Antimony).
required if_electricity_mix_pe
float
PE impact factor of electricity consumption in MJ / kWh.
required if_electricity_mix_gwp
float
GWP impact factor of electricity consumption in kgCO2eq / kWh.
required request_latency
Optional[float]
Measured request latency in seconds.
None
**kwargs
Any
Any other optional parameter.
{}
Returns:
Type Description Impacts
The impacts of an LLM generation request.
Source code in ecologits/impacts/llm.py
def compute_llm_impacts(\n model_active_parameter_count: ValueOrRange,\n model_total_parameter_count: ValueOrRange,\n output_token_count: float,\n if_electricity_mix_adpe: float,\n if_electricity_mix_pe: float,\n if_electricity_mix_gwp: float,\n request_latency: Optional[float] = None,\n **kwargs: Any\n) -> Impacts:\n \"\"\"\n Compute the impacts of an LLM generation request.\n\n Args:\n model_active_parameter_count: Number of active parameters of the model.\n model_total_parameter_count: Number of total parameters of the model.\n output_token_count: Number of generated tokens.\n if_electricity_mix_adpe: ADPe impact factor of electricity consumption of kgSbeq / kWh (Antimony).\n if_electricity_mix_pe: PE impact factor of electricity consumption in MJ / kWh.\n if_electricity_mix_gwp: GWP impact factor of electricity consumption in kgCO2eq / kWh.\n request_latency: Measured request latency in seconds.\n **kwargs: Any other optional parameter.\n\n Returns:\n The impacts of an LLM generation request.\n \"\"\"\n if request_latency is None:\n request_latency = math.inf\n\n active_params = [model_active_parameter_count]\n total_params = [model_total_parameter_count]\n\n if isinstance(model_active_parameter_count, Range) or isinstance(model_total_parameter_count, Range):\n if isinstance(model_active_parameter_count, Range):\n active_params = [model_active_parameter_count.min, model_active_parameter_count.max]\n else:\n active_params = [model_active_parameter_count, model_active_parameter_count]\n if isinstance(model_total_parameter_count, Range):\n total_params = [model_total_parameter_count.min, model_total_parameter_count.max]\n else:\n total_params = [model_total_parameter_count, model_total_parameter_count]\n\n results = {}\n fields = [\"request_energy\", \"request_usage_gwp\", \"request_usage_adpe\", \"request_usage_pe\",\n \"request_embodied_gwp\", \"request_embodied_adpe\", \"request_embodied_pe\"]\n for act_param, tot_param in zip(active_params, total_params):\n res = compute_llm_impacts_dag(\n model_active_parameter_count=act_param,\n model_total_parameter_count=tot_param,\n output_token_count=output_token_count,\n request_latency=request_latency,\n if_electricity_mix_adpe=if_electricity_mix_adpe,\n if_electricity_mix_pe=if_electricity_mix_pe,\n if_electricity_mix_gwp=if_electricity_mix_gwp,\n **kwargs\n )\n for field in fields:\n if field in results:\n results[field] = Range(min=results[field], max=res[field])\n else:\n results[field] = res[field]\n\n energy = Energy(value=results[\"request_energy\"])\n gwp_usage = GWP(value=results[\"request_usage_gwp\"])\n adpe_usage = ADPe(value=results[\"request_usage_adpe\"])\n pe_usage = PE(value=results[\"request_usage_pe\"])\n gwp_embodied = GWP(value=results[\"request_embodied_gwp\"])\n adpe_embodied = ADPe(value=results[\"request_embodied_adpe\"])\n pe_embodied = PE(value=results[\"request_embodied_pe\"])\n return Impacts(\n energy=energy,\n gwp=gwp_usage + gwp_embodied,\n adpe=adpe_usage + adpe_embodied,\n pe=pe_usage + pe_embodied,\n usage=Usage(\n energy=energy,\n gwp=gwp_usage,\n adpe=adpe_usage,\n pe=pe_usage\n ),\n embodied=Embodied(\n gwp=gwp_embodied,\n adpe=adpe_embodied,\n pe=pe_embodied\n )\n )\n
"},{"location":"reference/impacts/modeling/","title":"modeling","text":""},{"location":"reference/impacts/modeling/#impacts.modeling.Range","title":"Range
","text":" Bases: BaseModel
RangeValue data model to represent intervals.
Attributes:
Name Type Description min
float
Lower bound of the interval.
max
float
Upper bound of the interval.
"},{"location":"reference/impacts/modeling/#impacts.modeling.Impact","title":"Impact
","text":" Bases: BaseModel
Base impact data model.
Attributes:
Name Type Description type
str
Impact type.
name
str
Impact name.
value
ValueOrRange
Impact value.
unit
str
Impact unit.
"},{"location":"reference/impacts/modeling/#impacts.modeling.Energy","title":"Energy
","text":" Bases: Impact
Energy consumption.
Info Final energy consumption \"measured from the plug\".
Attributes:
Name Type Description type
str
energy
name
str
Energy
value
str
Energy value
unit
str
Kilowatt-hour (kWh)
"},{"location":"reference/impacts/modeling/#impacts.modeling.GWP","title":"GWP
","text":" Bases: Impact
Global Warming Potential (GWP) impact.
Info Also, commonly known as GHG/carbon emissions.
Attributes:
Name Type Description type
str
GWP
name
str
Global Warming Potential
value
str
GWP value
unit
str
Kilogram Carbon Dioxide Equivalent (kgCO2eq)
"},{"location":"reference/impacts/modeling/#impacts.modeling.ADPe","title":"ADPe
","text":" Bases: Impact
Abiotic Depletion Potential for Elements (ADPe) impact.
Info Impact on the depletion of non-living resources such as minerals or metals.
Attributes:
Name Type Description type
str
ADPe
name
str
Abiotic Depletion Potential (elements)
value
str
ADPe value
unit
str
Kilogram Antimony Equivalent (kgSbeq)
"},{"location":"reference/impacts/modeling/#impacts.modeling.PE","title":"PE
","text":" Bases: Impact
Primary Energy (PE) impact.
Info Total energy consumed from primary sources.
Attributes:
Name Type Description type
str
PE
name
str
Primary Energy
value
str
PE value
unit
str
Megajoule (MJ)
"},{"location":"reference/impacts/modeling/#impacts.modeling.Phase","title":"Phase
","text":" Bases: BaseModel
Base impact phase data model.
Attributes:
Name Type Description type
str
Phase type.
name
str
Phase name.
"},{"location":"reference/impacts/modeling/#impacts.modeling.Usage","title":"Usage
","text":" Bases: Phase
Usage impacts data model.
Info Represents the phase of energy consumption during model execution.
Attributes:
Name Type Description type
str
usage
name
str
Usage
energy
Energy
Energy consumption
gwp
GWP
Global Warming Potential (GWP) usage impact
adpe
ADPe
Abiotic Depletion Potential for Elements (ADPe) usage impact
pe
PE
Primary Energy (PE) usage impact
"},{"location":"reference/impacts/modeling/#impacts.modeling.Embodied","title":"Embodied
","text":" Bases: Phase
Embodied impacts data model.
Info Encompasses resource extraction, manufacturing, and transportation phases associated with the model's lifecycle.
Attributes:
Name Type Description type
str
embodied
name
str
Embodied
gwp
GWP
Global Warming Potential (GWP) embodied impact
adpe
ADPe
Abiotic Depletion Potential for Elements (ADPe) embodied impact
pe
PE
Primary Energy (PE) embodied impact
"},{"location":"reference/impacts/modeling/#impacts.modeling.Impacts","title":"Impacts
","text":" Bases: BaseModel
Impacts data model.
Attributes:
Name Type Description energy
Energy
Total energy consumption
gwp
GWP
Total Global Warming Potential (GWP) impact
adpe
ADPe
Total Abiotic Depletion Potential for Elements (ADPe) impact
pe
PE
Total Primary Energy (PE) impact
usage
Usage
Impacts for the usage phase
embodied
Embodied
Impacts for the embodied phase
"},{"location":"reference/tracers/anthropic_tracer/","title":"anthropic_tracer","text":""},{"location":"reference/tracers/cohere_tracer/","title":"cohere_tracer","text":""},{"location":"reference/tracers/google_tracer/","title":"google_tracer","text":""},{"location":"reference/tracers/huggingface_tracer/","title":"huggingface_tracer","text":""},{"location":"reference/tracers/litellm_tracer/","title":"litellm_tracer","text":""},{"location":"reference/tracers/mistralai_tracer/","title":"mistralai_tracer","text":""},{"location":"reference/tracers/openai_tracer/","title":"openai_tracer","text":""},{"location":"reference/tracers/utils/","title":"utils","text":""},{"location":"reference/tracers/utils/#tracers.utils.llm_impacts","title":"llm_impacts(provider, model_name, output_token_count, request_latency, electricity_mix_zone='WOR')
","text":"High-level function to compute the impacts of an LLM generation request.
Parameters:
Name Type Description Default provider
str
Name of the provider.
required model_name
str
Name of the LLM used.
required output_token_count
int
Number of generated tokens.
required request_latency
float
Measured request latency in seconds.
required electricity_mix_zone
Optional[str]
ISO 3166-1 alpha-3 code of the electricity mix zone (WOR by default).
'WOR'
Returns:
Type Description Optional[Impacts]
The impacts of an LLM generation request.
Source code in ecologits/tracers/utils.py
def llm_impacts(\n provider: str,\n model_name: str,\n output_token_count: int,\n request_latency: float,\n electricity_mix_zone: Optional[str] = \"WOR\",\n) -> Optional[Impacts]:\n \"\"\"\n High-level function to compute the impacts of an LLM generation request.\n\n Args:\n provider: Name of the provider.\n model_name: Name of the LLM used.\n output_token_count: Number of generated tokens.\n request_latency: Measured request latency in seconds.\n electricity_mix_zone: ISO 3166-1 alpha-3 code of the electricity mix zone (WOR by default).\n\n Returns:\n The impacts of an LLM generation request.\n \"\"\"\n\n model = models.find_model(provider=provider, model_name=model_name)\n if model is None:\n # TODO: Replace with proper logging\n print(f\"Could not find model `{model_name}` for {provider} provider.\")\n return None\n model_active_params = model.active_parameters \\\n or Range(min=model.active_parameters_range[0], max=model.active_parameters_range[1])\n model_total_params = model.total_parameters \\\n or Range(min=model.total_parameters_range[0], max=model.total_parameters_range[1])\n\n electricity_mix = electricity_mixes.find_electricity_mix(zone=electricity_mix_zone)\n if electricity_mix is None:\n # TODO: Replace with proper logging\n print(f\"Could not find electricity mix `{electricity_mix_zone}` in the ADEME database\")\n return None\n if_electricity_mix_adpe=electricity_mix.adpe\n if_electricity_mix_pe=electricity_mix.pe\n if_electricity_mix_gwp=electricity_mix.gwp\n\n return compute_llm_impacts(\n model_active_parameter_count=model_active_params,\n model_total_parameter_count=model_total_params,\n output_token_count=output_token_count,\n request_latency=request_latency,\n if_electricity_mix_adpe=if_electricity_mix_adpe,\n if_electricity_mix_pe=if_electricity_mix_pe,\n if_electricity_mix_gwp=if_electricity_mix_gwp,\n )\n
"},{"location":"tutorial/","title":"Tutorial","text":"The EcoLogits library tracks the energy consumption and environmental impacts of generative AI models accessed through APIs and their official client libraries.
It achieves this by patching the Python client libraries, ensuring that each API request is wrapped with an impact calculation function. This function computes the environmental impact based on several request features, such as the chosen model, the number of tokens generated, and the request's latency. The resulting data is then encapsulated in an Impacts
object, which is added to the response, containing the environmental impacts for a specific request.
Set up in 5 minutes
Install ecologits
with pip
and get up and running in minutes.
Getting started
Environmental impacts
Understand what environmental impacts and phases are reported.
Tutorial
Supported providers
List of providers and tutorials on how to make requests.
Providers
Methodology
Understand how we estimate environmental impacts.
Methodology
"},{"location":"tutorial/impacts/","title":"Environmental Impacts","text":"Environmental impacts are reported for each request in the Impacts
pydantic model and features multiple criteria such as the energy and global warming potential per phase (usage or embodied) as well as the total impacts.
To learn more on how we estimate the environmental impacts and what are our hypotheses go to the methodology section.
Structure of Impacts modelfrom ecologits.impacts.modeling import *\n\nImpacts(\n energy=Energy(), # (1)!\n gwp=GWP(),\n adpe=ADPe(),\n pe=PE(),\n usage=Usage( # (2)!\n energy=Energy(),\n gwp=GWP(),\n adpe=ADPe(),\n pe=PE(),\n ),\n embodied=Embodied( # (3)!\n gwp=GWP(),\n adpe=ADPe(),\n pe=PE(),\n )\n)\n
Total impacts for all phases. Usage impacts for the electricity consumption impacts. Note that the energy is equal to the \"total\" energy impact. Embodied impacts for resource extract, manufacturing and transportation of hardware components allocated to the request. You can extract an impact with:
>>> response.impacts.usage.gwp.value # (1)!\n0.34 # Expressed in kgCO2eq.\n
Assuming you have made an inference and get the response in an response
object. Or you could get value range impact instead:
>>> response.impacts.usage.gwp.value\nRange(min=0.16, max=0.48) # Expressed in kgCO2eq (1)\n
Range
are used to define intervals. "},{"location":"tutorial/impacts/#criteria","title":"Criteria","text":"To evaluate the impact of human activities on the planet or on the climate we use criteria that usually focus on a specific issue such as GHG emissions for global warming, water consumption and pollution or the depletion of natural resources. We currently support three environmental impact criteria in addition with the direct energy consumption.
Monitoring multiple criteria is useful to avoid pollution shifting, which is defined as the transfer of pollution from one medium to another. It is a common pitfall to optimize only one criterion like GHG emissions (e.g. buying new hardware that is more energy efficient), that can lead to higher impacts on minerals and metals depletion for example (see encyclopedia.com ).
"},{"location":"tutorial/impacts/#energy","title":"Energy","text":"The Energy
criterion refers to the direct electricity consumption of GPUs, server and other equipments from the data center. As defined the energy criteria is not an environmental impact, but it is used to estimate other impacts in the usage phase. This criterion is expressed in kilowatt-hour (kWh).
Energy model attributes Attributes:
type
(str
) \u2013 energy
name
(str
) \u2013 Energy
value
(str
) \u2013 Energy value
unit
(str
) \u2013 Kilowatt-hour (kWh)
"},{"location":"tutorial/impacts/#global-warming-potential-gwp","title":"Global Warming Potential (GWP)","text":"The Global Warming Potential (GWP
) criterion is an index measuring how much heat is absorbed by greenhouse gases in the atmosphere compared to carbon dioxide. This criterion is expressed in kilogram of carbon dioxide equivalent (kgCO2eq).
Learn more: wikipedia.org
GWP model attributes Attributes:
"},{"location":"tutorial/impacts/#abiotic-depletion-potential-for-elements-adpe","title":"Abiotic Depletion Potential for Elements (ADPe)","text":"The Abiotic Depletion Potential \u2013 elements (ADPe
) criterion represents the reduction of non-renewable and non-living (abiotic) resources such as metals and minerals. This criterion is expressed in kilogram of antimony equivalent (kgSbeq).
Learn more: sciencedirect.com
ADPe model attributes Attributes:
"},{"location":"tutorial/impacts/#primary-energy-pe","title":"Primary Energy (PE)","text":"The Primary Energy (PE
) criterion represents the amount of energy consumed from natural sources such as raw fuels and other forms of energy, including waste. This criterion is expressed in megajoule (MJ).
Learn more: wikipedia.org
PE model attributes Attributes:
type
(str
) \u2013 PE
name
(str
) \u2013 Primary Energy
value
(str
) \u2013 PE value
unit
(str
) \u2013 Megajoule (MJ)
"},{"location":"tutorial/impacts/#phases","title":"Phases","text":"Inspired from the Life Cycle Assessment methodology we classify impacts is two phases (usage and embodied). The usage phase is about the environmental impacts related to the energy consumption while using an AI model. The embodied phase encompasses upstream impacts such as resource extraction, manufacturing, and transportation. We currently do not support the third phase which is end-of-life due to a lack of open research and transparency on that matter.
Learn more: wikipedia.org
Another pitfall in environmental impact assessment is to only look at the usage phase and ignore upstream and downstream impacts. This can lead to higher overall impacts on the entire life cycle. If you replace old hardware by newer that is more energy efficient, you will get a reduction of impacts on the usage phase, but it will increase the upstream impacts as well.
"},{"location":"tutorial/impacts/#usage","title":"Usage","text":"The Usage
phase accounts for the environmental impacts while using AI models. We report all criteria in addition to direct energy consumption for this phase.
Note that we use the worldwide average electricity mix impact factor by default.
Usage model attributes Attributes:
type
(str
) \u2013 usage
name
(str
) \u2013 Usage
energy
(Energy
) \u2013 Energy consumption
gwp
(GWP
) \u2013 Global Warming Potential (GWP) usage impact
adpe
(ADPe
) \u2013 Abiotic Depletion Potential for Elements (ADPe) usage impact
pe
(PE
) \u2013 Primary Energy (PE) usage impact
"},{"location":"tutorial/impacts/#embodied","title":"Embodied","text":"The Embodied phase accounts for the upstream environmental impacts such as resource extraction, manufacturing and transportation allocated to the request. We report all criteria (excluding energy consumption) for this phase.
Embodied model attributes Attributes:
type
(str
) \u2013 embodied
name
(str
) \u2013 Embodied
gwp
(GWP
) \u2013 Global Warming Potential (GWP) embodied impact
adpe
(ADPe
) \u2013 Abiotic Depletion Potential for Elements (ADPe) embodied impact
pe
(PE
) \u2013 Primary Energy (PE) embodied impact
"},{"location":"tutorial/impacts/#impact-factors","title":"Impact Factors","text":"We use impact factors to quantify environmental harm from human activities, measuring the ratio of greenhouse gases, resource consumption, and other criteria resulting from activities like energy consumption, industrial processes, transportation, waster management and more.
"},{"location":"tutorial/impacts/#electricity-mix","title":"Electricity Mix","text":"We currently assume by default a worldwide average impact factor for electricity consumption. We plan to allow users to change these impact factors dynamically based on a specific country/region or with custom values.
Default values (from ADEME Base Empreinte\u00ae):
Impact criteria Value Unit GWP \\(5.904e-1\\) \\(kgCO2eq / kWh\\) ADPe \\(7.378e-7\\) \\(kgSbeq / kWh\\) PE \\(9.988\\) \\(MJ / kWh\\)"},{"location":"tutorial/providers/","title":"Supported providers","text":""},{"location":"tutorial/providers/#list-of-all-providers","title":"List of all providers","text":"Provider name Extra for installation Guide Anthropic anthropic
Guide for Anthropic Cohere cohere
Guide for Cohere Google Gemini google-generativeai
Guide for Google Gemini Hugging Face Hub huggingface-hub
Guide for Hugging Face Hub LiteLLM litellm
Guide for LiteLLM Mistral AI mistralai
Guide for Mistral AI OpenAI openai
Guide for OpenAI Azure OpenAI openai
Guide for Azure OpenAI"},{"location":"tutorial/providers/#chat-completions","title":"Chat Completions","text":"Provider Completions Completions (stream) Completions (async) Completions (async + stream) Anthropic Cohere Google Gemini HuggingFace Hub LiteLLM Mistral AI OpenAI Azure OpenAI Partial support for Anthropic streams, see full documentation: Anthropic provider.
"},{"location":"tutorial/providers/anthropic/","title":"Anthropic","text":"This guide focuses on the integration of EcoLogits with the Anthropic official python client .
Official links:
Repository: anthropics/anthropic-sdk-python Documentation: docs.anthropic.com "},{"location":"tutorial/providers/anthropic/#installation","title":"Installation","text":"To install EcoLogits along with all necessary dependencies for compatibility with the Anthropic client, please use the anthropic
extra-dependency option as follows:
pip install ecologits[anthropic]\n
This installation command ensures that EcoLogits is set up with the specific libraries required to interface seamlessly with Anthropic's Python client.
"},{"location":"tutorial/providers/anthropic/#chat-completions","title":"Chat Completions","text":""},{"location":"tutorial/providers/anthropic/#example","title":"Example","text":"Integrating EcoLogits with your applications does not alter the standard outputs from the API responses. Instead, it enriches them by adding the Impacts
object, which contains detailed environmental impact data.
SyncAsync from anthropic import Anthropic\nfrom ecologits import EcoLogits\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = Anthropic(api_key=\"<ANTHROPIC_API_KEY>\")\n\nresponse = client.messages.create(\n max_tokens=100,\n messages=[{\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}],\n model=\"claude-3-haiku-20240307\",\n)\n\n# Get estimated environmental impacts of the inference\nprint(response.impacts)\n
import asyncio\nfrom anthropic import AsyncAnthropic\nfrom ecologits import EcoLogits\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = AsyncAnthropic(api_key=\"<ANTHROPIC_API_KEY>\")\n\nasync def main() -> None:\n response = await client.messages.create(\n max_tokens=100,\n messages=[{\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}],\n model=\"claude-3-haiku-20240307\",\n )\n\n # Get estimated environmental impacts of the inference\n print(response.impacts)\n\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/anthropic/#streaming-example","title":"Streaming example","text":"In streaming mode, the impacts are calculated in the last chunk for the entire request.
SyncAsync from anthropic import Anthropic\nfrom ecologits import EcoLogits\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = Anthropic(api_key=\"<ANTHROPIC_API_KEY>\")\n\nwith client.messages.stream(\n max_tokens=100,\n messages=[{\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}],\n model=\"claude-3-haiku-20240307\",\n) as stream:\n for text in stream.text_stream:\n pass\n # Get estimated environmental impacts of the inference\n print(stream.impacts)\n
import asyncio\nfrom anthropic import AsyncAnthropic\nfrom ecologits import EcoLogits\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = AsyncAnthropic(api_key=\"<ANTHROPIC_API_KEY>\")\n\nasync def main() -> None:\n async with client.messages.stream(\n max_tokens=100,\n messages=[{\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}],\n model=\"claude-3-haiku-20240307\",\n ) as stream:\n async for text in stream.text_stream:\n pass\n # Get estimated environmental impacts of the inference\n print(stream.impacts)\n\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/cohere/","title":"Cohere","text":"This guide focuses on the integration of EcoLogits with the Cohere official python client .
Official links:
Repository: mistralai/client-python Documentation: docs.cohere.com "},{"location":"tutorial/providers/cohere/#installation","title":"Installation","text":"To install EcoLogits along with all necessary dependencies for compatibility with the Cohere client, please use the cohere
extra-dependency option as follows:
pip install ecologits[cohere]\n
This installation command ensures that EcoLogits is set up with the specific libraries required to interface seamlessly with Cohere's Python client.
"},{"location":"tutorial/providers/cohere/#chat-completions","title":"Chat Completions","text":""},{"location":"tutorial/providers/cohere/#example","title":"Example","text":"Integrating EcoLogits with your applications does not alter the standard outputs from the API responses. Instead, it enriches them by adding the Impacts
object, which contains detailed environmental impact data.
SyncAsync from cohere import Client\nfrom ecologits import EcoLogits\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = Client(api_key=\"<COHERE_API_KEY>\")\n\nresponse = client.chat(\n message=\"Tell me a funny joke!\", \n max_tokens=100\n)\n\n# Get estimated environmental impacts of the inference\nprint(response.impacts)\n
import asyncio\nfrom cohere import AsyncClient\nfrom ecologits import EcoLogits\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = AsyncClient(api_key=\"<COHERE_API_KEY>\")\n\nasync def main() -> None:\n response = await client.chat(\n message=\"Tell me a funny joke!\", \n max_tokens=100\n )\n\n # Get estimated environmental impacts of the inference\n print(response.impacts)\n\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/cohere/#streaming-example","title":"Streaming example","text":"In streaming mode, the impacts are calculated in the last chunk for the entire request.
SyncAsync from cohere import Client\nfrom ecologits import EcoLogits\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = Client(api_key=\"<COHERE_API_KEY>\")\n\nstream = client.chat_stream(\n message=\"Tell me a funny joke!\", \n max_tokens=100\n)\n\nfor chunk in stream:\n if chunk.event_type == \"stream-end\":\n # Get estimated environmental impacts of the inference\n print(chunk.impacts)\n
import asyncio\nfrom cohere import AsyncClient\nfrom ecologits import EcoLogits\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = AsyncClient(api_key=\"<COHERE_API_KEY>\")\n\nasync def main() -> None:\n stream = await client.chat_stream(\n message=\"Tell me a funny joke!\", \n max_tokens=100\n )\n\n async for chunk in stream:\n if chunk.event_type == \"stream-end\":\n # Get estimated environmental impacts of the inference\n print(chunk.impacts)\n\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/google/","title":"Google Gemini","text":"This guide focuses on the integration of EcoLogits with the Google Gemini official python client .
Official links:
Repository: google-gemini/generative-ai-python Documentation: ai.google.dev "},{"location":"tutorial/providers/google/#installation","title":"Installation","text":"To install EcoLogits along with all necessary dependencies for compatibility with the Google Gemini client, please use the google-generativeai
extra-dependency option as follows:
pip install ecologits[google-generativeai]\n
This installation command ensures that EcoLogits is set up with the specific libraries required to interface seamlessly with Google Gemini Python client.
"},{"location":"tutorial/providers/google/#chat-completions","title":"Chat Completions","text":""},{"location":"tutorial/providers/google/#example","title":"Example","text":"Integrating EcoLogits with your applications does not alter the standard outputs from the API responses. Instead, it enriches them by adding the Impacts
object, which contains detailed environmental impact data.
SyncAsync from ecologits import EcoLogits\nimport google.generativeai as genai\n\n# Initialize EcoLogits\nEcoLogits.init()\n\n# Ask something to Google Gemini\ngenai.configure(api_key=\"<GOOGLE_API_KEY>\")\nmodel = genai.GenerativeModel(\"gemini-1.5-flash\")\nresponse = model.generate_content(\"Write a story about a magic backpack.\")\n\n# Get estimated environmental impacts of the inference\nprint(response.impacts)\n
import asyncio\nfrom ecologits import EcoLogits\nimport google.generativeai as genai\n\n# Initialize EcoLogits\nEcoLogits.init()\n\n# Ask something to Google Gemini in async mode\nasync def main() -> None:\n genai.configure(api_key=\"<GOOGLE_API_KEY>\")\n model = genai.GenerativeModel(\"gemini-1.5-flash\")\n response = await model.generate_content_async(\n \"Write a story about a magic backpack.\"\n )\n\n # Get estimated environmental impacts of the inference\n print(response.impacts)\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/google/#streaming-example","title":"Streaming example","text":"In streaming mode, the impacts are calculated incrementally, which means you don't need to sum the impacts from each data chunk. Instead, the impact information in the last chunk reflects the total cumulative environmental impacts for the entire request.
SyncAsync from ecologits import EcoLogits\nimport google.generativeai as genai\n\n# Initialize EcoLogits\nEcoLogits.init()\n\n# Ask something to Google Gemini in streaming mode\ngenai.configure(api_key=\"<GOOGLE_API_KEY>\")\nmodel = genai.GenerativeModel(\"gemini-1.5-flash\")\nstream = model.generate_content(\n \"Write a story about a magic backpack.\", \n stream=True\n)\n\n# Get cumulative estimated environmental impacts of the inference\nfor chunk in stream:\n print(chunk.impacts)\n
import asyncio\nfrom ecologits import EcoLogits\nimport google.generativeai as genai\n\n# Initialize EcoLogits\nEcoLogits.init()\n\n# Ask something to Google Gemini in streaming and async mode\nasync def main() -> None:\n genai.configure(api_key=\"<GOOGLE_API_KEY>\")\n model = genai.GenerativeModel(\"gemini-1.5-flash\")\n stream = await model.generate_content_async(\n \"Write a story about a magic backpack.\", \n stream=True\n )\n\n # Get cumulative estimated environmental impacts of the inference\n async for chunk in stream:\n print(chunk.impacts)\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/huggingface_hub/","title":"Hugging Face Hub","text":"This guide focuses on the integration of EcoLogits with the Hugging Face Hub official python client .
Official links:
Repository: huggingface/huggingface_hub Documentation: huggingface.co "},{"location":"tutorial/providers/huggingface_hub/#installation","title":"Installation","text":"To install EcoLogits along with all necessary dependencies for compatibility with the Hugging Face Hub client, please use the huggingface-hub
extra-dependency option as follows:
pip install ecologits[huggingface-hub]\n
This installation command ensures that EcoLogits is set up with the specific libraries required to interface seamlessly with Hugging Face Hub's Python client.
"},{"location":"tutorial/providers/huggingface_hub/#chat-completions","title":"Chat Completions","text":""},{"location":"tutorial/providers/huggingface_hub/#example","title":"Example","text":"Integrating EcoLogits with your applications does not alter the standard outputs from the API responses. Instead, it enriches them by adding the Impacts
object, which contains detailed environmental impact data.
SyncAsync from ecologits import EcoLogits\nfrom huggingface_hub import InferenceClient\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = InferenceClient(model=\"HuggingFaceH4/zephyr-7b-beta\")\nresponse = client.chat_completion(\n messages=[{\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}],\n max_tokens=15\n)\n\n# Get estimated environmental impacts of the inference\nprint(response.impacts)\n
import asyncio\nfrom ecologits import EcoLogits\nfrom huggingface_hub import AsyncInferenceClient\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = AsyncInferenceClient(model=\"HuggingFaceH4/zephyr-7b-beta\")\n\nasync def main() -> None:\n response = await client.chat_completion(\n messages=[{\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}],\n max_tokens=15\n )\n\n # Get estimated environmental impacts of the inference\n print(response.impacts)\n\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/huggingface_hub/#streaming-example","title":"Streaming example","text":"In streaming mode, the impacts are calculated incrementally, which means you don't need to sum the impacts from each data chunk. Instead, the impact information in the last chunk reflects the total cumulative environmental impacts for the entire request.
SyncAsync from ecologits import EcoLogits\nfrom huggingface_hub import InferenceClient\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = InferenceClient(model=\"HuggingFaceH4/zephyr-7b-beta\")\nstream = client.chat_completion(\n messages=[{\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}],\n max_tokens=15,\n stream=True\n)\n\nfor chunk in stream:\n # Get cumulative estimated environmental impacts of the inference\n print(chunk.impacts)\n
import asyncio\nfrom ecologits import EcoLogits\nfrom huggingface_hub import AsyncInferenceClient\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = AsyncInferenceClient(model=\"HuggingFaceH4/zephyr-7b-beta\")\n\nasync def main() -> None:\n stream = await client.chat_completion(\n messages=[{\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}],\n max_tokens=15,\n stream=True\n )\n\n async for chunk in stream:\n # Get cumulative estimated environmental impacts of the inference\n print(chunk.impacts)\n\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/litellm/","title":"LiteLLM","text":"This guide focuses on the integration of EcoLogits with the LiteLLM official Python client .
Official links:
Repository: BerriAI/litellm Documentation: litellm.vercel.app "},{"location":"tutorial/providers/litellm/#installation","title":"Installation","text":"To install EcoLogits along with all necessary dependencies for compatibility with LiteLLM, please use the litellm
extra-dependency option as follows:
pip install ecologits[litellm]\n
This installation command ensures that EcoLogits is set up with the specific libraries required to interface seamlessly with LiteLLM's Python client.
"},{"location":"tutorial/providers/litellm/#chat-completions","title":"Chat Completions","text":""},{"location":"tutorial/providers/litellm/#example","title":"Example","text":"Integrating EcoLogits with your applications does not alter the standard outputs from the API responses. Instead, it enriches them by adding the Impacts
object, which contains detailed environmental impact data. Make sure you have the api key of the provider used in an .env file. Make sure you call the litellm generation function as \"litellm.completion\" and not just \"completion\".
SyncAsync from ecologits import EcoLogits\nimport litellm\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nresponse = litellm.completion(\n model=\"gpt-4o-2024-05-13\",\n messages=[{ \"content\": \"Hello, how are you?\",\"role\": \"user\"}]\n)\n\n# Get estimated environmental impacts of the inference\nprint(response.impacts)\n
import asyncio\nimport litellm\nfrom ecologits import EcoLogits\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nasync def main() -> None:\n response = await litellm.acompletion(\n model=\"gpt-3.5-turbo\",\n messages=[\n {\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}\n ]\n )\n\n # Get estimated environmental impacts of the inference\n print(response.impacts)\n\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/litellm/#streaming-example","title":"Streaming example","text":"In streaming mode, the impacts are calculated incrementally, which means you don't need to sum the impacts from each data chunk. Instead, the impact information in the last chunk reflects the total cumulative environmental impacts for the entire request.
SyncAsync from ecologits import EcoLogits\nimport litellm\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nstream = litellm.completion(\n model=\"gpt-3.5-turbo\",\n messages=[{\"role\": \"user\", \"content\": \"Hello World!\"}],\n stream=True\n)\n\nfor chunk in stream:\n # Get cumulative estimated environmental impacts of the inference\n print(chunk.impacts)\n
import asyncio\nimport litellm\nfrom ecologits import EcoLogits\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nasync def main() -> None:\n stream = await litellm.acompletion(\n model=\"gpt-3.5-turbo\",\n messages=[\n {\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}\n ]\n )\n\n async for chunk in stream:\n # Get cumulative estimated environmental impacts of the inference\n print(chunk.impacts)\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/mistralai/","title":"Mistral AI","text":"This guide focuses on the integration of EcoLogits with the Mistral AI official python client .
Official links:
Repository: mistralai/client-python Documentation: docs.mistral.ai "},{"location":"tutorial/providers/mistralai/#installation","title":"Installation","text":"To install EcoLogits along with all necessary dependencies for compatibility with the Mistral AI client, please use the mistralai
extra-dependency option as follows:
pip install ecologits[mistralai]\n
This installation command ensures that EcoLogits is set up with the specific libraries required to interface seamlessly with Mistral AI's Python client.
"},{"location":"tutorial/providers/mistralai/#chat-completions","title":"Chat Completions","text":""},{"location":"tutorial/providers/mistralai/#example","title":"Example","text":"Integrating EcoLogits with your applications does not alter the standard outputs from the API responses. Instead, it enriches them by adding the Impacts
object, which contains detailed environmental impact data.
SyncAsync from ecologits import EcoLogits\nfrom mistralai.client import MistralClient\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = MistralClient(api_key=\"<MISTRAL_API_KEY>\")\n\nresponse = client.chat(\n messages=[\n {\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}\n ],\n model=\"mistral-tiny\"\n)\n\n# Get estimated environmental impacts of the inference\nprint(response.impacts)\n
import asyncio\nfrom ecologits import EcoLogits\nfrom mistralai.async_client import MistralAsyncClient\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = MistralAsyncClient(api_key=\"<MISTRAL_API_KEY>\")\n\nasync def main() -> None:\n response = await client.chat(\n messages=[\n {\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}\n ],\n model=\"mistral-tiny\"\n )\n\n # Get estimated environmental impacts of the inference\n print(response.impacts)\n\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/mistralai/#streaming-example","title":"Streaming example","text":"In streaming mode, the impacts are calculated incrementally, which means you don't need to sum the impacts from each data chunk. Instead, the impact information in the last chunk reflects the total cumulative environmental impacts for the entire request.
SyncAsync from ecologits import EcoLogits\nfrom mistralai.client import MistralClient\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = MistralClient(api_key=\"<MISTRAL_API_KEY>\")\n\nstream = client.chat_stream(\n messages=[\n {\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}\n ],\n model=\"mistral-tiny\"\n)\n\nfor chunk in stream:\n # Get cumulative estimated environmental impacts of the inference\n print(chunk.impacts)\n
import asyncio\nfrom ecologits import EcoLogits\nfrom mistralai.async_client import MistralAsyncClient\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = MistralAsyncClient(api_key=\"<MISTRAL_API_KEY>\")\n\nasync def main() -> None:\n response = await client.chat(\n messages=[\n {\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}\n ],\n model=\"mistral-tiny\"\n )\n\n async for chunk in stream:\n # Get cumulative estimated environmental impacts of the inference\n if hasattr(chunk, \"impacts\"):\n print(chunk.impacts)\n\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/openai/","title":"OpenAI","text":"This guide focuses on the integration of EcoLogits with the OpenAI official python client .
Official links:
Repository: openai/openai-python Documentation: platform.openai.com "},{"location":"tutorial/providers/openai/#installation","title":"Installation","text":"To install EcoLogits along with all necessary dependencies for compatibility with the OpenAI client, please use the openai
extra-dependency option as follows:
pip install ecologits[openai]\n
This installation command ensures that EcoLogits is set up with the specific libraries required to interface seamlessly with OpenAI's Python client.
"},{"location":"tutorial/providers/openai/#chat-completions","title":"Chat Completions","text":""},{"location":"tutorial/providers/openai/#example","title":"Example","text":"Integrating EcoLogits with your applications does not alter the standard outputs from the API responses. Instead, it enriches them by adding the Impacts
object, which contains detailed environmental impact data.
SyncAsync from ecologits import EcoLogits\nfrom openai import OpenAI\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = OpenAI(api_key=\"<OPENAI_API_KEY>\")\n\nresponse = client.chat.completions.create(\n model=\"gpt-3.5-turbo\",\n messages=[\n {\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}\n ]\n)\n\n# Get estimated environmental impacts of the inference\nprint(response.impacts)\n
import asyncio\nfrom ecologits import EcoLogits\nfrom openai import AsyncOpenAI\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = AsyncOpenAI(api_key=\"<OPENAI_API_KEY>\")\n\nasync def main() -> None:\n response = await client.chat.completions.create(\n model=\"gpt-3.5-turbo\",\n messages=[\n {\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}\n ]\n )\n\n # Get estimated environmental impacts of the inference\n print(response.impacts)\n\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/openai/#streaming-example","title":"Streaming example","text":"In streaming mode, the impacts are calculated incrementally, which means you don't need to sum the impacts from each data chunk. Instead, the impact information in the last chunk reflects the total cumulative environmental impacts for the entire request.
SyncAsync from ecologits import EcoLogits\nfrom openai import OpenAI\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = OpenAI(api_key=\"<OPENAI_API_KEY>\")\n\nstream = client.chat.completions.create(\n model=\"gpt-3.5-turbo\",\n messages=[{\"role\": \"user\", \"content\": \"Hello World!\"}],\n stream=True\n)\n\nfor chunk in stream:\n # Get cumulative estimated environmental impacts of the inference\n print(chunk.impacts)\n
import asyncio\nfrom ecologits import EcoLogits\nfrom openai import AsyncOpenAI\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = AsyncOpenAI(api_key=\"<OPENAI_API_KEY>\")\n\nasync def main() -> None:\n stream = await client.chat.completions.create(\n model=\"gpt-3.5-turbo\",\n messages=[\n {\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}\n ]\n )\n\n async for chunk in stream:\n # Get cumulative estimated environmental impacts of the inference\n print(chunk.impacts)\n\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/openai/#compatibility-with-azure-openai","title":"Compatibility with Azure OpenAI","text":"EcoLogits is also compatible with Azure OpenAI .
import os\nfrom ecologits import EcoLogits\nfrom openai import AzureOpenAI\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = AzureOpenAI(\n azure_endpoint = os.getenv(\"AZURE_OPENAI_ENDPOINT\"), \n api_key=os.getenv(\"AZURE_OPENAI_API_KEY\"), \n api_version=\"2024-02-01\"\n)\n\n\nresponse = client.chat.completions.create(\n model=\"gpt-35-turbo\",\n messages=[\n {\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}\n ]\n)\n\n# Get estimated environmental impacts of the inference\nprint(response.impacts)\n
"}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Welcome to EcoLogits","text":"EcoLogits tracks the energy consumption and environmental impacts of using generative AI models through APIs. It supports major LLM providers such as OpenAI, Anthropic, Mistral AI and more (see supported providers).
"},{"location":"#requirements","title":"Requirements","text":"Python 3.9+
EcoLogits relies on key libraries to provide essential functionalities:
Pydantic for data modeling. Wrapt for function patching. "},{"location":"#installation","title":"Installation","text":"Select providers
Anthropic Cohere Google Gemini Hugging Face Inference Endpoints LiteLLM Mistral AI OpenAI
Run this command
For detailed instructions on each provider, refer to the complete list of supported providers and features. It is also possible to install EcoLogits without any provider.
"},{"location":"#usage-example","title":"Usage Example","text":"Below is a simple example demonstrating how to use the GPT-3.5-Turbo model from OpenAI with EcoLogits to track environmental impacts.
from ecologits import EcoLogits\nfrom openai import OpenAI\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = OpenAI(api_key=\"<OPENAI_API_KEY>\")\n\nresponse = client.chat.completions.create(\n model=\"gpt-3.5-turbo\",\n messages=[\n {\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}\n ]\n)\n\n# Get estimated environmental impacts of the inference\nprint(f\"Energy consumption: {response.impacts.energy.value} kWh\")\nprint(f\"GHG emissions: {response.impacts.gwp.value} kgCO2eq\")\n
Environmental impacts are quantified based on four criteria and across two phases:
Criteria:
Energy (energy): Final energy consumption in kWh, Global Warming Potential (gwp): Potential impact on global warming in kgCO2eq (commonly known as GHG/carbon emissions), Abiotic Depletion Potential for Elements (adpe): Impact on the depletion of non-living resources such as minerals or metals in kgSbeq, Primary Energy (pe): Total energy consumed from primary sources in MJ. Phases:
Usage (usage): Represents the phase of energy consumption during model execution, Embodied (embodied): Encompasses resource extraction, manufacturing, and transportation phases associated with the model's lifecycle. Learn more about environmental impacts assessment in the methodology section.
"},{"location":"#license","title":"License","text":"This project is licensed under the terms of the Mozilla Public License Version 2.0 (MPL-2.0) .
"},{"location":"#acknowledgements","title":"Acknowledgements","text":"EcoLogits is actively developed and maintained by GenAI Impact non-profit. We extend our gratitude to Data For Good and Boavizta for supporting the development of this project. Their contributions of tools, best practices, and expertise in environmental impact assessment have been invaluable.
"},{"location":"contributing/","title":"Contribution","text":"Help us improve EcoLogits by contributing!
"},{"location":"contributing/#issues","title":"Issues","text":"Questions, feature requests and bug reports are all welcome as discussions or issues.
When submitting a feature request or bug report, please provide as much detail as possible. For bug reports, please include relevant information about your environment, including the version of EcoLogits and other Python dependencies used in your project.
"},{"location":"contributing/#pull-requests","title":"Pull Requests","text":"Getting started and creating a Pull Request is a straightforward process. Since EcoLogits is regularly updated, you can expect to see your contributions incorporated into the project within a matter of days or weeks.
For non-trivial changes, please create an issue to discuss your proposal before submitting pull request. This ensures we can review and refine your idea before implementation.
"},{"location":"contributing/#prerequisites","title":"Prerequisites","text":"You'll need to meet the following requirements:
Python version above 3.9 git make poetry pre-commit "},{"location":"contributing/#installation-and-setup","title":"Installation and setup","text":"Fork the repository on GitHub and clone your fork locally.
# Clone your fork and cd into the repo directory\ngit clone git@github.com:<your username>/ecologits.git\ncd ecologits\n\n# Install ecologits development dependencies with poetry\nmake install\n
"},{"location":"contributing/#check-out-a-new-branch-and-make-your-changes","title":"Check out a new branch and make your changes","text":"Create a new branch for your changes.
# Checkout a new branch and make your changes\ngit checkout -b my-new-feature-branch\n# Make your changes and implements tests...\n
"},{"location":"contributing/#run-tests","title":"Run tests","text":"Run tests locally to make sure everything is working as expected.
make test\n
If you have added a new provider you will need to record your tests with VCR.py through pytest-recording.
make test-record\n
Once your tests are recorded, please check that the newly created cassette files (located in tests/cassettes/...
) do not contain any sensible information like API tokens. If so you will need to update the configuration accordingly in conftest.py
and run again the command to record tests.
"},{"location":"contributing/#build-documentation","title":"Build documentation","text":"If you've made any changes to the documentation (including changes to function signatures, class definitions, or docstrings that will appear in the API documentation), make sure it builds successfully.
# Build documentation\nmake docs\n# If you have changed the documentation, make sure it builds successfully.\n
You can also serve the documentation locally.
# Serve the documentation at localhost:8000\npoetry run mkdocs serve\n
"},{"location":"contributing/#code-formatting-and-pre-commit","title":"Code formatting and pre-commit","text":"Before pushing your work, run the pre-commit hook that will check and lint your code.
# Run all checks before commit\nmake pre-commit\n
"},{"location":"contributing/#commit-and-push-your-changes","title":"Commit and push your changes","text":"Commit your changes, push your branch to GitHub, and create a pull request.
Please follow the pull request template and fill in as much information as possible. Link to any relevant issues and include a description of your changes.
When your pull request is ready for review, add a comment with the message \"please review\" and we'll take a look as soon as we can.
"},{"location":"contributing/#documentation-style","title":"Documentation style","text":"Documentation is written in Markdown and built using Material for MkDocs. API documentation is build from docstrings using mkdocstrings.
"},{"location":"contributing/#code-documentation","title":"Code documentation","text":"When contributing to EcoLogits, please make sure that all code is well documented. The following should be documented using properly formatted docstrings.
We use Google-style docstrings formatted according to PEP 257 guidelines. (See Example Google Style Python Docstrings for further examples.)
"},{"location":"contributing/#documentation-style_1","title":"Documentation style","text":"Documentation should be written in a clear, concise, and approachable tone, making it easy for readers to understand and follow along. Aim for brevity while still providing complete information.
Code examples are highly encouraged, but should be kept short, simple and self-contained. Ensure that each example is complete, runnable, and can be easily executed by readers.
"},{"location":"contributing/#acknowledgment","title":"Acknowledgment","text":"We'd like to acknowledge that this contribution guide is heavily inspired by the excellent guide from Pydantic. Thanks for the inspiration!
"},{"location":"faq/","title":"Frequently Asked Questions","text":""},{"location":"faq/#why-are-training-impacts-not-included","title":"Why are training impacts not included?","text":"Even though the training impacts of generative AI models are substantial, we currently do not implement them in our methodologies and tools. EcoLogits is aimed at estimating the impacts of an API request made to a GenAI service. To make the impact assessment complete, we indeed should take into account training impacts. However, given that we focus on services that are used by millions of people, doing billions of requests annually the training impacts are in fact negligible.
For example, looking at Llama 3 70B, the estimated training greenhouse gas emissions are \\(1,900\\ tCO2eq\\). This is significant for an AI model but comparing it to running inference on that model for say 100 billion requests annually makes the share of impacts induced by training the model becomes very small. E.g., \\(\\frac{1,900\\ \\text{tCO2eq}}{100\\ \\text{billion requests}} = 1.9e-8\\ \\text{tCO2eq per request}\\) or \\(0.019\\ \\text{gCO2eq per request}\\). This, compared to running a simple request to Llama 3 70B that would yield \\(1\\ \\text{to}\\ 5\\ \\text{gCO2}\\) (calculated with our methodology).
It does not mean that we do not plan to integrate training impacts, it is just not a priority right now due to the difference in order of magnitude. It is also worth mentioning that estimating the number of requests that will be ever made in the lifespan of a model is very difficult, both for open-source and proprietary models. You can join the discussion on GitHub #70 .
"},{"location":"faq/#whats-the-difference-with-codecarbon","title":"What's the difference with CodeCarbon?","text":"EcoLogits and CodeCarbon are two different tools that do not aim to address the same use case. CodeCarbon should be used when you control the execution environment of your model. This means that if you deploy models on your laptop, your server or in the cloud it is preferable to use CodeCarbon to get energy consumption and estimate carbon emissions associated with running your model (including training, fine-tuning or inference).
On the other hand EcoLogits is designed for scenarios where you do not have access to the execution environment of your GenAI model because it is managed by a third-party provider. In such cases you can rely on EcoLogits to estimate energy consumption and environmental impacts for inference workloads. Both tools are complementary and can be used together to provide a comprehensive view of environmental impacts across different deployment scenarios.
"},{"location":"faq/#how-can-i-estimate-impacts-of-general-use-of-genai-models","title":"How can I estimate impacts of general use of GenAI models?","text":"If you want to estimate the environmental impacts of using generative AI models without coding or making request, we recommend you to use our online webapp EcoLogits Calculator .
"},{"location":"faq/#how-do-we-assess-impacts-for-proprietary-models","title":"How do we assess impacts for proprietary models?","text":"Environmental impacts are calculated based on model architecture and parameter count. For proprietary models, we lack transparency from providers, so we estimate parameter counts using available information. For GPT models, we based our estimates on leaked GPT-4 architecture and scaled parameters count for GPT-4-Turbo and GPT-4o based on pricing differences. For other proprietary models like Anthropic's Claude, we assume similar impacts for models released around the same time with similar performance on public benchmarks. Please note that these estimates are based on assumptions and may not be exact. Our methods are open-source and transparent, so you can always see the hypotheses we use.
"},{"location":"faq/#how-to-reduce-my-environmental-impact","title":"How to reduce my environmental impact?","text":"First, you may want to assess indirect impacts and rebound effects of the project you are building. Does the finality of your product or service is impacting negatively the environment? Does the usage of your product or service drives up consumption and environmental impacts of previously existing technology?
Try to be frugal and question your usages or needs of AI:
Do you really need AI to solve your problem? Do you really need GenAI to solve your problem? (you can read this paper ) Prefer fine-tuning of small and existing models over generalist models. Evaluate before, during and after the development of your project the environmental impacts with tools like EcoLogits or CodeCarbon (see more tools ) Restrict the use case and limit the usage of your tool or feature to the desired purpose. Do not buy new GPUs or hardware. Hardware production for data centers is responsible for around 50% of the impacts compared to usage impacts. The share is even more bigger for consumer devices, around 80%.
Use cloud instances that are located in low emissions / high energy efficiency data centers (see electricitymaps.com ).
Optimize your models for production use cases. You can look at model compression technics such as quantization, pruning or distillation. There are also inference optimization tricks available in some software.
"},{"location":"why/","title":"Why use EcoLogits?","text":"Generative AI significantly impacts our environment, consuming electricity and contributing to global greenhouse gas emissions. In 2020, the ICT sector accounted for 2.1% to 3.9% of global emissions, with projections suggesting an increase to 6%-8% by 2025 due to continued growth and adoption Freitag et al., 2021. The advent of GenAI technologies like ChatGPT has further exacerbated this trend, causing a sharp rise in energy, water, and hardware costs for major tech companies. [0, 1].
"},{"location":"why/#which-is-bigger-training-or-inference-impacts","title":"Which is bigger: training or inference impacts?","text":"The field of Green AI focuses on evaluating the environmental impacts of AI models. While many studies have concentrated on training impacts [2], they often overlook other critical phases like data collection, storage and processing phases, research experiments and inference. For GenAI, the inference phase can significantly overshadow training impacts when models are deployed at scale [3]. EcoLogits specifically addresses this gap by focusing on the inference impacts of GenAI.
"},{"location":"why/#how-to-assess-impacts-properly","title":"How to assess impacts properly?","text":"EcoLogits employs state-of-the-art methodologies based on Life Cycle Assessment and open data to assess environmental impacts across multiple phases and criteria. This includes usage impacts from electricity consumption and embodied impacts from the production and transportation of hardware. Our multi-criteria approach also evaluates carbon emissions, abiotic resource depletion, and primary energy consumption, providing a comprehensive view that informs decisions like model selection, hardware upgrades and cloud deployments.
"},{"location":"why/#how-difficult-is-it","title":"How difficult is it?","text":"Assessing environmental impacts can be challenging with external providers due to lack of control over the execution environment. Meaning you can easily estimate usage impact regarding energy consumption with CodeCarbon and also embodied impacts with BoaviztAPI, but these tools become less relevant with external service providers. EcoLogits simplifies this by basing calculations on well-founded assumptions about hardware, model size, and operational practices, making it easier to estimate impacts accurately. For more details, see our methodology section.
"},{"location":"why/#easy-to-use","title":"Easy to use","text":"EcoLogits integrates seamlessly into existing GenAI providers, allowing you to assess the environmental impact of each API request with minimal code adjustments:
from ecologits import EcoLogits\n\nEcoLogits.init() \n\n# Then, you can make request to any supported provider.\n
See the list of supported providers and more code snippets in the tutorial section.
"},{"location":"why/#have-more-questions","title":"Have more questions?","text":"Feel free to ask question in our GitHub discussions forum!
"},{"location":"methodology/","title":"Methodology","text":""},{"location":"methodology/#evaluation-methodologies","title":"Evaluation methodologies","text":"The following methodologies are currently available and implemented in EcoLogits:
Upcoming methodologies (join us to help speed up our progress):
Embeddings Image Generation Multi-Modal "},{"location":"methodology/#methodological-background","title":"Methodological background","text":"EcoLogits employs the Life Cycle Assessment (LCA) methodology, as defined by ISO 14044, to estimate the environmental impacts of requests made to generative AI inference services. This approach focuses on multiple phases of the lifecycle, specifically raw material extraction, manufacturing, transportation (denoted as embodied impacts), usage and end-of-life. Notably, we do not cover the end-of-life phase due to data limitations on e-waste recycling.
Our assessment considers three key environmental criteria:
Global Warming Potential (GWP): Evaluates the impact on global warming in terms of CO2 equivalents. Abiotic Resource Depletion for Elements (ADPe): Assesses the consumption of raw minerals and metals, expressed in antimony equivalents. Primary Energy (PE): Calculates energy consumed from natural sources, expressed in megajoules. Using a bottom-up modeling approach, we assess and aggregate the environmental impacts of all individual service components. This method differs from top-down approaches by allowing precise allocation of each resource's impact to the overall environmental footprint.
Our current focus is on high-performance GPU-accelerated cloud instances, crucial for GenAI inference tasks. While we exclude impacts from training, networking, and end-user devices, we thoroughly evaluate the impacts associated with hosting and running the model inferences.
The methodology is grounded in transparency and reproducibility, utilizing open market and technical data to ensure our results are reliable and verifiable.
"},{"location":"methodology/#licenses-and-citations","title":"Licenses and citations","text":"All the methodologies are licensed under CC BY-SA 4.0
Please ensure that you adhere to the license terms and properly cite the authors and the GenAI Impact non-profit organization when utilizing this work. Each methodology has an associated paper with specific citation requirements.
"},{"location":"methodology/llm_inference/","title":"LLM Inference","text":"Page still under construction
This page is still under construction. If you spot any inaccuracies or have questions about the methodology itself, feel free to open an issue on GitHub.
Early Publication
Beware that this is an early version of the methodology to evaluate the environmental impacts of LLMs at inference. We are still testing and reviewing the methodology internally. Some parts of the methodology may change in the near future.
"},{"location":"methodology/llm_inference/#environmental-impacts-of-llm-inference","title":"Environmental Impacts of LLM Inference","text":"Known limitations and hypotheses Based on a production setup: models are quantized, high-end servers with A100... Current implementation of EcoLogits assumes a fixed and worldwide impact factor for electricity mix. Model architectures are assumed when not dislosed by the provider. Not accounting the impacts of unused cloud resources, data center building, network and end-user devices, model training and data collection... Not tested on multi-modal models for text-to-text generation only. The environmental impacts of a request, \\(I_{request}\\) to a Large Language Model (LLM) can be divided into two components: the usage impacts, \\(I_{request}^u\\), which account for energy consumption, and the embodied impacts, \\(I_{request}^e\\), which account for resource extraction, hardware manufacturing, and transportation.
\\[ \\begin{equation*} \\begin{split} I_{request}&=I_{request}^u + I_{request}^e \\\\ &= E_{request}*F_{em}+\\frac{\\Delta T}{\\Delta L}*I_{server}^e \\end{split} \\end{equation*} \\] Where \\(E_{request}\\) represents the energy consumption of the IT resources associated with the request. \\(F_{em}\\) denotes the impact factor of electricity consumption, which varies depending on the location and time. Furthermore, \\(I_{server}^e\\) captures the embodied impacts of the IT resources, and \\(\\frac{\\Delta T}{\\Delta L}\\) signifies the hardware utilization factor, calculated as the computation time divided by the lifetime of the hardware.
"},{"location":"methodology/llm_inference/#usage-impacts","title":"Usage impacts","text":"To assess the usage impacts of an LLM inference, we first need to estimate the energy consumption of the server, which is equipped with one or more GPUs. We will also take into account the energy consumption of cooling equipment integrated with the data center, using the Power Usage Effectiveness (PUE) metric.
Subsequently, we can calculate the environmental impacts by using the \\(F_{em}\\) impact factor of the electricity mix. Ideally, \\(F_{em}\\) should vary with location and time to accurately reflect the local energy mix.
"},{"location":"methodology/llm_inference/#modeling-gpu-energy-consumption","title":"Modeling GPU energy consumption","text":"By leveraging the open dataset from the LLM Perf Leaderboard, produced by Hugging Face, we can estimate the energy consumption of the GPU using a parametric model.
We fit a linear regression model to the dataset, which models the energy consumption per output token as a function of the number of active parameters in the LLM, denoted as \\(P_{active}\\).
What are active parameters? We distinguish between active parameters and total parameter count for Sparse Mixture-of-Experts (SMoE) models. The total parameter count is used to determine the number of required GPUs to load the model into memory. In contrast, the active parameter count is used to estimate the energy consumption of a single GPU. In practice, SMoE models exhibit lower energy consumption per GPU compared to dense models of equivalent size (in terms of total parameters).
For a dense model: \\(P_{active} = P_{total}\\) For a SMoE model: \\(P_{active} = P_{total} / \\text{number of active experts}\\) On the LLM Perf Leaderboard dataset filtering We have filtered the dataset to keep relevant data points for the analysis. In particular we have applied the following conditions:
Model number of parameters >= 7B Keep dtype set to float16 GPU model is \"NVIDIA A100-SXM4-80GB\" No optimization 8bit and 4bit quantization excluding bitsandbytes (bnb) Figure: Energy consumption (in Wh) per output token vs. number of active parameters (in billions) \\[ \\frac{E_{GPU}}{\\#T_{out}} = \\alpha * P_{active} + \\beta \\] We found that \\(\\alpha = 8.91e-5\\) and \\(\\beta = 1.43e-3\\). Using these values, we can estimate the energy consumption of a simple GPU for the entire request, given the number of output tokens \\(\\#T_{out}\\) and the number of active parameters \\(P_{active}\\):
\\[ E_{GPU}(\\#T_{out}, P_{active}) = \\#T_{out} * (\\alpha * P_{active} + \\beta) \\] If the model requires multiple GPUs to be loaded into VRAM, the energy consumption \\(E_{GPU}\\) should be multiplied by the number of GPUs \\(\\#GPU_{required}\\) (see below).
"},{"location":"methodology/llm_inference/#modeling-server-energy-consumption","title":"Modeling server energy consumption","text":"To estimate the energy consumption of the entire server, we will use the previously estimated GPU energy model and separately estimate the energy consumption of the server itself (without GPUs), denoted as \\(E_{server\\backslash GPU}\\).
"},{"location":"methodology/llm_inference/#server-energy-consumption-without-gpus","title":"Server energy consumption without GPUs","text":"To model the energy consumption of the server without GPUs, we consider a fixed power consumption, \\(W_{server\\backslash GPU}\\), during inference (or generation latency), denoted as \\(\\Delta T\\). We assume that the server hosts multiple GPUs, but not all of them are actively used for the target inference. Therefore, we account for a portion of the energy consumption based on the number of required GPUs, \\(\\#GPU_{required}\\):
\\[ E_{server\\backslash GPU}(\\Delta T) = \\Delta T * W_{server\\backslash GPU} * \\frac{\\#GPU_{required}}{\\#GPU_{installed}} \\] For a typical high-end GPU-accelerated cloud instance, we use \\(W_{server\\backslash GPU} = 1\\ kW\\) and \\(\\#GPU_{installed} = 8\\).
"},{"location":"methodology/llm_inference/#estimating-the-generation-latency","title":"Estimating the generation latency","text":"The generation latency, \\(\\Delta T\\), is the duration of the inference measured on the server and is independent of networking latency. We estimate the generation latency using the LLM Perf Leaderboard dataset with the previously mentioned filters applied.
We fit a linear regression model on the dataset modeling the generation latency per output token given the number of active parameters of the LLM \\(P_{active}\\):
Figure: Latency (in s) per output token vs. number of active parameters (in billions) \\[ \\frac{\\Delta T}{\\#T_{out}} = A * P_{active} + B \\] We found \\(A = 8.02e-4\\) and \\(B = 2.23e-2\\). Using these values, we can estimate the generation latency for the entire request given the number of output tokens, \\(\\#T_{out}\\), and the number of active parameters, \\(P_{active}\\). When possible, we also measure the request latency, \\(\\Delta T_{request}\\), and use it as the maximum bound for the generation latency:
\\[ \\Delta T(\\#T_{out}, P_{active}) = \\#T_{out} * (A * P_{active} + B) \\] With the request latency, the generation latency is defined as follows:
\\[ \\Delta T(\\#T_{out}, P_{active}, \\Delta T_{request}) = \\min[\\#T_{out} * (A * P_{active} + B), \\Delta T_{request}] \\]"},{"location":"methodology/llm_inference/#estimating-the-number-of-active-gpus","title":"Estimating the number of active GPUs","text":"To estimate the number of required GPUs, \\(\\#GPU_{required}\\), to load the model in virtual memory, we divide the required memory to host the LLM for inference, \\(M_{model}\\), by the memory available on one GPU, \\(M_{GPU}\\).
The required memory to host the LLM for inference is estimated based on the total number of parameters and the number of bits used for model weights related to quantization. We also apply a memory overhead of \\(1.2\\) (see Transformers Math 101 ):
\\[ M_{model}(P_{total},Q)=\\frac{P_{total}*Q}{8}*1.2 \\] We then estimate the number of required GPUs, rounded up:
\\[ \\#GPU_{required}(P_{total},Q,M_{GPU}) = \\lceil \\frac{M_{model}(P_{total},Q)}{M_{GPU}}\\rceil \\] To stay consistent with previous assumptions based on LLM Perf Leaderboard data, we use \\(M_{GPU} = 80\\ GB\\) for an NVIDIA A100 80GB GPU.
"},{"location":"methodology/llm_inference/#complete-server-energy-consumption","title":"Complete server energy consumption","text":"The total server energy consumption for the request, \\(E_{server}\\), is calculated as follows:
\\[ E_{server} = E_{server\\backslash GPU} + \\#GPU_{required} * E_{GPU} \\]"},{"location":"methodology/llm_inference/#modeling-request-energy-consumption","title":"Modeling request energy consumption","text":"To estimate the energy consumption of the request, we multiply the previously computed server energy by the Power Usage Effectiveness (PUE) to account for cooling equipment in the data center:
\\[ E_{request} = PUE * E_{server} \\] We typically use a \\(PUE = 1.2\\) for hyperscaler data centers or supercomputers.
"},{"location":"methodology/llm_inference/#modeling-request-usage-environmental-impacts","title":"Modeling request usage environmental impacts","text":"To assess the environmental impacts of the request for the usage phase, we multiply the estimated electricity consumption by the impact factor of the electricity mix, \\(F_{em}\\), specific to the target country and time. We currently use a worldwide average multicriteria impact factor from the ADEME Base Empreinte\u00ae:
\\[ I^u_{request} = E_{request} * F_{em} \\] Some values of \\(F_{em}\\) per geographical area Area or country GWP (\\(gCO2eq / kWh\\)) ADPe (\\(kgSbeq / kWh\\)) PE (\\(MJ / kWh\\)) \ud83c\udf10 Worldwide \\(590.4\\) \\(7.378 * 10^{-8}\\) \\(9.99\\) \ud83c\uddea\ud83c\uddfa Europe (EEA) \\(509.4\\) \\(6.423 * 10^{-8}\\) \\(12.9\\) \ud83c\uddfa\ud83c\uddf8 USA \\(679.8\\) \\(9.855 * 10^{-8}\\) \\(11.4\\) \ud83c\udde8\ud83c\uddf3 China \\(1,057\\) \\(8.515 * 10^{-8}\\) \\(14.1\\) \ud83c\uddeb\ud83c\uddf7 France \\(81.3\\) \\(4.858 * 10^{-8}\\) \\(11.3\\)"},{"location":"methodology/llm_inference/#embodied-impacts","title":"Embodied impacts","text":"To determine the embodied impacts of an LLM inference, we need to estimate the hardware configuration used to host the model and its lifetime. Embodied impacts account for resource extraction (e.g., minerals and metals), manufacturing, and transportation of the hardware.
"},{"location":"methodology/llm_inference/#modeling-server-embodied-impacts","title":"Modeling server embodied impacts","text":"To estimate the embodied impacts of IT hardware, we use the BoaviztAPI tool from the non-profit organization Boavizta. This API embeds a bottom-up multicriteria environment impact estimation engine for embodied and usage phases of IT resources and services. We focus on estimating the embodied impacts of a server and a GPU. BoaviztAPI is an open-source project that relies on open databases and open research on environmental impacts of IT equipment.
"},{"location":"methodology/llm_inference/#server-embodied-impacts-without-gpu","title":"Server embodied impacts without GPU","text":"To assess the embodied environmental impacts of a high-end AI server, we use an AWS cloud instance as a reference. We selected the p4de.24xlarge
instance, as it corresponds to a server that can be used for LLM inference with eight NVIDIA A100 80GB GPU cards. The embodied impacts of this instance will be used to estimate the embodied impacts of the server without GPUs, denoted as \\(I^e_{server\\backslash GPU}\\).
The embodied environmental impacts of the cloud instance are:
Server (without GPU) GWP (\\(kgCO2eq\\)) \\(3000\\) ADPe (\\(kgSbeq\\)) \\(0.25\\) PE (\\(MJ\\)) \\(39,000\\) These impacts does not take into account the eight GPUs. (see bellow)
Example request to reproduce this calculation On the cloud instance route (/v1/cloud/instance) you can POST the following JSON.
{\n \"provider\": \"aws\",\n \"instance_type\": \"p4de.24xlarge\"\n}\n
Or you can use the demo available demo API with this command using curl
and parsing the JSON output with jq
.
curl -X 'POST' \\\n 'https://api.boavizta.org/v1/cloud/instance?verbose=true&criteria=gwp&criteria=adp&criteria=pe' \\\n -H 'accept: application/json' \\\n -H 'Content-Type: application/json' \\\n -d '{\n \"provider\": \"aws\",\n \"instance_type\": \"p4de.24xlarge\"\n}' | jq\n
"},{"location":"methodology/llm_inference/#gpu-embodied-impacts","title":"GPU embodied impacts","text":"Boavizta is currently developing a methodology to provide multicriteria embodied impacts for GPU cards. For this analysis, we use the embodied impact data they computed for a NVIDIA A100 80GB GPU. These values will be used to estimate the embodied impacts of a single GPU, denoted as \\(I^e_{GPU}\\).
NIDIA A100 80GB GWP (\\(kgCO2eq\\)) \\(143\\) ADPe (\\(kgSbeq\\)) \\(5.09 * 10^{-3}\\) PE (\\(MJ\\)) \\(1,828\\) The GPU embodied impacts will be soon available in the BoaviztAPI tool.
"},{"location":"methodology/llm_inference/#complete-server-embodied-impacts","title":"Complete server embodied impacts","text":"The final embodied impacts for the server, including the GPUs, are calculated as follows. Note that the embodied impacts of the server without GPUs are scaled by the number of GPUs required to host the model. This allocation is made to account for the fact that the remaining GPUs on the server can be used to host other models or multiple instances of the same model. As we are estimating the impacts of a single LLM inference, we need to exclude the embodied impacts that would be attributed to other services hosted on the same server.
\\[ I^e_{server}=\\frac{\\#GPU_{required}}{\\#GPU_{installed}}*I^e_{server\\backslash GPU} + \\#GPU_{required} * I^e_{GPU} \\]"},{"location":"methodology/llm_inference/#modeling-request-embodied-environmental-impacts","title":"Modeling request embodied environmental impacts","text":"To allocate the server embodied impacts to the request, we use an allocation based on the hardware utilization factor, \\(\\frac{\\Delta T}{\\Delta L}\\). In this case, \\(\\Delta L\\) represents the lifetime of the server and GPU, which we fix at 5 years.
\\[ I^e_{request}=\\frac{\\Delta T}{\\Delta L} * I^e_{server} \\]"},{"location":"methodology/llm_inference/#conclusion","title":"Conclusion","text":"This paper presents a methodology to assess the environmental impacts of Large Language Model (LLM) inference, considering both usage and embodied impacts. We model server and GPU energy consumption based on various parameters and incorporate PUE and electricity mix impact factors. For embodied impacts, we use the BoaviztAPI tool to estimate environmental impacts of IT hardware. Our methodology offers a comprehensive understanding of the environmental footprint of LLM inference, guiding researchers and practitioners towards more sustainable AI practices. Future work may involve refining the methodology and exploring the impacts of multi-modal models or RAG applications.
"},{"location":"methodology/llm_inference/#references","title":"References","text":" LLM-Perf Leaderboard to estimate GPU energy consumption and latency based on the model architecture and number of output tokens. BoaviztAPI to estimate server embodied impacts and base energy consumption. ADEME Base Empreinte\u00ae for electricity mix impacts per country. "},{"location":"methodology/llm_inference/#citation","title":"Citation","text":"Please cite GenAI Impact non-profit organization and link to this documentation page.
Coming soon...\n
"},{"location":"methodology/llm_inference/#license","title":"License","text":"This work is licensed under CC BY-SA 4.0
"},{"location":"reference/SUMMARY/","title":"SUMMARY","text":" _ecologits electricity_mix_repository exceptions impacts model_repository tracers anthropic_tracer cohere_tracer google_tracer huggingface_tracer litellm_tracer mistralai_tracer openai_tracer utils "},{"location":"reference/_ecologits/","title":"_ecologits","text":""},{"location":"reference/_ecologits/#_ecologits.EcoLogits","title":"EcoLogits
","text":"EcoLogits instrumentor to initialize function patching for each provider.
By default, the initialization will be done on all available and compatible providers that are supported by the library.
Examples:
EcoLogits initialization example with OpenAI.
from ecologits import EcoLogits\nfrom openai import OpenAI\n\nEcoLogits.init()\n\nclient = OpenAI(api_key=\"<OPENAI_API_KEY>\")\nresponse = client.chat.completions.create(\n model=\"gpt-3.5-turbo\",\n messages=[\n {\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}\n ]\n)\n\n# Get estimated environmental impacts of the inference\nprint(f\"Energy consumption: {response.impacts.energy.value} kWh\")\nprint(f\"GHG emissions: {response.impacts.gwp.value} kgCO2eq\")\n
"},{"location":"reference/_ecologits/#_ecologits.EcoLogits.init","title":"init(providers=None)
staticmethod
","text":"Initialization static method. Will attempt to initialize all providers by default.
Parameters:
Name Type Description Default providers
Union[str, list[str]]
list of providers to initialize.
None
Source code in ecologits/_ecologits.py
@staticmethod\ndef init(providers: Union[str, list[str]] = None) -> None:\n \"\"\"\n Initialization static method. Will attempt to initialize all providers by default.\n\n Args:\n providers: list of providers to initialize.\n \"\"\"\n if providers is None:\n providers = list(_INSTRUMENTS.keys())\n if isinstance(providers, str):\n providers = [providers]\n if not EcoLogits.initialized:\n init_instruments(providers)\n EcoLogits.initialized = True\n
"},{"location":"reference/electricity_mix_repository/","title":"electricity_mix_repository","text":""},{"location":"reference/exceptions/","title":"exceptions","text":""},{"location":"reference/exceptions/#exceptions.TracerInitializationError","title":"TracerInitializationError
","text":" Bases: EcoLogitsError
Tracer is initialized twice
"},{"location":"reference/exceptions/#exceptions.ModelingError","title":"ModelingError
","text":" Bases: EcoLogitsError
Operation or computation not allowed
"},{"location":"reference/model_repository/","title":"model_repository","text":""},{"location":"reference/impacts/dag/","title":"dag","text":""},{"location":"reference/impacts/llm/","title":"llm","text":""},{"location":"reference/impacts/llm/#impacts.llm.gpu_energy","title":"gpu_energy(model_active_parameter_count, output_token_count, gpu_energy_alpha, gpu_energy_beta)
","text":"Compute energy consumption of a single GPU.
Parameters:
Name Type Description Default model_active_parameter_count
float
Number of active parameters of the model.
required output_token_count
float
Number of generated tokens.
required gpu_energy_alpha
float
Alpha parameter of the GPU linear power consumption profile.
required gpu_energy_beta
float
Beta parameter of the GPU linear power consumption profile.
required Returns:
Type Description float
The energy consumption of a single GPU in kWh.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef gpu_energy(\n model_active_parameter_count: float,\n output_token_count: float,\n gpu_energy_alpha: float,\n gpu_energy_beta: float\n) -> float:\n \"\"\"\n Compute energy consumption of a single GPU.\n\n Args:\n model_active_parameter_count: Number of active parameters of the model.\n output_token_count: Number of generated tokens.\n gpu_energy_alpha: Alpha parameter of the GPU linear power consumption profile.\n gpu_energy_beta: Beta parameter of the GPU linear power consumption profile.\n\n Returns:\n The energy consumption of a single GPU in kWh.\n \"\"\"\n return output_token_count * (gpu_energy_alpha * model_active_parameter_count + gpu_energy_beta)\n
"},{"location":"reference/impacts/llm/#impacts.llm.generation_latency","title":"generation_latency(model_active_parameter_count, output_token_count, gpu_latency_alpha, gpu_latency_beta, request_latency)
","text":"Compute the token generation latency in seconds.
Parameters:
Name Type Description Default model_active_parameter_count
float
Number of active parameters of the model.
required output_token_count
float
Number of generated tokens.
required gpu_latency_alpha
float
Alpha parameter of the GPU linear latency profile.
required gpu_latency_beta
float
Beta parameter of the GPU linear latency profile.
required request_latency
float
Measured request latency (upper bound) in seconds.
required Returns:
Type Description float
The token generation latency in seconds.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef generation_latency(\n model_active_parameter_count: float,\n output_token_count: float,\n gpu_latency_alpha: float,\n gpu_latency_beta: float,\n request_latency: float,\n) -> float:\n \"\"\"\n Compute the token generation latency in seconds.\n\n Args:\n model_active_parameter_count: Number of active parameters of the model.\n output_token_count: Number of generated tokens.\n gpu_latency_alpha: Alpha parameter of the GPU linear latency profile.\n gpu_latency_beta: Beta parameter of the GPU linear latency profile.\n request_latency: Measured request latency (upper bound) in seconds.\n\n Returns:\n The token generation latency in seconds.\n \"\"\"\n gpu_latency = output_token_count * (gpu_latency_alpha * model_active_parameter_count + gpu_latency_beta)\n return min(gpu_latency, request_latency)\n
"},{"location":"reference/impacts/llm/#impacts.llm.model_required_memory","title":"model_required_memory(model_total_parameter_count, model_quantization_bits)
","text":"Compute the required memory to load the model on GPU.
Parameters:
Name Type Description Default model_total_parameter_count
float
Number of parameters of the model.
required model_quantization_bits
int
Number of bits used to represent the model weights.
required Returns:
Type Description float
The amount of required GPU memory to load the model.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef model_required_memory(\n model_total_parameter_count: float,\n model_quantization_bits: int,\n) -> float:\n \"\"\"\n Compute the required memory to load the model on GPU.\n\n Args:\n model_total_parameter_count: Number of parameters of the model.\n model_quantization_bits: Number of bits used to represent the model weights.\n\n Returns:\n The amount of required GPU memory to load the model.\n \"\"\"\n return 1.2 * model_total_parameter_count * model_quantization_bits / 8\n
"},{"location":"reference/impacts/llm/#impacts.llm.gpu_required_count","title":"gpu_required_count(model_required_memory, gpu_memory)
","text":"Compute the number of required GPU to store the model.
Parameters:
Name Type Description Default model_required_memory
float
Required memory to load the model on GPU.
required gpu_memory
float
Amount of memory available on a single GPU.
required Returns:
Type Description int
The number of required GPUs to load the model.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef gpu_required_count(\n model_required_memory: float,\n gpu_memory: float\n) -> int:\n \"\"\"\n Compute the number of required GPU to store the model.\n\n Args:\n model_required_memory: Required memory to load the model on GPU.\n gpu_memory: Amount of memory available on a single GPU.\n\n Returns:\n The number of required GPUs to load the model.\n \"\"\"\n return ceil(model_required_memory / gpu_memory)\n
"},{"location":"reference/impacts/llm/#impacts.llm.server_energy","title":"server_energy(generation_latency, server_power, server_gpu_count, gpu_required_count)
","text":"Compute the energy consumption of the server.
Parameters:
Name Type Description Default generation_latency
float
Token generation latency in seconds.
required server_power
float
Power consumption of the server in kW.
required server_gpu_count
int
Number of available GPUs in the server.
required gpu_required_count
int
Number of required GPUs to load the model.
required Returns:
Type Description float
The energy consumption of the server (GPUs are not included) in kWh.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef server_energy(\n generation_latency: float,\n server_power: float,\n server_gpu_count: int,\n gpu_required_count: int\n) -> float:\n \"\"\"\n Compute the energy consumption of the server.\n\n Args:\n generation_latency: Token generation latency in seconds.\n server_power: Power consumption of the server in kW.\n server_gpu_count: Number of available GPUs in the server.\n gpu_required_count: Number of required GPUs to load the model.\n\n Returns:\n The energy consumption of the server (GPUs are not included) in kWh.\n \"\"\"\n return (generation_latency / 3600) * server_power * (gpu_required_count / server_gpu_count)\n
"},{"location":"reference/impacts/llm/#impacts.llm.request_energy","title":"request_energy(datacenter_pue, server_energy, gpu_required_count, gpu_energy)
","text":"Compute the energy consumption of the request.
Parameters:
Name Type Description Default datacenter_pue
float
PUE of the datacenter.
required server_energy
float
Energy consumption of the server in kWh.
required gpu_required_count
int
Number of required GPUs to load the model.
required gpu_energy
float
Energy consumption of a single GPU in kWh.
required Returns:
Type Description float
The energy consumption of the request in kWh.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef request_energy(\n datacenter_pue: float,\n server_energy: float,\n gpu_required_count: int,\n gpu_energy: float\n) -> float:\n \"\"\"\n Compute the energy consumption of the request.\n\n Args:\n datacenter_pue: PUE of the datacenter.\n server_energy: Energy consumption of the server in kWh.\n gpu_required_count: Number of required GPUs to load the model.\n gpu_energy: Energy consumption of a single GPU in kWh.\n\n Returns:\n The energy consumption of the request in kWh.\n \"\"\"\n return datacenter_pue * (server_energy + gpu_required_count * gpu_energy)\n
"},{"location":"reference/impacts/llm/#impacts.llm.request_usage_gwp","title":"request_usage_gwp(request_energy, if_electricity_mix_gwp)
","text":"Compute the Global Warming Potential (GWP) usage impact of the request.
Parameters:
Name Type Description Default request_energy
float
Energy consumption of the request in kWh.
required if_electricity_mix_gwp
float
GWP impact factor of electricity consumption in kgCO2eq / kWh.
required Returns:
Type Description float
The GWP usage impact of the request in kgCO2eq.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef request_usage_gwp(\n request_energy: float,\n if_electricity_mix_gwp: float\n) -> float:\n \"\"\"\n Compute the Global Warming Potential (GWP) usage impact of the request.\n\n Args:\n request_energy: Energy consumption of the request in kWh.\n if_electricity_mix_gwp: GWP impact factor of electricity consumption in kgCO2eq / kWh.\n\n Returns:\n The GWP usage impact of the request in kgCO2eq.\n \"\"\"\n return request_energy * if_electricity_mix_gwp\n
"},{"location":"reference/impacts/llm/#impacts.llm.request_usage_adpe","title":"request_usage_adpe(request_energy, if_electricity_mix_adpe)
","text":"Compute the Abiotic Depletion Potential for Elements (ADPe) usage impact of the request.
Parameters:
Name Type Description Default request_energy
float
Energy consumption of the request in kWh.
required if_electricity_mix_adpe
float
ADPe impact factor of electricity consumption in kgSbeq / kWh.
required Returns:
Type Description float
The ADPe usage impact of the request in kgSbeq.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef request_usage_adpe(\n request_energy: float,\n if_electricity_mix_adpe: float\n) -> float:\n \"\"\"\n Compute the Abiotic Depletion Potential for Elements (ADPe) usage impact of the request.\n\n Args:\n request_energy: Energy consumption of the request in kWh.\n if_electricity_mix_adpe: ADPe impact factor of electricity consumption in kgSbeq / kWh.\n\n Returns:\n The ADPe usage impact of the request in kgSbeq.\n \"\"\"\n return request_energy * if_electricity_mix_adpe\n
"},{"location":"reference/impacts/llm/#impacts.llm.request_usage_pe","title":"request_usage_pe(request_energy, if_electricity_mix_pe)
","text":"Compute the Primary Energy (PE) usage impact of the request.
Parameters:
Name Type Description Default request_energy
float
Energy consumption of the request in kWh.
required if_electricity_mix_pe
float
PE impact factor of electricity consumption in MJ / kWh.
required Returns:
Type Description float
The PE usage impact of the request in MJ.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef request_usage_pe(\n request_energy: float,\n if_electricity_mix_pe: float\n) -> float:\n \"\"\"\n Compute the Primary Energy (PE) usage impact of the request.\n\n Args:\n request_energy: Energy consumption of the request in kWh.\n if_electricity_mix_pe: PE impact factor of electricity consumption in MJ / kWh.\n\n Returns:\n The PE usage impact of the request in MJ.\n \"\"\"\n return request_energy * if_electricity_mix_pe\n
"},{"location":"reference/impacts/llm/#impacts.llm.server_gpu_embodied_gwp","title":"server_gpu_embodied_gwp(server_embodied_gwp, server_gpu_count, gpu_embodied_gwp, gpu_required_count)
","text":"Compute the Global Warming Potential (GWP) embodied impact of the server
Parameters:
Name Type Description Default server_embodied_gwp
float
GWP embodied impact of the server in kgCO2eq.
required server_gpu_count
float
Number of available GPUs in the server.
required gpu_embodied_gwp
float
GWP embodied impact of a single GPU in kgCO2eq.
required gpu_required_count
int
Number of required GPUs to load the model.
required Returns:
Type Description float
The GWP embodied impact of the server and the GPUs in kgCO2eq.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef server_gpu_embodied_gwp(\n server_embodied_gwp: float,\n server_gpu_count: float,\n gpu_embodied_gwp: float,\n gpu_required_count: int\n) -> float:\n \"\"\"\n Compute the Global Warming Potential (GWP) embodied impact of the server\n\n Args:\n server_embodied_gwp: GWP embodied impact of the server in kgCO2eq.\n server_gpu_count: Number of available GPUs in the server.\n gpu_embodied_gwp: GWP embodied impact of a single GPU in kgCO2eq.\n gpu_required_count: Number of required GPUs to load the model.\n\n Returns:\n The GWP embodied impact of the server and the GPUs in kgCO2eq.\n \"\"\"\n return (gpu_required_count / server_gpu_count) * server_embodied_gwp + gpu_required_count * gpu_embodied_gwp\n
"},{"location":"reference/impacts/llm/#impacts.llm.server_gpu_embodied_adpe","title":"server_gpu_embodied_adpe(server_embodied_adpe, server_gpu_count, gpu_embodied_adpe, gpu_required_count)
","text":"Compute the Abiotic Depletion Potential for Elements (ADPe) embodied impact of the server
Parameters:
Name Type Description Default server_embodied_adpe
float
ADPe embodied impact of the server in kgSbeq.
required server_gpu_count
float
Number of available GPUs in the server.
required gpu_embodied_adpe
float
ADPe embodied impact of a single GPU in kgSbeq.
required gpu_required_count
int
Number of required GPUs to load the model.
required Returns:
Type Description float
The ADPe embodied impact of the server and the GPUs in kgSbeq.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef server_gpu_embodied_adpe(\n server_embodied_adpe: float,\n server_gpu_count: float,\n gpu_embodied_adpe: float,\n gpu_required_count: int\n) -> float:\n \"\"\"\n Compute the Abiotic Depletion Potential for Elements (ADPe) embodied impact of the server\n\n Args:\n server_embodied_adpe: ADPe embodied impact of the server in kgSbeq.\n server_gpu_count: Number of available GPUs in the server.\n gpu_embodied_adpe: ADPe embodied impact of a single GPU in kgSbeq.\n gpu_required_count: Number of required GPUs to load the model.\n\n Returns:\n The ADPe embodied impact of the server and the GPUs in kgSbeq.\n \"\"\"\n return (gpu_required_count / server_gpu_count) * server_embodied_adpe + gpu_required_count * gpu_embodied_adpe\n
"},{"location":"reference/impacts/llm/#impacts.llm.server_gpu_embodied_pe","title":"server_gpu_embodied_pe(server_embodied_pe, server_gpu_count, gpu_embodied_pe, gpu_required_count)
","text":"Compute the Primary Energy (PE) embodied impact of the server
Parameters:
Name Type Description Default server_embodied_pe
float
PE embodied impact of the server in MJ.
required server_gpu_count
float
Number of available GPUs in the server.
required gpu_embodied_pe
float
PE embodied impact of a single GPU in MJ.
required gpu_required_count
int
Number of required GPUs to load the model.
required Returns:
Type Description float
The PE embodied impact of the server and the GPUs in MJ.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef server_gpu_embodied_pe(\n server_embodied_pe: float,\n server_gpu_count: float,\n gpu_embodied_pe: float,\n gpu_required_count: int\n) -> float:\n \"\"\"\n Compute the Primary Energy (PE) embodied impact of the server\n\n Args:\n server_embodied_pe: PE embodied impact of the server in MJ.\n server_gpu_count: Number of available GPUs in the server.\n gpu_embodied_pe: PE embodied impact of a single GPU in MJ.\n gpu_required_count: Number of required GPUs to load the model.\n\n Returns:\n The PE embodied impact of the server and the GPUs in MJ.\n \"\"\"\n return (gpu_required_count / server_gpu_count) * server_embodied_pe + gpu_required_count * gpu_embodied_pe\n
"},{"location":"reference/impacts/llm/#impacts.llm.request_embodied_gwp","title":"request_embodied_gwp(server_gpu_embodied_gwp, server_lifetime, generation_latency)
","text":"Compute the Global Warming Potential (GWP) embodied impact of the request.
Parameters:
Name Type Description Default server_gpu_embodied_gwp
float
GWP embodied impact of the server and the GPUs in kgCO2eq.
required server_lifetime
float
Lifetime duration of the server in seconds.
required generation_latency
float
Token generation latency in seconds.
required Returns:
Type Description float
The GWP embodied impact of the request in kgCO2eq.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef request_embodied_gwp(\n server_gpu_embodied_gwp: float,\n server_lifetime: float,\n generation_latency: float\n) -> float:\n \"\"\"\n Compute the Global Warming Potential (GWP) embodied impact of the request.\n\n Args:\n server_gpu_embodied_gwp: GWP embodied impact of the server and the GPUs in kgCO2eq.\n server_lifetime: Lifetime duration of the server in seconds.\n generation_latency: Token generation latency in seconds.\n\n Returns:\n The GWP embodied impact of the request in kgCO2eq.\n \"\"\"\n return (generation_latency / server_lifetime) * server_gpu_embodied_gwp\n
"},{"location":"reference/impacts/llm/#impacts.llm.request_embodied_adpe","title":"request_embodied_adpe(server_gpu_embodied_adpe, server_lifetime, generation_latency)
","text":"Compute the Abiotic Depletion Potential for Elements (ADPe) embodied impact of the request.
Parameters:
Name Type Description Default server_gpu_embodied_adpe
float
ADPe embodied impact of the server and the GPUs in kgSbeq.
required server_lifetime
float
Lifetime duration of the server in seconds.
required generation_latency
float
Token generation latency in seconds.
required Returns:
Type Description float
The ADPe embodied impact of the request in kgSbeq.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef request_embodied_adpe(\n server_gpu_embodied_adpe: float,\n server_lifetime: float,\n generation_latency: float\n) -> float:\n \"\"\"\n Compute the Abiotic Depletion Potential for Elements (ADPe) embodied impact of the request.\n\n Args:\n server_gpu_embodied_adpe: ADPe embodied impact of the server and the GPUs in kgSbeq.\n server_lifetime: Lifetime duration of the server in seconds.\n generation_latency: Token generation latency in seconds.\n\n Returns:\n The ADPe embodied impact of the request in kgSbeq.\n \"\"\"\n return (generation_latency / server_lifetime) * server_gpu_embodied_adpe\n
"},{"location":"reference/impacts/llm/#impacts.llm.request_embodied_pe","title":"request_embodied_pe(server_gpu_embodied_pe, server_lifetime, generation_latency)
","text":"Compute the Primary Energy (PE) embodied impact of the request.
Parameters:
Name Type Description Default server_gpu_embodied_pe
float
PE embodied impact of the server and the GPUs in MJ.
required server_lifetime
float
Lifetime duration of the server in seconds.
required generation_latency
float
Token generation latency in seconds.
required Returns:
Type Description float
The PE embodied impact of the request in MJ.
Source code in ecologits/impacts/llm.py
@dag.asset\ndef request_embodied_pe(\n server_gpu_embodied_pe: float,\n server_lifetime: float,\n generation_latency: float\n) -> float:\n \"\"\"\n Compute the Primary Energy (PE) embodied impact of the request.\n\n Args:\n server_gpu_embodied_pe: PE embodied impact of the server and the GPUs in MJ.\n server_lifetime: Lifetime duration of the server in seconds.\n generation_latency: Token generation latency in seconds.\n\n Returns:\n The PE embodied impact of the request in MJ.\n \"\"\"\n return (generation_latency / server_lifetime) * server_gpu_embodied_pe\n
"},{"location":"reference/impacts/llm/#impacts.llm.compute_llm_impacts_dag","title":"compute_llm_impacts_dag(model_active_parameter_count, model_total_parameter_count, output_token_count, request_latency, if_electricity_mix_adpe, if_electricity_mix_pe, if_electricity_mix_gwp, model_quantization_bits=MODEL_QUANTIZATION_BITS, gpu_energy_alpha=GPU_ENERGY_ALPHA, gpu_energy_beta=GPU_ENERGY_BETA, gpu_latency_alpha=GPU_LATENCY_ALPHA, gpu_latency_beta=GPU_LATENCY_BETA, gpu_memory=GPU_MEMORY, gpu_embodied_gwp=GPU_EMBODIED_IMPACT_GWP, gpu_embodied_adpe=GPU_EMBODIED_IMPACT_ADPE, gpu_embodied_pe=GPU_EMBODIED_IMPACT_PE, server_gpu_count=SERVER_GPUS, server_power=SERVER_POWER, server_embodied_gwp=SERVER_EMBODIED_IMPACT_GWP, server_embodied_adpe=SERVER_EMBODIED_IMPACT_ADPE, server_embodied_pe=SERVER_EMBODIED_IMPACT_PE, server_lifetime=HARDWARE_LIFESPAN, datacenter_pue=DATACENTER_PUE)
","text":"Compute the impacts dag of an LLM generation request.
Parameters:
Name Type Description Default model_active_parameter_count
float
Number of active parameters of the model.
required model_total_parameter_count
float
Number of parameters of the model.
required output_token_count
float
Number of generated tokens.
required request_latency
float
Measured request latency in seconds.
required if_electricity_mix_adpe
float
ADPe impact factor of electricity consumption of kgSbeq / kWh (Antimony).
required if_electricity_mix_pe
float
PE impact factor of electricity consumption in MJ / kWh.
required if_electricity_mix_gwp
float
GWP impact factor of electricity consumption in kgCO2eq / kWh.
required model_quantization_bits
Optional[int]
Number of bits used to represent the model weights.
MODEL_QUANTIZATION_BITS
gpu_energy_alpha
Optional[float]
Alpha parameter of the GPU linear power consumption profile.
GPU_ENERGY_ALPHA
gpu_energy_beta
Optional[float]
Beta parameter of the GPU linear power consumption profile.
GPU_ENERGY_BETA
gpu_latency_alpha
Optional[float]
Alpha parameter of the GPU linear latency profile.
GPU_LATENCY_ALPHA
gpu_latency_beta
Optional[float]
Beta parameter of the GPU linear latency profile.
GPU_LATENCY_BETA
gpu_memory
Optional[float]
Amount of memory available on a single GPU.
GPU_MEMORY
gpu_embodied_gwp
Optional[float]
GWP embodied impact of a single GPU.
GPU_EMBODIED_IMPACT_GWP
gpu_embodied_adpe
Optional[float]
ADPe embodied impact of a single GPU.
GPU_EMBODIED_IMPACT_ADPE
gpu_embodied_pe
Optional[float]
PE embodied impact of a single GPU.
GPU_EMBODIED_IMPACT_PE
server_gpu_count
Optional[int]
Number of available GPUs in the server.
SERVER_GPUS
server_power
Optional[float]
Power consumption of the server in kW.
SERVER_POWER
server_embodied_gwp
Optional[float]
GWP embodied impact of the server in kgCO2eq.
SERVER_EMBODIED_IMPACT_GWP
server_embodied_adpe
Optional[float]
ADPe embodied impact of the server in kgSbeq.
SERVER_EMBODIED_IMPACT_ADPE
server_embodied_pe
Optional[float]
PE embodied impact of the server in MJ.
SERVER_EMBODIED_IMPACT_PE
server_lifetime
Optional[float]
Lifetime duration of the server in seconds.
HARDWARE_LIFESPAN
datacenter_pue
Optional[float]
PUE of the datacenter.
DATACENTER_PUE
Returns:
Type Description dict[str, float]
The impacts dag with all intermediate states.
Source code in ecologits/impacts/llm.py
def compute_llm_impacts_dag(\n model_active_parameter_count: float,\n model_total_parameter_count: float,\n output_token_count: float,\n request_latency: float,\n if_electricity_mix_adpe: float,\n if_electricity_mix_pe: float,\n if_electricity_mix_gwp: float,\n model_quantization_bits: Optional[int] = MODEL_QUANTIZATION_BITS,\n gpu_energy_alpha: Optional[float] = GPU_ENERGY_ALPHA,\n gpu_energy_beta: Optional[float] = GPU_ENERGY_BETA,\n gpu_latency_alpha: Optional[float] = GPU_LATENCY_ALPHA,\n gpu_latency_beta: Optional[float] = GPU_LATENCY_BETA,\n gpu_memory: Optional[float] = GPU_MEMORY,\n gpu_embodied_gwp: Optional[float] = GPU_EMBODIED_IMPACT_GWP,\n gpu_embodied_adpe: Optional[float] = GPU_EMBODIED_IMPACT_ADPE,\n gpu_embodied_pe: Optional[float] = GPU_EMBODIED_IMPACT_PE,\n server_gpu_count: Optional[int] = SERVER_GPUS,\n server_power: Optional[float] = SERVER_POWER,\n server_embodied_gwp: Optional[float] = SERVER_EMBODIED_IMPACT_GWP,\n server_embodied_adpe: Optional[float] = SERVER_EMBODIED_IMPACT_ADPE,\n server_embodied_pe: Optional[float] = SERVER_EMBODIED_IMPACT_PE,\n server_lifetime: Optional[float] = HARDWARE_LIFESPAN,\n datacenter_pue: Optional[float] = DATACENTER_PUE,\n) -> dict[str, float]:\n \"\"\"\n Compute the impacts dag of an LLM generation request.\n\n Args:\n model_active_parameter_count: Number of active parameters of the model.\n model_total_parameter_count: Number of parameters of the model.\n output_token_count: Number of generated tokens.\n request_latency: Measured request latency in seconds.\n if_electricity_mix_adpe: ADPe impact factor of electricity consumption of kgSbeq / kWh (Antimony).\n if_electricity_mix_pe: PE impact factor of electricity consumption in MJ / kWh.\n if_electricity_mix_gwp: GWP impact factor of electricity consumption in kgCO2eq / kWh.\n model_quantization_bits: Number of bits used to represent the model weights.\n gpu_energy_alpha: Alpha parameter of the GPU linear power consumption profile.\n gpu_energy_beta: Beta parameter of the GPU linear power consumption profile.\n gpu_latency_alpha: Alpha parameter of the GPU linear latency profile.\n gpu_latency_beta: Beta parameter of the GPU linear latency profile.\n gpu_memory: Amount of memory available on a single GPU.\n gpu_embodied_gwp: GWP embodied impact of a single GPU.\n gpu_embodied_adpe: ADPe embodied impact of a single GPU.\n gpu_embodied_pe: PE embodied impact of a single GPU.\n server_gpu_count: Number of available GPUs in the server.\n server_power: Power consumption of the server in kW.\n server_embodied_gwp: GWP embodied impact of the server in kgCO2eq.\n server_embodied_adpe: ADPe embodied impact of the server in kgSbeq.\n server_embodied_pe: PE embodied impact of the server in MJ.\n server_lifetime: Lifetime duration of the server in seconds.\n datacenter_pue: PUE of the datacenter.\n\n Returns:\n The impacts dag with all intermediate states.\n \"\"\"\n results = dag.execute(\n model_active_parameter_count=model_active_parameter_count,\n model_total_parameter_count=model_total_parameter_count,\n model_quantization_bits=model_quantization_bits,\n output_token_count=output_token_count,\n request_latency=request_latency,\n if_electricity_mix_gwp=if_electricity_mix_gwp,\n if_electricity_mix_adpe=if_electricity_mix_adpe,\n if_electricity_mix_pe=if_electricity_mix_pe,\n gpu_energy_alpha=gpu_energy_alpha,\n gpu_energy_beta=gpu_energy_beta,\n gpu_latency_alpha=gpu_latency_alpha,\n gpu_latency_beta=gpu_latency_beta,\n gpu_memory=gpu_memory,\n gpu_embodied_gwp=gpu_embodied_gwp,\n gpu_embodied_adpe=gpu_embodied_adpe,\n gpu_embodied_pe=gpu_embodied_pe,\n server_gpu_count=server_gpu_count,\n server_power=server_power,\n server_embodied_gwp=server_embodied_gwp,\n server_embodied_adpe=server_embodied_adpe,\n server_embodied_pe=server_embodied_pe,\n server_lifetime=server_lifetime,\n datacenter_pue=datacenter_pue,\n )\n return results\n
"},{"location":"reference/impacts/llm/#impacts.llm.compute_llm_impacts","title":"compute_llm_impacts(model_active_parameter_count, model_total_parameter_count, output_token_count, if_electricity_mix_adpe, if_electricity_mix_pe, if_electricity_mix_gwp, request_latency=None, **kwargs)
","text":"Compute the impacts of an LLM generation request.
Parameters:
Name Type Description Default model_active_parameter_count
ValueOrRange
Number of active parameters of the model.
required model_total_parameter_count
ValueOrRange
Number of total parameters of the model.
required output_token_count
float
Number of generated tokens.
required if_electricity_mix_adpe
float
ADPe impact factor of electricity consumption of kgSbeq / kWh (Antimony).
required if_electricity_mix_pe
float
PE impact factor of electricity consumption in MJ / kWh.
required if_electricity_mix_gwp
float
GWP impact factor of electricity consumption in kgCO2eq / kWh.
required request_latency
Optional[float]
Measured request latency in seconds.
None
**kwargs
Any
Any other optional parameter.
{}
Returns:
Type Description Impacts
The impacts of an LLM generation request.
Source code in ecologits/impacts/llm.py
def compute_llm_impacts(\n model_active_parameter_count: ValueOrRange,\n model_total_parameter_count: ValueOrRange,\n output_token_count: float,\n if_electricity_mix_adpe: float,\n if_electricity_mix_pe: float,\n if_electricity_mix_gwp: float,\n request_latency: Optional[float] = None,\n **kwargs: Any\n) -> Impacts:\n \"\"\"\n Compute the impacts of an LLM generation request.\n\n Args:\n model_active_parameter_count: Number of active parameters of the model.\n model_total_parameter_count: Number of total parameters of the model.\n output_token_count: Number of generated tokens.\n if_electricity_mix_adpe: ADPe impact factor of electricity consumption of kgSbeq / kWh (Antimony).\n if_electricity_mix_pe: PE impact factor of electricity consumption in MJ / kWh.\n if_electricity_mix_gwp: GWP impact factor of electricity consumption in kgCO2eq / kWh.\n request_latency: Measured request latency in seconds.\n **kwargs: Any other optional parameter.\n\n Returns:\n The impacts of an LLM generation request.\n \"\"\"\n if request_latency is None:\n request_latency = math.inf\n\n active_params = [model_active_parameter_count]\n total_params = [model_total_parameter_count]\n\n if isinstance(model_active_parameter_count, Range) or isinstance(model_total_parameter_count, Range):\n if isinstance(model_active_parameter_count, Range):\n active_params = [model_active_parameter_count.min, model_active_parameter_count.max]\n else:\n active_params = [model_active_parameter_count, model_active_parameter_count]\n if isinstance(model_total_parameter_count, Range):\n total_params = [model_total_parameter_count.min, model_total_parameter_count.max]\n else:\n total_params = [model_total_parameter_count, model_total_parameter_count]\n\n results = {}\n fields = [\"request_energy\", \"request_usage_gwp\", \"request_usage_adpe\", \"request_usage_pe\",\n \"request_embodied_gwp\", \"request_embodied_adpe\", \"request_embodied_pe\"]\n for act_param, tot_param in zip(active_params, total_params):\n res = compute_llm_impacts_dag(\n model_active_parameter_count=act_param,\n model_total_parameter_count=tot_param,\n output_token_count=output_token_count,\n request_latency=request_latency,\n if_electricity_mix_adpe=if_electricity_mix_adpe,\n if_electricity_mix_pe=if_electricity_mix_pe,\n if_electricity_mix_gwp=if_electricity_mix_gwp,\n **kwargs\n )\n for field in fields:\n if field in results:\n results[field] = Range(min=results[field], max=res[field])\n else:\n results[field] = res[field]\n\n energy = Energy(value=results[\"request_energy\"])\n gwp_usage = GWP(value=results[\"request_usage_gwp\"])\n adpe_usage = ADPe(value=results[\"request_usage_adpe\"])\n pe_usage = PE(value=results[\"request_usage_pe\"])\n gwp_embodied = GWP(value=results[\"request_embodied_gwp\"])\n adpe_embodied = ADPe(value=results[\"request_embodied_adpe\"])\n pe_embodied = PE(value=results[\"request_embodied_pe\"])\n return Impacts(\n energy=energy,\n gwp=gwp_usage + gwp_embodied,\n adpe=adpe_usage + adpe_embodied,\n pe=pe_usage + pe_embodied,\n usage=Usage(\n energy=energy,\n gwp=gwp_usage,\n adpe=adpe_usage,\n pe=pe_usage\n ),\n embodied=Embodied(\n gwp=gwp_embodied,\n adpe=adpe_embodied,\n pe=pe_embodied\n )\n )\n
"},{"location":"reference/impacts/modeling/","title":"modeling","text":""},{"location":"reference/impacts/modeling/#impacts.modeling.Range","title":"Range
","text":" Bases: BaseModel
RangeValue data model to represent intervals.
Attributes:
Name Type Description min
float
Lower bound of the interval.
max
float
Upper bound of the interval.
"},{"location":"reference/impacts/modeling/#impacts.modeling.Impact","title":"Impact
","text":" Bases: BaseModel
Base impact data model.
Attributes:
Name Type Description type
str
Impact type.
name
str
Impact name.
value
ValueOrRange
Impact value.
unit
str
Impact unit.
"},{"location":"reference/impacts/modeling/#impacts.modeling.Energy","title":"Energy
","text":" Bases: Impact
Energy consumption.
Info Final energy consumption \"measured from the plug\".
Attributes:
Name Type Description type
str
energy
name
str
Energy
value
str
Energy value
unit
str
Kilowatt-hour (kWh)
"},{"location":"reference/impacts/modeling/#impacts.modeling.GWP","title":"GWP
","text":" Bases: Impact
Global Warming Potential (GWP) impact.
Info Also, commonly known as GHG/carbon emissions.
Attributes:
Name Type Description type
str
GWP
name
str
Global Warming Potential
value
str
GWP value
unit
str
Kilogram Carbon Dioxide Equivalent (kgCO2eq)
"},{"location":"reference/impacts/modeling/#impacts.modeling.ADPe","title":"ADPe
","text":" Bases: Impact
Abiotic Depletion Potential for Elements (ADPe) impact.
Info Impact on the depletion of non-living resources such as minerals or metals.
Attributes:
Name Type Description type
str
ADPe
name
str
Abiotic Depletion Potential (elements)
value
str
ADPe value
unit
str
Kilogram Antimony Equivalent (kgSbeq)
"},{"location":"reference/impacts/modeling/#impacts.modeling.PE","title":"PE
","text":" Bases: Impact
Primary Energy (PE) impact.
Info Total energy consumed from primary sources.
Attributes:
Name Type Description type
str
PE
name
str
Primary Energy
value
str
PE value
unit
str
Megajoule (MJ)
"},{"location":"reference/impacts/modeling/#impacts.modeling.Phase","title":"Phase
","text":" Bases: BaseModel
Base impact phase data model.
Attributes:
Name Type Description type
str
Phase type.
name
str
Phase name.
"},{"location":"reference/impacts/modeling/#impacts.modeling.Usage","title":"Usage
","text":" Bases: Phase
Usage impacts data model.
Info Represents the phase of energy consumption during model execution.
Attributes:
Name Type Description type
str
usage
name
str
Usage
energy
Energy
Energy consumption
gwp
GWP
Global Warming Potential (GWP) usage impact
adpe
ADPe
Abiotic Depletion Potential for Elements (ADPe) usage impact
pe
PE
Primary Energy (PE) usage impact
"},{"location":"reference/impacts/modeling/#impacts.modeling.Embodied","title":"Embodied
","text":" Bases: Phase
Embodied impacts data model.
Info Encompasses resource extraction, manufacturing, and transportation phases associated with the model's lifecycle.
Attributes:
Name Type Description type
str
embodied
name
str
Embodied
gwp
GWP
Global Warming Potential (GWP) embodied impact
adpe
ADPe
Abiotic Depletion Potential for Elements (ADPe) embodied impact
pe
PE
Primary Energy (PE) embodied impact
"},{"location":"reference/impacts/modeling/#impacts.modeling.Impacts","title":"Impacts
","text":" Bases: BaseModel
Impacts data model.
Attributes:
Name Type Description energy
Energy
Total energy consumption
gwp
GWP
Total Global Warming Potential (GWP) impact
adpe
ADPe
Total Abiotic Depletion Potential for Elements (ADPe) impact
pe
PE
Total Primary Energy (PE) impact
usage
Usage
Impacts for the usage phase
embodied
Embodied
Impacts for the embodied phase
"},{"location":"reference/tracers/anthropic_tracer/","title":"anthropic_tracer","text":""},{"location":"reference/tracers/cohere_tracer/","title":"cohere_tracer","text":""},{"location":"reference/tracers/google_tracer/","title":"google_tracer","text":""},{"location":"reference/tracers/huggingface_tracer/","title":"huggingface_tracer","text":""},{"location":"reference/tracers/litellm_tracer/","title":"litellm_tracer","text":""},{"location":"reference/tracers/mistralai_tracer/","title":"mistralai_tracer","text":""},{"location":"reference/tracers/openai_tracer/","title":"openai_tracer","text":""},{"location":"reference/tracers/utils/","title":"utils","text":""},{"location":"reference/tracers/utils/#tracers.utils.llm_impacts","title":"llm_impacts(provider, model_name, output_token_count, request_latency, electricity_mix_zone='WOR')
","text":"High-level function to compute the impacts of an LLM generation request.
Parameters:
Name Type Description Default provider
str
Name of the provider.
required model_name
str
Name of the LLM used.
required output_token_count
int
Number of generated tokens.
required request_latency
float
Measured request latency in seconds.
required electricity_mix_zone
Optional[str]
ISO 3166-1 alpha-3 code of the electricity mix zone (WOR by default).
'WOR'
Returns:
Type Description Optional[Impacts]
The impacts of an LLM generation request.
Source code in ecologits/tracers/utils.py
def llm_impacts(\n provider: str,\n model_name: str,\n output_token_count: int,\n request_latency: float,\n electricity_mix_zone: Optional[str] = \"WOR\",\n) -> Optional[Impacts]:\n \"\"\"\n High-level function to compute the impacts of an LLM generation request.\n\n Args:\n provider: Name of the provider.\n model_name: Name of the LLM used.\n output_token_count: Number of generated tokens.\n request_latency: Measured request latency in seconds.\n electricity_mix_zone: ISO 3166-1 alpha-3 code of the electricity mix zone (WOR by default).\n\n Returns:\n The impacts of an LLM generation request.\n \"\"\"\n\n model = models.find_model(provider=provider, model_name=model_name)\n if model is None:\n # TODO: Replace with proper logging\n print(f\"Could not find model `{model_name}` for {provider} provider.\")\n return None\n model_active_params = model.active_parameters \\\n or Range(min=model.active_parameters_range[0], max=model.active_parameters_range[1])\n model_total_params = model.total_parameters \\\n or Range(min=model.total_parameters_range[0], max=model.total_parameters_range[1])\n\n electricity_mix = electricity_mixes.find_electricity_mix(zone=electricity_mix_zone)\n if electricity_mix is None:\n # TODO: Replace with proper logging\n print(f\"Could not find electricity mix `{electricity_mix_zone}` in the ADEME database\")\n return None\n if_electricity_mix_adpe=electricity_mix.adpe\n if_electricity_mix_pe=electricity_mix.pe\n if_electricity_mix_gwp=electricity_mix.gwp\n\n return compute_llm_impacts(\n model_active_parameter_count=model_active_params,\n model_total_parameter_count=model_total_params,\n output_token_count=output_token_count,\n request_latency=request_latency,\n if_electricity_mix_adpe=if_electricity_mix_adpe,\n if_electricity_mix_pe=if_electricity_mix_pe,\n if_electricity_mix_gwp=if_electricity_mix_gwp,\n )\n
"},{"location":"tutorial/","title":"Tutorial","text":"The EcoLogits library tracks the energy consumption and environmental impacts of generative AI models accessed through APIs and their official client libraries.
It achieves this by patching the Python client libraries, ensuring that each API request is wrapped with an impact calculation function. This function computes the environmental impact based on several request features, such as the chosen model, the number of tokens generated, and the request's latency. The resulting data is then encapsulated in an Impacts
object, which is added to the response, containing the environmental impacts for a specific request.
Set up in 5 minutes
Install ecologits
with pip
and get up and running in minutes.
Getting started
Environmental impacts
Understand what environmental impacts and phases are reported.
Tutorial
Supported providers
List of providers and tutorials on how to make requests.
Providers
Methodology
Understand how we estimate environmental impacts.
Methodology
"},{"location":"tutorial/#initialization-of-ecologits","title":"Initialization of EcoLogits","text":"To use EcoLogits in your projects, you will need to initialize the client tracers that are used internally to intercept and enrich responses.
Default behavior is to search and initialize all available providers.
from ecologits import EcoLogits\n\n# Initialize for all available providers\nEcoLogits.init()\n\n# Initialize for `openai` provider only\nEcoLogits.init(\"openai\")\n\n# Initialize for `openai` and `anthropic` providers only\nEcoLogits.init([\"openai\", \"anthropic\"])\n
It is currently not possible to un-initialize a provider at runtime. If that's the case do not hesitate to open an issue and explain why it could be necessary for your use case.
"},{"location":"tutorial/impacts/","title":"Environmental Impacts","text":"Environmental impacts are reported for each request in the Impacts
pydantic model and features multiple criteria such as the energy and global warming potential per phase (usage or embodied) as well as the total impacts.
To learn more on how we estimate the environmental impacts and what are our hypotheses go to the methodology section.
Structure of Impacts modelfrom ecologits.impacts.modeling import *\n\nImpacts(\n energy=Energy(), # (1)!\n gwp=GWP(),\n adpe=ADPe(),\n pe=PE(),\n usage=Usage( # (2)!\n energy=Energy(),\n gwp=GWP(),\n adpe=ADPe(),\n pe=PE(),\n ),\n embodied=Embodied( # (3)!\n gwp=GWP(),\n adpe=ADPe(),\n pe=PE(),\n )\n)\n
Total impacts for all phases. Usage impacts for the electricity consumption impacts. Note that the energy is equal to the \"total\" energy impact. Embodied impacts for resource extract, manufacturing and transportation of hardware components allocated to the request. You can extract an impact with:
>>> response.impacts.usage.gwp.value # (1)!\n0.34 # Expressed in kgCO2eq.\n
Assuming you have made an inference and get the response in an response
object. Or you could get value range impact instead:
>>> response.impacts.usage.gwp.value\nRange(min=0.16, max=0.48) # Expressed in kgCO2eq (1)\n
Range
are used to define intervals. "},{"location":"tutorial/impacts/#criteria","title":"Criteria","text":"To evaluate the impact of human activities on the planet or on the climate we use criteria that usually focus on a specific issue such as GHG emissions for global warming, water consumption and pollution or the depletion of natural resources. We currently support three environmental impact criteria in addition with the direct energy consumption.
Monitoring multiple criteria is useful to avoid pollution shifting, which is defined as the transfer of pollution from one medium to another. It is a common pitfall to optimize only one criterion like GHG emissions (e.g. buying new hardware that is more energy efficient), that can lead to higher impacts on minerals and metals depletion for example (see encyclopedia.com ).
"},{"location":"tutorial/impacts/#energy","title":"Energy","text":"The Energy
criterion refers to the direct electricity consumption of GPUs, server and other equipments from the data center. As defined the energy criteria is not an environmental impact, but it is used to estimate other impacts in the usage phase. This criterion is expressed in kilowatt-hour (kWh).
Energy model attributes Attributes:
type
(str
) \u2013 energy
name
(str
) \u2013 Energy
value
(str
) \u2013 Energy value
unit
(str
) \u2013 Kilowatt-hour (kWh)
"},{"location":"tutorial/impacts/#global-warming-potential-gwp","title":"Global Warming Potential (GWP)","text":"The Global Warming Potential (GWP
) criterion is an index measuring how much heat is absorbed by greenhouse gases in the atmosphere compared to carbon dioxide. This criterion is expressed in kilogram of carbon dioxide equivalent (kgCO2eq).
Learn more: wikipedia.org
GWP model attributes Attributes:
"},{"location":"tutorial/impacts/#abiotic-depletion-potential-for-elements-adpe","title":"Abiotic Depletion Potential for Elements (ADPe)","text":"The Abiotic Depletion Potential \u2013 elements (ADPe
) criterion represents the reduction of non-renewable and non-living (abiotic) resources such as metals and minerals. This criterion is expressed in kilogram of antimony equivalent (kgSbeq).
Learn more: sciencedirect.com
ADPe model attributes Attributes:
"},{"location":"tutorial/impacts/#primary-energy-pe","title":"Primary Energy (PE)","text":"The Primary Energy (PE
) criterion represents the amount of energy consumed from natural sources such as raw fuels and other forms of energy, including waste. This criterion is expressed in megajoule (MJ).
Learn more: wikipedia.org
PE model attributes Attributes:
type
(str
) \u2013 PE
name
(str
) \u2013 Primary Energy
value
(str
) \u2013 PE value
unit
(str
) \u2013 Megajoule (MJ)
"},{"location":"tutorial/impacts/#phases","title":"Phases","text":"Inspired from the Life Cycle Assessment methodology we classify impacts is two phases (usage and embodied). The usage phase is about the environmental impacts related to the energy consumption while using an AI model. The embodied phase encompasses upstream impacts such as resource extraction, manufacturing, and transportation. We currently do not support the third phase which is end-of-life due to a lack of open research and transparency on that matter.
Learn more: wikipedia.org
Another pitfall in environmental impact assessment is to only look at the usage phase and ignore upstream and downstream impacts. This can lead to higher overall impacts on the entire life cycle. If you replace old hardware by newer that is more energy efficient, you will get a reduction of impacts on the usage phase, but it will increase the upstream impacts as well.
"},{"location":"tutorial/impacts/#usage","title":"Usage","text":"The Usage
phase accounts for the environmental impacts while using AI models. We report all criteria in addition to direct energy consumption for this phase.
Note that we use the worldwide average electricity mix impact factor by default.
Usage model attributes Attributes:
type
(str
) \u2013 usage
name
(str
) \u2013 Usage
energy
(Energy
) \u2013 Energy consumption
gwp
(GWP
) \u2013 Global Warming Potential (GWP) usage impact
adpe
(ADPe
) \u2013 Abiotic Depletion Potential for Elements (ADPe) usage impact
pe
(PE
) \u2013 Primary Energy (PE) usage impact
"},{"location":"tutorial/impacts/#embodied","title":"Embodied","text":"The Embodied phase accounts for the upstream environmental impacts such as resource extraction, manufacturing and transportation allocated to the request. We report all criteria (excluding energy consumption) for this phase.
Embodied model attributes Attributes:
type
(str
) \u2013 embodied
name
(str
) \u2013 Embodied
gwp
(GWP
) \u2013 Global Warming Potential (GWP) embodied impact
adpe
(ADPe
) \u2013 Abiotic Depletion Potential for Elements (ADPe) embodied impact
pe
(PE
) \u2013 Primary Energy (PE) embodied impact
"},{"location":"tutorial/impacts/#impact-factors","title":"Impact Factors","text":"We use impact factors to quantify environmental harm from human activities, measuring the ratio of greenhouse gases, resource consumption, and other criteria resulting from activities like energy consumption, industrial processes, transportation, waster management and more.
"},{"location":"tutorial/impacts/#electricity-mix","title":"Electricity Mix","text":"We currently assume by default a worldwide average impact factor for electricity consumption. We plan to allow users to change these impact factors dynamically based on a specific country/region or with custom values.
Default values (from ADEME Base Empreinte\u00ae):
Impact criteria Value Unit GWP \\(5.904e-1\\) \\(kgCO2eq / kWh\\) ADPe \\(7.378e-7\\) \\(kgSbeq / kWh\\) PE \\(9.988\\) \\(MJ / kWh\\)"},{"location":"tutorial/providers/","title":"Supported providers","text":""},{"location":"tutorial/providers/#list-of-all-providers","title":"List of all providers","text":"Provider name Extra for installation Guide Anthropic anthropic
Guide for Anthropic Cohere cohere
Guide for Cohere Google Gemini google-generativeai
Guide for Google Gemini Hugging Face Hub huggingface-hub
Guide for Hugging Face Hub LiteLLM litellm
Guide for LiteLLM Mistral AI mistralai
Guide for Mistral AI OpenAI openai
Guide for OpenAI Azure OpenAI openai
Guide for Azure OpenAI"},{"location":"tutorial/providers/#chat-completions","title":"Chat Completions","text":"Provider Completions Completions (stream) Completions (async) Completions (async + stream) Anthropic Cohere Google Gemini HuggingFace Hub LiteLLM Mistral AI OpenAI Azure OpenAI Partial support for Anthropic streams, see full documentation: Anthropic provider.
"},{"location":"tutorial/providers/anthropic/","title":"Anthropic","text":"This guide focuses on the integration of EcoLogits with the Anthropic official python client .
Official links:
Repository: anthropics/anthropic-sdk-python Documentation: docs.anthropic.com "},{"location":"tutorial/providers/anthropic/#installation","title":"Installation","text":"To install EcoLogits along with all necessary dependencies for compatibility with the Anthropic client, please use the anthropic
extra-dependency option as follows:
pip install ecologits[anthropic]\n
This installation command ensures that EcoLogits is set up with the specific libraries required to interface seamlessly with Anthropic's Python client.
"},{"location":"tutorial/providers/anthropic/#chat-completions","title":"Chat Completions","text":""},{"location":"tutorial/providers/anthropic/#example","title":"Example","text":"Integrating EcoLogits with your applications does not alter the standard outputs from the API responses. Instead, it enriches them by adding the Impacts
object, which contains detailed environmental impact data.
SyncAsync from anthropic import Anthropic\nfrom ecologits import EcoLogits\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = Anthropic(api_key=\"<ANTHROPIC_API_KEY>\")\n\nresponse = client.messages.create(\n max_tokens=100,\n messages=[{\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}],\n model=\"claude-3-haiku-20240307\",\n)\n\n# Get estimated environmental impacts of the inference\nprint(response.impacts)\n
import asyncio\nfrom anthropic import AsyncAnthropic\nfrom ecologits import EcoLogits\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = AsyncAnthropic(api_key=\"<ANTHROPIC_API_KEY>\")\n\nasync def main() -> None:\n response = await client.messages.create(\n max_tokens=100,\n messages=[{\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}],\n model=\"claude-3-haiku-20240307\",\n )\n\n # Get estimated environmental impacts of the inference\n print(response.impacts)\n\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/anthropic/#streaming-example","title":"Streaming example","text":"In streaming mode, the impacts are calculated in the last chunk for the entire request.
SyncAsync from anthropic import Anthropic\nfrom ecologits import EcoLogits\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = Anthropic(api_key=\"<ANTHROPIC_API_KEY>\")\n\nwith client.messages.stream(\n max_tokens=100,\n messages=[{\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}],\n model=\"claude-3-haiku-20240307\",\n) as stream:\n for text in stream.text_stream:\n pass\n # Get estimated environmental impacts of the inference\n print(stream.impacts)\n
import asyncio\nfrom anthropic import AsyncAnthropic\nfrom ecologits import EcoLogits\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = AsyncAnthropic(api_key=\"<ANTHROPIC_API_KEY>\")\n\nasync def main() -> None:\n async with client.messages.stream(\n max_tokens=100,\n messages=[{\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}],\n model=\"claude-3-haiku-20240307\",\n ) as stream:\n async for text in stream.text_stream:\n pass\n # Get estimated environmental impacts of the inference\n print(stream.impacts)\n\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/cohere/","title":"Cohere","text":"This guide focuses on the integration of EcoLogits with the Cohere official python client .
Official links:
Repository: mistralai/client-python Documentation: docs.cohere.com "},{"location":"tutorial/providers/cohere/#installation","title":"Installation","text":"To install EcoLogits along with all necessary dependencies for compatibility with the Cohere client, please use the cohere
extra-dependency option as follows:
pip install ecologits[cohere]\n
This installation command ensures that EcoLogits is set up with the specific libraries required to interface seamlessly with Cohere's Python client.
"},{"location":"tutorial/providers/cohere/#chat-completions","title":"Chat Completions","text":""},{"location":"tutorial/providers/cohere/#example","title":"Example","text":"Integrating EcoLogits with your applications does not alter the standard outputs from the API responses. Instead, it enriches them by adding the Impacts
object, which contains detailed environmental impact data.
SyncAsync from cohere import Client\nfrom ecologits import EcoLogits\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = Client(api_key=\"<COHERE_API_KEY>\")\n\nresponse = client.chat(\n message=\"Tell me a funny joke!\", \n max_tokens=100\n)\n\n# Get estimated environmental impacts of the inference\nprint(response.impacts)\n
import asyncio\nfrom cohere import AsyncClient\nfrom ecologits import EcoLogits\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = AsyncClient(api_key=\"<COHERE_API_KEY>\")\n\nasync def main() -> None:\n response = await client.chat(\n message=\"Tell me a funny joke!\", \n max_tokens=100\n )\n\n # Get estimated environmental impacts of the inference\n print(response.impacts)\n\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/cohere/#streaming-example","title":"Streaming example","text":"In streaming mode, the impacts are calculated in the last chunk for the entire request.
SyncAsync from cohere import Client\nfrom ecologits import EcoLogits\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = Client(api_key=\"<COHERE_API_KEY>\")\n\nstream = client.chat_stream(\n message=\"Tell me a funny joke!\", \n max_tokens=100\n)\n\nfor chunk in stream:\n if chunk.event_type == \"stream-end\":\n # Get estimated environmental impacts of the inference\n print(chunk.impacts)\n
import asyncio\nfrom cohere import AsyncClient\nfrom ecologits import EcoLogits\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = AsyncClient(api_key=\"<COHERE_API_KEY>\")\n\nasync def main() -> None:\n stream = await client.chat_stream(\n message=\"Tell me a funny joke!\", \n max_tokens=100\n )\n\n async for chunk in stream:\n if chunk.event_type == \"stream-end\":\n # Get estimated environmental impacts of the inference\n print(chunk.impacts)\n\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/google/","title":"Google Gemini","text":"This guide focuses on the integration of EcoLogits with the Google Gemini official python client .
Official links:
Repository: google-gemini/generative-ai-python Documentation: ai.google.dev "},{"location":"tutorial/providers/google/#installation","title":"Installation","text":"To install EcoLogits along with all necessary dependencies for compatibility with the Google Gemini client, please use the google-generativeai
extra-dependency option as follows:
pip install ecologits[google-generativeai]\n
This installation command ensures that EcoLogits is set up with the specific libraries required to interface seamlessly with Google Gemini Python client.
"},{"location":"tutorial/providers/google/#chat-completions","title":"Chat Completions","text":""},{"location":"tutorial/providers/google/#example","title":"Example","text":"Integrating EcoLogits with your applications does not alter the standard outputs from the API responses. Instead, it enriches them by adding the Impacts
object, which contains detailed environmental impact data.
SyncAsync from ecologits import EcoLogits\nimport google.generativeai as genai\n\n# Initialize EcoLogits\nEcoLogits.init()\n\n# Ask something to Google Gemini\ngenai.configure(api_key=\"<GOOGLE_API_KEY>\")\nmodel = genai.GenerativeModel(\"gemini-1.5-flash\")\nresponse = model.generate_content(\"Write a story about a magic backpack.\")\n\n# Get estimated environmental impacts of the inference\nprint(response.impacts)\n
import asyncio\nfrom ecologits import EcoLogits\nimport google.generativeai as genai\n\n# Initialize EcoLogits\nEcoLogits.init()\n\n# Ask something to Google Gemini in async mode\nasync def main() -> None:\n genai.configure(api_key=\"<GOOGLE_API_KEY>\")\n model = genai.GenerativeModel(\"gemini-1.5-flash\")\n response = await model.generate_content_async(\n \"Write a story about a magic backpack.\"\n )\n\n # Get estimated environmental impacts of the inference\n print(response.impacts)\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/google/#streaming-example","title":"Streaming example","text":"In streaming mode, the impacts are calculated incrementally, which means you don't need to sum the impacts from each data chunk. Instead, the impact information in the last chunk reflects the total cumulative environmental impacts for the entire request.
SyncAsync from ecologits import EcoLogits\nimport google.generativeai as genai\n\n# Initialize EcoLogits\nEcoLogits.init()\n\n# Ask something to Google Gemini in streaming mode\ngenai.configure(api_key=\"<GOOGLE_API_KEY>\")\nmodel = genai.GenerativeModel(\"gemini-1.5-flash\")\nstream = model.generate_content(\n \"Write a story about a magic backpack.\", \n stream=True\n)\n\n# Get cumulative estimated environmental impacts of the inference\nfor chunk in stream:\n print(chunk.impacts)\n
import asyncio\nfrom ecologits import EcoLogits\nimport google.generativeai as genai\n\n# Initialize EcoLogits\nEcoLogits.init()\n\n# Ask something to Google Gemini in streaming and async mode\nasync def main() -> None:\n genai.configure(api_key=\"<GOOGLE_API_KEY>\")\n model = genai.GenerativeModel(\"gemini-1.5-flash\")\n stream = await model.generate_content_async(\n \"Write a story about a magic backpack.\", \n stream=True\n )\n\n # Get cumulative estimated environmental impacts of the inference\n async for chunk in stream:\n print(chunk.impacts)\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/huggingface_hub/","title":"Hugging Face Hub","text":"This guide focuses on the integration of EcoLogits with the Hugging Face Hub official python client .
Official links:
Repository: huggingface/huggingface_hub Documentation: huggingface.co "},{"location":"tutorial/providers/huggingface_hub/#installation","title":"Installation","text":"To install EcoLogits along with all necessary dependencies for compatibility with the Hugging Face Hub client, please use the huggingface-hub
extra-dependency option as follows:
pip install ecologits[huggingface-hub]\n
This installation command ensures that EcoLogits is set up with the specific libraries required to interface seamlessly with Hugging Face Hub's Python client.
"},{"location":"tutorial/providers/huggingface_hub/#chat-completions","title":"Chat Completions","text":""},{"location":"tutorial/providers/huggingface_hub/#example","title":"Example","text":"Integrating EcoLogits with your applications does not alter the standard outputs from the API responses. Instead, it enriches them by adding the Impacts
object, which contains detailed environmental impact data.
SyncAsync from ecologits import EcoLogits\nfrom huggingface_hub import InferenceClient\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = InferenceClient(model=\"HuggingFaceH4/zephyr-7b-beta\")\nresponse = client.chat_completion(\n messages=[{\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}],\n max_tokens=15\n)\n\n# Get estimated environmental impacts of the inference\nprint(response.impacts)\n
import asyncio\nfrom ecologits import EcoLogits\nfrom huggingface_hub import AsyncInferenceClient\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = AsyncInferenceClient(model=\"HuggingFaceH4/zephyr-7b-beta\")\n\nasync def main() -> None:\n response = await client.chat_completion(\n messages=[{\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}],\n max_tokens=15\n )\n\n # Get estimated environmental impacts of the inference\n print(response.impacts)\n\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/huggingface_hub/#streaming-example","title":"Streaming example","text":"In streaming mode, the impacts are calculated incrementally, which means you don't need to sum the impacts from each data chunk. Instead, the impact information in the last chunk reflects the total cumulative environmental impacts for the entire request.
SyncAsync from ecologits import EcoLogits\nfrom huggingface_hub import InferenceClient\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = InferenceClient(model=\"HuggingFaceH4/zephyr-7b-beta\")\nstream = client.chat_completion(\n messages=[{\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}],\n max_tokens=15,\n stream=True\n)\n\nfor chunk in stream:\n # Get cumulative estimated environmental impacts of the inference\n print(chunk.impacts)\n
import asyncio\nfrom ecologits import EcoLogits\nfrom huggingface_hub import AsyncInferenceClient\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = AsyncInferenceClient(model=\"HuggingFaceH4/zephyr-7b-beta\")\n\nasync def main() -> None:\n stream = await client.chat_completion(\n messages=[{\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}],\n max_tokens=15,\n stream=True\n )\n\n async for chunk in stream:\n # Get cumulative estimated environmental impacts of the inference\n print(chunk.impacts)\n\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/litellm/","title":"LiteLLM","text":"This guide focuses on the integration of EcoLogits with the LiteLLM official Python client .
Official links:
Repository: BerriAI/litellm Documentation: litellm.vercel.app "},{"location":"tutorial/providers/litellm/#installation","title":"Installation","text":"To install EcoLogits along with all necessary dependencies for compatibility with LiteLLM, please use the litellm
extra-dependency option as follows:
pip install ecologits[litellm]\n
This installation command ensures that EcoLogits is set up with the specific libraries required to interface seamlessly with LiteLLM's Python client.
"},{"location":"tutorial/providers/litellm/#chat-completions","title":"Chat Completions","text":""},{"location":"tutorial/providers/litellm/#example","title":"Example","text":"Integrating EcoLogits with your applications does not alter the standard outputs from the API responses. Instead, it enriches them by adding the Impacts
object, which contains detailed environmental impact data. Make sure you have the api key of the provider used in an .env file. Make sure you call the litellm generation function as \"litellm.completion\" and not just \"completion\".
SyncAsync from ecologits import EcoLogits\nimport litellm\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nresponse = litellm.completion(\n model=\"gpt-4o-2024-05-13\",\n messages=[{ \"content\": \"Hello, how are you?\",\"role\": \"user\"}]\n)\n\n# Get estimated environmental impacts of the inference\nprint(response.impacts)\n
import asyncio\nimport litellm\nfrom ecologits import EcoLogits\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nasync def main() -> None:\n response = await litellm.acompletion(\n model=\"gpt-3.5-turbo\",\n messages=[\n {\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}\n ]\n )\n\n # Get estimated environmental impacts of the inference\n print(response.impacts)\n\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/litellm/#streaming-example","title":"Streaming example","text":"In streaming mode, the impacts are calculated incrementally, which means you don't need to sum the impacts from each data chunk. Instead, the impact information in the last chunk reflects the total cumulative environmental impacts for the entire request.
SyncAsync from ecologits import EcoLogits\nimport litellm\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nstream = litellm.completion(\n model=\"gpt-3.5-turbo\",\n messages=[{\"role\": \"user\", \"content\": \"Hello World!\"}],\n stream=True\n)\n\nfor chunk in stream:\n # Get cumulative estimated environmental impacts of the inference\n print(chunk.impacts)\n
import asyncio\nimport litellm\nfrom ecologits import EcoLogits\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nasync def main() -> None:\n stream = await litellm.acompletion(\n model=\"gpt-3.5-turbo\",\n messages=[\n {\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}\n ]\n )\n\n async for chunk in stream:\n # Get cumulative estimated environmental impacts of the inference\n print(chunk.impacts)\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/mistralai/","title":"Mistral AI","text":"This guide focuses on the integration of EcoLogits with the Mistral AI official python client .
Official links:
Repository: mistralai/client-python Documentation: docs.mistral.ai "},{"location":"tutorial/providers/mistralai/#installation","title":"Installation","text":"To install EcoLogits along with all necessary dependencies for compatibility with the Mistral AI client, please use the mistralai
extra-dependency option as follows:
pip install ecologits[mistralai]\n
This installation command ensures that EcoLogits is set up with the specific libraries required to interface seamlessly with Mistral AI's Python client.
"},{"location":"tutorial/providers/mistralai/#chat-completions","title":"Chat Completions","text":""},{"location":"tutorial/providers/mistralai/#example","title":"Example","text":"Integrating EcoLogits with your applications does not alter the standard outputs from the API responses. Instead, it enriches them by adding the Impacts
object, which contains detailed environmental impact data.
SyncAsync from ecologits import EcoLogits\nfrom mistralai.client import MistralClient\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = MistralClient(api_key=\"<MISTRAL_API_KEY>\")\n\nresponse = client.chat(\n messages=[\n {\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}\n ],\n model=\"mistral-tiny\"\n)\n\n# Get estimated environmental impacts of the inference\nprint(response.impacts)\n
import asyncio\nfrom ecologits import EcoLogits\nfrom mistralai.async_client import MistralAsyncClient\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = MistralAsyncClient(api_key=\"<MISTRAL_API_KEY>\")\n\nasync def main() -> None:\n response = await client.chat(\n messages=[\n {\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}\n ],\n model=\"mistral-tiny\"\n )\n\n # Get estimated environmental impacts of the inference\n print(response.impacts)\n\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/mistralai/#streaming-example","title":"Streaming example","text":"In streaming mode, the impacts are calculated incrementally, which means you don't need to sum the impacts from each data chunk. Instead, the impact information in the last chunk reflects the total cumulative environmental impacts for the entire request.
SyncAsync from ecologits import EcoLogits\nfrom mistralai.client import MistralClient\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = MistralClient(api_key=\"<MISTRAL_API_KEY>\")\n\nstream = client.chat_stream(\n messages=[\n {\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}\n ],\n model=\"mistral-tiny\"\n)\n\nfor chunk in stream:\n # Get cumulative estimated environmental impacts of the inference\n print(chunk.impacts)\n
import asyncio\nfrom ecologits import EcoLogits\nfrom mistralai.async_client import MistralAsyncClient\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = MistralAsyncClient(api_key=\"<MISTRAL_API_KEY>\")\n\nasync def main() -> None:\n response = await client.chat(\n messages=[\n {\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}\n ],\n model=\"mistral-tiny\"\n )\n\n async for chunk in stream:\n # Get cumulative estimated environmental impacts of the inference\n if hasattr(chunk, \"impacts\"):\n print(chunk.impacts)\n\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/openai/","title":"OpenAI","text":"This guide focuses on the integration of EcoLogits with the OpenAI official python client .
Official links:
Repository: openai/openai-python Documentation: platform.openai.com "},{"location":"tutorial/providers/openai/#installation","title":"Installation","text":"To install EcoLogits along with all necessary dependencies for compatibility with the OpenAI client, please use the openai
extra-dependency option as follows:
pip install ecologits[openai]\n
This installation command ensures that EcoLogits is set up with the specific libraries required to interface seamlessly with OpenAI's Python client.
"},{"location":"tutorial/providers/openai/#chat-completions","title":"Chat Completions","text":""},{"location":"tutorial/providers/openai/#example","title":"Example","text":"Integrating EcoLogits with your applications does not alter the standard outputs from the API responses. Instead, it enriches them by adding the Impacts
object, which contains detailed environmental impact data.
SyncAsync from ecologits import EcoLogits\nfrom openai import OpenAI\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = OpenAI(api_key=\"<OPENAI_API_KEY>\")\n\nresponse = client.chat.completions.create(\n model=\"gpt-3.5-turbo\",\n messages=[\n {\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}\n ]\n)\n\n# Get estimated environmental impacts of the inference\nprint(response.impacts)\n
import asyncio\nfrom ecologits import EcoLogits\nfrom openai import AsyncOpenAI\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = AsyncOpenAI(api_key=\"<OPENAI_API_KEY>\")\n\nasync def main() -> None:\n response = await client.chat.completions.create(\n model=\"gpt-3.5-turbo\",\n messages=[\n {\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}\n ]\n )\n\n # Get estimated environmental impacts of the inference\n print(response.impacts)\n\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/openai/#streaming-example","title":"Streaming example","text":"In streaming mode, the impacts are calculated incrementally, which means you don't need to sum the impacts from each data chunk. Instead, the impact information in the last chunk reflects the total cumulative environmental impacts for the entire request.
SyncAsync from ecologits import EcoLogits\nfrom openai import OpenAI\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = OpenAI(api_key=\"<OPENAI_API_KEY>\")\n\nstream = client.chat.completions.create(\n model=\"gpt-3.5-turbo\",\n messages=[{\"role\": \"user\", \"content\": \"Hello World!\"}],\n stream=True\n)\n\nfor chunk in stream:\n # Get cumulative estimated environmental impacts of the inference\n print(chunk.impacts)\n
import asyncio\nfrom ecologits import EcoLogits\nfrom openai import AsyncOpenAI\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = AsyncOpenAI(api_key=\"<OPENAI_API_KEY>\")\n\nasync def main() -> None:\n stream = await client.chat.completions.create(\n model=\"gpt-3.5-turbo\",\n messages=[\n {\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}\n ]\n )\n\n async for chunk in stream:\n # Get cumulative estimated environmental impacts of the inference\n print(chunk.impacts)\n\n\nasyncio.run(main())\n
"},{"location":"tutorial/providers/openai/#compatibility-with-azure-openai","title":"Compatibility with Azure OpenAI","text":"EcoLogits is also compatible with Azure OpenAI .
import os\nfrom ecologits import EcoLogits\nfrom openai import AzureOpenAI\n\n# Initialize EcoLogits\nEcoLogits.init()\n\nclient = AzureOpenAI(\n azure_endpoint = os.getenv(\"AZURE_OPENAI_ENDPOINT\"), \n api_key=os.getenv(\"AZURE_OPENAI_API_KEY\"), \n api_version=\"2024-02-01\"\n)\n\n\nresponse = client.chat.completions.create(\n model=\"gpt-35-turbo\",\n messages=[\n {\"role\": \"user\", \"content\": \"Tell me a funny joke!\"}\n ]\n)\n\n# Get estimated environmental impacts of the inference\nprint(response.impacts)\n
"}]}
\ No newline at end of file
diff --git a/dev/sitemap.xml b/dev/sitemap.xml
index 63a59d7..f2560ac 100644
--- a/dev/sitemap.xml
+++ b/dev/sitemap.xml
@@ -36,7 +36,7 @@
daily
- https://ecologits.ai/latest/reference/ecologits/
+ https://ecologits.ai/latest/reference/_ecologits/
2024-08-28
daily
diff --git a/dev/sitemap.xml.gz b/dev/sitemap.xml.gz
index 4c7a770..405aeb3 100644
Binary files a/dev/sitemap.xml.gz and b/dev/sitemap.xml.gz differ
diff --git a/dev/tutorial/impacts/index.html b/dev/tutorial/impacts/index.html
index a0e52c6..b1f1969 100644
--- a/dev/tutorial/impacts/index.html
+++ b/dev/tutorial/impacts/index.html
@@ -325,7 +325,7 @@
-
+
API Reference
@@ -1146,11 +1146,11 @@
-
+
- ecologits
+ _ecologits
diff --git a/dev/tutorial/index.html b/dev/tutorial/index.html
index 1cf76ef..7746d6b 100644
--- a/dev/tutorial/index.html
+++ b/dev/tutorial/index.html
@@ -325,7 +325,7 @@
-
+
API Reference
@@ -610,6 +610,17 @@
+
+
+
+
+ Introduction
+
+
+
+
+
+
@@ -620,6 +631,34 @@
+
+
+
+
+
+
+
+
+
+
+
+ Table of contents
+
+
+
+
+
@@ -1008,11 +1047,11 @@
-
+
- ecologits
+ _ecologits
@@ -1459,6 +1498,23 @@
+
+
+ Table of contents
+
+
+
@@ -1507,6 +1563,23 @@ Tutorial
+Initialization of EcoLogits
+To use EcoLogits in your projects, you will need to initialize the client tracers that are used internally to intercept and enrich responses.
+
+
Default behavior is to search and initialize all available providers.
+
+from ecologits import EcoLogits
+
+# Initialize for all available providers
+EcoLogits . init ()
+
+# Initialize for `openai` provider only
+EcoLogits . init ( "openai" )
+
+# Initialize for `openai` and `anthropic` providers only
+EcoLogits . init ([ "openai" , "anthropic" ])
+
+It is currently not possible to un-initialize a provider at runtime. If that's the case do not hesitate to open an issue and explain why it could be necessary for your use case.
diff --git a/dev/tutorial/providers/anthropic/index.html b/dev/tutorial/providers/anthropic/index.html
index 6df25e5..4983933 100644
--- a/dev/tutorial/providers/anthropic/index.html
+++ b/dev/tutorial/providers/anthropic/index.html
@@ -325,7 +325,7 @@
-
+
API Reference
@@ -1082,11 +1082,11 @@
-
+
- ecologits
+ _ecologits
diff --git a/dev/tutorial/providers/cohere/index.html b/dev/tutorial/providers/cohere/index.html
index efed115..5f1c9b0 100644
--- a/dev/tutorial/providers/cohere/index.html
+++ b/dev/tutorial/providers/cohere/index.html
@@ -325,7 +325,7 @@
-
+
API Reference
@@ -1082,11 +1082,11 @@
-
+
- ecologits
+ _ecologits
diff --git a/dev/tutorial/providers/google/index.html b/dev/tutorial/providers/google/index.html
index cf94947..de02d66 100644
--- a/dev/tutorial/providers/google/index.html
+++ b/dev/tutorial/providers/google/index.html
@@ -325,7 +325,7 @@
-
+
API Reference
@@ -1082,11 +1082,11 @@
-
+
- ecologits
+ _ecologits
diff --git a/dev/tutorial/providers/huggingface_hub/index.html b/dev/tutorial/providers/huggingface_hub/index.html
index a9eaf35..a4477df 100644
--- a/dev/tutorial/providers/huggingface_hub/index.html
+++ b/dev/tutorial/providers/huggingface_hub/index.html
@@ -325,7 +325,7 @@
-
+
API Reference
@@ -1082,11 +1082,11 @@
-
+
- ecologits
+ _ecologits
diff --git a/dev/tutorial/providers/index.html b/dev/tutorial/providers/index.html
index ff785c4..9886c7d 100644
--- a/dev/tutorial/providers/index.html
+++ b/dev/tutorial/providers/index.html
@@ -325,7 +325,7 @@
-
+
API Reference
@@ -1114,11 +1114,11 @@
-
+
- ecologits
+ _ecologits
diff --git a/dev/tutorial/providers/litellm/index.html b/dev/tutorial/providers/litellm/index.html
index d603be4..8ba569a 100644
--- a/dev/tutorial/providers/litellm/index.html
+++ b/dev/tutorial/providers/litellm/index.html
@@ -325,7 +325,7 @@
-
+
API Reference
@@ -1082,11 +1082,11 @@
-
+
- ecologits
+ _ecologits
diff --git a/dev/tutorial/providers/mistralai/index.html b/dev/tutorial/providers/mistralai/index.html
index 0a4b864..d6daa2e 100644
--- a/dev/tutorial/providers/mistralai/index.html
+++ b/dev/tutorial/providers/mistralai/index.html
@@ -325,7 +325,7 @@
-
+
API Reference
@@ -1082,11 +1082,11 @@
-
+
- ecologits
+ _ecologits
diff --git a/dev/tutorial/providers/openai/index.html b/dev/tutorial/providers/openai/index.html
index 68c461d..7acb1d5 100644
--- a/dev/tutorial/providers/openai/index.html
+++ b/dev/tutorial/providers/openai/index.html
@@ -325,7 +325,7 @@
-
+
API Reference
@@ -1091,11 +1091,11 @@
-
+
- ecologits
+ _ecologits
diff --git a/dev/why/index.html b/dev/why/index.html
index 5a9acf0..61a2f28 100644
--- a/dev/why/index.html
+++ b/dev/why/index.html
@@ -325,7 +325,7 @@
-
+
API Reference
@@ -1080,11 +1080,11 @@
-
+
- ecologits
+ _ecologits