Greener Bytes: Exploring the Environmental Impact and Solutions for LLMs by Meetu Malhotra

Meetu Malhotra
Researcher

Reviews

0
No votes yet
Automatic Summary

Understanding Large Language Models (LLMs) and Their Impact on the Environment

In the ever-evolving realm of artificial intelligence, Large Language Models (LLMs) have emerged as a transformative force. This blog post dives into the fundamentals of LLMs, their applications, the challenges they face, and their significant environmental implications.

What are Large Language Models?

Large Language Models are sophisticated deep neural networks that leverage the transformer architecture, initially introduced in the groundbreaking 2017 paper titled "Attention is All You Need". These models possess generative capabilities, allowing them to create various forms of data—including text, audio, and video—based on the patterns learned during training.

Applications of LLMs

LLMs are reshaping numerous industries by enhancing functionalities and optimizing various tasks. Here are some notable applications:

  • Natural Language Processing (NLP): LLMs excel in tasks such as machine translation, which facilitates seamless communication across language barriers.
  • Information Retrieval: Users can obtain specific data from vast documents quickly, making LLMs invaluable for research and decision-making.
  • Recommendation Systems: E-commerce giants like Amazon use LLMs to provide personalized product recommendations based on user behavior.
  • Multimodal Tasks: Advanced models like ChatGPT can process texts, images, and audio, enhancing user interaction.
  • Automated Agents: LLMs enable the creation of agents that can perform tasks for users, such as submitting college applications.
  • Evaluation and Feedback: LLMs assist educators in grading assignments and providing course feedback efficiently.

Challenges Faced by LLMs

Despite their advancements, LLMs encounter several significant challenges:

  • Hallucination: LLMs may fabricate information when they lack knowledge, leading to potential misinformation.
  • Bias in Data: Existing biases in training data can result in biased outputs, reflecting societal inequalities.
  • High Computational Costs: Training and utilizing LLMs necessitate substantial computational resources and energy consumption.
  • Data Privacy Risks: Personal data shared during interactions with LLMs may inadvertently be exposed.
  • Context Limitations: Many LLMs struggle to maintain context in lengthy conversations, leading to incomplete responses.
  • Lack of Logical Reasoning: While proficient at generating data, LLMs often falter in complex logical reasoning tasks.
  • Ethical Concerns: The potential misuse of LLMs in creating deepfakes poses a significant ethical dilemma.

The Environmental Impact of LLMs

As we embrace the capabilities of LLMs, it's crucial to address their carbon footprint and energy consumption. Here’s why:

What is Carbon Footprint?

The carbon footprint quantifies the total greenhouse gas emissions, primarily carbon dioxide, produced by activities, including those involved in operating LLMs. Notably, training models like BERT emits approximately 284 tons of carbon dioxide, equivalent to the emissions of five cars throughout their lifetimes.

Energy Consumption During Model Training

The training phase of LLMs demands massive computational resources, leading to extraordinary energy consumption. For instance, training the GPT-3 model (175 billion parameters) requires energy akin to powering an average American household for 120 years. Consequently, this process contributes to a significant carbon footprint.

Operational Energy Use

After training, the inference phase, where the model is applied to various tasks, also consumes considerable energy. The need for continuous operation further amplifies the energy requirements, heightening the environmental impact.

Quantifying Carbon Emissions

To measure the environmental costs associated with LLMs, tools like the Code Carbon library can estimate carbon emissions during model operations. Users can track emissions by inputting their code into the system and analyzing the results.

Strategies for Reducing Carbon Emissions

To mitigate the environmental effects of LLMs, several strategies can be implemented:

  • Optimize Algorithms:

Video Transcription

So, yeah, the topics that I'm going to cover, it includes introduction to LLMs, use cases.I'll start with the generic challenges of LLMs, and then I'll boil down to just the energy consumption or the or the carbon, footprints that we all are leaving, and then how we can quantify them and the strategies to reduce carbon emission. So the large so I'll start with the introduction, large language model. So what large language models means? So these are deep neural networks that utilizes the transformer architecture. Meaning so transformer architecture was introduced in 2017, in a paper where attention is all you need. It was the topic of the paper where this transformer architecture was introduced. And that was the base model, which every large language model utilizes. So at the core of large language model, these are generative in nature, meaning, gen a terms, you know, the generative artificial intelligence.

Meaning, they are able to generate any kind of data, whether it's text data or, video data, audio data based on the, patterns that they'll learn by on which the, data they are trained on. So they can be used for, different format of data. For example, we all deal with this, g p we all must have used the GPT models, the chart GPT models one that are, series which was introduced by OpenAI. Gemini series, it's by Google. Then we have llama series by Facebook, that recent one being llama four. And based on different form of data, there are different, kind of large language models available. For example, audio data, video data, images, we have stable diffusion. For images, we have Dell e also, from OpenAI, time series data. We have these different, models for that.

Next, on this slide you see is the, the remarkable gen journey of the innovation and growth that you can see over these years in large language models. I have taken this, image from paper, archive paper, which talks about comprehensive overview of large language models. So it all started with GPT initial version, which was train which was trained on millions of parameters with 512 token length to all the way to the recent GPT series, which is, trained on billions of parameters, meaning there is increase in sheer size of the training parameters.

And then the context length is, like, more than 8,000. So that's how they have grown. In every sequent subsequent year, we see, the change in these models. They are getting better and better in terms of, size, in terms of training, in terms of scalability, in terms of the kind of data that they can handle, in terms of the challenges that they were, you know, they that have been improved, the challenges faced by the initial version.

Of course, they are they are overcome by the, latest version. So that's how we see, immense growth in these large language models. Here, I'm talking about the, transformer architecture. Again, the image on the left, it is quite complex. It is taken from attention is all you need paper, the very first paper when transformers were introduced. They were initially introduced for the very basic task, which was like, machine translation. And, the idea was to capture the contextual meaning that is, present in our, tokens. But, eventually, they, lead to the development of large language models, which are not only encoder and decoder based, but some of them are, like, just encoder based, some decoder based, and some include both. So the chat GPT model that you see today that you interact with, at your daily life, it it is decoder only architecture. And in next slide, I'm talking about the various applications of large language models that we see. NLP task. So NLP stands for natural language processing.

Here, I'm talking about just the text part of it, which means machine translation is something that is very common that we use, all day to day basic. If I have to interact with somebody who does not speak my language, so just, you know, passing audio message in the in in my mobile and getting the, language in the desired language getting the response back in the desired language so that I can interact with the person, That is, you know, machine learning in its live form.

That is one of the applications of large language models that is doing the work in the back end. LLMs are also used for information retrieval. So information retrieval, as the name suggests, that you want to retrieve information for any kind of document. For example, I can, provide PDFs to large language models, and I can say, hey. I want the specific information out of these PDFs. Or other example could be if you have a meeting transcript with you and you want to and you don't want to go through the whole transcript. Rather, you can just pick up the bullet points, and you can go through, you know, it can provide you the summarized form of that meeting notes, and you can go through them. So that is the information retrieval that LLMs are used for. Next is recommendation. Netflix, we all are using. Amazon services, we all are using.

So whatever product we put in our cart based on, or we shop, based on the historical patterns, whatever movies we have seen, these, organization, these are using recommendation models in the back end that can suggest, okay. This is what the closest, product look like that you might like. So they are used in recommendation also. Then they are used in multimodal, task also. Meaning, for example, chat GPT. So initially, it was like you provide it to text data, it will response back in text data. But now it's not like that. You can provide the audio data, meaning you can talk, or you can, provide, let's say, Google Lens. You can provide, image data. You can provide the picture, and then you can ask, okay. Tell me something about this picture, e, either in audio format or in just the text format or in video format, like, more information.

So multimodel means they are, talking about multiple kind of they deal with multiple form of data, and these are the, that's the another application. Then there are LLMs based agents. So LLMs are the, I would say, backbone, meaning they do the logical reasoning, but agents are something that perform action. So there are some agents, out there that can, do work for you. For example, in simple way, you can, relate it to Siri or Alexa. Or, if I give you an example, for example, you're putting application to different colleges, let's say. So what you will do, you will go to individual website of these colleges, and you will, you know, enter all the details, and you will submit the form. What if there is an agent that is, for you? You provide generic information, all the information to this agent, saying that, hey.

This is my name, last name, email ID, and whatever information you use while submitting the form, college application form. The agent will do that part for you, do that job for you, meaning submitting application for these multiple colleges for you. So this is the power of LLMs, meaning you just provide them instructions, clear instructions, precise instructions, and they will do job for you. And, logically, the logical part of it will be taken care of by language models. They are also used in l, evaluation. So teachers or professors, they are using, them, these large language models to grade, the assignments or to create courses and get feedback on those courses. So LLMs are providing all those services. For example, University of San Diego in School of Law, they are using large language modules as an experiment for their classes.

Next, I will talk about the challenges in LLM. So here, I'm talking about the general challenges that we face. For example, hallucination. And, again, lot of research has been, is being done on, tackling these challenges. So hallucination is one of the challenges that we face where large language models, they fabricate the information, meaning they are we all know they are, trained on immense amount of data. Right? So if we ask question, it is quite a possibility that the large language model does not know. But, surprisingly, it will never tell you that, hey. I don't know the this sort of information. It will try its best to answer you for any kind of question that you, provide to large language model. And that's where the fabrication of information comes in, and that leads to misinformation. So that is one of the challenges. Bias in data.

So bias meaning, we have, so we the training data. So whatever we will teach these large language models, whatever data we will train, these large language models on, if there is bias in that training data, it is auto automatically going to reflect in the answers generated by these large language models. For example, I tried Delhi asking, hey. Generate me the image of a construction worker. By default, it will always generate you the image of a man. Generate me the image of nurse. So generate me the image of a a cook. So it will generate an image of female. So so what I'm trying to say, these kind of gender bias, they exist in our society also. And those societal, norms or the bias, that is, reflected in training data. And as a result, that is reflected in the answers provided by these large language models.

So these are this is this is just one of the examples that, that show that LLMs, they have bias in their answers. Then there is high computational and energy cost. Of course, I cannot if I need to work on large language model, I cannot use my normal computer. I will have to rely on, GPUs. So for that, basically, high computational is, resources are needed to run, to train them and, of course, to, get inference out of them. Data privacy. So, definitely, there are some data privacy related risk associated. If you accidentally put any of your, personal information in in these, while chatting with large language models, it is going to be exposed to the outside world.

The point is whatever chat we do with these large language models, like, on chat GPT, whatever text you are entering, it is going back in the pool of that is used to pool of data that is used to train that model. So you are exposing whatever, information you are putting it, you are ex actually exposing it to the outside world. So that's the data privacy concern there. Context limitations. So context length is how much it remembers in the past to give you the answer. For example, if I start a story, once upon a time, there was a princess name blah blah blah, and then I keep on moving. Right? So after a certain point, that was the main challenge with these neural networks that they forget, okay. What was the story about? What was the name of that princess? Meaning, it will never remember where it all started.

So we need to expand the context lens so that the large language model or any model in the process, it remembers what it talked about. So to increase that context length, of course, it comes at a price. It comes with highway computational resources, and so we have to increase the context length. So, again, a lot of research has been going on in here. The, latest GPT series model, they show context length of eight eight thousand, 8,000, tokens. But, I think, a lot of research for example, Finch Finch is the research area where, a lot of research they are doing, where their researchers are trying to increase the context length without, you know, impacting the, pressure on the computational resources or without compromising with the accuracy.

Next is the lack of logical reasoning. So a lot they they are no doubt, they are good at generating data. But when it comes to logical reason reasoning, meaning breaking down the problem into, logical steps, definitely, large language models are not good at it. So, again, prompt engineering is something where, to some extent, these these things can be taken care of with the chain of, thoughts kind of, prompt engineering techniques. But a lot of times, more complex problems, they are not handled well with the large language models. And then there are ethical concerns. We all know the deep fake, videos that are out there to spread the misinformation. That is one of the use cases where, that are related to the ethical concerns by large language models.

Although there are tools like, MS video authenticator, but definitely that a lot of work needs to be done in this area to detect how large language to stop how large language, models can be misused to spread, you know, misinformation. Next is about the carbon footprint. So what exactly it is? It is the amount of total gas emission that is, put out, specifically not generic gas emission. I would say carbon dioxide that is put out by any activity that we do. So, I have placed here a link that I will share in the chat later that you can go through. It talks about how our normal query that we are putting to large language model is actually creating a lot of carbon footprint. This is another stat that will give you an idea, like, how much of carbon emission is being done, is emitted by these large language models.

So training the BERT model that we have, it emits, like, two eighty four tons of carbon dioxide, which is equivalent to the carbon dioxide emission by five cars in their entire lifetime. That's the kind of carbon that is being emitted. And another stat, which I have taken from this link, which shows the training of GPT three model, which was of one seventy five billion parameters, it was equivalent to the energy that was released. It was equivalent to the energy that can power American households, average American household for, like, one twenty years. So where, this energy is actually emitted? So in large language models, we have two phases. One is like training phase where you, energy is lot of computational resources and carbon footprints are, emitted. And then there is inference phase.

When the model is trained and now you are asking the model, okay, perform so and so, task for me, like machine learning translation or sentiment analysis or whatever. So training phase. These large language models like we, discussed, in earlier slides, that they are trained on massive amount of data. Meaning, the GPT models, if you say, they are trained on, like, literally every information that you see on Internet, it is present in the training model. It it is used in training those GPT models on any other model. So having that amount of data, of course, high power usage is required. So there is a simple formula to it. Like, large model size, meaning more number of parameters on which you are or more, huge amount of data on which you are training the model, that reflects the size of the model.

So large size of model means more energy that is required and which eventually means more carbon footprint. That's the simple understanding that I have with respect to large language models. And then the environmental effect of impact. So to power these large language models, usually, the fossil fuels, coal based electricity is used instead of the greener options. In fact, on that note, recently, Amazon, made this, announcement that they will move they will switch, to nuclear energy to power their data centers instead of using the coal based energy sources. Cooling requirement, of course, there is lot of heat generated when whenever any query is, put to these large language models, they answer us back. And during this process, lot of energy is released. And to keep the devices cool, we need, these, cooling system that again release energy.

In fact, recently, Sam Altman said that when I think the that, Ghibli images trend was, too much of online, there was lot of, popularity in that trend. And then Sam Altman said that our servers are literally melting because of the usage. So a lot of people were using that feature of GPT, and because of which there was a lot of pressure on the, you know, servers, there there was a lot of heat that was generated. And to keep them cool, you know, a lot of resources were being used. Another is where, like I said, the next phase is inference phase or you can say testing phase. So now, once the training is done, now you are using the dataset or, checking these, model capabilities on different tasks. And, you are checking.

So for that, there is operational cost involved, meaning you have the model with you, but now you want to deploy that large language model. Again, it is energy ex intensive. You need hardware resources also. You need a lot of computational resources to make it run. And, Hugging Face is one of the platform that showcase a lot of, large language models which are open source. One can use it there and see how much of energy. They they don't run on your normal computer. They you need, I specifically use Google, Pro subscription, but there are the point is I need to buy different resources for that to make that make it run, the GPU resources. Scalability to put these, large language models to production, of course, they should be scalable because they are continuously emerging. Today, you see one seventy five k, tokens of context.

Then tomorrow, there will be more. So it is keep on increasing. Right? So the point is, we need, the kind of setup that can handle the increasing needs of these large language models, increasing size of these large language models. And then these, runs 24 by seven. Meaning, anytime of the day, you can put query to catch EPT, and you will get the answer. So at the back end, the large language model is running continuously, which is consuming lot of energy. And then, of course, there is lot of carbon footprints associated with it. And then how you can quantify, like, how much of energy is being released by these carbon models by these large language models.

So code carbon is one of the Python library that, that can be used to estimate the carbon emission. For example, if I need to check if I need to see how much of carbon is being emitted, I will put my code in a chunk. I will say tracker dot start, tracker dot end, and in between, I can run my code. At the end, I will get a CSV file, which will showcase, okay, this much of carbon has been emitted. There are these two websites also where you can put the information related to hardware and data and inference that the kind of problem that you are dealing with, trying to solve, text classification or machine learning translation, and it will give you an idea how much of carbon is being emitted by your specific activity.

Strategies include the optimizing the algorithm, meaning you can make it instead of using large language models, you can make small, like pruning, quantization, and distillation. These are the techniques that one can use. You can use green data centers. Like I said, Amazon is moving to, nuclear energy, and then you can integrate it carbon tracking tools like I just suggested, code carbon, etcetera. So that's the key takeaway. And