How Democratized Large Language Models Boost AI Development

author-image

By

Photo by drmakete lab on Unsplash 

Each month, Intel VP and GM for Open Ecosystem Arun Gupta holds a live Twitter Spaces conversation with changemakers in the open source community. Follow him on Twitter to tune into the next conversation. 

In this month’s conversation, Gupta talks with Julien Simon, the chief evangelist at Hugging Face*. The startup began life as a chatbot for teens but has long outgrown its cute façade, recently releasing HuggingChat, an open source version of ChatGPT in a bid for “transparency, inclusivity, accountability and distribution of power” in artificial intelligence. 

Gupta and Simon discuss large language models (LLMs), Hugging Face’s commitment to democratizing AI, and the evolution of generative AI. This conversation has been edited and condensed for brevity and clarity.

What Slows AI Down

Arun Gupta: Hugging Face advances open source libraries while also offering commercial services to help companies start production quicker.

Julien Simon: Exactly. Production is what really slows projects down. Everything looks fine in the sandbox, but once you start deploying models, the latency budget and hosting costs can be a problem.

Arun Gupta: Can you tell us more about latency budgets?

Julien Simon: Let’s say an e-commerce site has a chatbot and a product search. For the best user experience, you want the product search and chatbot to respond less than X milliseconds, and you work from there. It’s not easy to optimize large models out of the box. We spend a lot of time with our hardware partners like Intel on custom accelerators to optimize those large models.

The Power of LLMs

Arun Gupta: Can you explain what an LLM is?

Julien Simon: An LLM is a deep learning model that uses transformer architecture. They’re not trained for a particular objective, they’re just trained to extract content from that huge quantity of available language data, like Wikipedia* and YouTube* captions.  

You can't do much with those LLMs, they’re just a starting point to build task-specific models. Hugging Face helps companies use LLMs to specialize data for a particular task (called fine-tuning), such as building a summarization model for documents in your particular context, like financial services or life sciences.

Arun Gupta: So LLMs pick up the data from public domain, put that into a model, and allow users to fine-tune it for a specific purpose.

Julien Simon: Exactly. That’s why transformer models are popular, because companies can take them off the shelf and specialize them for a particular problem with limited compute effort. You stand on the shoulders of the LLM.  

Arun Gupta: That’s the open source philosophy, standing on the shoulders of giants to apply someone’s work and contextualize it for your organization.  

Julien Simon: It’s a fascinating model. Top research organizations—like Google*, Microsoft*, and Intel—are actively researching new architectures, pretraining those models, and making them available for everyone to use. Large companies do the work once, and now those models are available to the community on our hub.  

Arun Gupta: Can you explain what the hub is? 

Julien Simon: On the Hugging Face website, we host close to 200,000 models and over 30,000 data sets, all open source. You can grab those models with one line of code and evaluate them, test them, and customize them. The models are pretrained and ready to go, so you can experiment with them in a matter of hours—not days, weeks, or months. 

Arun Gupta: Can LLMs only come from corporations, or do individual developers have the compute resources to create them? 

Julien Simon: Since models usually train on terabytes of data, it can cost several million dollars. Individual developers can certainly contribute to architectural advances, but it requires a large company with both the research team and funding.

Open vs. Closed Models: Choice is Everything

Arun Gupta: Basically, you’re providing those transformers on the hub so the community can use LLMs to derive business value from them. 

Julien Simon: It’s a fascinating space. OpenAI* decided to make GPT-3 and following versions closed source. We believe open, transparent models lead to more innovation and more safety. While OpenAI does great work, customers are concerned about privacy and intellectual property—what happens to the data you send to closed models? We believe there’s room for both open and closed models, and at the end of the day customers decide what works best for them. 

Arun Gupta: That makes sense. The richness of a model is based not just on the volume of data, but the diversity of the data as well. Some of that diversity can only happen in open source domains. But as you say, each path has advantages. 

Julien Simon: We’re working with the open source community to create alternatives, and we see new amazing open source models appear every day. They’re a fraction of the size of the closed source models, and you can deploy them at a reasonable cost. Hardware optimization techniques like quantization help accelerate the models even more. If you think you need expensive GPUs and a big hosting budget, think again.  

Arun Gupta: Right, most organizations can’t afford to build and manage big deployments. The value of democratized AI is that everybody has access to hardware and generative AI they can use to drive business value. 

Julien Simon: That’s what democratization means to us. Companies should be able to run state-of-the-art models on hardware that is easy to procure and repurpose. There’s room for companies using expensive GPUs, but choice is everything. 

The Evolution of Generative AI

Arun Gupta: Let’s talk about generative AI. Can you define it and tell us how it’s different from previous models? 

Julien Simon: Generative AI models generate new data from existing data. Stable Diffusion* is one example—you input a text prompt, and the AI outputs an entirely new image based on the prompt. These models are different from previous models like extractive AI, which simply extract or classify information that is present in the original data. You already know what the output is going to be, whereas the results of generative AI can be surprising, as we’ve seen with ChatGPT. 

Arun Gupta: So that’s why generative AI requires an LLM on the backend because it’s generating new data. 

Julien Simon: Yes. For text-to-image generation, the first step is to understand the context of the text prompt, and then use that context to generate an image. That’s a complicated process. Any time text is involved, you need an LLM to understand the underlying context and perform whatever the task requires.  

Arun Gupta: You mentioned GPT, an LLM created by OpenAI. The architecture behind GPT, which stands for generative pretrained transformer, has been around for a while. But GPT-3 became more interactive and able to understand more context. What made it so popular? 

Julien Simon: Everybody is impressed by the human-like quality of the answers. Not long ago, generative models had a robotic feel to them. You would have never mistaken a chatbot for a human. But reinforcement learning from human feedback (RLHF) changed that. RLHF is where once a model provides an answer, humans score the quality of the answer and provide a better answer if the generated answer is not good enough. The human feedback is fed back into the training set and the model is trained to maximize the quality score of the answer. That technique really pushed generative models to the next level. If you want to see an end-to-end example, check out our blog post on using RLHF.

Arun Gupta: Pretend I’m a Python* developer and I want to build the next ChatGPT. What would I need to do?

Julien Simon: That blog is a good starting point. We also released our own chat application called Hugging Chat*. It’s based on the Open Assistant model*, which is open source, and so is the UI for the chat application and the inference server, called text generation inference. These are all open source building blocks you can use to replicate a model and fine tune it for your own data.

Hugging Face in Action: Who’s Using Transformers

Arun Gupta: What about the transformers  available on Hugging Face, can you give some fun examples of how people are using them?

Julien Simon: We see use cases everywhere. You’re probably familiar with Musixmatch*. They use our transformers to bring lyrics to Spotify*. We also worked with a healthcare startup in France called Synapse Medicine* to build a medication chatbot integrated into the leading government website for healthcare in France…It’s not just Fortune 500 companies, we see startups getting their models to production in a matter of weeks. The best way to demystify this technology is to play around with it, and all it takes is some basic Python skills.

Arun Gupta: That goes back to the philosophy of democratized AI. A handful of companies invest millions in building LLMs, and now everyone can leverage them.  

Julien Simon: That’s exactly right. I tell customers that if they believe AI is transformative—and it’s probably even more transformative than the cloud was—how could you not own it? You don’t want someone else to be in control of your future. You want to be in control of your future.

Follow Arun Gupta on Twitter for monthly Twitter Spaces live sessions with changemakers in the open source community.