November 4, 2022

FLAN-T5, a yummy model superior to GPT-3

FLAN-T5

FLAN-T5

FLAN-T5

By Sofía Sánchez González

Sometimes some artificial intelligence models go unnoticed despite their worth. This is the case with FLAN-T5, a model developed by Google and with a name as appetizing as its NLP power. The California company created a new example of the democratization of artificial intelligence and we explain why. FLAN-T5, a yummy model superior to GPT-3.

What is new about FLAN-T5?

Firstly, we have Google T5 (Text-to-Text Transfer Transformer). T5 consists of transformer-based architecture that uses a text-to-text approach and is the epitome of encoder-decoder excellence in the world of natural language processing (NLP).

But you may be wondering: Google T5 came out a few years ago, did it not? What makes the FLAN-T5 model so yummy and unique?

The main novelty is that the model reasons for itself. We can say, “Tell me yes or no to this question and also explain your answer,” just like in school exams. I wish we had FLAN-T5 back then.

It is not a mechanical model that simply learns by heart. It is capable of breaking down the text and reasoning on it thanks to the datasets that have been used in the training. It is not the typical question-answering model that simply answers you.

Flan-T5 model architecture

The Flan-T5 model is designed with an encoder-decoder architecture optimized for natural language processing tasks. It features multiple layers and attention mechanisms that allow it to understand and generate human-like text efficiently. The model’s architecture enables it to perform well in various applications, including language translation and text summarization.

Flan-T5 vs. GPT-3: A comparative analysis

Flan-T5 and GPT-3 are both state-of-the-art models in the AI landscape. While GPT-3 is known for its extensive language capabilities, Flan-T5 offers a more streamlined architecture, making it efficient for specific tasks like translation and summarization. This section explores their differences in depth.

Why is it superior to GPT-3?

For several reasons:

  1. GPT-3 is a model with a high degree of popularity, but to test it and use it correctly we need a huge computing budget that can seldom be found in a regular home. We need power in our computers that is not easy to get. However, FLAN-T5 does not need large devices because its smaller models or checkpoints are created for the common citizen.
  2. It detects sarcasm and is very intuitive. It is able to reinterpret the questions.
  3. Tested with an input of 5 examples into FLAN-T5 XL (5-shot), the 3 billion parameter model outperforms GPT-3. In fact, there are not many examples to give it and it performs very well with zero-shot tasks.

NLP engineer Manuel Romero, who has already tested the model, sums it up like this: “It is one of the smallest models (3B parameters) with the most natural language understanding that I have seen in my years of experience in the world of NLP.”

All the tasks you can imagine

Google has developed and released this model in five versions (see here):

  • Flan-T5 small
  • Flan-T5 base
  • Flan-T5 large
  • Flan-T5 XL
  • Flan-T5 XXL

If you want concrete examples of what you can do with FLAN-T5, here they are:

  • Translate between several languages (more than 60 languages). Spanish is among the majority because it is the second most used language for training.
  • Summaries.
  • FLAN-T5 answers general questions, for example: “How many minutes should I cook my egg?”
  • It can also answer historical questions or even questions about the future.
  • FLAN-T5 is capable of solving math problems when given the reasoning.

Of course, not everything is an advantage. FLAN-T5 does not calculate results very well when the format deviates from what it knows. The smaller the checkpoint, the less general information is retained. Still, there are many strong capabilities that this Google model offers.

Want to try it?

No problem. We have tested it with Google Colab and we find it very powerful because we did not have to fine-tune it.

We include the Colab in this link so you can try it and do your own research.

Also, here you have the Hugging Face demo.

Tell us what you think!

Learn more about AI models.

About Narrativa

Narrativa® Agentic AI solutions unlock a faster, smarter future for life sciences organizations, helping them to efficiently produce complex, high-volume documentation for regulatory and commercialization workflows. By automating content creation, Narrativa® delivers greater speed, accuracy, and consistency—while ensuring full compliance in highly regulated environments.

The Narrativa® Navigator platform provides secure and specialized Agentic AI-powered automation features. It includes complementary user-friendly tools such as Clinical Atlas for CSR and Protocol generation, Narrative Pathway, TLF Voyager, and Redaction Scout, which operate cohesively to transform clinical data into submission-ready documents for regulatory and commercialization. From database to delivery, pharmaceutical sponsors, biotech firms, and contract research organizations (CROs) rely on Narrativa® to streamline workflows, decrease costs, and reduce time-to-market across the clinical lifecycle and, more broadly, throughout their entire businesses.

Explore www.narrativa.com and follow on LinkedIn, Facebook, Instagram, and X.