What if we were able to teach machines how to think? Or better, what if they could dream? A few years back this notion would have been considered an absurd idea. But here we are looking at the idea taking shape in our reality. We live in a world of miracles where technology changes everyday lives. And now in an era of artificial intelligence, generative pre-trained transformers are a stepping stone, a foundation that paves the way for such upcoming advancements.
What is a generative pre-trained transformer if you are not a geeky nerd? The short answer is that generative pre-trained transformers aka gpt are machines capable of understanding and generating human-like language. Simply, it means teaching machines to understand and behave like humans.
The technology behind the generative pre-trained transformer is a phenomenal work of art. Imagine teaching a machine that only understands 0s and 1s, what you are saying, or better, a machine that generates responses based on your questions. It sounds like a thing that is straight out of a fictional movie. But the more fascinating the technology sounds, the more fascinating the technology behind its development.
Generative pre-trained models are divided into two parts — attention block and multilayer perception. We will discuss each in detail but first, we need to understand some fundamental terms.
Natural Language processing is the part of preparing our human language in machine-readable terms. The general overview is that the sentences are broken into small chunks called tokens which retain maximum information, discarding the unnecessary parts. These tokens are then converted into embeddings and passed onto the transformer. Embeddings are essentially a matrix representation that consists of numbers for each token. The goal of using embeddings is to retain the contextual information and also solve any ambiguity that may arise.
Once the transformer receives the input embedding, it is sent to the attention block also called as self-attention block where the model tries to weigh the importance of each word in the sentence given the other words. This helps the model understand the context of the input query which in turn helps it to generate the next word.
The way this is done is essentially that input query token embeddings are multiplied with the key vector of the same query via a dot product and then scaled and normalised so that all embeddings stay in a certain range. This generated matrix is then passed onto the multi-layer perception block for further evaluation.
The Multi-layer perception consists of a fully connected neural network with one or more hidden layers. The input matrix is then multiplied by the weights of the perception in each layer and finally using an activation function, the output is obtained. The goal of using multi-layer perception is to introduce non-linearity into our model. Why non-linearity? Because in real-world scenarios, we won’t always get data that follows a linear approach but rather a non-linear one. Also, this block provided the feature extraction by transforming the input matrix into higher dimensions which helps the model learn new kinds of patterns that are not available in lower dimensions.
Now all these processes of attention block and multi-layer perception are repeated over and over again and finally, we get the output embeddings in the transformer. The output we get is a probability distribution of all the words that might come next in the given input query and the most relevant word is chosen to be the next word.
This whole input query with the next word is again sent to the transformer to generate the next word. This process is repeated over and over again till the desired output is obtained. Thus this type of transformer is called a generative transformer as it quite literally generates each word in one iteration.
Now, the word pre-trained is associated with transformer as the model is first trained on a large set of data to tune the model's weight before it is used for general purposes. They are also called Large Language Models aka LLMs as they are trained on vast datasets. That’s a topic for another time.
Now, the future of generative pre-trained transformers is quite convincing as we see exponential growth in its successful implementations by a lot of companies like Openai, Google and Meta just to name a few. Openai and its introduction with the GPT 3 model that took the world by storm. GPT 3 proved how these generative models can impact our day-to-day lives. We saw it write poetry, code, and articles, help us with our academic assignments and much more. The future that comes with generative pre-trained transformers is quite promising but at the same time quite frightening.
Just like a coin has two sides, GPT also has both good and bad sides. And to judge it on either of its sides is a crime in itself.
As we stand on the brink of a new era in AI, it’s up to us to shape the future of GPT technology for the benefit of all.
You can also follow me on medium for more such blogs.
Thank you for reading. I hope you have a great day!
Top comments (0)