What is LLM?
LLM is a model trained on vast amount of data from different sources like books, articles, and different websites like wikipedia, reddit, and stackoverflow.
LLMs can performs tasks like text generation, translation, text summarization, and question answering.
How LLM is Trained?
The text from all sources is pre-processed (i.e removing explicit or bias content, labeling text, parsing text, converting text to numbers called word embeddings)
The model start with an empty brain like new born child, then shown numerical version of data (word embeddings), the model is then try to learn the context using parameters (we often heard the term parameters like GPT-3 trained on 175 billion parameters)
Initially, it produces wrong outputs, but during training, it continue to improve its performance by adjusting parameters.
Once trained, its deployed, and made available for public use.
Receiving and Processing input from user
When user enter the input for example “I am a beginner at programming, which language is it better to start at?”
Model first split the input sentences into words often called tokens, like
["I", "am", "a", "beginner", "at", "programming", ""which", "language", "is", "it", "better", "to", "start", "at", "?"]
As computer only understand numbers, so, each of these words/tokens are then converted into numerical forms like the word “I” may be converted into [0.35, -0.12, 0.77, 0.44, ….], same for other tokens as well.
Using these vectors, the model calculates attention scores often refered as attention mechanism to determine which words in the input prompt are most important (e.g words like "beginner," "programming," "language", “start”).
The model then search from its data about programming language that beginner oftenly choose when starting career like Python, JavaScript. (as model is already trained on StackOverflow, Medium, and other articles).
Generating Output
Model then try to generate output, word by word using next word probability. Like possible words that occur at first position is one of [“A”, “The”, …], and as usually the first word is “The”, so it chooses it as it might have highest probability of being at first position.
Then it tries to predict next word, the possible words might be [”most”, “good”, “best”, …], the model may choose “best” as second character, and it repeats it for whole output.
As it uses probability, which also ensures the words that are more relevant for beginner programmers should have highest probability, this way model ensures the generated output should match user needs.
This way, LLMs generates human-like responses.
Top comments (2)
Thats a good beginner explanation, thanks!
thanks for appreciation