DEV Community

Davide Santangelo
Davide Santangelo

Posted on

Building a Tiny Language Model (LLM) in Ruby: A Step-by-Step Guide - V3 "Integrating Reasoning into the Tiny LLM"

1. Understanding the Role of Reasoning in Language Models

Recent advancements in LLMs have shown that explicit "chain-of-thought" reasoning (i.e. generating intermediate steps before the final answer) can help models solve complex tasks by breaking them into smaller steps. In our context, we’ll simulate this process using our Markov Chain model by adding a new method that:

  • Generates a chain-of-thought: A short sequence of intermediate tokens (or "thoughts") that reflect the model’s internal "reasoning" about the text.
  • Uses the reasoning as context: The final text generation then uses the last token(s) of the reasoning phase as a seed, ideally guiding the output toward more coherent or context-rich responses.

2. Implementing Reasoning in the Markov Chain Model

We’ll extend the original MarkovChain class by adding a new method, generate_with_reasoning. The overall strategy is as follows:

  • Phase 1: Reasoning Generation
    We initiate a “reasoning” phase by choosing a seed (or a random key from the chain if none is provided). Then, we run a loop for a configurable number of reasoning steps. This sequence forms our chain-of-thought.

  • Phase 2: Final Output Generation
    Once the chain-of-thought is built, we extract its last key (or a suitably derived seed) and pass it to our original generate method.

# Extend the existing MarkovChain class
class MarkovChain
  # (The original initialize, train, and generate methods remain unchanged)

  # New method: generate_with_reasoning
  # max_words: maximum words for the final output
  # reasoning_steps: number of intermediate reasoning tokens to generate
  # seed: an optional seed for the reasoning phase
  def generate_with_reasoning(max_words = 50, reasoning_steps = 5, seed = nil)
    # PHASE 1: Generate chain-of-thought reasoning
    reasoning_tokens = []

    # Use the provided seed if valid, else choose a random key from the chain
    current_seed = seed && @chain.key?(seed) ? seed : @chain.keys.sample
    reasoning_tokens << current_seed

    reasoning_steps.times do
      current_context = current_seed.split.last(@order).join(" ")
      possible_next = @chain[current_context]
      break if possible_next.nil? || possible_next.empty?

      next_word = possible_next.sample
      reasoning_tokens << next_word

      current_words = (current_seed.split + [next_word]).last(@order)
      current_seed = current_words.join(" ")
    end

    chain_of_thought = "Chain-of-Thought: " + reasoning_tokens.join(" ")

    # PHASE 2: Final output generation
    final_seed = reasoning_tokens.last
    final_output = generate(max_words, final_seed)

    "#{chain_of_thought}\n\nFinal Output: #{final_output}"
  end
end
Enter fullscreen mode Exit fullscreen mode

3. Testing the Enhanced Model

# Sample text data for training
sample_text = <<~TEXT
  "It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness..."
  "A tale of two cities" by Charles Dickens is a classic example of contrast and deep reasoning in literature.
TEXT

# Create and train the model
model = MarkovChain.new(2)
model.train(sample_text)
puts "Training complete!"

# Generate text using the reasoning-enhanced method
puts "\nGenerating text with reasoning:\n\n"
puts model.generate_with_reasoning(50, 5, "it was")
Enter fullscreen mode Exit fullscreen mode

4. Discussion and Limitations

  • Simplicity vs. Complexity:
    Our method is an illustrative example. Unlike neural LLMs that are trained to perform multi-step reasoning, our Markov Chain-based approach simply samples from a fixed probability distribution.

  • Enhancement Opportunities:
    You might consider:

    • Training a separate chain on a corpus of "reasoning" texts.
    • Incorporating feedback loops where the final output is evaluated and used to refine the reasoning phase.
    • Experimenting with different numbers of reasoning steps and orders to observe how the final output changes.
  • Educational Value:

    Despite its limitations, this approach helps illustrate how additional processing layers (like reasoning or chain-of-thought) can be conceptually integrated into language models.

Conclusion

In this third part of our tiny LLM guide, we expanded our Ruby-based Markov Chain model by simulating a reasoning phase. This additional “chain-of-thought” generation provides a glimpse into how modern LLMs incorporate intermediate reasoning to enhance coherence and context in generated text.

Top comments (0)