DEV Community

Cover image for How to run Llama model locally on MacBook Pro and Function calling in LLM -Llama web search agent breakdown
selvakumar palanisamy
selvakumar palanisamy

Posted on

How to run Llama model locally on MacBook Pro and Function calling in LLM -Llama web search agent breakdown

Easily install Open source Large Language Models (LLM) locally on your Mac with Ollama.On a basic M1 Pro Macbook with 16GB memory, this configuration takes approximately 10 to 15 minutes to get going. The model itself starts up in less than ten seconds after the setup is finished.

  1. Go to >> https://ollama.com/download/mac and download the software for your OS

Image description

2.Open the Ollama application and Move to Applications

Image description

3.

ollama run llama3
Enter fullscreen mode Exit fullscreen mode

Depending on your network speed, the download and build take ten to fifteen minutes to finish.It is operating properly if the browser displays "Ollama is running" after accessing http://localhost:11434.


(venv) admin@admins-MacBook-Pro selvapal % ollama run llama3
>>> list the climate chnage reasons 
Here are some of the main reasons contributing to climate change:

1. **Greenhouse gases**: The burning of fossil fuels such as coal, oil, and gas releases carbon dioxide (CO2), methane (CH4), and other greenhouse gases into the atmosphere, trapping heat 
and leading to global warming.
2. **Deforestation**: The clearance of forests for agriculture, urbanization, and logging reduces the ability of trees to absorb CO2, a key greenhouse gas, contributing to climate change.
3. **Land use changes**: Changes in land use, such as the conversion of natural habitats into agricultural lands or cities, can lead to increased emissions of greenhouse gases and loss of 
carbon sequestration capabilities.
4. **Agriculture**: The production of meat, especially beef, and other animal products leads to increased methane emissions and deforestation for livestock grazing and feed crop 
production.
5. **Industrial processes**: Industrial activities such as cement production, steel manufacturing, and chemical processing release large amounts of greenhouse gases into the atmosphere.
6. **Population growth**: As the global population grows, so does the demand for energy, food, and resources, leading to increased emissions and consumption.
7. **Consumption patterns**: The way we live our daily lives, including transportation, diet, and consumer choices, contributes to climate change through energy use, waste generation, and 
resource depletion.
8. **Waste management**: Inadequate waste management practices, such as the disposal of plastic waste in landfills or oceans, can lead to methane emissions and contribute to climate 
change.
9. **Fugitive emissions**: The extraction, processing, and transportation of fossil fuels can release large amounts of greenhouse gases into the atmosphere, often unnoticed or unreported.
10. **Aerosol emissions**: The burning of biomass, such as wood or agricultural waste, releases aerosols that can trap heat and contribute to climate change.
11. **Nitrous oxide (N2O) emissions**: Agricultural practices, such as fertilizer use and manure management, release N2O, a potent greenhouse gas.
12. **Chlorofluorocarbons (CFCs)**: The use of CFC-containing products, such as refrigerants and propellants, can lead to the destruction of stratospheric ozone and contribute to climate 
change.
13. **Methane emissions from natural sources**: Natural sources, such as wetlands, rice paddies, and termites, release methane into the atmosphere, which contributes to climate change.
14. **Ocean fertilization**: The addition of nutrients to the ocean to stimulate phytoplankton growth can lead to increased CO2 absorption but also releases other greenhouse gases and 
alters marine ecosystems.
15. **Geoengineering**: Large-scale engineering projects aimed at mitigating climate change, such as solar radiation management or carbon capture and storage, can have unintended 
consequences and potentially exacerbate climate change.

These are some of the main reasons contributing to climate change. It's essential to understand these factors to develop effective strategies for reducing greenhouse gas emissions and 
mitigating the impacts of climate change.

>>> Send a message (/? for help)
Enter fullscreen mode Exit fullscreen mode

Benchmark Llama3 performance

admin@admins-MacBook-Pro ~ % git clone https://github.com/selvakumarsai/llm-benchmark.git
Cloning into 'llm-benchmark'...
remote: Enumerating objects: 28, done.
remote: Counting objects: 100% (28/28), done.
remote: Compressing objects: 100% (25/25), done.
remote: Total 28 (delta 12), reused 6 (delta 1), pack-reused 0
Receiving objects: 100% (28/28), 16.26 KiB | 555.00 KiB/s, done.
Resolving deltas: 100% (12/12), done.
admin@admins-MacBook-Pro ~ % cd llm-benchmark
admin@admins-MacBook-Pro llm-benchmark % pip install -r requirements.txt
Defaulting to user installation because normal site-packages is not writeable
Collecting ollama (from -r requirements.txt (line 1))
  Downloading ollama-0.2.1-py3-none-any.whl.metadata (4.2 kB)
Requirement already satisfied: pydantic in /Users/admin/Library/Python/3.9/lib/python/site-packages (from -r requirements.txt (line 2)) (2.8.0)
Requirement already satisfied: httpx<0.28.0,>=0.27.0 in /Users/admin/Library/Python/3.9/lib/python/site-packages (from ollama->-r requirements.txt (line 1)) (0.27.0)
Requirement already satisfied: annotated-types>=0.4.0 in /Users/admin/Library/Python/3.9/lib/python/site-packages (from pydantic->-r requirements.txt (line 2)) (0.7.0)
Requirement already satisfied: pydantic-core==2.20.0 in /Users/admin/Library/Python/3.9/lib/python/site-packages (from pydantic->-r requirements.txt (line 2)) (2.20.0)
Requirement already satisfied: typing-extensions>=4.6.1 in /Users/admin/Library/Python/3.9/lib/python/site-packages (from pydantic->-r requirements.txt (line 2)) (4.6.2)
Requirement already satisfied: anyio in /Users/admin/Library/Python/3.9/lib/python/site-packages (from httpx<0.28.0,>=0.27.0->ollama->-r requirements.txt (line 1)) (4.4.0)
Requirement already satisfied: certifi in /Users/admin/Library/Python/3.9/lib/python/site-packages (from httpx<0.28.0,>=0.27.0->ollama->-r requirements.txt (line 1)) (2023.5.7)
Requirement already satisfied: httpcore==1.* in /Users/admin/Library/Python/3.9/lib/python/site-packages (from httpx<0.28.0,>=0.27.0->ollama->-r requirements.txt (line 1)) (1.0.5)
Requirement already satisfied: idna in /Users/admin/Library/Python/3.9/lib/python/site-packages (from httpx<0.28.0,>=0.27.0->ollama->-r requirements.txt (line 1)) (3.4)
Requirement already satisfied: sniffio in /Users/admin/Library/Python/3.9/lib/python/site-packages (from httpx<0.28.0,>=0.27.0->ollama->-r requirements.txt (line 1)) (1.3.1)
Requirement already satisfied: h11<0.15,>=0.13 in /Users/admin/Library/Python/3.9/lib/python/site-packages (from httpcore==1.*->httpx<0.28.0,>=0.27.0->ollama->-r requirements.txt (line 1)) (0.14.0)
Requirement already satisfied: exceptiongroup>=1.0.2 in /Users/admin/Library/Python/3.9/lib/python/site-packages (from anyio->httpx<0.28.0,>=0.27.0->ollama->-r requirements.txt (line 1)) (1.2.1)
Downloading ollama-0.2.1-py3-none-any.whl (9.7 kB)
Installing collected packages: ollama
Successfully installed ollama-0.2.1

admin@admins-MacBook-Pro llm-benchmark % python3.10 -m pip install ollama                                        
Collecting ollama
  Using cached ollama-0.2.1-py3-none-any.whl.metadata (4.2 kB)
Requirement already satisfied: httpx<0.28.0,>=0.27.0 in /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages (from ollama) (0.27.0)
Requirement already satisfied: anyio in /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages (from httpx<0.28.0,>=0.27.0->ollama) (4.4.0)
Requirement already satisfied: certifi in /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages (from httpx<0.28.0,>=0.27.0->ollama) (2024.6.2)
Requirement already satisfied: httpcore==1.* in /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages (from httpx<0.28.0,>=0.27.0->ollama) (1.0.5)
Requirement already satisfied: idna in /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages (from httpx<0.28.0,>=0.27.0->ollama) (3.7)
Requirement already satisfied: sniffio in /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages (from httpx<0.28.0,>=0.27.0->ollama) (1.3.1)
Requirement already satisfied: h11<0.15,>=0.13 in /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages (from httpcore==1.*->httpx<0.28.0,>=0.27.0->ollama) (0.14.0)
Requirement already satisfied: exceptiongroup>=1.0.2 in /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages (from anyio->httpx<0.28.0,>=0.27.0->ollama) (1.2.1)
Requirement already satisfied: typing-extensions>=4.1 in /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages (from anyio->httpx<0.28.0,>=0.27.0->ollama) (4.12.2)
Using cached ollama-0.2.1-py3-none-any.whl (9.7 kB)
Installing collected packages: ollama
Successfully installed ollama-0.2.1

admin@admins-MacBook-Pro llm-benchmark % python3.10 benchmark.py --verbose --prompts "how to do organic farming"

Verbose: True
Skip models: []
Prompts: ['how to do organic farming']
Evaluating models: ['llama3:latest', 'gemma:latest']



Benchmarking: llama3:latest
Prompt: how to do organic farming
Organic farming is a type of agriculture that avoids the use of synthetic fertilizers, pesticides, and genetically modified organisms (GMOs). Instead, organic farmers rely on natural methods to control pests and diseases, and use compost and other natural amendments to improve soil fertility. Here are some key principles and practices of organic farming:

1. **Soil conservation**: Organic farmers focus on building healthy soil through the use of crop rotations, cover crops, and organic amendments like compost.
2. **Crop rotation**: Rotating crops helps to break disease cycles, improves soil structure, and increases biodiversity.
3. **Composting**: Composting is a key process in organic farming that turns waste into nutrient-rich fertilizer.
4. **Manure management**: Organic farmers use animal manure as a natural fertilizer and to improve soil health.
5. **Integrated pest management (IPM)**: Instead of relying on pesticides, organic farmers use IPM techniques like crop rotation, biological control, and cultural controls to manage pests.
6. **Cover cropping**: Planting cover crops in the off-season helps to prevent erosion, adds nutrients to the soil, and provides habitat for beneficial insects.
7. **Biodiversity conservation**: Organic farming aims to conserve biodiversity by promoting ecosystem services and supporting beneficial insects and microorganisms.
8. **Minimum tillage or no-till**: Reducing soil disturbance helps to preserve soil structure, reduce erosion, and promote soil biota.
9. **Organic amendments**: Using natural amendments like compost, manure, or green manure instead of synthetic fertilizers.
10. **Record keeping**: Organic farmers keep detailed records of their farming practices, including crop rotations, pest management, and soil health.

Some specific techniques used in organic farming include:

1. **Biodynamic agriculture**: A holistic approach that considers the moon's cycles and uses natural preparations to promote soil fertility and plant growth.
2. **Permaculture**: A design system that aims to create self-sustaining ecosystems by mimicking nature's patterns and relationships.
3. **Agroforestry**: Integrating trees into agricultural landscapes to improve soil health, reduce erosion, and increase biodiversity.
4. **Green manure**: Planting legumes or other cover crops to fix nitrogen in the soil and add organic matter.
5. **Crop rotation with legumes**: Incorporating legume crops like beans or peas into crop rotations to improve soil fertility.

To get started with organic farming, you can:

1. **Research local regulations**: Familiarize yourself with national and regional laws regarding organic farming practices and certifications.
2. **Start small**: Begin by converting a small portion of your land to organic methods and gradually scale up as you gain experience.
3. **Join an organic farm network**: Connect with other organic farmers, attend workshops, and share knowledge to learn from their experiences.
4. **Develop a business plan**: Create a plan for marketing and selling your organic products, including pricing and target markets.
5. **Monitor and adjust**: Continuously monitor your soil health, crop yields, and pest management strategies, and make adjustments as needed.

Remember that transitioning to organic farming takes time, patience, and dedication. Start by making small changes and gradually build up your knowledge and skills.Response: 

----------------------------------------------------
        llama3:latest
            Prompt eval: 13.14 t/s
            Response: 6.23 t/s
            Total: 6.31 t/s

        Stats:
            Prompt tokens: 16
            Response tokens: 654
            Model load time: 2.06s
            Prompt eval time: 1.22s
            Response time: 105.04s
            Total time: 108.32s
----------------------------------------------------



Benchmarking: gemma:latest
Prompt: how to do organic farming
Enter fullscreen mode Exit fullscreen mode

Function Calling - Llama web search agent breakdown

Larger models like GPT4, Claude anthorpics are fine tune to be able to perform function calling and can attach tools like online search and decide they need to use the tools or not and execute the tools.

Smaller models like Olama need  precise definition and routing.

Langchain enables you to define conditional routing techniques to create a directed graph flows in the form of a state machine.

Image description

  1. The web research agent first goes through a routing step where it looks at the user's query and determines whether or not we need context.

  2. This is the first llm call. If it determines that context is not needed, it goes to the generation step where it generates its final output.

  3. If it determines that context is needed, it goes to a transform query state where it takes the user's initial question and optimises it for a web search.

  4. The optimised search query then goes into the web search step , and all of the context from the web search step used to generate the final report.

First load the longchain dependencies

# Displaying final output format
from IPython.display import display, Markdown, Latex
# LangChain Dependencies
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser, StrOutputParser
from langchain_community.chat_models import ChatOllama
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_community.utilities import DuckDuckGoSearchAPIWrapper
from langgraph.graph import END, StateGraph
# For State Graph 
from typing_extensions import TypedDict
import os
Enter fullscreen mode Exit fullscreen mode

Setup the environment variables

# Environment Variables
#os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "xxxxxxxxxxxxxxxxxx" 
os.environ["LANGCHAIN_PROJECT"] = "L3 Research Agent"
Enter fullscreen mode Exit fullscreen mode

Next define the LLM

# Defining LLM
local_llm = 'llama3'
llama3 = ChatOllama(model=local_llm, temperature=0)
llama3_json = ChatOllama(model=local_llm, format='json', temperature=0)
Enter fullscreen mode Exit fullscreen mode

Install websearch api

pip install -U duckduckgo-search
Enter fullscreen mode Exit fullscreen mode
# Web Search Tool

wrapper = DuckDuckGoSearchAPIWrapper(max_results=25)
web_search_tool = DuckDuckGoSearchRun(api_wrapper=wrapper)
Enter fullscreen mode Exit fullscreen mode

Image description

Create the different prompts first for function call routing and to define nodes

First Generate report Prompt

# Generation Prompt

generate_prompt = PromptTemplate(
    template="""

    <|begin_of_text|>

    <|start_header_id|>system<|end_header_id|> 

    You are an AI assistant for Research Question Tasks, that synthesizes web search results. 
    Strictly use the following pieces of web search context to answer the question. If you don't know the answer, just say that you don't know. 
    keep the answer concise, but provide all of the details you can in the form of a research report. 
    Only make direct references to material if provided in the context.

    <|eot_id|>

    <|start_header_id|>user<|end_header_id|>

    Question: {question} 
    Web Search Context: {context} 
    Answer: 

    <|eot_id|>

    <|start_header_id|>assistant<|end_header_id|>""",
    input_variables=["question", "context"],
)

# Chain
generate_chain = generate_prompt | llama3 | StrOutputParser()
Enter fullscreen mode Exit fullscreen mode

Second define Router prompt

# Router

router_prompt = PromptTemplate(
    template="""

    <|begin_of_text|>

    <|start_header_id|>system<|end_header_id|>

    You are an expert at routing a user question to either the generation stage or web search. 
    Use the web search for questions that require more context for a better answer, or recent events.
    Otherwise, you can skip and go straight to the generation phase to respond.
    You do not need to be stringent with the keywords in the question related to these topics.
    Give a binary choice 'web_search' or 'generate' based on the question. 
    Return the JSON with a single key 'choice' with no premable or explanation. 

    Question to route: {question} 

    <|eot_id|>

    <|start_header_id|>assistant<|end_header_id|>

    """,
    input_variables=["question"],
)

# Chain
question_router = router_prompt | llama3_json | JsonOutputParser()

# Test Run
question = "What's up?"
print(question_router.invoke({"question": question}))
Enter fullscreen mode Exit fullscreen mode

{'choice': 'generate'}

Third, Define Query transformation prompt to transform the user query to optimised query for websearch state

# Query Transformation

query_prompt = PromptTemplate(
    template="""

    <|begin_of_text|>

    <|start_header_id|>system<|end_header_id|> 

    You are an expert at crafting web search queries for research questions.
    More often than not, a user will ask a basic question that they wish to learn more about, however it might not be in the best format. 
    Reword their query to be the most effective web search string possible.
    Return the JSON with a single key 'query' with no premable or explanation. 

    Question to transform: {question} 

    <|eot_id|>

    <|start_header_id|>assistant<|end_header_id|>

    """,
    input_variables=["question"],
)

# Chain
query_chain = query_prompt | llama3_json | JsonOutputParser()

# Test Run
question = "What's happened recently with Tesla?"
print(query_chain.invoke({"question": question}))
Enter fullscreen mode Exit fullscreen mode

{'query': 'Tesla recent news'}

*Define the Graph state and Nodes for the conditional routing *

# Graph State
class GraphState(TypedDict):
    """
    Represents the state of our graph.

    Attributes:
        question: question
        generation: LLM generation
        search_query: revised question for web search
        context: web_search result
    """
    question : str
    generation : str
    search_query : str
    context : str

# Node - Generate

def generate(state):
    """
    Generate answer

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): New key added to state, generation, that contains LLM generation
    """

    print("Step: Generating Final Response")
    question = state["question"]
    context = state["context"]

    # Answer Generation
    generation = generate_chain.invoke({"context": context, "question": question})
    return {"generation": generation}

# Node - Query Transformation

def transform_query(state):
    """
    Transform user question to web search

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Appended search query
    """

    print("Step: Optimizing Query for Web Search")
    question = state['question']
    gen_query = query_chain.invoke({"question": question})
    search_query = gen_query["query"]
    return {"search_query": search_query}


# Node - Web Search

def web_search(state):
    """
    Web search based on the question

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Appended web results to context
    """

    search_query = state['search_query']
    print(f'Step: Searching the Web for: "{search_query}"')

    # Web search tool call
    search_result = web_search_tool.invoke(search_query)
    return {"context": search_result}


# Conditional Edge, Routing

def route_question(state):
    """
    route question to web search or generation.

    Args:
        state (dict): The current graph state

    Returns:
        str: Next node to call
    """

    print("Step: Routing Query")
    question = state['question']
    output = question_router.invoke({"question": question})
    if output['choice'] == "web_search":
        print("Step: Routing Query to Web Search")
        return "websearch"
    elif output['choice'] == 'generate':
        print("Step: Routing Query to Generation")
        return "generate"
Enter fullscreen mode Exit fullscreen mode

*Define and compile workflow *

# Build the nodes
workflow = StateGraph(GraphState)
workflow.add_node("websearch", web_search)
workflow.add_node("transform_query", transform_query)
workflow.add_node("generate", generate)

# Build the edges
workflow.set_conditional_entry_point(
    route_question,
    {
        "websearch": "transform_query",
        "generate": "generate",
    },
)
workflow.add_edge("transform_query", "websearch")
workflow.add_edge("websearch", "generate")
workflow.add_edge("generate", END)

# Compile the workflow
local_agent = workflow.compile()
Enter fullscreen mode Exit fullscreen mode

*Finally define the agent to run the query *

def run_agent(query):
    output = local_agent.invoke({"question": query})
    print("=======")
    display(Markdown(output["generation"]))
Enter fullscreen mode Exit fullscreen mode

*Test the different flows *

# Test it out!
run_agent("What's been up with Tesla recently?")

Enter fullscreen mode Exit fullscreen mode

Step: Routing Query
Step: Routing Query to Web Search
Step: Optimizing Query for Web Search
Step: Searching the Web for: "Tesla recent news"
Step: Generating Final Response

Based on the provided web search context, here's what's been up with Tesla recently:

Tesla is reportedly preparing to build a $25,000 electric car built on its next-generation engineering platform.
Elon Musk has announced that Tesla will launch new EVs in 2025, including affordable ones, which will blend current and next-generation platforms.
The company has set a goal to start production of a new mass-market electric vehicle codenamed "Redwood" in mid-2025.
Tesla has produced its 20 millionth 4680 cell at Gigafactory Texas, a key step for its new vehicle programs. The 4680 cell is designed to reduce battery cost by over 50% and has a capacity of about 100 Wh.
Tesla reported first-quarter adjusted earnings per share of $0.45, below the estimated $0.52, on revenue of $21.30 billion, which missed forecasts for $22.3 billion.
The company set a new delivery record for the fourth quarter and met its 2023 delivery target, shaking off a third-quarter miss and assuaging investors concerned with any hiccups as it prepares to launch new products.
Tesla has issued two recalls on the Cybertruck, its third and fourth since the model was introduced late last year. The latest recall affects almost all of the nearly 12,000 trucks on the road.
The company is currently testing approximately 35 trucks for its long-range Semi, which will have a range of up to 500 miles.
Tesla's stock rose as much as 4.5% on Tuesday after the company delivered a record number of vehicles in the three months to the end of June.
Overall, it seems that Tesla is preparing to launch new products and expand its offerings, including affordable electric cars and long-range Semi trucks. The company has also faced some challenges, including recalls and missed earnings estimates, but has managed to set new delivery records and meet its 2023 delivery target.


# Test it out!
run_agent("How are you doing today?")

Enter fullscreen mode Exit fullscreen mode

Step: Routing Query
Step: Routing Query to Generation
Step: Generating Final Response

I'm just an AI, I don't have feelings or emotions like humans do. I am functioning properly and ready to assist with any research questions you may have. I don't have personal experiences or opinions, so I won't be able to provide a subjective assessment of my "mood" or "well-being."

Ref : https://www.youtube.com/watch?v=-lnR9oU0Jl8&t=0s

Top comments (0)