DEV Community

huizhou92
huizhou92

Posted on

Go Program pattern 04:Map-Reduce

Map-Reduce is a programming paradigm used for processing large-scale datasets. It helps simplify the process of parallel computation and improves computational efficiency.

This article is first published in the medium MPP plan. If you are a medium user, please follow me in medium. Thank you very much.

First, let's understand the concepts of Map and Reduce.

  • Map: In the Map phase, the input dataset is divided into a series of key-value pairs, and the same operation is applied to each key-value pair. This operation can be a function or a code block used to process each key-value pair and generate intermediate results.
  • Reduce: In the Reduce phase, the intermediate results generated in the Map phase are combined and processed to obtain the final output result. In the Reduce phase, we can aggregate, summarize, or perform other operations on intermediate results with the same key.

The core idea of the Map-Reduce programming paradigm is "divide and conquer." It allows us to break down complex computational tasks into multiple independent subtasks, process these subtasks in parallel, and then merge the results to obtain the final result.

Basic Example

Here is a simple example demonstrating the workflow of Map-Reduce:

func MapFunction(arr []string, fn func(string) string) <-chan string {
    ch := make(chan string)
    go func() {
        for _, v := range arr {
            ch <- fn(v)
        }
        close(ch)
    }()
    return ch
}

func ReduceFunction(ch <-chan string, fn func(string, string) string) string {
    var res string
    for v := range ch {
        res = fn(res, v)
    }
    return res
}

func main() {
    // generate 10 random strings
    arr := []string{"a", "b", "c", "d", "e", "f", "g", "h", "i"}
    // map
    ch := MapFunction(arr, func(s string) string {
        return strings.ToUpper(s)
    })
    // reduce
    res := ReduceFunction(ch, func(s1, s2 string) string {
        return s1 + s2
    })
    fmt.Println(res)
}
Enter fullscreen mode Exit fullscreen mode

go.dev

In this example, we define a MapFunction that takes a string array and converts each element to uppercase using a custom function fn, returning a channel. The ReduceFunction takes a channel and a custom function fn to concatenate the results and print them out.

The following image provides a metaphor that vividly illustrates the business semantics of Map-Reduce, which is very useful in data processing.

Pasted image 20240129172925

You may understand that Map/Reduce is just a control logic, and the real business logic is defined by the data and the function passed to them. Yes, this is a classic programming pattern of separating "business logic" from "control logic." Now let's take a look at a code example with meaningful business logic to reinforce the understanding of separating "control logic" and "business logic."

Business Example

Employee Information

First, we have an employee object and some data:

type Employee struct {
    Name     string
    Age      int
    Vacation int
    Salary   int
}

var list = []Employee{
    {"Hao", 44, 0, 8000},
    {"Bob", 34, 10, 5000},
    {"Alice", 23, 5, 9000},
    {"Jack", 26, 0, 4000},
    {"Tom", 48, 9, 7500},
    {"Marry", 29, 0, 6000},
    {"Mike", 32, 8, 4000},
}
Enter fullscreen mode Exit fullscreen mode
Related Reduce/Filter Functions
func EmployeeCountIf(list []Employee, fn func(e *Employee) bool) int {
    count := 0
    for i, _ := range list {
        if fn(&list[i]) {
            count += 1
        }
    }
    return count
}

func EmployeeFilterIn(list []Employee, fn func(e *Employee) bool) []Employee {
    var newList []Employee
    for i, _ := range list {
        if fn(&list[i]) {
            newList = append(newList, list[i])
        }
    }
    return newList
}

func EmployeeSumIf(list []Employee, fn func(e *Employee) int) int {
    var sum = 0
    for i, _ := range list {
        sum += fn(&list[i])
    }
    return sum
}
Enter fullscreen mode Exit fullscreen mode

Here's a brief explanation:

  • EmployeeCountIf and EmployeeSumIf are used to count the number of employees or calculate the total based on a certain condition. They represent the semantics of Filter + Reduce.
  • EmployeeFilterIn filters the employees based on a certain condition. It represents the semantics of Filter.

Now we can have the following code:

1) Count the number of employees over 40 years old:

old := EmployeeCountIf(list, func(e *Employee) bool {
    return e.Age > 40
})
fmt.Printf("Old people: %d\n", old)
//Old people: 2
Enter fullscreen mode Exit fullscreen mode

2) Count the number of employees with a salary greater than 6000:

highPay := EmployeeCountIf(list, func(e *Employee) bool {
    return e.Salary >= 6000
})
fmt.Printf("High Salary people: %d\n", highPay)
//High Salary people: 4
Enter fullscreen mode Exit fullscreen mode

3) List employees who have not taken any vacation:

noVacation := EmployeeFilterIn(list, func(e *Employee) bool {
    return e.Vacation == 0
})
fmt.Printf("People with no vacation: %v\n", noVacation)
Enter fullscreen mode Exit fullscreen mode

The Map-Reduce programming paradigm divides the computational task into Map and Reduce phases. Although writing single-machine code may not be faster than a simple for loop and may appear complex, in the era of cloud-native computing, we can leverage parallel computation and shared data access to improve computational efficiency. It is a powerful tool suitable for handling large-scale data and parallel computing scenarios, such as the original Google PageRank algorithm. The main purpose of learning it is to understand its mindset.

Top comments (0)