DEV Community

Cover image for Build a CLI tool : Generating hex dumps with Golang
Sourjaya Das
Sourjaya Das

Posted on

Build a CLI tool : Generating hex dumps with Golang

In this article, we will look at how to build a CLI tool using Go standard packages.We will be trying to build our own version of xxd. This project challenge was originally posted here.

So, what exactly is xxd?
Xxd is a CLI tool for linux that creates a hex dump of a given file or standard input.It can also convert a hex dump back to its original binary form.

To demonstrate what xxd does:

  • Create or take any existing file from your system. And run the following command:
# xxd <filename>
# xxd <filepath>
# To dump the hex dump to another file
# xxd <filename/filepath> > <filename/filepath> 
xxd file.txt
Enter fullscreen mode Exit fullscreen mode
  • Suppose the file contents were:
Hey! How you doing?
I hope you are doing fine.
Believe in yourself.
Enter fullscreen mode Exit fullscreen mode
  • When we run the command mentioned above, we get the following output:

Sample1

To understand what was printed out lets look at the first line of the output:

00000000: 4865 7921 2048 6f77 2079 6f75 2064 6f69  Hey! How you doi
Enter fullscreen mode Exit fullscreen mode
  • The first part : 00000000 is the file offset, that is the position from which this line prints the bytes.
  • The second part : 4865 7921 2048 6f77 2079 6f75 2064 6f69 is the actual bytes of data in its hexadecimal form with two octets(bytes) grouped together(48 is hex of a byte,65 is hex of another byte in the group,and so on).
  • The third part : Hey! How you doi is the actual content(text) of those bytes printed in the line.

📝
Any ASCII character with hexadecimal value less than 20 and more than 7e will be converted to '.' while printing out the original text in the third part.

Now, what is the need of such a tool? There can be many use cases of a hex dump which includes binary file analysis and comparison, data transformation, digital forensics and security.

At first glance it seems easy to code such a tool. Read the file or read from the standard input, convert each byte to its hexadecimal, and print the list of hexadecimal numbers with the original text using proper arrangement. But the challenge lies in the different edge cases that pops up due to the use of additional options.

The xxd has its own flags that help to manipulate the output. We will look at the behaviours of this flags, when we discuss about the edge cases.

Now that we have a brief overview of what we need to build, lets dive deep into the intricacies of this tool.


Table of contents

1. Prerequisites

2. Understanding the behaviour of the flags

3. Writing and Understanding the code

4. What's next


1. Prerequisites

As we will be using golang for our project,we should make sure :

1.1. Golang is installed in our system.

To check wether GO is installed in our system or not use the command go version in your terminal. If it is not installed, check the official installation steps.

1.2. Use a code editor of your choice.

And that's about all that we need to start coding.


2. Understanding the behaviour of the flags

Before we start writing our code we have to make sure we have a good understanding of what each flags do and how the output changes with change of values of this flags.
As mentioned in the challenge we will be focusing on the functionalities of six flags :

Flags Description
-e Output in little-endian format.
-g Specify the number of bytes per group.
-l Specify the no of octets to dump from the input.
-c Specify the number of octets to print each line.
-s Specify the offset to start printing from.
-r Revert from hexadecimal dump to original binary form.

2.1. -e flag

when we enter the command with -e flag:

xxd -e file.txt
Enter fullscreen mode Exit fullscreen mode

we get a output like:

Sample2

if we look at each groups, there are 4 octets in reversed ordering. So the default behaviour changes.

2.2. -g flag

when we use the -g flag:

xxd -g 5 file.txt
Enter fullscreen mode Exit fullscreen mode

we get:

Sample3

In the output we see 5 octets grouped together until the columns are filled for each row.

Then again if we use both -e and -g together:

xxd -e -g 5 file.txt
Enter fullscreen mode Exit fullscreen mode

we get:

xxd: number of octets per group must be a power of 2 with -e.
Enter fullscreen mode Exit fullscreen mode

In this case, we have to make sure octets per group is given to be a power of 2, to make it work with the -e flag.

2.3. -c flag

When we use -c flag:

xxd -c 5 file.txt
Enter fullscreen mode Exit fullscreen mode

the result is:

Sample4

Notice that there are atmost 5 octets per row(line).

2.4. -l flag

For the command:

xxd -l 20 f.txt
Enter fullscreen mode Exit fullscreen mode

the output will be:

Sample5

The total no of octets displayed are 20.

2.5. s flag.

If we write:

xxd -s 3 file.txt
Enter fullscreen mode Exit fullscreen mode

we get:

Sample6

The offset will start from the 3rd byte in the input file. But if value of flag -s is negative, then the offset will be set relative to the end of file.
Another edge case to consider if the value of -s flag is negative when no file name is given, the seek won't happen.

xxd -s -3
#Output
xxd: Sorry, cannot seek.
Enter fullscreen mode Exit fullscreen mode

This is also true for inputs like:
xxd -s -0 and xxd -s +-5

2.6. -r flag

This flag is used to revert back from a hex dump to its original binary form. Suppose file.hex contains the hex dump of file.txt. If we want to get the text content back we do:

xxd -r file.hex
#or
xxd -r file.hex > file2.txt
Enter fullscreen mode Exit fullscreen mode

The output will be:

Hey! How you doing?
I hope you are doing fine.
Believe in yourself.
Enter fullscreen mode Exit fullscreen mode

📝
We can use decimal, octal or hexadecimal values for the flags.
octal values are represented with a leading 0 like 014 and hexadecimal is represented like 0x0c or 0x0C.

It is important to mention that if we put a non numeric value like abcd as any flag value, when the file name is not provided, the default flag values will be used. Also if a value like 5jkl is given as a flag value, the value will be read as 5.

The return values are as follows:

Value Description
0 no errors encountered.
-1 operation not supported.
1 error while parsing options.
2 problems with input file.
3 problems with output file.
4,5 desired seek position is unreachable.

3. Writing and Understanding the code

Before starting with the code, its important to have an idea about how we will tackle the problem. At my first attempt at building this tool, I took a naive path, to read the file, some bytes at a time, and stored them in a slice of bytes. Then I printed out each byte in its hex format,one by one. Well this solution worked fine when there were no flags involved, and when the output format did not depend on those flag inputs. But when I started to build the logic for all the edge cases, the code started to become messy and unreadable.

That's when I had to switch the way I was processing those bytes. Instead of directly converting each individual byte to its hex representation, I converted the whole chunk of bytes to a string of hex values. This change helped in tackling most of the edge cases I talked about earlier.

3.1. Folder Structure

└── 📁sdxxd
    └── 📁xxd
        └── xxd.go
    └── main.go
    └── go.mod
    └── go.sum
Enter fullscreen mode Exit fullscreen mode

3.2 Create your GO Project

Before writing your code you need to setup the directory where you will house your project. Then, open the terminal from the directory, and enter the following command to initialize your project.

# go mod init <your_module_path>
go mod init github.com/Sourjaya/sdxxd
Enter fullscreen mode Exit fullscreen mode

The go mod init command creates a go.mod file to track your code's dependencies. Using your own github repository will provide a unique module path for the project.

Now, in main.go write the following code:

package main

import "github.com/Sourjaya/sdxxd/xxd"

func main() {
    xxd.Driver()
}
Enter fullscreen mode Exit fullscreen mode

Here we call the Driver function from xxd package.

3.3. Utility functions and the structs in use

In the xxd folder create a new go file xxd.go:



Here we declare three structs Flags , ParsedFlags and IsSetFlags . In function NewFlags() we initialize the flags and check if certain flag values have been provided or not.

📝
Here to parse the flags from the terminal we are not going to use golang flags package because this package does not have the support for this input form: xxd -s5 -g3, where there is no gap between the flag and the flag values. Instead we are using pflags package.

Now, lets look at some of the helper functions we are going to need and what is the need of them.

  • numberParse()

This function will be used to parse the flag values and with the help of regular expression, filter out the numerical value from it.

// Function to parse number from a string using regular expression
func numberParse(input string) (res int64, err error) {
    // regular expression
    re := regexp.MustCompile(`-?0[xX][0-9a-fA-F]+|-\b0[0-7]*\b|-\b[1-9][0-9]*\b|0[xX][0-9a-fA-F]+|\b0[0-7]*\b|\b[1-9][0-9]*\b`)
    // Find the match
    s := re.FindString(input)
    // if a certain match is found convert into decimal, octal or hexadecimal and return. else return 0.
    if s != "" {
        return strconv.ParseInt(s, 0, 64)
    }

    return 0, nil
}
Enter fullscreen mode Exit fullscreen mode
  • reverseString()

This function is for reversing a hex string input. This function is exclusively used when the output should be in little-endian format.

// Function to reverse a string
// input: The input hex string to be reversed.
// Returns the reversed hex string.
func reverseString(input string) string {
    // Decode hex string to byte slice
    hexStr := strings.ReplaceAll(input, " ", "")
    bytes, _ := hex.DecodeString(hexStr)
    // Reverse the byte slice
    for i, j := 0, len(bytes)-1; i < j; i, j = i+1, j-1 {
        bytes[i], bytes[j] = bytes[j], bytes[i]
    }
    // Encode the reversed byte slice back to hex string
    reversed := hex.EncodeToString(bytes)
    whitespace := strings.Repeat(" ", len(input)-len(reversed))

    return whitespace + reversed
}
Enter fullscreen mode Exit fullscreen mode
  • byteToHex()

Before printing the result we will need to convert the slice of bytes to a hex string. This function is for this purpose.

// Function to convert a byte slice to a hex string with specified grouping.
// byteBuffer: The input byte slice to be converted.
// count: The number of bytes per group.
// Returns the hex string representation of the byte slice.
func byteToHex(byteBuffer []byte, count int) string {
    // encode byte slice to string
    encodedString := hex.EncodeToString(byteBuffer)
    // add extra whitespaces
    for i := 0; i < (count-(len(byteBuffer)%count))*2; i++ {
        encodedString = fmt.Sprint(encodedString, " ")
    }

    return encodedString
}
Enter fullscreen mode Exit fullscreen mode
  • byteToSting()

To display the third section of the result, we need to convert the byte slice to its text form. This function will do exactly that.

// input: The input byte slice to be converted.
// Returns the string representation of the byte slice.
func bytesToString(input []byte) string {
    output := make([]byte, len(input))
    // convert ASCII byte slice to its equivalent character string
    for i, b := range input {
        if b < 0x20 || b > 0x7e {
            output[i] = '.'
        } else {
            output[i] = b
        }
    }

    return string(output)
}
Enter fullscreen mode Exit fullscreen mode
  • size()

The size of the chunk of bytes to read is dependent on the columns value. We can use any stop value, but I used an arbitrary value of 2048. Its essential to read the bytes in chunks because reading large files will be comparatively faster this way, than to read it as a whole.

// calculate size of chunk to read for each iteration
func size(cols int) int {
    div := SIZE / cols
    if SIZE%cols != 0 {
        return (div + 1) * cols
    }

    return div * cols
}
Enter fullscreen mode Exit fullscreen mode
  • trimBytes()

This function will be needed when the reverse conversion takes place, that is from a hex dump to the original content.

// Helper function to trim the spaces from a line
func trimBytes(s string) string {
    words := strings.Fields(s)

    return strings.Join(words, "")
}
Enter fullscreen mode Exit fullscreen mode

3.4. Structuring the code

After we have written the helper functions its time to put them to use. We will start with the Driver() function.

// Driver function to use the functionalities of this package
func Driver() int {
    f, setFlags, args := NewFlags()
    // if no file name is provided read from standard input
    if len(args) == 0 || args[0] == "-" {
        return f.processStdIn(setFlags)
    }

    return f.processFile(args[0], setFlags)
}
Enter fullscreen mode Exit fullscreen mode

Here, the flag structs are set and the first thing that is checked whether there is a file name in the list of arguments.

📝
args is a list of arguments starting from after all the flag inputs.

If there is a file that the user has mentioned, call (*Flags).processFile() method else if the file name is absent or if the file name is given as - , call (*Flags).processStdIn().

  • (f *Flags).processFile()

In this method, we first open the file. In case -r flag is set, we call the revert() function. We will look what revert() does in a few minutes. If the flag is not present, we read a set no. of bytes at a time, from the file and pass it to InputParse() .

  • (f *Flags).processStdIn()

Here, we check if -r flag is set, and call revert() accordingly. Otherwise, we scan the standard input and print the resultant hex dump. Here we have to consider additional edge cases, like the result will be displayed upto the no of rows whose columns have been filled completely, else the prompt waits for additional input to read. Unless we interrupt the program, it will continue to run until -l value is reached(only when -l is set).

The code for this two functions are given below:

Now if you look at the code, you will see three functions:

  • revert() This function is used to convert from hexadecimal dump to the original binary form. There can be two types of input into this function. *os.File when file is given and *bufio.Scanner when read is done from standard input.
  • (f *Flags).checkFlags() This method properly parses each flag value(originally string value) to numerical value, which then can be used by the InputParse() method. This method is also responsible to terminate the program if there is any error while parsing the flags.

The code for this two functions:

  • (flags *Parsedflags).InputParse() All that is left is two use the helper functions appropriately to generate the proper hex dump. To do that we call this function.
func (flags *ParsedFlags) InputParse(s []byte, offset int, length int) string {
    // convert byte slice to hex string
    buffer := byteToHex(s, flags.C)
    // function to generate hex dump output string
    return flags.dumpHex(offset, length, buffer, s)
}
Enter fullscreen mode Exit fullscreen mode

Here, first we convert the slice of bytes to hexadecimal string and then call dumpHex() passing in the offset(this helps in proper indexing of lines),flag values, original slice of bytes and the buffer(hex string).

So, finally we reach a point where only the conversion is left. To convert from the original input to its hex dump we use the dumpHex() method.

Since there are two characters(one octet is represented by two characters) per hexadecimal, we loop till twice of the length of input bytes. Then first we print the offset. The next step is to print the grouped octets. The no of octets depends on the flag value of -g as well as the -c value. We have to make sure that we reverse each group before printing if little-endian mode is set.

Once the octets are printed, the text equivalent to the octets are displayed beside them. This three part process is repeated for each row(line) until the end of file or input.

📝
Make sure that if the -l flag value is set, no of octets that will be printed is equal to that value.

The complete code can be found in this repo.

3.5. Building the Go binary and testing the tool.

Once we have finished writing our code, we will run the code go mod tidy to make sure all the dependencies are in order. Now, let's build the binary executable:

go build
Enter fullscreen mode Exit fullscreen mode

The build file is successfully generated. We can finally test our tool.To test it, first we will create a tar file :

echo "File 1 contents" >> file1.txt
echo "File 2 contents" >> file2.txt
echo "File 3 contents" >> file3.txt
tar -cf files.tar file1.txt file2.txt file3.txt
Enter fullscreen mode Exit fullscreen mode

Now, we will use files.tar to check it out:

Demo


5. What's Next

As you may have noticed, this code reads and processes the file(or the input) in a sequential way, and there is no parallel, concurrent processing involved. For the sake of simplicity, I have not used the concepts of concurrency. Therefore, this tool will work, but struggle when there are large files involved.

Also when it comes to the options that the original xxd tool has, we have implemented only 6 of the options. There are other options as well that we haven't looked at yet.

So there is always room to improve and optimize the code adding to its list of functionalities.

If you would like to share your feedback please feel free to drop a comment below.

Top comments (0)