DEV Community

Cover image for Python How-To: Create Sample Data Using Faker
dev_neil_a
dev_neil_a

Posted on • Updated on

Python How-To: Create Sample Data Using Faker

YouTube Video

If you would prefer to watch a video of this article, there is a video version available on YouTube below:

Introduction

Faker is a third-party Python library that is used to generate random data, such as names and addresses that can be used for whatever purpose it is needed for.

A common use case is to create 'fake' user data for testing with a library such as pytest that needs some testing data to perform SQL statements to and from a SQL database during testing.

Creating A List Of Randomised Addresses

Prior to using faker, it will need to be installed as it is a third-party library that is not part of the base Python library collection. To do this, use the pip command in the terminal as follows:

pip install faker
Enter fullscreen mode Exit fullscreen mode

The following example will generate three addresses and peoples names, each of which will be a dictionary that will be added to a list.

# --- 1. Import the required libraries:
from faker import Faker
from random import randint


# --- 2. Instantiate an instance of faker:
fake = Faker(locale = "en_GB")


# --- 3. Create an empty list to hold the addresses:
address_book = []


# --- 4. Create three addresses and add each one to the address_book list:
for i in range(0, 3):
    # --- 5. Create a dictionary with the required key-value pairs:
    address_entry = {
        "name": fake.name(),
        "house_number": randint(a = 1, b = 250),
        "street_name": fake.street_name(),
        "county": fake.county(),
        "city": fake.city(),
        "post_code": fake.postcode(),
        "country": "United Kingdom"
    }

    # --- 6. Add the dictionary to the address_book list:
    address_book.append(address_entry)


# --- 7. Display the address_book list
print(address_book)
Enter fullscreen mode Exit fullscreen mode

Output:

[
  {
    "name": "Elliott Robertson",
    "house_number": 77,
    "street_name": "Williams stream",
    "county": "Dumfries and Galloway",
    "city": "Davismouth",
    "post_code": "SP2R 4UQ",
    "country": "United Kingdom"
  },
  {
    "name": "Rosie Blake",
    "house_number": 102,
    "street_name": "Lucas manors",
    "county": "Wiltshire",
    "city": "East Marilyn",
    "post_code": "S22 7FG",
    "country": "United Kingdom"
  },
  {
    "name": "Dr Grace Greenwood",
    "house_number": 176,
    "street_name": "Rees forest",
    "county": "West Berkshire",
    "city": "West Jeremy",
    "post_code": "RH4R 5NQ",
    "country": "United Kingdom"
  }
]
Enter fullscreen mode Exit fullscreen mode

Note: The above output will not be how it is shown. It has been formatted to make it much more clearer for this article.

Now, let's walk through the code to go over what it does:

  1. Import faker and randint. These are the only two libraries that are needed in the example.
  2. Instantiate an instance of faker. This will initialise a faker instance to use with the variable name fake. The locale argument allows faker to limit the data it will provide to a specific language / region. In this case, it is set to use data for en_GB (English (en), Great Britain (GB)) only.
    1. There is a list of Localisation ID codes (LCID) available here. Note: The - between the language code and country will need to be replaced with a _.
  3. Next, create an empty list called address_book. Each address that is created will be added to this list.
  4. Next, using a for loop, create three addresses and add each one to the address_book list.
  5. As part of the for loop, a dictionary with the required key-value pairs is created. The only parts that don't use faker are the house_number (random integer between 1 and 250) and the country which was manually set to United Kingdom. The reason for that is faker doesn't set the country correctly to the locale meaning it puts any country in.
  6. The last part of the for loop is to append (add) the address_entry dictionary to the address_book list. Once the loop finishes there will be three dictionaries added to the address_book list.
  7. Finally, display the list of addresses in the address_book list.

Conclusion

Faker is a very useful library when there is a need to get some randomly generated data quickly. There are a lot of options available within faker that can be used. Running the below in Python will show all of the options that are available:

print(dir(fake))
Enter fullscreen mode Exit fullscreen mode

I hope that this article was useful. Have a nice day!

References

Documentation for Faker:
https://faker.readthedocs.io/en/master/

LCID (Localisation ID) Codes list:
https://learn.microsoft.com/en-us/openspecs/office_standards/ms-oe376/6c085406-a698-4e12-9d4d-c3b0ee3dbc4a (Remember, change the - to an _)

Top comments (0)