PostgreSQL with Fake Data (:

Abdulmohsen Alenazi
4 min readOct 8, 2021

All developers at this time are looking for an easy way to launch their application. However, creating a large group manually could be tedious and time wasting. With PostgreSQL it’s easy to create a test dataset that links randomly generated values using a few simple commands.

I’ll explain more about how to generate PostgreSQL test data using the psql command line interface:

First you need setup PostgreSQL database cluster that is running on your machine.

You can use the command on a Linux machine to check if it is installed or not.

systemctl status postgresql

You also need a docker container (we’ll use a container to hold our PostgreSQL test data.

Postgres Docker Container:

A Docker container is used for the development and deployment of applications. With a container, applications can run faster and become portable and scalable.

First run this command line for Docker container

docker run — name objectrocket -e POSTGRES_PASSWORD=1234 -p 5432:5432 -d postgres

The Docker container will automatically pull the image from PostgreSQL , and will be used as our container.

Now to see a list of available running container

docker ps

The output will look something like this:

Now that we’ve reviewed some Docker fundamentals, let’s create a table in the psql command-line console for PostgreSQL

sudo -u postgres psql -h localhost -p 5432

The table we’re going to create will hold our generated PostgreSQL test data.

To create a table in PostgreSQL, The basic syntax is shown below:

create table test(complete BOOLEAN NOT NULL, "end" BOOLEAN NOT NULL);

Here is the statement we’ll use to create the table for our test data

CREATE table test01(
id BIGINT GENERATED ALWAYS AS IDENTITY,
PRIMARY KEY(id),
hash_firstname TEXT NOT NULL,
hash_lastname TEXT NOT NULL,
gender VARCHAR(6) NOT NULL CHECK (gender IN (‘male’, ‘female’))
);

PostgreSQL Test Data

Here, we’ll INSERT values into the table using a random function to generate the data:

INSERT INTO test01(hash_firstname, hash_lastname, gender)
SELECT md5(RANDOM()::TEXT), md5(RANDOM()::TEXT), CASE WHEN RANDOM() < 0.5 THEN 'male' ELSE 'female' END FROM generate.series(1, 10000);

PostgreSQL generate_series()

generate_series() is a built-in PostgreSQL function that makes it easy to create ordered tables of numbers or dates.

The function is simple and easy to get started with, taking at least two required arguments to specify the start and stop parameters for the generated data.

SELECT * FROM generate_series(1,5);

To increasing dataset scale

generate_series() works and how you can use functions to add additional dynamic content to the output. But how do we quickly get to the scale of tens of millions of rows (or more)

The database outputs every row from the first table with the value of the first row from the second table. It does this over and over until all rows in both tables have been iterated. (and yes, if you join more than two tables this way, the process just keeps multiplying, tables processed left to right)

This a small example using two generate_series()in the same select statement.

SELECT * from generate_series(1,10) a, generate_series(1,2) b;

Conclusion

In this article, I walked you through the process of creating a test dataset in PostgreSQL. With the examples I provide to guide you, you will be able to create your own dataset for use in testing and development.

--

--