PostgreSQL with Fake Data (:
All developers at this time are looking for an easy way to launch their application. However, creating a large group manually could be tedious and time wasting. With PostgreSQL it’s easy to create a test dataset that links randomly generated values using a few simple commands.
I’ll explain more about how to generate PostgreSQL test data using the psql command line interface:
First you need setup PostgreSQL database cluster that is running on your machine.
You can use the command on a Linux machine to check if it is installed or not.
systemctl status postgresql
You also need a docker container (we’ll use a container to hold our PostgreSQL test data.
Postgres Docker Container:
A Docker container is used for the development and deployment of applications. With a container, applications can run faster and become portable and scalable.
First run this command line for Docker container
docker run — name objectrocket -e POSTGRES_PASSWORD=1234 -p 5432:5432 -d postgres
The Docker container will automatically pull the image from PostgreSQL , and will be used as our container.
Now to see a list of available running container
docker ps
The output will look something like this:
Now that we’ve reviewed some Docker fundamentals, let’s create a table in the psql
command-line console for PostgreSQL
sudo -u postgres psql -h localhost -p 5432
The table we’re going to create will hold our generated PostgreSQL test data.
To create a table in PostgreSQL, The basic syntax is shown below:
create table test(complete BOOLEAN NOT NULL, "end" BOOLEAN NOT NULL);
Here is the statement we’ll use to create the table for our test data
CREATE table test01(
id BIGINT GENERATED ALWAYS AS IDENTITY,
PRIMARY KEY(id),
hash_firstname TEXT NOT NULL,
hash_lastname TEXT NOT NULL,
gender VARCHAR(6) NOT NULL CHECK (gender IN (‘male’, ‘female’))
);
PostgreSQL Test Data
Here, we’ll INSERT
values into the table using a random function to generate the data:
INSERT INTO test01(hash_firstname, hash_lastname, gender)
SELECT md5(RANDOM()::TEXT), md5(RANDOM()::TEXT), CASE WHEN RANDOM() < 0.5 THEN 'male' ELSE 'female' END FROM generate.series(1, 10000);
PostgreSQL generate_series()
generate_series()
is a built-in PostgreSQL function that makes it easy to create ordered tables of numbers or dates.
The function is simple and easy to get started with, taking at least two required arguments to specify the start and stop parameters for the generated data.
SELECT * FROM generate_series(1,5);
To increasing dataset scale
generate_series()
works and how you can use functions to add additional dynamic content to the output. But how do we quickly get to the scale of tens of millions of rows (or more)
The database outputs every row from the first table with the value of the first row from the second table. It does this over and over until all rows in both tables have been iterated. (and yes, if you join more than two tables this way, the process just keeps multiplying, tables processed left to right)
This a small example using two generate_series()
in the same select statement.
SELECT * from generate_series(1,10) a, generate_series(1,2) b;
Conclusion
In this article, I walked you through the process of creating a test dataset in PostgreSQL. With the examples I provide to guide you, you will be able to create your own dataset for use in testing and development.