Every coder will have to generate some dummy data for their database or randomize / de-identify existing data for testing purposes at some point in their career.
But most programmers fall into the trap of reinventing randomization functions and end up with something that isn't random at all. I would suggest you stick to the built-in functions.
It's pretty amazing what you can do out of the box with SQL.
That is why I've created a list of pretty much every use case I can think of in which you would have to create Random Data, all being built on the basic RAND()
function.
Note that I tested the more complex code in MySQL and that RAND()
has to be replaced according to your SQL flavor.
See the first table below which shows how to generate a random value between 0.0 <= x < 1.0 in almost every SQL Flavor - you can use this to adapt my code for your use case.
MySql | RAND() |
SQLAnywhere (SAP SyBase) | RAND() |
PostgreSQL | RANDOM() |
Oracle (PL/SQL) | DBMS_RANDOM.VALUE() |
MonetDB | RAND() / 2147483648 |
MicrosoftSQLServer | RAND() |
SQLite | ABS(RANDOM() / 9223372036854775807) |
MariaDB | RAND() |
InformixDynamicServer | dbms_random_random() |
Firebird SQL | RAND() |
DB2 SQL | RANDOM(:HRAND) |
CUBRID | RAND() / 2147483648 |
Basic random value
SELECT RAND() as random;
random |
0.8997652594929237 |
This function returns a random value between 0.0 <= x < 1.0.
Building on this simple function, i will now create a random value in a known range.
Random value in a given range
SELECT (1 + (100 - 1) * RAND())
numeric |
29.257653945365597 |
This will create a random Number between 1.0 <= x < 100.0.
Let's convert it to a function in order to make this reusable for other scenarios that I will build on:
DELIMITER // CREATE OR REPLACE FUNCTION randomNumber(lowestValue FLOAT, highestValue FLOAT) RETURNS FLOAT DETERMINISTIC BEGIN DECLARE randomValue FLOAT; SELECT (lowestValue + (highestValue - lowestValue) * RAND()) into randomValue; RETURN (randomValue); END// DELIMITER ;
Now, let's test it out:
SELECT randomNumber(1,10), randomNumber(10,100), randomNumber(99.00,99.99);
Result:
randomNumber(1,10) | randomNumber(10,100) | randomNumber(99.00,99.99) |
9.098647117614746 | 51.53215408325195 | 99.60171508789062 |
How about getting full Integers instead of Floats?
SELECT CAST(randomNumber(1,10) AS INTEGER), CAST(randomNumber(10,100) AS INTEGER), CAST(randomNumber(99.00,99.99) AS INTEGER)
Result:
CAST (randomNumber(1,10) AS INTEGER) | CAST(randomNumber(10,100) AS INTEGER) | CAST(randomNumber(99.00,99.99) AS INTEGER) |
2 | 89 | 99 |
Now that we've got the basics out of the way, let's get into the more complex things.
Random boolean values
There is not much to do, to get a boolean instead of a normal integer, since a boolean is just a tinyint(1)
in MySQL. Note that you would have to cast your Integer as a Boolean in some other flavors, as in this example for PostgreSQL:
SELECT CAST(ROUND(randomNumber(0,1))::INTEGER AS BOOLEAN);
Result:
round |
False |
But since I will keep using MySQL for now, this is the code you would need for MySQL:
SELECT ROUND(randomNumber(0,1));
Result:
ROUND(randomNumber(0,1)) |
0 |
Let's make this into a function for later use:
DELIMITER // CREATE OR REPLACE FUNCTION randomBoolean() RETURNS TINYINT DETERMINISTIC BEGIN DECLARE bool tinyint; SELECT ROUND(randomNumber(0,1)) into bool; RETURN (bool); END// DELIMITER ;
Let’s test it:
SELECT randomBoolean(), randomBoolean(), randomBoolean();
Result:
randomBoolean() | randomBoolean() | randomBoolean() |
1 | 0 | 0 |
Weighted random boolean
Often times, you need to simulate genuine variation in your seed data, and weighting is one way to achieve this.
For example, I may want to generate data where the majority of values should be False
.
For this I am going to create the following function where I can input a weight which will then be used to generate a Boolean:
DELIMITER // CREATE OR REPLACE FUNCTION randomWeightedBoolean(trueWeight FLOAT) RETURNS TINYINT DETERMINISTIC BEGIN DECLARE bool TINYINT; SELECT ROUND(randomNumber(0,1) < trueWeight) into bool; RETURN (bool); END// DELIMITER ;
Now lets test it by simply passing in a trueWeight:
select randomWeightedBoolean(0.75);
Result:
randomWeightedBoolean(0.75) |
1 |
Now it generates a 1 (TRUE
) 75% of the time.
Random row(s) from a table
You could also use the RAND()
function in the ORDER BY
clause to order them randomly.
This is what it would look like without randomizing the order:
SELECT id FROM employees
id |
1 |
2 |
3 |
4 |
5 |
This is what it would like with randomizing the order:
SELECT id FROM employees ORDER BY RAND();
id |
5 |
1 |
3 |
4 |
2 |
Building upon that I can now get a random single row from a Table using the LIMIT
clause:
SELECT id FROM employees ORDER BY RAND() LIMIT 1
id |
2 |
Random value from a list
I need randomized data, but the values should be restricted to an enumerated list that I know in advance.
For example, say I wanted to return a random value from the list BMW, Mercedes, VW, Audi.
This can easily be done with the following code:
SELECT ELT(RAND() * 4,"BMW", "Mercedes", "VW", "Audi") AS randomValueFromList
Result:
randomValueFromList |
BMW |
Random text
This is an interesting one. I can easily generate a random string like this:
SELECT MD5(RAND()) AS randomString;
Result:
randomString |
fb763b576159b3c5a6af0b2775496091 |
I could shorten it to a desired length as well (note, obviously using MD5 will only generate a max. 32-byte string):
SELECT SUBSTR(MD5(RAND()),1,5) AS randomString;
Result:
randomString |
ae2ab |
I could also manipulate the text case:
SELECT UPPER(SUBSTR(MD5(RAND()),1,5)) AS randomUpperCaseString;
Result:
randomUpperCaseString |
9CE4C |
This will only provide limited value, especially if you need longer strings or strings with certain characters.
To limit our statement to certain characters we can use this SQL Code which will get random Letter/Number which is in the list:
SELECT substring('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789', rand()*36+1, 1) as letter
letter |
H |
But since I also want to be able to specify a length I will create the following function which takes in the possible characters and length and then returns a random string:
DELIMITER // CREATE OR REPLACE FUNCTION randomString(characters LONGTEXT, length INTEGER) RETURNS LONGTEXT DETERMINISTIC BEGIN DECLARE randomString LONGTEXT DEFAULT ""; DECLARE i INT; SET i = 0; WHILE i < length DO SET randomString = CONCAT(randomString, substring(characters, rand()*LENGTH(characters)+1, 1)); SET i = i + 1; END WHILE; RETURN (randomString); END// DELIMITER ;
Let’s test it using the following statement:
select randomString('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',5) as string1, randomString('ABCD',5) as string2, randomString('ABCDEFGHIJKLMNOPQRSTUVWXYZ',10) as string3
Result:
string1 | string2 | string3 |
NZMV7 | DABDC | LLSJNPHSSI |
As you can see, now I can easily choose if I want numbers in the string or just letters, what letters, and the length of the string.
Other
You can generate random values for a number of interesting use cases using a combination of some things I have shown so far:
Temperature
SELECT ROUND(randomNumber(-10,45),1) AS degreeCelsius
Result:
degreeCelsius |
29.4 |
Phone Number
SELECT ROUND(randomNumber(1111111111,9099999999)) AS phoneNumber;
Result:
phoneNumber |
4800151552 |
Future date in the next 60 days
SELECT DATE_ADD(CURRENT_DATE, INTERVAL randomNumber(1, 60) DAY) as futureDate;
Result:
futureDate |
2020-10-24 |
Random date within the past 12 months
SELECT DATE_SUB(CURRENT_DATE, INTERVAL randomNumber(1, 365) DAY) as futureDate;
Result:
futureDate |
2019-11-26 |
Random date of birth for a person under the age of 25
SELECT DATE_SUB(CURRENT_DATE, INTERVAL (randomNumber(1, 25)*365) DAY) as birthdate;
Result:
birthdate |
2002-12-30 |