Generating Fake Data for Postgres in an Elixir script

By Mitchell Henke on October 11th 2016

When playing with new features in databases, it's often helpful to be able to have a meaningful amount of data, to get closer to at least approximating a real workload. My use case revolved around needing to generate random phrases, as well as a random nearby coordinate.

I'm a huge fan of Elixir, and so it was my language of choice for this. I didn't want to deal with the overhead of adding any dependencies, so communicating directly with Postgres through Elixir was not an option. I decided I would have the script generate a bunch of INSERT statements that I could then pass to Postgres directly.

It ended up looking like this:

# words.exs

# random words generated from https://www.randomlists.com/random-words
words = ["general", "sweet", "boot", "enormous", "thumb", "rinsing", "soda", "bells", "hurry", "class", "wrap", "face", "creature", "taller", "angle", "pin", "purple", "driving", "fun", "elderly", "guitar", "magical", "pie", "acoustics", "pretend", "electricity", "cheese", "turkey", "tea"]

central_latitude = 43.0384303
central_longitude = -87.9079503

# file to create
file_path = "./sql.sql"

stream = Stream.repeatedly(fn ->
  # get coordinate within .033 degrees in each direction
  latitude = central_latitude + :random.uniform / 30.0
  longitude = central_longitude + :random.uniform / 30.0

  # Grab 5 random words from the list, and join them with a space to create one string
  words = Enum.take_random(words, 5) |> Enum.join(" ")

  # Emit desired string
  "INSERT INTO test (words, words_vector, geo) VALUES ('#{words}', to_tsvector('#{words}'), ST_GeomFromText('POINT(#{longitude} #{latitude})', 4326));"
end)

# Lazily take as many rows as we'd like to generate, and then join them with a line.
sql = Stream.take(stream, 250000)
      |> Enum.join("\n")
File.write!(file_path, sql, [:write])

# run with:
# >elixir words.exs

Once it's run, I have a file with 250,000 insert statements that can be run through Postgres with:

psql -d database_name --quiet -f sql.sql

It wasn't the most efficient way to solve the issue, but it didn't need to be :)

Feel free to contact me on Twitter at @mitchellhenke with any questions or clarifications. Thanks for reading!

RokkinCat

is a software engineering agency. We build applications and teach businesses how to use new technologies like machine learning and virtual reality. We write about software, business, and business software culture.