The Key=Value Store That Nobody Used

4 min read

Yes, I am aware that your "persistent" Key Value store makes you extremely happy. Did anyone, however, notice that Hstore is included with PostgreSQL? I think PostgreSQL is a really good RDBMS that is frequently disregarded. Even better, it has a fantastic publisher/subscriber system that many people may have implemented using Redis, RabbitMQ, etc. (or LISTEN/NOTIFY in terms of PostgreSQL). I would simply recommend Postgres to anyone who has never used anything other than MySQL.

I'll be concentrating on a key value store that is actually ACID compliant rather than looking at benchmarks. Postgres utilizes its storage engine and has an addition for key-value storage on top. The idea is to have a table with one or more columns with the datatype “hstore” which in turn has a “structure-free storage” in it.

When considering this model, numerous analogies pop into mind. Similar to Cassandra, it may be a Column Family Store where the row key is the table's primary key, each column of the hstore type acts as a supercolumn, and the key for each hstore entry is a column name. Similar to Redis's Hash structures (HSET, HDEL), you can imagine it as a two- or three-level MongoDB store. (few modifications required). it allows me the chance to show you some incredibly simple examples, despite being similar (when little tricks are applied) to your NoSQL store structures.

Let's first configure our system. I'll compile Postgres from source and use the most recent version for my experiment. After entering the source directory, type:

./configure & & make install

to install your Postgres. Don't forget to install the extensions in the contrib directory:

cd ./contrib & & make install

Once it is configured, you can start the server and create your own PostgreSQL database cluster (Tip: Use initdb and pg_ctl). Launch psql after that, and make sure your hstore extension is installed:

CREATE EXTENSI ON hstore;
SELECT 'foo=>bar' ::hstore;

If everything goes according to plan, a table output should be visible. We are now prepared to begin some DDL. I made a table called my_store and defined it as follows:

CREATE TABLE my_store (
	id character varying(1024) NOT NULL,
	doc hstore,
	CONSTRAINT my_store_pkey PRIMARY KEY (id)
)WITH (
	OIDS=FALSE
);

CREATE INDEX my_store_doc_idx_gist ON my_store
USING gist(doc);

You can see that I've created a table with a hstore column type and a GiST index (for operators like?,?&,?|, etc.). To learn more about the various operators that are offered, consult the documentation. I now have a database and tables created, and I use a straightforward script to fill it with about 115K rows of data from a Twitter stream. Remember that this is real-world data, and I was hoping to find out a few fundamental things by running some simple queries on it.

How many people, for instance, use hashtags, make mentions, or include links in their tweets? To accomplish this, I created a straightforward Python script using psycopg2 and tweepy, which I ran for a few hours. If there were any hashtags in the tweet, I added a key value pair of "has_hashtags: t" to each tweet in my store.

Similar to that, I added "has_urls" and "has_mentions" if they appeared in the tweet. Later, I will query my table using these keys and my GiST index.

The Key=Value Store That Nobody Used

Defining Your Database Schema: A Guide to Different Methods

Streamline Your Database Schema Management with Schema Guard

PostgreSQL is More Than Just a Boring Database!

Combining The Power Of Relational And Key=Value Style Under One Store