Archive for the ‘PostgreSQL’ Category

Postgres Session Variables – Neat.

After futzing around a bit, and once again having my suspicions confirmed, I came up with the following solution to my session variables problem: temporary tables. They are dropped at the end of the session, so it all pans out nicely. I still didn’t like having to make front-ends do all the work (plus I was all excited after figuring it out), so I slapped together a pretty basic couple of wrapper functions:

/*
 * Session Variables in PostgreSQL via PL/pgSQL
 * Written/tested on version 8.4.2, but should work anywhere
 *
 * Code written by Stephen "sycobuny" Belcher
 *
 * Free to use, I enjoy writing stuff like this.
 * Just give me some props if you do.
 *
 */

/* -- These lines may not be necessary, lang/schema may exist

   -- Remove commenting only if they need to be created

CREATE
  LANGUAGE 'plpgsql';

CREATE
  SCHEMA SV;

 */

-- Make sure the session_variables temporary table exists

-- TODO: Make sure it has the right columns, too
CREATE OR REPLACE
  FUNCTION SV.ensure_session_table_exists()
  RETURNS VOID AS
$BODY$BEGIN
  PERFORM *
    FROM pg_catalog.pg_class
    WHERE relname = 'session_variables' AND
          relnamespace = pg_catalog.pg_my_temp_schema();

  IF NOT FOUND THEN
    CREATE
      TEMPORARY TABLE session_variables (
        "key" TEXT PRIMARY KEY,
        "value" TEXT
      );
    RETURN;
  END IF;
END;$BODY$ LANGUAGE 'plpgsql';

-- Set a variable. Yep.
CREATE OR REPLACE
  FUNCTION SV.set(IN xKey TEXT, INOUT xValue TEXT) AS
$BODY$BEGIN
  PERFORM SV.ensure_session_table_exists();
  PERFORM *
    FROM session_variables
    WHERE "key" = xKey;

  IF FOUND THEN
    UPDATE session_variables
      SET "value" = xValue
      WHERE "key" = xKey;
  ELSE
    INSERT
      INTO session_variables ("key", "value")
      VALUES (xKey, xValue);
  END IF;

  RETURN;
END;$BODY$ LANGUAGE 'plpgsql';

-- Get a variable's value. It's just that easy!
CREATE OR REPLACE
  FUNCTION SV.get(IN xKey TEXT, OUT xValue TEXT) AS
$BODY$BEGIN
  PERFORM SV.ensure_session_table_exists();
  PERFORM "value"
    FROM session_variables
    WHERE "key" = xKey;

  IF NOT FOUND THEN
    RAISE WARNING 'Variable % does not exist', xKey;
  END IF;

  SELECT "value"
    INTO xValue
    FROM ession_variables
    WHERE "key" = xKey;

  RETURN;
END;$BODY$ LANGUAGE 'plpgsql';

From this point, you just do:

SELECT SV.set('my session variable', 'its value');
SELECT SV.get('my session variable');

It handles creating and updating the table and values independently, without any hassles.

Well, there’s a couple hassles:

If it’s the first time you access it in a session, it barfs out warnings because PostgreSQL likes to warn you when it creates new indexes and constraints implicitly; whether this causes any libraries such as ActiveRecord to croak, I’m not sure – I will be testing that one shortly at least. Also, the get() function itself throws up a warning if you try to get a variable that hasn’t been defined yet. This is because the value of any variable can actually be NULL, but there’s nothing else to return if it hasn’t been defined yet. It’s an intrinsically implementation-specific concern whether this behavior is desired, so I split the difference: you can do it, but you’re going to be tut-tutted by the database.

So, there you have it. Of course, changing the schema into which I put these functions should be trivial; I put them there because I like the simple syntactic sugar it provides, but, as they are not tied to a permanent table it should be as simple as just changing the function names. I’ll probably wind up doing it myself, as my schemas are thematically named (individual “projects” and their “code names”).

There’s one final gotcha, which I didn’t fully account for because I ran out of ideas how to ensure it continues working right: PostgreSQL processes statements directed towards temporary tables first (if you don’t specify the fully qualified name), before checking the schemas in the search_path. I’m not sure if this is an SQL-standard way of doing things; if it isn’t, I’m sure they’ll correct it at some point, and then my code will be broken. Unfortunately, I’m not entirely sure how best to perform a query on a temporary schema (which is assigned a technically-random name by the database) without constructing the query as string, which totally removes any optimization done by preparing the statements ahead of time. If there’s some way around that, I’d be quite interested to hear it. Oh yeah, and it hides your “session_variables” table, if you’ve made one yourself. Sorry. Qualify your names and it won’t be a problem, though.

Postgres Triggers, Why Do You Hate Me?

Actually, in spite of the melodramatic post title, figuring out triggers in PostgreSQL has been relatively painless. Of course, I had a pretty firm grasp of them in MySQL, and thus the major migration headache is realizing that the code handling the trigger has to be defined separate from the trigger itself.

The issue for me, however, is that (unlike Peter Eisentraut)  I have a sordid love affair with schemas. For me, it’s not just about addressing the potential naming conflicts, but breaking down the tasks our database/frontend performs into more manageable blocks. Put it this way: I have over 100 tables, functions aplenty, and datatypes (as ENUM now has to be a datatype) to spare. While I recognize that it’s not impossible to manage, and there’s many systems that probably have a volume of data far exceeding mine, it still does wonders for my sanity if I can break those down so I only have to look through at most 25 tables at a time. They’re not randomly selected, they are geared towards similar ends and logically fit together, and I think schemas really helps on that front.

My solution to Peter’s problem of localized search paths is obtuse, but it works: I deliberately name the full path in anything that’s going inside a stored procedure. It’s a very tedious and defensive posture, but it has worked pretty well, at least up until now. This is where we get back to triggers: triggers in PostgreSQL are not named according to the same conventions as almost everything else in the DBMS. While most times, in the docs “simple_name” by itself is tantamount to “public”.”simple_name” (as the default search path is ‘”$user”, public’), this is not so for triggers. They are associated with the tables for which they are defined. While in hindsight, this makes sense, it took some time to figure this out (also, Michael Graziano pointed me in the right direction after I bitched about it). What would have been far more simple is if, anywhere in the documentation, they had simply specified this strange behavior. Even a hint, when executing “CREATE TRIGGER my_schema.do_something” other than a bland “syntax error at ‘.’” would be nice.

The other part of my problem with triggers doesn’t have anything to do with the trigger mechanism itself, but an issue I’ve encountered in the database as a whole. There is, as far as I can tell, no mechanism for creating connection-level or database-level custom variables. You can make variables obviously in any of the procedural languages. However, setting a variable, like MySQL’s “SET @@my_custom_variable := ‘some custom value’;” just doesn’t seem to exist. While this may not seem like a feature that would be particularly useful (after all, there are the aforementioned procedural languages), I’ve been finding it quite problematic.

When we were on MySQL, we used wxPerl and wxRuby as front-ends to connect directly to the database using database-level logins, and I was able to write auditing fairly handily: not only could I have information about the table on what data was modified and when, I could also log who modified the data. That doesn’t seem all that amazing, except that this was all database-side. The clients had to change exactly 0 code. You could even optionally issue a “@@COMMENTS := ‘my editing comments’;” and have it apply to all of the changes automatically until you unset it, providing an easy-access way to comment the audit log. I knew this was going to be problematic switching to rails, as ActiveRecord only ever connects as one user, and uses special models to manage logins. However, with MySQL, I could just ensure that the client issued a “@@CURRENT_USER := ‘whomever@address’;” before starting work. This, while obtuse, still has the database doing the majority of the legwork. I could easily reject any statements that occured before that variable was assigned, to make sure that changes made should always include complete auditing data.

PostgreSQL has denied my attempts at a simple solution to this, however. I’ve been searching, but the easiest solution seems to be to make a special table which holds “current connection” information and somehow tie the connection ID to the user ID that way. This seems pretty complicated, and I’m not sure how safe it is to assume that the connection ID will be unique for an arbitrary length of time (as I’m not sure what mechanisms exist for periodically purging the table of stale IDs). If there is any other way around this, I’d love to hear about it.

Return top

About Me

I do software development and database management. I went to school for harp performance and I'm pretty decent at it.
Here's some more, if you're interested.