Posts Tagged ‘connections’

Postgres GUC as Session/Transaction Variables

Edit Feb 8 8:53: After playing around a bit more with the functions, I’ve added another caveat dealing with the function volatility.

Some time ago, I wrote about session variables in PostgreSQL. I’ve been using the solution for some time to address the problem of performing a more-or-less-automatic audit trail for certain important tables when using accounts defined by the system and not the database, and it’s been working pretty well so far. However, I’ve always been concerned about the idea of potentially creating a new table for every transaction, even if it’s temporary. The database in question is a very small low-throughput system just used internally, but being inefficient just cause nobody will notice doesn’t seem like a good enough excuse.

As Rails is our front-end and it uses connection pooling, there’s a possibility that two subsequent pageviews would use two different connections to the DB. While that doesn’t always generate a new table (the second connection could have already had the temporary table created), it does necessitate resetting the values in the table that I use for auditing (current_user and audit_notes) every time, just to be sure. I’m unsure where on the scale of efficiency it would fall to validate that an update must occur before actually doing it, but it seems, due to the fact that I must validate the table as a whole exists before trying to do anything, that this is the least of my problems.

However, I recently had the magic of Tom Lane shine down from on high in this recent thread on the pgsql-sql mailing list, wherein he made mention of a feature of which I had been previously unaware: custom GUC variables. I’m not actually sure what the GUC stands for, even. However, what it provides is a namespace into which you can throw arbitrary variables. It’s designed for modules that are loaded at run time and need configuration (like plperl.use_strict).

If you clicked on the link to the thread, you’ll realize that Tom suggested this as a solution to the very problem I had built my “variable” temporary table method to address. The requested solution involved transaction-level isolation, but as ActiveRecord doesn’t seem to like using them unless you beat it thoroughly about the head, I’m more concerned with connection-level isolation. Fortunately, it looks like this does both! Instead of my big long complicated functions, you can simply include “custom_variable_classes = ‘audit’” in postgresql.conf, reload, and in any connection do “SET audit.”current_user” = ‘whoever’;” There are a couple of small caveats worth noting, however:

  • This is not the usage for which the GUC system was designed; as such, it is somewhat a Bad Thing to do, as it can potentially cause screwing around with modules that are loaded. This is particularly notable if you have multiple databases in your server, as it’s a global setting. Each one of your users will have that variable namespace. As far as I know, that’s not a security concern, but rather a nuisance concern, if they’ve never asked for such a thing. However, I think it’d be charitable to describe the situation where it’s problematic as an extreme edge case. And as far as the loaded modules, it would seem fairly trivial, unless you have millions being loaded and unloaded all the time (I don’t even know if you can unload without a server restart), that assigning a unique name to your variable class should not be a problem.
  • SET statements in PostgreSQL allow for setting string literals with optional quoting. This may not be obvious if you’ve never used the SET command (which I tend only to use for search_path), but it means you can’t set the value using variable substitution, i.e. in a function call. You’ll have to compile the query at runtime of the function using EXECUTE and that can be unpleasant for everybody if you’re not careful about it. That is, use pg_catalog.quote_literal() to make sure your variables are safe, because any characters PostgreSQL can’t figure are part of the string will cause errors. You shouldn’t bother using pg_catalog.quote_nullable(), for why, see the last point.
  • Certain values of variables must be ident-quoted. So far I’ve found that to be true of “user” and “current_user” at least (so, set audit.”user” instead of audit.user), and I presume there are others. Someone smarter than me may have an answer for this.
  • If you want to access the value of a variable via a function, the function must be declared as VOLATILE. IMMUTABLE is clearly out because it doesn’t depend on inputs, and for some reason that I’m unaware of STABLE is also out. This is probably a function of the SHOW command rather than of custom GUC variables in particular. Speaking of SHOW…
  • Retrieving the value of the variable, say in a PL/pgSQL trigger, can be done via “SHOW audit.”current_user” INTO some_variable;” – I’m not sure if there are more efficient ways but that’s the one I’ve found that works. At least, most of the time…
  • Retrieving the value of a configuration variable that has not been set yet causes an exception to be raised. This is not an insurmountable problem, as you can simply trap the exception, but as the documentation warns, an exception-trapping block in PL/pgSQL is far more expensive than a regular block, so it shouldn’t be done if you can avoid it, which would be easy except…
  • SET statements do not allow you to assign NULL values to configuration variables. This is problematic if, like me, you want to allow someone to optionally insert some notes to go along with any auditing for a particular chunk of work (“I just changed the received time on this log because it turns out that was a 2 not a squiggled-out number”), but don’t want a pile of empty strings littering everywhere. You can handle it in one of two ways: have your trigger functions call NULLIF() and always assign the return value of the variable you want to NULL if it’s set to ”, or just trap any exceptions from unused variables and return NULL. While I think NULLIF() is probably the cheaper option (without any benchmarks backing this gut feeling up), the trapping exceptions method is probably the Right Thing to do.

So, for all those things to be kept aware of, the end result can be just as simple as:

CREATE FUNCTION audit_user(OUT TEXT) LANGUAGE PLPGSQL AS
$$BEGIN
    SHOW audit."current_user" INTO $1;
EXCEPTION WHEN OTHERS THEN
    $1 := NULL;
END;$$;

CREATE FUNCTION audit_notes(OUT TEXT) LANGUAGE PLPGSQL AS
$$BEGIN
    SHOW audit.notes INTO $1;
EXCEPTION WHEN OTHERS THEN
    $1 := NULL;
END;$$;

I trapped the exception OTHERS for two reasons: one, I assumed that OTHERS would be faster than comparing against a specific case; and two, I plain just don’t know what specific exception gets raised when this happens. I also did not create a function that would set the auditing variables, as I figured there would be little point to creating a function that would basically just be wrapping a SET call. It’s all clearly much shorter and more sane than the temporary table solution, not to mention it seems it’s likely to be a lot faster.

I think I’ve found myself a winner.

Postgres Session Variables – Neat.

After futzing around a bit, and once again having my suspicions confirmed, I came up with the following solution to my session variables problem: temporary tables. They are dropped at the end of the session, so it all pans out nicely. I still didn’t like having to make front-ends do all the work (plus I was all excited after figuring it out), so I slapped together a pretty basic couple of wrapper functions:

/*
 * Session Variables in PostgreSQL via PL/pgSQL
 * Written/tested on version 8.4.2, but should work anywhere
 *
 * Code written by Stephen "sycobuny" Belcher
 *
 * Free to use, I enjoy writing stuff like this.
 * Just give me some props if you do.
 *
 */

/* -- These lines may not be necessary, lang/schema may exist

   -- Remove commenting only if they need to be created

CREATE
  LANGUAGE 'plpgsql';

CREATE
  SCHEMA SV;

 */

-- Make sure the session_variables temporary table exists

-- TODO: Make sure it has the right columns, too
CREATE OR REPLACE
  FUNCTION SV.ensure_session_table_exists()
  RETURNS VOID AS
$BODY$BEGIN
  PERFORM *
    FROM pg_catalog.pg_class
    WHERE relname = 'session_variables' AND
          relnamespace = pg_catalog.pg_my_temp_schema();

  IF NOT FOUND THEN
    CREATE
      TEMPORARY TABLE session_variables (
        "key" TEXT PRIMARY KEY,
        "value" TEXT
      );
    RETURN;
  END IF;
END;$BODY$ LANGUAGE 'plpgsql';

-- Set a variable. Yep.
CREATE OR REPLACE
  FUNCTION SV.set(IN xKey TEXT, INOUT xValue TEXT) AS
$BODY$BEGIN
  PERFORM SV.ensure_session_table_exists();
  PERFORM *
    FROM session_variables
    WHERE "key" = xKey;

  IF FOUND THEN
    UPDATE session_variables
      SET "value" = xValue
      WHERE "key" = xKey;
  ELSE
    INSERT
      INTO session_variables ("key", "value")
      VALUES (xKey, xValue);
  END IF;

  RETURN;
END;$BODY$ LANGUAGE 'plpgsql';

-- Get a variable's value. It's just that easy!
CREATE OR REPLACE
  FUNCTION SV.get(IN xKey TEXT, OUT xValue TEXT) AS
$BODY$BEGIN
  PERFORM SV.ensure_session_table_exists();
  PERFORM "value"
    FROM session_variables
    WHERE "key" = xKey;

  IF NOT FOUND THEN
    RAISE WARNING 'Variable % does not exist', xKey;
  END IF;

  SELECT "value"
    INTO xValue
    FROM ession_variables
    WHERE "key" = xKey;

  RETURN;
END;$BODY$ LANGUAGE 'plpgsql';

From this point, you just do:

SELECT SV.set('my session variable', 'its value');
SELECT SV.get('my session variable');

It handles creating and updating the table and values independently, without any hassles.

Well, there’s a couple hassles:

If it’s the first time you access it in a session, it barfs out warnings because PostgreSQL likes to warn you when it creates new indexes and constraints implicitly; whether this causes any libraries such as ActiveRecord to croak, I’m not sure – I will be testing that one shortly at least. Also, the get() function itself throws up a warning if you try to get a variable that hasn’t been defined yet. This is because the value of any variable can actually be NULL, but there’s nothing else to return if it hasn’t been defined yet. It’s an intrinsically implementation-specific concern whether this behavior is desired, so I split the difference: you can do it, but you’re going to be tut-tutted by the database.

So, there you have it. Of course, changing the schema into which I put these functions should be trivial; I put them there because I like the simple syntactic sugar it provides, but, as they are not tied to a permanent table it should be as simple as just changing the function names. I’ll probably wind up doing it myself, as my schemas are thematically named (individual “projects” and their “code names”).

There’s one final gotcha, which I didn’t fully account for because I ran out of ideas how to ensure it continues working right: PostgreSQL processes statements directed towards temporary tables first (if you don’t specify the fully qualified name), before checking the schemas in the search_path. I’m not sure if this is an SQL-standard way of doing things; if it isn’t, I’m sure they’ll correct it at some point, and then my code will be broken. Unfortunately, I’m not entirely sure how best to perform a query on a temporary schema (which is assigned a technically-random name by the database) without constructing the query as string, which totally removes any optimization done by preparing the statements ahead of time. If there’s some way around that, I’d be quite interested to hear it. Oh yeah, and it hides your “session_variables” table, if you’ve made one yourself. Sorry. Qualify your names and it won’t be a problem, though.

Postgres Triggers, Why Do You Hate Me?

Actually, in spite of the melodramatic post title, figuring out triggers in PostgreSQL has been relatively painless. Of course, I had a pretty firm grasp of them in MySQL, and thus the major migration headache is realizing that the code handling the trigger has to be defined separate from the trigger itself.

The issue for me, however, is that (unlike Peter Eisentraut)  I have a sordid love affair with schemas. For me, it’s not just about addressing the potential naming conflicts, but breaking down the tasks our database/frontend performs into more manageable blocks. Put it this way: I have over 100 tables, functions aplenty, and datatypes (as ENUM now has to be a datatype) to spare. While I recognize that it’s not impossible to manage, and there’s many systems that probably have a volume of data far exceeding mine, it still does wonders for my sanity if I can break those down so I only have to look through at most 25 tables at a time. They’re not randomly selected, they are geared towards similar ends and logically fit together, and I think schemas really helps on that front.

My solution to Peter’s problem of localized search paths is obtuse, but it works: I deliberately name the full path in anything that’s going inside a stored procedure. It’s a very tedious and defensive posture, but it has worked pretty well, at least up until now. This is where we get back to triggers: triggers in PostgreSQL are not named according to the same conventions as almost everything else in the DBMS. While most times, in the docs “simple_name” by itself is tantamount to “public”.”simple_name” (as the default search path is ‘”$user”, public’), this is not so for triggers. They are associated with the tables for which they are defined. While in hindsight, this makes sense, it took some time to figure this out (also, Michael Graziano pointed me in the right direction after I bitched about it). What would have been far more simple is if, anywhere in the documentation, they had simply specified this strange behavior. Even a hint, when executing “CREATE TRIGGER my_schema.do_something” other than a bland “syntax error at ‘.’” would be nice.

The other part of my problem with triggers doesn’t have anything to do with the trigger mechanism itself, but an issue I’ve encountered in the database as a whole. There is, as far as I can tell, no mechanism for creating connection-level or database-level custom variables. You can make variables obviously in any of the procedural languages. However, setting a variable, like MySQL’s “SET @@my_custom_variable := ‘some custom value’;” just doesn’t seem to exist. While this may not seem like a feature that would be particularly useful (after all, there are the aforementioned procedural languages), I’ve been finding it quite problematic.

When we were on MySQL, we used wxPerl and wxRuby as front-ends to connect directly to the database using database-level logins, and I was able to write auditing fairly handily: not only could I have information about the table on what data was modified and when, I could also log who modified the data. That doesn’t seem all that amazing, except that this was all database-side. The clients had to change exactly 0 code. You could even optionally issue a “@@COMMENTS := ‘my editing comments’;” and have it apply to all of the changes automatically until you unset it, providing an easy-access way to comment the audit log. I knew this was going to be problematic switching to rails, as ActiveRecord only ever connects as one user, and uses special models to manage logins. However, with MySQL, I could just ensure that the client issued a “@@CURRENT_USER := ‘whomever@address’;” before starting work. This, while obtuse, still has the database doing the majority of the legwork. I could easily reject any statements that occured before that variable was assigned, to make sure that changes made should always include complete auditing data.

PostgreSQL has denied my attempts at a simple solution to this, however. I’ve been searching, but the easiest solution seems to be to make a special table which holds “current connection” information and somehow tie the connection ID to the user ID that way. This seems pretty complicated, and I’m not sure how safe it is to assume that the connection ID will be unique for an arbitrary length of time (as I’m not sure what mechanisms exist for periodically purging the table of stale IDs). If there is any other way around this, I’d love to hear about it.

Return top

About Me

I do software development and database management. I went to school for harp performance and I'm pretty decent at it.
Here's some more, if you're interested.