PostgreSQL cheat sheet
Pete Freitag has posted a nice little PostgreSQL Cheat Sheet
PL/Ruby loves RubyGems and DRb
I admit it. I have had a torrid love affair with procedural languages ever since I started playing with PostgreSQL. The ability to share logic amongst all the applications touching the same database server.. was…well… a breath of fresh air.
What is a procedural language in Postgresql?
PostgreSQL docs describe them as, ”…allows user-defined functions to be written in other languages besides SQL and C. “
Well, PostgreSQL has PLs for Perl, Python, Java, C, PHP… and even RUBY!
CREATE FUNCTION ruby_max(int4, int4) RETURNS int4 AS '
if args[0].to_i > args[1].to_i
return args[0]
else
return args[1]
end
' LANGUAGE 'plruby';
PL/PGSQL is nice and all, but it’s not as fun as playing with Ruby. PL/Perl… well is perl, and PL/Python… is python. Both PL/Perl and PL/Python have untrusted variants. You see, they don’t want your PostgreSQL server to do anything harmful to the machine by being able to do stuff like system(‘cat /dev/null > /etc/passwd). But for some people, (like me) they want the flexibility of their language anyways. :-)
Note: Never do this if your system user that runs PostgreSQL has privileges to do anything harmful on your system.
The PL/Ruby documentation is minimal at the moment, but covers enough to get you started. I don’t know if many people are using it out there… but hopefully that is about to change! I’ve played with it a bit, but always wanted to be able to do stuff like require ‘rubygems’, but this is a feature of an untrusted language. I even found myself digging around in C code to see if I could figure out how to hack the plruby language to skip over those checks… but I am not a C programmer and I got lost in some header files.
Then it hit me. “Why haven’t you emailed the author?”
So I emailed the author of PL/Ruby, Guy Decoux, who responded pretty quickly with the answer to my dreams! Okay, I do have bigger dreams than this… but you get the idea.
First of all, some of you might be thinking, ”Why on Earth would you want to do this?”
Well, here is a simple example of how it could be used with RedCloth Let’s say that I want to be able to perform the following query from within SQL.
Why not do this in the application? Well, I do actually have a case where I have an older PHP application that I will be porting to Ruby in the future, but would like to give the application some access to some of the features of Ruby that I will be using, such as RedCloth.
Okay, so show me an example of one of these scary PostgreSQL functions.
CREATE FUNCTION redcloth(text) RETURNS text AS '
require ''rubygems''
require ''redcloth''
content = args[0]
rc = RedCloth.new(content)
return rc.to_html
' LANGUAGE 'plruby';
”Wait! You said this would be scary!?”
Well, PL/Ruby allows you to write… plain ole Ruby within your functions. (do you see where I am getting here?)
PL/Ruby meets RedCloth
rb=# SELECT redcloth('*strong text* and _emphasized text_');
redcloth
------------------------------------------------------------------
<p><strong>strong text</strong> and <em>emphasized text</em></p>
(1 row)
PL/Ruby meets ShortURL
CREATE FUNCTION rubyurlize(text) RETURNS text AS '
require ''rubygems''
require ''shorturl''
return ShortURL.shorten(args[0])
' LANGUAGE 'plruby';
...which allows for
rb=# SELECT
rb-# rubyurlize('http://www.robbyonrails.com/') as link1,
rb-# rubyurlize('http://moulon.inra.fr/ruby/plruby.html') as link2;
link1 | link2
--------------------------+------------------------
http://rubyurl.com/lyoKm | http://rubyurl.com/dTo
(1 row)
PostgreSQL meets DRb
Okay, this is one of the reasons why I wanted to play with PL/Ruby a bit more. Distributed Ruby Objects… from PostreSQL?
What is DRb?
If you don’t know already… per the description in RDOC, “dRuby is a distributed object system for Ruby. It allows an object in one Ruby process to invoke methods on an object in another Ruby process on the same or a different machine.”
It basically allows you to share an object to other machines… at the same time!
mmm…distributed objects…
DRb Object
Here is a simple ruby script that you would run from the shell. It creates a DRb object which accepts connections at localhost:9000.
#!/usr/bin/ruby
require 'drb'
class MyRemoteObject
def say(str)
return "You say #{str}. I say #{str.reverse.upcase}!"
end
end
server = MyRemoteObject.new
DRb.start_service('druby://localhost:9000', server)
DRb.thread.join
Start me up!
$ ruby mydrb.rb
Now that we have DRb running and listening for connections…we need a client to connect to it.
DRb function in PL/Ruby
Here is a very simple DRb client script and I just drop that into a PostgreSQL function.
CREATE FUNCTION drb_test(text) RETURNS text AS '
require ''drb''
DRb.start_service
ro = DRbObject.new(nil, ''druby://localhost:9000'')
return ro.say(args[0])
' LANGUAGE 'plruby';
The result?
rb=# SELECT drb_test('Potato');
drb_test
-------------------------------
You say Potato. I say OTATOP!
(1 row)
Are we having fun yet?
Okay, so how do I manage to get this to work? Well… for that, you will have to read my blog post, Installing untrusted PL/Ruby for PostgreSQL
Let’s all go get some coffee (or tea) and start playing with PL/Ruby today!
Installing untrusted PL/Ruby for PostgreSQL
This is going to be short and sweet.
“PL/Ruby is a loadable procedural language for the Postgres database system that enable the Ruby language to create functions and trigger procedures”
Method 1. The standard, safe, PL/Ruby.
Before running this, you need to have all the PostgreSQL headers installed. (se INSTALL in the postgresql directory)make install-all-headers
To install PL/Ruby, you need to download the tarball from here. As you can see, I download it with wget
and then install like I would any ruby library. (maybe plruby could become a gem?)
cd /usr/local/src
wget ftp://moulon.inra.fr/pub/ruby/plruby.tar.gz
tar zxvf plruby.tar.gz
cd plruby
ruby extconf.rb
make
make install
Method 2: The untrusted, but super cool PL/Ruby.
Guy Decoux, author of PL/Ruby, was kind enough to share a secret about the PL/Ruby install. (from his email…)
Well plruby normally run with $SAFE = 12, this value if fixed at compile time. Now it has an undocumented option, if you compile it with ruby extconf.rb --with-safe-level=0 ... it will run with $SAFE = 0 and you have the equivalent of an untrusted language.
Pretty simple solution, eh?
On my server I was able to run the following:
cd /usr/local/src
wget ftp://moulon.inra.fr/pub/ruby/plruby.tar.gz
tar zxvf plruby.tar.gz
cd plruby
sudo ruby extconf.rb \ --with-pgsql-dir=/usr/local/pgsql-8.0 \ --with-safe-level=0 \ --with-suffix=u
make
make install
Update: the --with-suffix=u
was added after someone commented on this. This allows you to install plruby and plrubyu.
Installing PL/Ruby in PostgreSQL Up until now, you haven’t actually installed the language into the database. We’re close though!
All that you need to do is run the following commands to install it to a specific database in your server.
$ psql template1
template1=# CREATE DATABASE plruby;
CREATE DATABASE
template1=# \c plruby
You are now connected to database "plruby".
plruby=# create function plruby_call_handler() returns language_handler
plruby-# as '/usr/lib/site_ruby/1.8/i386-linux/plruby.so'
plruby-# language 'C';
CREATE FUNCTION
plruby=# create language 'plruby'
plruby-# handler plruby_call_handler
plruby-# lancompiler 'PL/Ruby';
CREATE LANGUAGE
plruby=#
That should be all there is to it!
Where do we go from here?
See my post: PL/Ruby loves RubyGems and DRb
When TSearch2 Met AJAX
Last night, a local PDX.rb-ist, asked about full text searching in PostgreSQL. I pointed him to TSearch2, which is a nice little addon to handle full text searching with indexing, ranking, highlighting, etc. To my knowledge, it’s the closest to a google-like search that you can get with PostgreSQL. Some people in #postgresql (irc.freenode.net), said that you can build custom functions that will allow you to quote content, and do other fun stuff within your search string. We can discuss that another time.
After thinking it over, I thought, “why not put ajax on top of a full text search and see what it can do?”
The first question, where was I going to get a bunch of content that I could search through and have it be somewhat meaningful for the public, if I decide to put it up as a demo page. The RubyOnRails mailing list came to mind, so after seeing that I couldn’t download the full archive from the rails mailman page (at least not that I could tell), I decided that I would just import my Maildir for that mailing list.
This added another initial step. What would be a good way to import the 13,000~ emails that I had in the folder?
I knew that worst case, I could find a module on CPAN and build a perl script to import it… since I didn’t see anything in the standard ruby library. Then I found TMail. Someone said that they think ActionMailer uses TMail as well.
The resulting quick and dirty script became:
#!/usr/bin/env ruby
require 'tmail'
require 'rubygems'
require 'postgres'
require 'dbi'
conn = DBI.connect("DBI:Pg:database=rails_mailinglist;host=localhost;port=5403", "username", "password" )
MAILBOX = ".MailingLists.Ruby.RubyOnRails"
sql = "INSERT INTO archives (sender, recipient, subject, body) VALUES (?,?,?,?)"
@sth = conn.prepare(sql)
box = TMail::Maildir.new(MAILBOX)
box.each do |port|
mail = TMail::Mail.new(port)
p mail.subject
@sth.execute(mail.from, mail.to, mail.subject, mail.body)
end
exit
Not rocket science. :-)
Okay, so I let that start running through the mailing list emails that I have, and opened up another tab in iTerm and typed our friend, rails archives
followed by cd archives
. The next step was to modify the config/database.yml
file.
(you all know how to do that, right?)
Okay, you should still be with me…so far.
After I got my database settings in place, I ran ./script/generate scaffold Archive
and watched it created my new filles to play with.
./script/server
and I am looking at the first several emails that are in my RubyOnRails mailing list folder. I notice that the first one is the confirmation email from the day that I signed up on the mailing list. Mon, 24 Jan 2005 16:00:14 +0000 (GMT) . So, I delete that email and the ‘welcome to..’ one so that no one sees my mailman password/confirm info. ;-)
Installation
So, Rails has no problem with the data. So, I then head over to the Tsearch2 site and look for some installation information. I walked through this walkthrough
Database Structure
For this example, I kept it pretty simple for the database structure. I believe the create script was:
CREATE TABLE archives (
id SERIAL PRIMARY KEY,
sender VARCHAR(255),
recipient VARCHAR(255),
subject VARCHAR(255),
body TEXT
);
The rest was basically following through with those steps and building the triggers and functions around the subject
and body
fields in the table.
To use the tsearch2 functionality, I used find_by_sql
rather than using just find
.
@archives = Archive.find_by_sql("SELECT id, headline(body,q) as headline, body, rank(idxfti,q) as rank, sender, subject FROM archives, to_tsquery('#{@str}') AS q WHERE idxfti @@ q ORDER BY rank(idxfti,q) DESC LIMIT 100")
The @str
variable is a value that I build based on the string(s) that the user is typing in the search field. Tsearch2 requires that you sepeare each string with a pipe (|
). So, I put in a few checks on the string that was being passed to my method in my controller by AJAX. (I’ll let you take the time to figure out how to get AJAX in Rails working and watching a text field… it’s not hard to find info on google. ) :-)
The end result?
I will warn you that this does’t work in all browsers, some IE people said they had issues… and I spent enough time tinkering with it to just settle with this for now. :-)
I present… fulltext searching with PostgreSQL on Rails.
There are approx 13,000 emails in the system, so I put a limit on the number of responses that show up to 100.
My Thoughts
Well, it was an interesting concept. I’m not a big fan of livesearching, it doesn’t really seem to buy us much when working with this sort of data. I do find live auto-completion to be quite useful though. It’s not practical to have AJAX peg the database every second as I type for new content and it’s obvious that a database with that much content is not going to respond as snappy as you would hope. However, I decided to compare the speed to searching in Thunderbird and Evolution. From my sophesticated benchmarking suite (my imaginary stop watch)...
AJAX won!
okay, I should be fair and say, Tsearch2 won as it is doing all the heavy lifting.
Enjoy!
PostgreSQL sequences in Rails
Rails doesn’t support legacy or custom named sequences at the moment. (as far as I am aware). It’s kind of tricky to have it detect the SEQUENCE
name automatically (every time).
$sql = $db->prepare("SELECT seq.relname::text
FROM pg_class src, pg_class seq, pg_namespace, pg_attribute,
pg_depend
WHERE
pg_depend.refobjsubid = pg_attribute.attnum AND
pg_depend.refobjid = src.oid AND
seq.oid = pg_depend.objid AND
src.relnamespace = pg_namespace.oid AND
pg_attribute.attrelid = src.oid AND
pg_namespace.nspname = ? AND
src.relname = ? AND
pg_attribute.attname = ?");
I used this to mimmick the mysql_insert_id function in PHP for PostgreSQL… ( pg_insert_id )
Well, with Rails, I thought that I would build a similar patch, as the current code just assumes the value would be {column}_id_seq
.
After hours of playing around and thinking that I figured it all out ... I decided to run a quick test with a non standard sequence name… like this one:
testingdb=# \d legacy.foobar
Table "legacy.foobar"
Column | Type | Modifiers
-----------+-----------------------+------------------------------------------------------------
foobar_id | integer | not null default nextval('legacy.old_sequence_name'::text)
name | character varying(40) |
Indexes:
"foobar_pkey" PRIMARY KEY, btree (foobar_id)
testingdb=# INSERT INTO legacy.foobar (name) VALUES ('abc')
testingdb-# ;
INSERT 17514 1
testingdb=# SELECT * FROM legacy.foobar ;
foobar_id | name
-----------+------
106 | abc
(1 row)
My patch wouldn’t figure that out because the sequence was not created by SERIAL
. So, my patch started to feel lame and a total waste of time, because I thought that it was fixing a problem.. that works pretty much as effectively as assuming it is _seq
.. but without needing to run a SQL query to determine that. We all (should) know that the field will be named like that when working with SERIAL
. So, my patch didn’t buy us anything.
However, Active Record still doesn’t support those funky sequence names. So, I found this ticket #1273.
Their approach was very similar to what caused me to use my long SQL query in the first place because this was suggested to me well over a year ago and I found it to not work in the following situation.
If I have two seperate schemas with the same table name in each like so:
=# \d legacy.people
Table "legacy.people"
Column | Type | Modifiers
-----------+-----------------------+---------------------------------------------------------------
people_id | integer | not null default nextval('legacy.people_people_id_seq'::text)
name | character varying(50) |
Indexes:
"people_pkey" PRIMARY KEY, btree (people_id)
=# \d foo.people
Table "foo.people"
Column | Type | Modifiers
-----------+-----------------------+------------------------------------------------------------
people_id | integer | not null default nextval('foo.people_people_id_seq'::text)
name | character varying(50) |
Indexes:
"people_pkey" PRIMARY KEY, btree (people_id)
That patch will not work because you can’t call the following query:
# SELECT adsrc FROM pg_attrdef WHERE adrelid = (SELECT oid FROM pg_class WHERE relname = 'people');
ERROR: more than one row returned by a subquery used as an expression
... because there are two tables with the same name! (fun, huh?)
Mine would work… but why bother with that huge query? So, I took my ticket out of [PATCH] and decided that I thought it would be best to just assume that sequences are generated with SERIAL
( link ) by default in AR.
Okay, so what can we do about custom SEQUENCE
names?
Well, I am proposing the following (and mentioned this in the ticket #2016)...
class LegacyTable < ActiveRecord::Base
def self.table_name() "legacy.foobar" end
# new option for this
set_primary_key "foobar_id", :sequence => "legacy.old_sequence_name"
end
(or something along those lines)
With this, I can work around these legacy database scenarios with a quick option. Thoughts/opinions?
I decided to post this on my blog as well, because I do know that there are a few skeptical PostgreSQL people out there who read my blog… I want you to know that I am looking out for you. ;-)
I am sick and tired.. and going to sleep now.
Typo on PostgreSQL
As Typo only supported SQLite and MySQL so far, I submitted a PostgreSQL schema file for the project. This new blog is running on PostgreSQL 8.0 and Rails!
I am going to use this blog to follow my Rails-related projects..and post tips and tricks!
Cheers
UPDATE Links have been updated as the original typo svn/trac is gone.