Perl Ideas

Sunday, May 31, 2009

CPAN Picks: Variable Dump

This is a series in which I will present CPAN modules I often use or deem to be noteworthy otherwise.

While the first post of the series, Quick'n'Dirty SQL, was about SQL access, this post deals with a debugging module: Data::Dump

I'm not a big fan of the Perl debugger and although it has its uses, I prefer to debug by logging debug information from my applications. Now, debug information comes mostly in two flavors: trace info, which indicates that execution has reached a stage (usually the point before or after a certain interesting source code line) and dump info, which allows the inspection of runtime values held by application variables.

The most venerable and ancient (i.e. primitive) dump info provider is the Data::Dumper module (well, apart from Perl Cookbook recipes). The main drawback is that its output usually starts with $VAR1 which is meant to denote the variable being dumped. You can supply a name to it, but it's cumbersome and non-intuitive.

Data::Dump on the other hand, dispenses with variable names altogether. It just prints the value supplied to it, no matter what the variable was called. It also allows you to import a handy short-named dump function (pp) instead of the name-clashing dump.

So what makes Data::Dump stand out from other dumping modules?

The first thing I've mentioned already, it chooses to ignore variable names. While this might seem a drawback at first, I usually have enough trace info between the variable dumps to allow me to distinguish between dumped variables. In the rare cases when that's not enough, I add some debug info describing the variable.

The second feature is that it's context sensitive. Called in void context (i.e. as a command on its own) it will output the dump to STDERR:

use Data::Dump qw/pp/;
pp $my_complex_data; # this gets printed to STDERR

But in any other context (i.e. when it's return value is used) it will just return the dump as a string and skip the output:

use Data::Dump qw/pp/;
$c->log->debug("Stash: " . pp $c->stash );

The final advantage (mostly over Data::Dumper) is that is just seems to work whatever I throw at it, no need to wrap stuff in anonymous array or hash refs. It doesn't dump subroutine refs, but don't need to inspect them anyway (the code I have is not that tricky).

So that's Data-Dump for me, but there are more output functions and tracing wrappers for you to use (if you need them, of course)

Thursday, May 21, 2009

Heaps made simple

This is the third post in the Perl scheduler series. We'll look at the CPAN modules that support our scheduler. As we've kept the design simple, there are only two modules involved (apart from Moose): Heap::Simple and Event.

Each scheduler must have a priority queue where tasks are kept waiting for their execution. These queues typically use a heap. Our heap is kindly provided by Heap::Simple. Heap::Simple claims to be only an interface, so you must also install one of its implementations (XS or pure Perl). The priority of a task is its scheduled execution time (represented as unix time) and Heap::Simple allows us to provide the priorites as object methods. So we initialize the heap like this:

my $heap = Heap::Simple->new( elements => [Object => 'next_time'] )

This tells the heap that its elements will be objects and the next_time method will provide the priority associated with the element. Since we use unix time which is an integer, the ordering of priorities is straightforward. In fact it's the default (numerical, less-than) order provided by the module.

We can add one or more tasks:

$heap->insert( $task, $another_task, @still_other_tasks );

And we can extract all due tasks from the queue:

$heap->extract_upto( time() );

We might also want to inspect all tasks in the queue, without extracting them (for example to save the queue to disk):

my @tasks = $heap->values;

To design a scheduler

Welcome back to the second installment of the Perl scheduler series. The previous post talked about the motivation behind this project, but now it's time for some design decisions. The goal of this first iteration is to have a task scheduler which can execute perl subroutines, methods and external commands. The scheduler should support both cron-style (i.e. recurring execution at fixed intervals) and at-style (i.e. single execution at a fixed time) scheduling. And we're trying to keep the design as simple as possible.

Let's look at the characteristics of existing schedulers. The standard implementation for Linux and BSD (aka the Vixie cron) has a time resolution of 1 minute. This means you can only schedule tasks (they call them cron jobs) at a full minute mark. We wish to allow a finer grained scheduling resolution (one second).

Cron will execute only the tasks which are scheduled at the present moment and will skip tasks which were due earlier but didn't execute (due to downtime, for example). We want to be able to take into account overdue tasks, in case our scheduler crashes or the machine goes down. This implies some sort of persistence for the tasks and their queue.

Cron also allows execution of tasks with a specific user's credentials. Execution with user credentials is also a good thing to have, but it requires root privileges so we'll leave that for later.

And finally, cron can rescan its configuration, looking for new jobs, either periodically or when the user demands it.

To make the design simple we'll start with a process which wakes up and executes a series of task objects (representing those tasks which are overdue and due, in chronological order, i.e. oldest first). The task list will be stored in a heap with the scheduled execution time as the priority. The scheduler can also wake up on demand (on a HUP signal) and re-read it configuration to rebuild its task list.

Monday, May 11, 2009

To "cron" in Perl

My first (and to date only) attempt at releasing a Perl module into the wild (a.k.a. CPAN) was the poorly named and implemented Catalyst::Engine::JobQueue::POE. Why poorly named? Well, it has nothing to do with batch processing and job queues. What it does is allow you to run predefined HTTP requests for your Catalyst application at periodic intervals (a la cron). As for the poor implementation, it has only some basic features and it's based on the POE Catalyst engine which is deprecated today. So it's a pretty sorry excuse for a cron replacement and, need I say?, highly NOT recommended.

Despite this minor setback, I have not abandoned the idea of writing a task scheduler in Perl. Therefore I'm starting a new project to implement cron and at functionality. It's a good pretext to practice Moose (and try out MooseX::* extensions) and IOC-style programming.

Because I have no chance to beat the unix cron implementation in regards to performance, I've chosen to focus on flexibility, extensibility and Perl specific integration (like Catalyst integration).

Next up: To design a scheduler

Tuesday, May 5, 2009

Logging to a database

Usually log messages are sent to a logfile or another host. In certain cases though, it pays off to put your log messages into a database.

We have a webapp which receives a large number of requests. The requests are dispatched to various drivers. For each request we'll store the following data in the request table: the user which made the request, the driver to which it as been dispatched and the status of the request result. In addition to this we'll provide the driver with a logger object, which can be use to store messages in another table (message). Each message is associated to it's corresponding request via a foreign key.

The Database

Let's use an SQLite database and set up the two tables:

CREATE TABLE request (
    id INTEGER PRIMARY KEY,
    user TEXT NOT NULL,
    driver TEXT NOT NULL,
    status TEXT NOT NULL
);

CREATE TABLE message (
    request_id INTEGER NOT NULL,
    level TEXT NOT NULL,
    msg_text TEXT NOT NULL,
    msg_timestamp TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP,
    CONSTRAINT fk_request_id FOREIGN KEY (request_id) REFERENCES request(id) ON DELETE CASCADE
);

Yes, foreign key constraints aren't enforced in SQLite, but you can enforce them with triggers.

Our webapp already does log to files with the quite wonderful Log::Log4perl module. We want to keep using this module for the database logging.

Here's what we're gonna do: when a request has arrived and has been validated, we'll create a request record and use its id for all the messages logged during that request's handling. The only problem is letting the driver know the request id as easy as possible.

As it turns out, Log::Log4perl has just the right tools for our task. They're called: Log::Log4perl::Appender::DBI and Log::Log4perl::MDC. The first one maps a log call to an SQL query, which will insert the message into the message table. The second one is called Mapped Diagnostic Context and it provides us with a hash to store data in. This hash can be accessed directly from the appender, so the driver doesn't even need to know the request id and can log as easy to the database as it would log to a file.

The Setup

Before we get to the proper Log::Log4perl configuration, let's see how we'll setup the request and request id:

package WebApp;

use Log::Log4perl;
Log::Log4perl::init('log4perl.conf');

sub handle_request {
    ...
    $db->insert('request',{
        user => $user,
        driver => $driver,
        status => $status,
    });
    my $request_id = $db->last_insert_id;
    Log::Log4perl::MDC->put('req_id', $request_id);
    ...
}

That's all there is to it. We're using DBIx::Simple here. I've already talked about it, but here are some new features. The insert method is quite straightforward. It takes a table name and a hash and inserts the hash values into the table using the corresponding hash keys as column names. The last_insert_id method is mapped to the DBI method with same name and it returns the id of the last inserted row, namely our request id. Finally, we store the request id in the MDC hash for the appender to use.

The Configuration

Now we must configure Log::Log4perl to store the messages in the database, more precisely in message table, with the proper request id. To do so we edit the log4perl.conf configuration file:

# request logger
log4perl.logger.request=DEBUG, request_log

# request appender
log4perl.appender.request_log=Log::Log4perl::Appender::DBI
log4perl.appender.request_log.datasource=DBI:SQLite:dbname=log.db
log4perl.appender.request_log.sql=INSERT INTO message (request_id, level, msg_text) VALUES( ?, ? , ?);
log4perl.appender.request_log.params.1=%X{req_id}
log4perl.appender.request_log.params.2=%p
log4perl.appender.request_log.usePreparedStmt=1

... aaand we're done. What we've done is create a new logger (named request) and told it to use the request_log appender. This appender talks to the database we've configured (in the datasource field) and executes a prepared statement (it's faster!) with the given SQL query. Note that the query itself uses placeholders to avoid any SQL injection issues. The query parameters are first filled out from the config: the first one (mapped to request_id) is taken from the MDC hash we set up earlier. The second one is the log level (debug, info, warn, error, ...). The rest of ther parameters (well, there's only one left, msg_text) will be taken from the log call and yes, it's the log message itself.

The Payoff

Every driver can get a database logger object and log information into the database:

package MyDriver;

use Log::Log4perl;
my $logger = Log::Log4perl::get_logger('request');
...
$logger->warn('Invalid frobnitz detected');

What's more, SQLite will kindly record a timestamp for the log message too (in the msg_timestamp. The messages will be linked to their request and you can select and aggregate them them by level, time, user, driver or status.

The Bonus

The log message table structure can be extended easily. Just add these two lines to the Log::Log4perl config:

log4perl.appender.request_log.layout=Log::Log4perl::Layout::NoopLayout
log4perl.appender.request_log.warp_message=0

They will keep Log::Log4perl from concatenating the arguments of a log. Now you to use more parameters in the log call and map them to other fields in the table. For example we'd like the driver to provide an error code along with the log message. Just add the error_code column to the message table and change the SQL query config line to:

log4perl.appender.request_log.sql=INSERT INTO message (request_id, level, msg_text, error_code) VALUES( ?, ?, ?, ?);

Now the driver can provide the code as the second parameter:

$logger->warn('Frobnitz is invalid', 'INVALID_FROBNITZ');

Even if the driver provides no second parameter the error_code column will be NULL.

The Pitch

That's amazing! You've seen it with your own eyes, folks, yes, Log::Log4perl can log it all!! Even to a database.... with more than one field... and a many-to-one relationship!!! It's true! It logs like no other! ~~Buy~~ Install now!

Wednesday, April 29, 2009

Quick'n'Dirty SQL

I'm a convinced DBIx::Class user, but from time to time I find myself in need of a quick'n'dirty alternative. Something that let's me meddle with a database without the need of setting up result set classes or using loaders, but at the same time something more powerful and expressive than pure DBI.

Therefore, I present to you DBIx::Simple . It's half-way between the thought-out, "enterprisey" DBIx::Class and the raw, "bare-metal" DBI.

Let's see what you can do with it! It tend to use it in conjunction with SQL::Abstract, since I'm familiar with it via the search method of DBIx::Class.

First things first: setting up a DBIx::Simple object is trivial if you already have a connected DBIx::Class schema:

$my_dbs = DBIx::Simple->connect( $my_schema->storage->dbh );

Even if you don't have a schema, you can use the same semantics as with DBIC (namely an array):

$my_dbs = DBIx::Simple->connect( $dsn, $user, $pass, \%options );

Quick'n'Dirty Hashes

Let's say we have a database which describes articles (i.e. there's table articles which has an article_id and a name column) and we want to turn that into a hash with the id as key and the name as value:

%articles = $my_dbs->select('articles')->map;

Note: assumes the table has just these two columns and article_id is the first column.

OK, but what if we want to mark some articles in the database as unavailable and have the hash return only the available articles? Just add an available column and there you go:

$available_articles = $my_dbs->select('articles', [qw/article_id name/], { available => 1 })->map;

Note: now we specify the two columns which make up the hash, along with the select condition so order and additional columns are not relevant anymore.

You might have noticed that the first code snippet creates a hash, while the second one creates a hash reference. DBIx::Simple will detect the context and return the appropriate value.

Quick'n'Dirty Arrays

You can also just pull out a list of all articles names, instead of a hash:

@article_list = $my_dbs->select('articles', ['name'])->flat;

Oh, just a list of unavailable articles, but you want the ids instead of the names? OK:

$unavailable_articles = $my_dbs->select('articles', ['article_id'], { available => 0 })->flat;

Note: as with map, DBIx::Simple will allow you to create either arrays or array references depending on context.

Quick'n'Dirty Complex Data Structures

Now let's consider something more complex. Let's get all the data pertaining to articles in a big array. Each record will be a hash with column names as keys and record values as values:

@all_articles = $my_dbs->select('articles')->hashes;

That's all there is to it. Of course you can get an array of arrays (with the arrays method, of course). For arrays of arrays it's best to specify a column list so as not to rely on the implicit column order provided by the database:

@articles_as_arrays = $my_dbs->select('articles', [qw/article_id name available supplier/])->arrays;

But let's go to something even more wonderful, like a hash of hashes. The first level hash will have the article_id as key and the second level hashes will have column names as hashes (just like in the previous examples):

$article_hash = $my_dbs->select('articles')->map_hashes('article_id');

As you can see, any way you want to slice and dice your database data, DBIx::Simple will lend a helpful hand.

Quick'n'Dirty Updates

Quickly make all articles from the ACME supplier as unavailable, because he's temporarily unable to deliver? No problem:

$my_dbs->update('articles', { available => 0 }, { supplier => 'ACME'} );

The first hash represents column to set and the second columns on which to filter.

Quick'n'Dirty Cleanup

And finally, as a bonus, a method to quickly empty all tables in a MySQL database (hopefully, it's the test database and not the production one):

@table_list = $my_dbs->query('SHOW TABLES')->flat;
$my_dbs->delete($table) foreach my $table (@table_list);

That's it! But there are plenty more examples and informations available on CPAN.

Thursday, April 23, 2009

How to blog about Perl

In response to the Iron Man Challenge issued by the EPO, here's a new Perl blog. And some background info: I've been working in a Perl shop (for almost ten years), did mostly Web-based apps, yes, the old LAMP stack (where P stands for Perl not PHP!), held a couple of Perl trainings and I like the way Perl is going now. I also like beer.

So how can I blog about Perl? I've tried it before and all the blogs died a quiet death of neglect and lack of motivation. My main problem is a lack of interesting things to say. But, nevertheless, I'm willing to give it one more try. Not for the awards or forfeits, just to help with the Perl marketing. Or Perl evangelism. Or "spreading-the-word", whatever you want to call it.

The name is Perl Ideas, as in "I'm having them", or Perl Ideals, as in "I'm looking for them". Ideals found so far:

be flexible (a.k.a. TMTOWTDI) - but you already knew that
be modern - there's Moose, Catalyst, DBIx::Class and they're happening right now
be friendly - give something back if you can

That's all I found for now but don't worry, I'm still looking.