- SQLite3 in php — some notes
- APCu in php — some notes
I’ve been working on the SQLIite Object Cache plugin for WordPress, so I’ve had to figure out a few things about using SQLite itself and php’s SQLite3 extension.
Here are some notes.
There are various versions of the SQLite software around. If you will run your code on multiple different hosting providers and server configurations, you need to take that into account.
And, beware when working out your queries. It’s possible for more than one SQLite version of query engine to access the same database at the same time. When I came to SQLite from the client/server SQL world, I wasn’t aware that was a thing. Maybe your php extension uses SQLite version 3.7, and your command-line sqlite3
client uses version 3.31. They can have the same file open, and the same time, but SQL things that work in one almost always work in the other, almost. It happened to me! So, working out queries in a SQLite command-line client then building them into php code doesn’t necessarily do what you want. Here are some details of differences.
Upsert
Version 3.24 was the first version with the upsert function. It’s like MariaDB’s ON DUPLICATE KEY UPDATE
function, with slightly different syntax. But, some hosting providers’ php extensions offer older versions of SQLite, so we can’t rely on that function for portable software, but have to use it conditionally.
With SQLite’s flavor of UPSERT we can use SQL like this to upsert to a table by name
.
INSERT INTO object_cache (name, value) VALUES ('hello', 'world' )
ON CONFLICT(name) DO UPDATE SET value=excluded.value;
Working without UPSERT requires a sequence of two queries, with some php application logic. We first try the update. Then, if the update changed zero rows (because the desired row wasn’t already in the table) we do the insert. And, the overall operation must be wrapped in a transaction. This simplified code shows how to handle both versions correctly.
if ( version_compare( $sqlite_version, '3.24' ) >= 0 ) {
$sqlite->exec(
"INSERT INTO object_cache (name, value) VALUES ('hello', 'world' )
ON CONFLICT(name) DO UPDATE SET value=excluded.value");
} else {
$sqlite->exec( 'BEGIN' );
$sqlite->exec( "UPDATE object_cache SET value='world' WHERE name = 'hello'" );
if ( 0 === $sqlite->changes() ) {
$sqlite->exec(
"INSERT INTO object_cache (name, value ) VALUES ('hello', 'world')" );
}
$sqlite->exec( 'COMMIT' );
}
Clustered Primary Keys
Version 3.8.2 was the first version to offer user-declared clustered primary keys, rather than always using ROWID. For my application, a simple name/value table, this is a clear performance win. (The bottleneck query patterns contain WHERE name='something'
, so a clustered PK on name
is perfect. Earlier versions required a workaround to get a unique key. Here’s the code I used to make compatible tables.
/* later versions of SQLite3 have clustered primary keys, "WITHOUT ROWID" */
if ( version_compare( $sqlite_version, '3.8.2' ) < 0 ) {
$t = "
CREATE TABLE IF NOT EXISTS object_cache (
name TEXT NOT NULL COLLATE BINARY,
value BLOB,
expires INT
);
CREATE UNIQUE INDEX IF NOT EXISTS name ON object_cache (name);
CREATE INDEX IF NOT EXISTS expires ON object_cache (expires)";
} else {
$t = "
CREATE TABLE IF NOT EXISTS object_cache (
name TEXT NOT NULL PRIMARY KEY COLLATE BINARY,
value BLOB,
expires INT
) WITHOUT ROWID;
CREATE INDEX IF NOT EXISTS expires ON object_cache";
}
$this->exec( $t );
Exceptions
I should have enabled exceptions in place of old-school php warnings before I wrote my first line of SQLite code. I should have opened my database file with these lines of code.
$sqlite = new SQLite3(
'path/to/db.sqlite',
SQLITE3_OPEN_READWRITE | SQLITE3_OPEN_CREATE, '' );
$sqlite->enableExceptions( true );
Instead, I wasted a lot of programming time doing if ( false === $result )
testing of various function results. SQLite’s enabled exceptions work correctly, and I should have trusted them. Keep in mind that the extension throws a plain Exception
item rather than some specifically subclassed exception. If you intersperse SQLite3 code with other code that throws its own exceptions, you’ll have to be careful with your catch clauses.
Timeouts
In my simple — but concurrently accessed — CRUD web application of SQLite, sometimes a data-manipulation operation took a really long time to finish. Most operations on small data values take something less than 100 microseconds. But a very rare operation took a half-second or more. (I don’t know why.) How rare? One or two among thousands of operations. You can see a summary of one set of operations here.
So I arbitrarily set the timeout to five seconds.
$sqlite->busyTimeout( 5000 );
My application has a scenario where it gets requests to upsert multiple rows of data. Sometimes these requests can implicate hundreds of rows or more. Now, it’s obviously good to wrap these multiple operations in BEGIN
/ COMMIT
transactions to save time. In a concurrent system, though, we don’t want to hold write transactions open too long. So, every 32 rows I COMMIT
the open transaction then BEGIN
the next one. This is a common trick for database bulk loading. But, obviously, if your application requires ACID semantics for your multiple-row data manipulation operations, you’ll need just one transaction.
In my application, when I arranged to checkpoint the write log once in a while, the timeouts went away. The reason I had to checkpoint that log was that the .sqlite file and .sqlite-wal file both grew very large on a busy system. Force-checkpointing the write log kept the file sizes from getting ridiculously large. That probably makes queries and updates faster. (This is speculation, I don’t actually know how all that write-ahead-log of changed blocks works; if you know how I could instrument SQLite to tell me more, please write a comment, or point me to some advice.)
Network-attached storage
Some users have had intermittent trouble with timeouts. We are chasing the hypothesis that this is because their site files are on network-attached storage (NFS) rather than locally mounted HDD / SSD hardware. SQLite is well known to function incorrectly on network-attached storage, especially under high concurrent load, because it relies on advisory flock locking to help manage concurrency.
VACUUM
Avoid using the VACUUM
operation if possible. It works by copying all data from an old .sqlite file to a new one. That prevents concurrent access.
close()
It looks like it’s OK to omit a call $sqlite3_object->close()
at the end of a pageview. It seems that php automatically does the close operation upon script termination.
Well, that statement above couldn’t be more wrong. The correct statement is
Be sure to close() your Sqlite3 connection or you’lll get a file descriptor leak.
A user (Matt Jones, no relation, thanks Matt) reported, and gave me a PR, that my application was leaving file descriptors open in a busy php_fpm setup where the php worker processes handle many requests, and are used in a pool. The file descriptors built up until the process ran out.
Matt fixed the problem with a __destruct{}
method on the class handling the connection.
DELETE … LIMIT
The SQLite database software has a compilation option without which statements of the form
DELETE FROM some_table WHERE something LIMIT number
and similar UPDATE statements with LIMIT clauses don’t work. So it’s necessary to use a query shape like this to do that sort of thing.
DELETE FROM some_table
WHERE primary_key IN
(SELECT primary_key FROM some_table WHERE something LIMIT number)
Beware: some but not all SQLite3 binaries have this compilation option set. I happened to test code against several binaries. By chance they all had this option set, and I didn’t know about the limitation.
Checkpointing the write-ahead log
If you have a busy site, you may have many php instances using your SQLite3 database at the same time. SQLite cannot checkpoint its write-ahead log until all instances close it. The write-ahead log keeps growing unless it is checkpointed. On a busy site it could grow very large, if some instance always has it open.
Checkpointing the write-ahead log copies all changes in that log back to the main database, then resets the log file so new information is written to its beginning.
There’s a PRAGMA to block new connections and wait for existing ones to close, and then do the checkpointing. It’s inexpensive on quiet systems.
Do PRAGMA wal_checkpoint(RESTART) when cleaning up, and also occasionally, to prevent the write-ahead log from growing without bound on busy systems. This helps prevent slowdowns and crashes, and has been proven out in production on a few thousand sites.
Using a RAM Disk
I did a quick test of putting the .sqlite files on /dev/shm
on a big server. It didn’t help performance noticeably.