SQLite3 in php - some notes - Plum Island Media

This entry is part 1 of 6 in the series php features

SQLite3 in php — some notes
APCu in php — some notes
php’s microtime() function
In php, empty arrays are falsey, not truthy
php composer, WordPress, and plugins
php’s OPcache can get saturated

I’ve been working on the SQLIite Object Cache plugin for WordPress, so I’ve had to figure out a few things about using SQLite itself and php’s SQLite3 extension.

Here are some notes.

There are various versions of the SQLite software around. If you will run your code on multiple different hosting providers and server configurations, you need to take that into account.

And, beware when working out your queries. It’s possible for more than one SQLite version of query engine to access the same database at the same time. When I came to SQLite from the client/server SQL world, I wasn’t aware that was a thing. Maybe your php extension uses SQLite version 3.7, and your command-line sqlite3 client uses version 3.31. They can have the same file open, and the same time, but SQL things that work in one almost always work in the other, almost. It happened to me! So, working out queries in a SQLite command-line client then building them into php code doesn’t necessarily do what you want. Here are some details of differences.

Upsert

Version 3.24 was the first version with the upsert function. It’s like MariaDB’s ON DUPLICATE KEY UPDATE function, with slightly different syntax. But, some hosting providers’ php extensions offer older versions of SQLite, so we can’t rely on that function for portable software, but have to use it conditionally.

With SQLite’s flavor of UPSERT we can use SQL like this to upsert to a table by name.

INSERT INTO object_cache (name, value) VALUES ('hello', 'world' ) 
ON CONFLICT(name) DO UPDATE SET value=excluded.value;

Working without UPSERT requires a sequence of two queries, with some php application logic. We first try the update. Then, if the update changed zero rows (because the desired row wasn’t already in the table) we do the insert. And, the overall operation must be wrapped in a transaction. This simplified code shows how to handle both versions correctly.

				if ( version_compare( $sqlite_version, '3.24' ) >= 0 ) {
                    $sqlite->exec( 
                     "INSERT INTO object_cache (name, value) VALUES ('hello', 'world' ) 
                      ON CONFLICT(name) DO UPDATE SET value=excluded.value");
				} else {
					$sqlite->exec( 'BEGIN' );
					$sqlite->exec( "UPDATE object_cache SET value='world' WHERE name = 'hello'" );
					if ( 0 === $sqlite->changes() ) {
  					    $sqlite->exec(
                                 "INSERT INTO object_cache (name, value ) VALUES ('hello', 'world')" );
                    }
					$sqlite->exec( 'COMMIT' );
                 }

Index Names

Each index in a schema in SQLite must have a distinct name. In some other DBMSs, the scope of index names is the table. But in SQLite3 the scope is schema-wide. You can have multiple tables with identically named columns. But when you create indexes on those columns, give those indexes unique names.

Clustered Primary Keys

Version 3.8.2 was the first version to offer user-declared clustered primary keys, rather than always using ROWID. For my application, a simple name/value table, this is a clear performance win. (The bottleneck query patterns contain WHERE name='something', so a clustered PK on name is perfect. Earlier versions required a workaround to get a unique key. Here’s the code I used to make compatible tables.

				/* later versions of SQLite3 have clustered primary keys, "WITHOUT ROWID" */
				if ( version_compare( $sqlite_version, '3.8.2' ) < 0 ) {
					$t = "
						CREATE TABLE IF NOT EXISTS object_cache (
						   name TEXT NOT NULL COLLATE BINARY,
						   value BLOB,
						   expires INT
						);
						CREATE UNIQUE INDEX IF NOT EXISTS name ON object_cache (name);
						CREATE INDEX IF NOT EXISTS expires ON object_cache (expires)";
				} else {
					$t = "
						CREATE TABLE IF NOT EXISTS object_cache (
						   name TEXT NOT NULL PRIMARY KEY COLLATE BINARY,
						   value BLOB,
						   expires INT
						) WITHOUT ROWID;
						CREATE INDEX IF NOT EXISTS expires ON object_cache";
				}
				$this->exec( $t );

Exceptions

I should have enabled exceptions in place of old-school php warnings before I wrote my first line of SQLite code. I should have opened my database file with these lines of code.

			$sqlite = new SQLite3( 
                              'path/to/db.sqlite', 
                              SQLITE3_OPEN_READWRITE | SQLITE3_OPEN_CREATE, '' );
			$sqlite->enableExceptions( true );

Instead, I wasted a lot of programming time doing if ( false === $result ) testing of various function results. SQLite’s enabled exceptions work correctly, and I should have trusted them. Keep in mind that the extension throws a plain Exception item rather than some specifically subclassed exception. If you intersperse SQLite3 code with other code that throws its own exceptions, you’ll have to be careful with your catch clauses.

Timeouts

In my simple — but concurrently accessed — CRUD web application of SQLite, sometimes a data-manipulation operation took a really long time to finish. Most operations on small data values take something less than 100 microseconds. But a very rare operation took a half-second or more. (I don’t know why.) How rare? One or two among thousands of operations. You can see a summary of one set of operations here.

So I arbitrarily set the timeout to five seconds.

$sqlite->busyTimeout( 5000 );

My application has a scenario where it gets requests to upsert multiple rows of data. Sometimes these requests can implicate hundreds of rows or more. Now, it’s obviously good to wrap these multiple operations in BEGIN / COMMIT transactions to save time. In a concurrent system, though, we don’t want to hold write transactions open too long. So, every 32 rows I COMMIT the open transaction then BEGIN the next one. This is a common trick for database bulk loading. But, obviously, if your application requires ACID semantics for your multiple-row data manipulation operations, you’ll need just one transaction.

In my application, when I arranged to checkpoint the write log once in a while, the timeouts went away. The reason I had to checkpoint that log was that the .sqlite file and .sqlite-wal file both grew very large on a busy system. Force-checkpointing the write log kept the file sizes from getting ridiculously large. That probably makes queries and updates faster. (This is speculation, I don’t actually know how all that write-ahead-log of changed blocks works; if you know how I could instrument SQLite to tell me more, please write a comment, or point me to some advice.)

Network-attached storage

Some users have had intermittent trouble with timeouts. We are chasing the hypothesis that this is because their site files are on network-attached storage (NFS) rather than locally mounted HDD / SSD hardware. SQLite is well known to function incorrectly on network-attached storage, especially under high concurrent load, because it relies on advisory flock locking to help manage concurrency.

VACUUM

Avoid using the VACUUM operation if possible. It works by copying all data from an old .sqlite file to a new one. That prevents concurrent access.

close()

~~It looks like it’s OK to omit a call $sqlite3_object->close() at the end of a pageview. It seems that php automatically does the close operation upon script termination.~~

Well, that statement above couldn’t be more wrong. The correct statement is

Be sure to close() your Sqlite3 connection or you’lll get a file descriptor leak.

A user (Matt Jones, no relation, thanks Matt) reported, and gave me a PR, that my application was leaving file descriptors open in a busy php_fpm setup where the php worker processes handle many requests, and are used in a pool. The file descriptors built up until the process ran out.

Matt fixed the problem with a __destruct{} method on the class handling the connection.

DELETE … LIMIT

The SQLite database software has a compilation option without which statements of the form

DELETE FROM some_table WHERE something LIMIT number

and similar UPDATE statements with LIMIT clauses don’t work. So it’s necessary to use a query shape like this to do that sort of thing.

DELETE FROM some_table 
 WHERE primary_key IN
      (SELECT primary_key FROM some_table WHERE something LIMIT number)

Beware: some but not all SQLite3 binaries have this compilation option set. I happened to test code against several binaries. By chance they all had this option set, and I didn’t know about the limitation.

Checkpointing the write-ahead log

If you have a busy site, you may have many php instances using your SQLite3 database at the same time. SQLite cannot checkpoint its write-ahead log until all instances close it. The write-ahead log keeps growing unless it is checkpointed. On a busy site it could grow very large, if some instance always has it open.

Checkpointing the write-ahead log copies all changes in that log back to the main database, then resets the log file so new information is written to its beginning.

There’s a PRAGMA to block new connections and wait for existing ones to close, and then do the checkpointing. It’s inexpensive on quiet systems.

Do PRAGMA wal_checkpoint(RESTART) when cleaning up, and also occasionally, to prevent the write-ahead log from growing without bound on busy systems. This helps prevent slowdowns and crashes, and has been proven out in production on a few thousand sites.

Using a RAM Disk

I did a quick test of putting the .sqlite files on /dev/shm on a big server. It didn’t help performance noticeably.