PHP and SQLite FTS4: How to process the matchinfo function part II

Understanding the output from the matchinfo() function

Part I: Reading the binary output from the matchinfo() function

To analyze what the output of the matchinfo() function means, we’ll use a different example dataset than the one from the SQLite FTS4 documentation. Since I’m a bird photographer, let’s create a virtual FTS4 table holding information about photos of birds. It consists of the five columns id, title, description, species and common name (note: id is the image id and not the FTS4 rowid). Then we fill the table with four records making up our example data:

// create the example database
try {
  $db = new PDO('sqlite:example.sqlite');
} catch (PDOException $error) {
  echo $error->getMessage();
}

// create a virtual fts4 table and populate it with example data
try {
  $db->exec("CREATE VIRTUAL TABLE images USING fts4(imgId, title, description, species, speciesEn);
    INSERT INTO images VALUES(1, 'Great Spotted Woodpecker', 'A great spotted woodpecker with a caterpillar in its beak.', 'Dendrocopos major', 'Great Spotted woodpecker');
    INSERT INTO images VALUES(2, 'Woodpecker at the pond', 'A green woodpecker drinks water.', 'Picus viridis', 'Green Woodpecker');
    INSERT INTO images VALUES(3, 'Woodpecker', 'A middle spotted woodpecker is looking for food on an oak tree.', 'Dendrocopos medius', 'Middle Spotted Woodpecker');
    INSERT INTO images VALUES(4, 'Woodpecker', 'A lesser yellownape showing its green wings.', 'Picus chlorolophus', 'Lesser Yellownape');");
} catch (PDOException $error) {
  echo $error->getMessage().'<br>';
}

idNext we perform a FTS4 SELECT query to search for green woodpecker using the matchinfo() function. We also use a helper function to format the output visually in groups of three integers (for the sake of this example). Note that the query uses an implicit AND operator. If explicit AND queries are supported (e.g. green AND woodpecker) depends on whether enhanced query syntax is available or not. Wrap words in double quotes to search for the whole phrase, e.g “green woodpecker”.

// use matchinfo when searching for green woodpecker using an implicit AND operator
$data = $db->query("SELECT imgId, MATCHINFO(images) info FROM images WHERE images MATCH 'green woodpecker'");

// convert the binary output to integers and format integers in groups of three
while ($row = $data->fetch(PDO::FETCH_ASSOC)) {
  // matchinfo returns a blob of 32-bit unsigned integers in machine byte-order
  $arrInt32 = unpack('L*', $row['info']);
  echo 'row '.$row['imgId'].': '.formatOutput($arrInt32).'<br>';
}

/**
 * Format the array of integers with spaces between according to the example in the SQLite FTS documentation
 * @param array $arrInt32
 */
function formatOutput($arrInt32)
{
  $str = '' ;
  foreach ($arrInt32 as $i => $int) {
    $str.= ($i % 3 === 0 ? ' ' : '').$int;
  }
  
  return $str;
}

Our FTS4 search query will match records with id 2 and id 4. These are the only two rows of our table, where both words green and woodpecker occur. The converted and formatted output of the matchinfo function looks like the following:

row 2: 25 000 000 232 000 111 000 144 133 000 133
row 4: 25 000 000 132 000 011 000 144 033 000 033

The first two integers are simply the total number of search words of the query (green + woodpecker = 2) and the total number of columns in the table (id + title + description + species + speciesEn = 5).

Next, follow ten groups of three integers each. Each group represents a combination of the search words with a column (e.g. 2 * 5 = 10).

The first integer in the group gives the number of times the word (or phrase) matched within the content of the column of the current row. The second gives the number of times the word (or phrase) matched in the column of all rows in the table. The last number gives the number of rows where the word (or phrase) matched at least once.

For example, let’s look at the third group of integers from the output of matchinfo for row 2, e.g. 232. It represents the combination of the word green by the table column description:

FTS4
green woodpecker
id title description species speciesEn id title description species speciesEn
000 000 232 000 111 000 144 133 000 133
000 000 132 000 011 000144 033 000 033
Matrix of the search words green woodpecker by table columns as returned by the matchinfo function. See text to understand the meaning of the color coding

The word green matched in the column description twice within the content of the current row, a total of three times in all rows and it matched two rows overall:

idtitledescriptionspeciesspeciesEn
1Great Spotted WoodpeckerA great spotted woodpecker drinks water.Dendrocopos majorGreat Spotted Woodpecker
2A Woodpecker at the pondA green woodpecker with a green caterpillar in its beak.Picus viridisGreen Woodpecker
3WoodpeckerA middle spotted woodpecker is looking for food on an oak tree.Dendrocopos mediusMiddle Spotted Woodpecker
4WoodpeckerA lesser yellownape showing its green wings.Picus chlorolophusLesser Yellownape
Tabular representation of our example data in the virtual table images containing four records about photos with birds. The color coding shows the different number of times the word green matched within the column description of the search query green woodpecker.

To complete the second part of this series, we will interpret another integer triplet: integers 144 of the seventh group representing woodpecker by title of either the first or second result of the search query. The word woodpecker matched in the column title once within the content of the current row, four times in the content of all rows and it matched within four rows overall.

PHP and SQLite FTS4: How to process the matchinfo function part I

Reading the binary output from the matchinfo() function

SQLite’s MATCHINFO() function provides metrics that are useful for filtering or sorting results of a query according to relevance. The function returns a binary string representing a variable number of 32-bit unsigned integers in machine byte-order. It can be read into an array of integers using the PHP unpack() function passing 'L*' as the format parameter.

Continue reading “PHP and SQLite FTS4: How to process the matchinfo function part I”

SQLite FTS4: Standard vs Enhanced Query Syntax

The SQLite documentation talks about two ways to perform full text queries: standard or enhanced query syntax. Be aware: which one is available depends on how SQLite was compiled. You might, like me, run into troubles when developing locally using one syntax and then publishing remotely only to find out your query does not return anything. You can check which version is supported by your server by querying the PRAGMA compile_options, e.g. for PHP:

Continue reading “SQLite FTS4: Standard vs Enhanced Query Syntax”

Minify CSS with PhpStorm automatically

Installation guide for PhpStorm 2019.3.1 to minify CSS files automatically after editing using the command line interface for CSSO as a file watcher. You could also use yuicompressor instead, but version 2.4.8 chokes on CSS @keyframes rules.

Continue reading “Minify CSS with PhpStorm automatically”

BIOS update with Linux Mint

Updating your BIOS on Linux Mint 19 can be as easy as on Windows. If your computer is listed on the Linux Vendor Firmware Service, then all you have to do is:

~ $ sudo apt install fwupd
~ $ fwupdmgr get-updates
~ $ fwupdmgr update

The firmware-manager might even get included in the next version of Linux Mint 20.

Linux Mint 19.1: Lightning Calendar in English, but dates in the 24-hour time format

I like having the user interface of my Linux Mint 19.1 in English, but the regional settings such as numbers and currency in a different language such as Swiss German.

Screenshot of the Language Settings dialog in Linux Mint 19.1
Continue reading “Linux Mint 19.1: Lightning Calendar in English, but dates in the 24-hour time format”

VirtualBox 6: How to enable symlinks for shared folders

For security reasons, creating symbolic links in a shared folder is disabled in the guest OS (ticket 10085 and manual 5.3 Shared Folders). If you trust your Linux guest OS, you can enable symlinking from the host OS with the following command:

~ $ VBoxManage setextradata "VM_Name" VBoxInternal2/SharedFoldersEnableSymlinksCreate/Folder_Name 1

Replace VM_Name and Folder_Name with your own names. To get a list of names of your virtual machines, execute in the host console:

~ $ VBoxManage list vms

You can also list information about a specific machine with:

~ $ VBoxManage showvminfo "VM_Name"

Now you can start your Linux guest OS and create the symbolic link, in my case  I wanted to point my shared folder to the apache directory /var/www/html, e.g.

~ $ sudo ln -s /media/sf_Websites /var/www/html

The search for the Okinawa Rail

The Okinawa Rail is a flightless bird endemic to the Yambaru region in the north of Okinawa. The Japanese call it Yambaru Kuina, and though stylized versions of it are ubiquitous on packaging, advertising or road signs, most people have never actually seen it.

photo of street sign with a stylized Okinawa rail photo of a warning sign depicting an Okinawa rail
Okinawa rail on street signs.
Continue reading “The search for the Okinawa Rail”

How to install ExifTool with Composer

I’m working on a PHP project, that uses Phil Harveys excellent ExifTool to read Exif and XMP tags from photos. Since ExifTool is written in Perl, there is of course no composer.json to install this dependency directly with Composer. But you can create your own package information in your master composer.json as follows:

{
    ...
    "repositories": [
        ...        
        {
            "type": "package",
            "package": {
                "name": "philharvey/exiftool",
                "version": "10.75",
                "dist": {
                    "url": "https://sno.phy.queensu.ca/~phil/exiftool/Image-ExifTool-10.75.tar.gz",
                    "type": "tar"
                }
            }
        }
    ],
    "require": {
        ...
        "philharvey/exiftool": "10.*"
    }
}

This will automatically download ExifTool version 10.75 and extract it into the vendor folder /vendor/philharvey/exiftool/

Note, that the package type ‘tar’ should extract the file completely. If that is not the case, make sure you have the latest composer version 1.6.2 installed. Some Linux distributions (e.g. Linux Mint 18.3 and probably Ubuntu 16.4 too) are still on version 1.0.0-beta2, which does not support extracting ‘tar.gz’. Read my quick tutorial on how to upgrade composer on Linux Mint 18.3.

How to update Composer on Linux Mint 18.3

When you install Composer (the dependency manager for PHP) globally on Linux Mint 18.3 with the package manager…

$ sudo apt install composer
$ composer -V
Composer version @package_branch_alias_version@ (1.0.0-beta2) 2016-03-27 16:00:34

…you’ll get the totally outdated version 1.0.0-beta2. The problem is that this version does not support extracting compressed package files ‘tar.gz’. It also does not support updating itself with:

$ composer selfupdate

So, you need to download the latest composer version (currently 1.6.2) manually and then put it into the directory /usr/local/bin/composer (which overrides /usr/bin/composer)

$ sudo mv composer.phar /usr/local/bin/composer
$ chmod +x /usr/local/bin/composer

now login – logout to apply the changes, voila:

$ composer -V
Composer version 1.6.2 2018-01-05 15:28:41