{"id":1822,"date":"2020-08-30T11:47:47","date_gmt":"2020-08-30T09:47:47","guid":{"rendered":"https:\/\/www.speich.net\/articles\/?p=1822"},"modified":"2021-08-21T15:37:22","modified_gmt":"2021-08-21T13:37:22","slug":"php-and-sqlite-fts4-how-to-process-the-matchinfo-function-2","status":"publish","type":"post","link":"https:\/\/www.speich.net\/articles\/en\/2020\/08\/30\/php-and-sqlite-fts4-how-to-process-the-matchinfo-function-2\/","title":{"rendered":"PHP and SQLite FTS4: How to process the matchinfo function part II"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Understanding the output from the matchinfo() function<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/www.speich.net\/articles\/en\/2020\/07\/04\/php-and-sqlite-fts4-how-to-process-the-matchinfo-function\/\">Part I: Reading the binary output from the matchinfo() function<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To analyze what the output of the matchinfo() function means, we&#8217;ll use a different example dataset than the one from the SQLite FTS4 documentation. Since I&#8217;m a <a href=\"https:\/\/www.speich.net\/photo\/photodb\/photo-en.php?theme=1\">bird photographer<\/a>, let&#8217;s create a virtual FTS4 table holding information about photos of birds. It consists of the five columns id, title, description, species and common name (note: id is the image id and not the FTS4 rowid). Then we fill the table with four records making up our example data:<\/p>\n\n\n\n<pre><code class=\"language-php\">\/\/ create the example database\ntry {\n  $db = new PDO('sqlite:example.sqlite');\n} catch (PDOException $error) {\n  echo $error-&gt;getMessage();\n}\n\n\/\/ create a virtual fts4 table and populate it with example data\ntry {\n  $db-&gt;exec(\"CREATE VIRTUAL TABLE images USING fts4(imgId, title, description, species, speciesEn);\n    INSERT INTO images VALUES(1, 'Great Spotted Woodpecker', 'A great spotted woodpecker with a caterpillar in its beak.', 'Dendrocopos major', 'Great Spotted woodpecker');\n    INSERT INTO images VALUES(2, 'Woodpecker at the pond', 'A green woodpecker drinks water.', 'Picus viridis', 'Green Woodpecker');\n    INSERT INTO images VALUES(3, 'Woodpecker', 'A middle spotted woodpecker is looking for food on an oak tree.', 'Dendrocopos medius', 'Middle Spotted Woodpecker');\n    INSERT INTO images VALUES(4, 'Woodpecker', 'A lesser yellownape showing its green wings.', 'Picus chlorolophus', 'Lesser Yellownape');\");\n} catch (PDOException $error) {\n  echo $error-&gt;getMessage().'&lt;br&gt;';\n}<\/code><\/pre>\n\n\n\n<!--more-->\n\n\n\n<p class=\"wp-block-paragraph\">Next we perform a FTS4 SELECT query to search for <em>green woodpecker<\/em> using the <code>matchinfo()<\/code> function. We also use a helper function to format the output visually in groups of three integers (for the sake of this example). Note that the query uses an implicit AND operator. If explicit AND queries are supported (e.g. <em>green AND woodpecker<\/em>) depends on whether <a href=\"https:\/\/www.speich.net\/articles\/en\/2020\/07\/03\/sqlite-fts4-standard-vs-enhanced-query-syntax\/\">enhanced query syntax<\/a> is available or not. Wrap words in double quotes to search for the whole phrase, e.g <em>&#8220;green woodpecker&#8221;<\/em>.<\/p>\n\n\n\n<pre><code class=\"language-php\">\/\/ use matchinfo when searching for green woodpecker using an implicit AND operator\n$data = $db-&gt;query(\"SELECT imgId, MATCHINFO(images) info FROM images WHERE images MATCH 'green woodpecker'\");\n\n\/\/ convert the binary output to integers and format integers in groups of three\nwhile ($row = $data-&gt;fetch(PDO::<em>FETCH_ASSOC<\/em>)) {\n  \/\/ matchinfo returns a blob of 32-bit unsigned integers in machine byte-order\n  $arrInt32 = <em>unpack<\/em>('L*', $row['info']);\n  echo 'row '.$row['imgId'].': '.formatOutput($arrInt32).'&lt;br&gt;';\n}\n\n\/**\n * Format the array of integers with spaces between according to the example in the SQLite FTS documentation\n * @param array $arrInt32\n *\/\nfunction formatOutput($arrInt32)\n{\n  $str = '' ;\n  foreach ($arrInt32 as $i =&gt; $int) {\n    $str.= ($i % 3 === 0 ? ' ' : '').$int;\n  }\n  \n  return $str;\n}<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Our FTS4 search query will match records with id 2 and id 4. These are the only two rows of our table, where both words <em>green<\/em> and <em>woodpecker<\/em> occur. The converted and formatted output of the matchinfo function looks like the following:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">row 2: <code class=\"language-php\">25 000 000 232 000 111 000 144 133 000 133<\/code><br>row 4: <code class=\"language-php\">25 000 000 132 000 011 000 144 033 000 033<\/code><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The first two integers are simply the total number of search words of the query (<em>green<\/em> + <em>woodpecker<\/em> = 2)  and the total number of columns in the table (<em>id<\/em> + <em>title<\/em> + <em>description<\/em> + <em>species<\/em> + <em>speciesEn<\/em> = 5).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Next, follow ten groups of three integers each. Each group represents a combination of the search words with a column (e.g. 2 * 5 = 10).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The first integer in the group gives the number of times the word (or phrase) matched within the content of the column of the current row. The second gives the number of times the word (or phrase) matched in the column of all rows in the table. The last number gives the number of rows where the word (or phrase) matched at least once.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For example, let&#8217;s look at the third group of integers from the output of matchinfo for row 2, e.g. <code class=\"language-php\">232<\/code>. It represents the combination of the word <em>green<\/em> by the table column <em>description<\/em>:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table>\n            <thead>\n            <tr>\n                <th colspan=\"5\">green<\/th>\n                <th colspan=\"5\">woodpecker<\/th>\n            <\/tr>\n            <tr>\n                <th>id<\/th>\n                <th>title<\/th>\n                <th>description<\/th>\n                <th>species<\/th>\n                <th>speciesEn<\/th>\n                <th>id<\/th>\n                <th>title<\/th>\n                <th>description<\/th>\n                <th>species<\/th>\n                <th>speciesEn<\/th>\n            <\/tr>\n            <\/thead>\n            <tbody>\n            <tr>\n                <td>000<\/td>\n                <td>000<\/td>\n                <td><span class=\"ellipse color1\">2<\/span><span class=\"ellipse color2\">3<\/span><span class=\"ellipse color3\">2<\/span><\/td>\n                <td>000<\/td>\n                <td>111<\/td>\n                <td>000<\/td>\n                <td>144<\/td>\n                <td>133<\/td>\n                <td>000<\/td>\n                <td>133<\/td>\n            <\/tr>\n            <tr>\n                <td>000<\/td>\n                <td>000<\/td>\n                <td>132<\/td>\n                <td>000<\/td>\n                <td>011<\/td>\n                <td>000<\/td>FTS4\n                <td>144<\/td>\n                <td>033<\/td>\n                <td>000<\/td>\n                <td>033<\/td>\n            <\/tr>\n            <\/tbody>\n        <\/table><figcaption>Matrix of the search words <em>green woodpecker<\/em> by table columns as returned by the matchinfo function. See text to understand the meaning of the color coding<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The word <em>green<\/em> matched in the column <em>description<\/em> <span class=\"color1\"><strong>twice<\/strong><\/span> within the content of the current row, a total of <span class=\"color2\"><strong>three<\/strong><\/span> times in all rows and it matched <span class=\"color3\"><strong>two<\/strong><\/span> rows overall:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>id<\/th><th>title<\/th><th>description<\/th><th>species<\/th><th>speciesEn<\/th><\/tr><\/thead><tbody><tr><td>1<\/td><td>Great Spotted Woodpecker<\/td><td>A great spotted woodpecker drinks water.<\/td><td>Dendrocopos major<\/td><td>Great Spotted Woodpecker<\/td><\/tr><tr><td>2<\/td><td>A Woodpecker at the pond<\/td><td style=\"border: 0.2em solid #ffdb0b;\">A <span class=\"ellipse color2\"><span class=\"ellipse color1\">green<\/span><\/span> woodpecker with a <span class=\"ellipse color2\"><span class=\"ellipse color1\">green<\/span><\/span> caterpillar in its beak.<\/td><td>Picus viridis<\/td><td>Green Woodpecker<\/td><\/tr><tr><td>3<\/td><td>Woodpecker<\/td><td>A middle spotted woodpecker is looking for food on an oak tree.<\/td><td>Dendrocopos medius<\/td><td>Middle Spotted Woodpecker<\/td><\/tr><tr><td>4<\/td><td>Woodpecker<\/td><td style=\"border: 0.2em solid #ffdb0b;\">A lesser yellownape showing its <span class=\"ellipse color2\">green<\/span> wings.<\/td><td>Picus chlorolophus<\/td><td>Lesser Yellownape<\/td><\/tr><\/tbody><\/table><figcaption>Tabular representation of our example data in the virtual table images containing four records about photos with birds. The color coding shows the different number of times the word <em>green<\/em> matched within the column <em>description<\/em> of the search query <em>green woodpecker<\/em>.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">To complete the second part of this series, we will interpret another integer triplet: integers <code class=\"language-php\">144<\/code> of the seventh group representing <em>woodpecker<\/em> by <em>title<\/em> of either the first or second result of the search query. The word <em>woodpecker<\/em> matched in the column <em>title<\/em> once within the content of the current row, four times in the content of all rows and it matched within four rows overall.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Understanding the output from the matchinfo() function Part I: Reading the binary output from the matchinfo() function To analyze what the output of the matchinfo() function means, we&#8217;ll use a different example dataset than the one from the SQLite FTS4 documentation. Since I&#8217;m a bird photographer, let&#8217;s create a virtual FTS4 table holding information about &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.speich.net\/articles\/en\/2020\/08\/30\/php-and-sqlite-fts4-how-to-process-the-matchinfo-function-2\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;PHP and SQLite FTS4: How to process the matchinfo function part II&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[285,284,73,82],"class_list":["post-1822","post","type-post","status-publish","format-standard","hentry","category-php","tag-fts3","tag-fts4","tag-php","tag-sqlite","entry"],"_links":{"self":[{"href":"https:\/\/www.speich.net\/articles\/wp-json\/wp\/v2\/posts\/1822","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.speich.net\/articles\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.speich.net\/articles\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.speich.net\/articles\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.speich.net\/articles\/wp-json\/wp\/v2\/comments?post=1822"}],"version-history":[{"count":83,"href":"https:\/\/www.speich.net\/articles\/wp-json\/wp\/v2\/posts\/1822\/revisions"}],"predecessor-version":[{"id":2028,"href":"https:\/\/www.speich.net\/articles\/wp-json\/wp\/v2\/posts\/1822\/revisions\/2028"}],"wp:attachment":[{"href":"https:\/\/www.speich.net\/articles\/wp-json\/wp\/v2\/media?parent=1822"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.speich.net\/articles\/wp-json\/wp\/v2\/categories?post=1822"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.speich.net\/articles\/wp-json\/wp\/v2\/tags?post=1822"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}