This course is currently in draft.

Introduction

This course is for implementers and solution designers and covers Funnelback implementation and design when integrating with Squiz Matrix.

Implementers should complete the FUNL201-203 courses and solution designers should complete FUNL301 before attempting this course.

Special tool tips or hints will appear throughout the exercises as well, which will provide extra knowledge or tips. They will look like:

This box is used to provide links to further reading available on the current topic.
This box is used to provide tips or hints related to the current topic.
This box is used to provide important advice relating to the current topic. This includes advice to be aware of to avoid common errors.

What this workshop will cover:

  • Designing Squiz Matrix for Funnelback

  • Crawling Squiz Matrix

  • Using Squiz Matrix with push collections

  • Search results integration

  • Auto-completion integration

  • Recommendations

  • Debugging the user interface

  • Using Funnelback as a data source

Prerequisites to completing the course:

  • FUNL201, FUNL202, FUNL203, FUNL301

1. Designing Squiz Matrix for Funnelback

This section looks at the things that should be considered during the requirements planning and design phases of any Squiz Matrix site design to plan and prepare the site for integration with Funnelback.

Some of these items can be retro-fitted to existing implementations and others require consideration during the design phase.

1.1. Control what is indexed

1.1.1. Robots.txt

Use robots.txt to prevent well behaved crawlers from accessing complete sections of a site, based on the URL path. Funnelback will honour robots.txt directives.

When configuring robots.txt always disallow the search results pages and consider also consider preventing access to any pages where Funnelback is being used to generate the page content (such as document/file browse sections).

The Funnelback user agent can be used to create Funnelback specific rules. E.g. The following robots.txt prevents web robots from crawling the /login and /search content but allows access to Funnelback.

User-agent: *
Disallow: /search
Disallow: /login

User-agent: FunnelBack
Disallow:

Note: that robots.txt only allows wildcards to be present in the User-agent directive and that there is no Allow: directive (only Disallow:).

Tutorial: Create a robots.txt
  1. Follow the steps outlined in the article below to set up a robots.txt. The rules should prevent access to the search results page for all user agents and allow the Funnelback user agent access to some content that is disallowed for other user agents.

  2. Crawl and index the site and observe that Funnelback honours the rules defined in the robots.txt.

1.1.2. Robots meta tags

Page level robots directives can be defined within the <head> of a HTML document.

These directives take the form of html <meta> elements and allow page-level control of well-behaved robots.

Funnelback will honour the following robots meta tags:

  • index, follow: this tells the web crawler to index the page and also follow any links that are found on the page. This is the default behaviour.

  • index, nofollow: this tells the web crawler to index the page but to not follow any links that are found on the page. This only applies to links found on this page, and doesn’t prevent the link being indexed if it’s found somewhere else on the site.

  • noindex, follow: this tells the web crawler to not include the page in the search index, but to still extract and follow any links that are found within the page. This is good if you wish to exclude home page(s) from a crawl as you need to crawl through the page to get to child pages.

  • noindex, nofollow: this tells the web crawler to not include the page in the index and also to not follow any links from this page. This has the same end effect of having the page listed in the robots.txt file, but is controllable at the page level.

A nofollow tag will only mean that Funnelback doesn’t add the links found on the page to the crawl frontier. If the link is found elsewhere without any form of exclusion then it will still appear in the index.

The following directives can be added to the (no)index/(no)follow directives to provide further control of crawler behaviour.

  • noarchive: this tells supporting web crawlers to index the page but to not include the document in the cache (so cache links won’t work)

  • nosnippet: this tells supporting web crawlers to index the page but to not return a snippet when rendering results. This also prevents the caching of the page (as defined by the noarchive directive above).

there are a number of additional robots meta tags that have been defined by third parties such as Google (such as noimageindex and nocache). Please note that these tags are not currently supported by Funnelback.
Tutorial: Prevent a home page from appearing in the search index
  1. Add a meta robots noindex,follow tag to your site home page so that the site is indexed but the home page is excluded from the search results.

  2. Crawl and index the site and observe that the home page is automatically excluded from the search results.

1.1.3. HTML rel="nofollow"

The HTML anchor (A) tag includes an attribute that can be used to prevent a web crawler from following a link. This is the same as the robots meta nofollow tag, but applies only to the link on which it is set.

To prevent Funnelback from following a specific link on a page add a rel attribute to the anchor tag. E.g. don’t following this link: <a href="mylink.html" rel="nofollow" />

A nofollow attribute will only mean that Funnelback doesn’t add the specific links to the crawl frontier. If the link is found elsewhere without any form of exclusion then it will still appear in the index.
  1. Create a new standard page that doesn’t display in the site navigation (i.e. type 2 linking)

  2. Add a link to this page from your home page and include add a rel attribute on the link to prevent it from being indexed.

  3. Crawl the site and observe that the new page is excluded from the search index.

1.1.4. noindex tags

Controlling which parts of a web page are considered for search results relevance is a very important and simple process that can be applied to both existing and forthcoming Squiz Matrix sites.

Noindex tags are HTML comment tags that should be included in site templates (and wherever else is appropriate). Areas of the page code that don’t contain page content should be excluded from consideration by the indexer. This means that when indexing the page only the relevant content is included in the index.

This has a few benefits. The most obvious one is that search result relevance will immediately improve due to the removal of a lot of noise from the search results. For example a search for contact information won’t potentially return every page on a site just because contact us happens to appear in the site navigation.

A secondary benefit is that the search result summaries will become much more relevant as well as snippet text will only include the indexed content.

Applying noindex tags is as simple as adding <!-- noindex --> and <!-- endnoindex --> comment tags to your site templates. e.g.

<body>
 ... This section is indexed ...
 <!--noindex-->
 ... Text in this section is not indexed, but the link graph information is recorded by Funnelback for ranking purposes ...
 <!--endnoindex-->
 ... This section is indexed ...
 </body>

The idea is that you put noindex tags around all templated site navigation, headers and footers. This prevents Funnelback returning every page in response to the queries such as about and contact and also ensures that navigation and headers are excluded from contextual search summaries.

Noindex tags do not have to be specified in matching noindex/endnoindex pairs - the document is parsed from top to bottom and indexing switches whenever a tag is encountered.

If you’ve used any <!-- noindex tags -->, don’t forget to put a <!-- endnoindex --> before the start of any content otherwise Funnelback will have nothing to index.

There are some similar Google-specific tags that provide equivalent noindex functionality.

Funnelback sees the following as aliases for the noindex/endnoindex tags:

Funnelback native tag Google equivalent 1 Google equivalent 2

<!-- noindex -->

<!-- googleoff: index -->

<!-- googleoff: all -->

the Googleoff/on: anchor and snippet tags are not supported by Funnelback.
Tutorial: Hide parts of a page from Funnelback
  1. Index your site using Funnelback and observe the search results when you search for something in your site navigation.

  2. Modify your site template to wrap the header,footer and site navigation in noindex tags

  3. Clear the Matrix cache

  4. Run a full update of the search index and rerun the query for something in the site navigation. Observe that there are now fewer results, and that the summaries don’t show any content from the header/footer or navigation.

1.2. Canonical URLs

Use canonical URLs to instruct Funnelback to index a page with a specific URL, regardless of what URL it used to request the page. e.g. This can be used to overcome URLs that contain dynamic elements or ?a URLs.

Use canonical URLs with care - Funnelback will use the canonical URL as the key to store the document and this is also used for duplicate detections so incorrectly specified canonical URLs may result in things missing from the index. A common error is to set the canonical URL to the home page for the entire site.

1.3. Planning your metadata

Funnelback can make use of metadata in a number of ways to enhance the user’s search experience.

1.3.1. Custom search summaries

The search result entries can be customised to display any available metadata. This can be used to produce better looking and more useful summaries:

  • Enhance the results but printing a metadata description or abstract in place of an automatically generated summary.

  • Augment the results with additional fielded information, thumbnail (using a URL to a thumbnail image).

  • Enrich the result by dynamically returning Javascript that uses metadata to populate variables

When designing the search result wireframes, think about what fielded content would be appropriate in the search result item layout and what metadata field will be used to provide the data to fill this field.

1.3.2. Result collapsing (grouping)

Metadata can also be used as a key for the grouping or collapsing of search results. It’s possible to configure multiple result collapsing keys - these can be a single field, or a several fields which are combined. These keys are similar in concept to a primary key in a database.

Once result collapsing is configured Funnelback is instructed to collapse the results based on one of the defined keys. This will cause all results with an identical key value to be grouped together.

Result collapsing can also be turned on and off at query time, so collapsing/grouping can be built in to the search UI as a control (similar to a sort by control).

When designing a site, think about the sort of groupings that could be useful in the results. For example you might wish to group results by:

  • Subject (e.g. Documents with identical subject metadata)

  • Version (e.g. Documents with the identical title metadata)

1.3.3. Faceted navigation

When designing a site consider how users should ideally filter the search. Consider both the different filters (that you might apply concurrently) as well as the choices that are offered within each of the filters.

Thinking about this will influence the metadata that needs to be specified as part of the design.

Facets should ideally be based on metadata fields, with the available categories sourced from the set of values defined within the field.

In the example below there are three facets defined (genre, play, character).

faceted navigation 01

To facilitate this faceted navigation structure all pages should have three metadata fields that correspond to the facets.

There should be a field that is used to populate the genre facet and the values would be comedy, tragedy and so on. (e.g. <meta name="play.genre" value="history" />)

You can include a page in multiple categories by applying multiple metadata keyword values. E.g. if an item should be included in both comedy and tragedy categories then you would need a field similar to <meta name="play.genre" content="comedy|tragedy" />)

A page can be included in multiple filters by applying the appropriate fields. E.g. you might have something like:

<meta name="play.genre" content="comedy|tragedy" />
<meta name="play.title" content="Romeo and Juliet" />
<meta name="play.character" content="Romeo|Juliet" />

Hierarchy can be supported for faceted navigation (when using single select facets, with drilldown). This creates facets with categories and sub-categories. (e.g. you could have a location facet with hierarchy - Location: Australia > Vic > St Kilda)

For this you would have metadata similar to:

<meta name="country" content="Australia" />
<meta name="state" content="Vic" />
<meta name="suburb" content="St Kilda" />

Facet counts are produced automatically for the facets when a searc is run. The counts are an estimate of the number of results that will be returned after the facet is applied. Because the estimate is produced based on the result set the value can change after selecting a facet.

There are several types of faceted navigation available including single and multi-select options.

1.4. Knowledge graph

Knowledge graph relies on good quality metadata to populate the knowledge graph widget.

Squiz Matrix should be configured to produce two knowledge graph specific fields for any URL that should be included as a node in the knowledge graph. The two metadata fields should define

  • a node type which contains a single value indicating the type. e.g. <meta name="fb.kg-node-type" content="Policy document" />.

  • a field containing the set of identifiers (delimited with vertical bar characters) for the URL. e.g. <meta name="fb.kg-node-identifiers" content="John Smith|Smith, John|jsmith@example.com|u22241444" />.

The metadata field names are not important but these two fields will then to be mapped to the FUNkgNodeLabel and FUNkgNodeNames metadata classes within Funnelback.

Additional metadata fields (again with multiple values delimited using vertical bar characters) should be included for each property you wish to display in the knowledge graph widget.

All metadata classes within Funnelback are available to be displayed within the knowledge graph widget.

1.5. Metadata for binary content

Every site indexed by Funnelback should include asset listings that expose metadata that is stored within Squiz Matrix relating to binary documents (PDF, RTF and MS Office documents).

When a document is uploaded to Squiz Matrix metadata is captured about the document - title, author and so on. Users will expect that the information that was entered when uploading the document will be reflected in the search results.

When Funnelback crawls a Squiz Matrix site it follows all links it encounters and downloads the documents. For PDF/office documents an additional operation to extract the text from the document runs. This is all Funnelback has access to when creating the index.

The additional metadata entered when uploading a document to Squiz Matrix is stored within the Squiz Matrix database (and not within the metadata fields contained within the PDF or office document). This means that Funnelback won’t see these and the titles will be whatever happens to be stored in the document’s internal metadata - often something quite useless such as 'Template no. 3'.

This can be rectified by creating an asset listing that lists both the document’s URL and any associated metadata that you wish to include in the index. This listing is formatted in Funnelback’s external_metadata.cfg format.

Tutorial: Creating an external metadata asset listing
  1. Create an asset listing asset underneath the corresponding site asset in Squiz Matrix named external metadata. The asset listing should be created in a section of the site, where other Funnelback assets are stored, normally within type 2 folder asset titled Funnelback which sits amongst the first level of assets in a site. This will create and asset listing with a URL similar to: http://www.example.com/funnelback/external-metadata

  2. Configure the asset types and the root nodes of the asset listing, to target the file assets you would like external metadata applied to in your Funnelback collection’s index. Common practice is to target both the content site and media site as root nodes, and to select the following asset types:

    • PDF File

    • MS Excel Document

    • MS PowerPoint Document

    • MS Word Document

      exercise creating an external metadata asset listing 01
  3. Remove the contents of the empty results body copy, and ensure that the content type of all content containers in the asset listing have been set to raw html.

  4. In the default format of the asset listing, use the keyword to list the asset’s URL, then for each external metadata element you would like applied, use the following format:

    http://example.com/path/to/document.pdf Funnelback_metadata_class:"corresponding asset metadata keyword" Funnelback_metadata_class2:"corresponding asset metadata keyword2"
  5. Your default format will look similar to:

    % asset_url % c:"% asset_metadata_description %" s:"% asset_metadata_dc.subject %" d:"% asset_published_short %" <LINE BREAK>

    Resulting in output like:

    http://site/url1.html c:"This is the description" s:"keyword1|keyword2|keyword3" d:"2015-01-24"
    http://site/url1.html c:"This is another description" s:"keyword1|keyword2|keyword3" d:"2015-04-14"
    You should ensure that each of the printed fields are cleaned to remove any quotes and line breaks and that fields that contain multiple values are delimited with a vertical bar character '|'.
  6. Ensure the design parse file is configured to return the external metadata file as plain text. To set this up, create a design asset titled Plain text format, and on its parse file screen, enter the following mark up:

    <MySource_PRINT id_name="__global__" var="content_type" content_type="text/plain" />
    <MySource_AREA id_name="page_body" design_area="body" />
  7. Apply this design to the asset listing asset so that it is returned as plain text.

  8. Preview your external metadata listing. The listing should be returned as UTF-8 plain and should be similar to the example below. You may need to view the page source to see the text with line breaks. Dates should be formatted as short ISO-8601 dates (e.g. 2010-12-16)

    http://www.example.com.au/__data/assets/pdf_file/0009/10314/example-1.pdf description:"example description" keyword:"example subjects" d:"example date"
    http://www.example.com.au/__data/assets/pdf_file/0017/10466/example-2.pdf description:"example description" keyword:"example subjects" d:"example date"
    http://www.example.com.au/__data/assets/pdf_file/0003/10596/example-3.pdf description:"example description" keyword:"example subjects" d:"example date"
    http://www.example.com.au/__data/assets/pdf_file/0019/10945/example-4.pdf description:"example description" keyword:"example subjects" d:"example date"
  9. Ensure that access to the external metadata listing is disallowed in robots.txt. For the URL above (http://www.example.com/funnelback/external-metadata) you would add the following to your robots.txt:

    Disallow: /funnelback/
  10. Log in to the Funnelback administration interface and switch to the collection that contains the Squiz Matrix site content.

  11. Review the metadata mappings for the Funnelback collection. Select the administer tab then click on edit metadata mappings.

  12. Ensure that each of the Funnelback metadata class fields from the external metadata file have corresponding definitions in the metadata mappings to ensure that the type for the metadata fields are explicitly defined. If the fields only exist via external metadata create entries in the metadata mappings and type a source field similar to EXTERNAL_METADATA:

  13. Edit the collection.cfg and add a pre-index workflow command to retrieve the external metadata from Squiz Matrix, and save it to the collection’s configuration folder so that the metadata is available to Funnelback when indexing. Add the following as a pre_index_command (update the URL to the external metadata listing in Squiz Matrix. $SEARCH_HOME and $COLLECTION_NAME can be left as these are special variables, similar to Matrix keyword modifiers, that will be filled in by Funnelback):

    pre_index_command=curl --connect-timeout 60 --retry 3 --retry-delay 20 'http://www.example.com/funnelback/external-metadata/_nocache' -o $SEARCH_HOME/conf/$COLLECTION_NAME/external_metadata.cfg
It is a good idea to access an uncached version of the URL when downloading the external metadata file to ensure that the external metadata includes all the latest information.
Funnelback expects the external metadata configuration file to be valid. If the file includes any errors in formatting the update will fail.

1.5.1. Common external metadata errors

Syntax errors in the external_metadata.cfg file

Funnelback expects any external metadata configuration to be valid. If any errors are detected indexing will fail with an error similar to the following appearing in the Step-Index.log:

Using external metadata from /opt/funnelback/data/shakespeare/live/tmp/external_metadata.cfg6294184464704914621
Error: Missing double quote at line 1
Ext. metadata requested but not available in required form
 - taking early retirement.
Command finished with exit code: 1

The error message will attempt to provide information on the cause.

Common causes of syntax errors include:

  • Unclosed quoted metadata values. e.g.

    Author:"Shakespeare Type:Classics
  • Missing quotes on multi-word metadata values. e.g.

    Author:William Shakespeare Type:Classics)
  • Fancy quote characters. e.g.

    Author:"William Shakespeare" Type:Classics
  • Line breaks or double quotes appearing within a metadata field. e.g.

    Author:"John Smith" Description:"This book draws together all of Shakespeare's output into a single volume.
    "The collected works of Shakespeare" is something that should be included in all libraries." Type:Classics

Syntax errors can be mitigated by cleaning the variables as they are printed to the template using Squiz Matrix keyword modifiers.

Squiz Matrix and large asset listings

If there are going to be a large number of items that result from the asset listing it may be necessary to paginate the listings to generate external metadata.

This will require additional work on the Funnelback side to grab each of the pages, stitch the external metadata lines together and construct a single external_metadata.cfg. This could be implemented in the Funnelback pre_index_command by writing a shell or perl script to compile all of the pages into a single external_metadata.cfg that is written to the collection’s configuration folder.

1.6. Using HTTP headers instead of asset listings to provide metadata

Note: the following is not tested but is possible in theory.

  1. Configure Squiz Matrix to manage the all binary document types (i.e. so that PDFs, MS Office documents are served by Matrix instead of Apache and they receive Squiz Matrix URLs instead of the __data urls.

  2. Configure Squiz Matrix to send the associated PDF metadata as a series of HTTP headers returned with the PDF. E.g.

    X-Document-Title=PDF document title
    X-Document-Keywords=Keyword 1|Keyword 2|Keyword 3
    X-Document-Description=This is the description
    X-Document-Author=John Smith
    X-Document-Date=2015-07-22
  3. Access a PDF document via HTTP and view the HTTP headers to confirm that they are being sent when the document is requested.

  4. Configure metadata mappings in Funnelback to index the HTTP headers.

  5. Run a full update in Funnelback and check to see that the metadata is detected and indexed.

1.7. Using metadata to produce structured autocompletion

Another powerful use of metadata is to generate structured auto-completion.

In Funnelback structured auto-completion is generated from CSV data, which provides the trigger, suggestion and action for each auto-complete suggestion.

When producing the CSV file it is important to think about these three attributes of every suggestion.

This CSV file is commonly generated and ingested by Funnelback as part of an update.

1.7.1. Auto-complete triggers

The trigger column specifies the trigger keyword for the suggestion.

As a user types in a query the partially entered query is submitted to Funnelback’s query completion web service and matched against the triggers in the auto-completion database.

The matching is anchored to the left of the complete trigger term. E.g. if the following triggers are defined in an auto-completion CSV file:

  • joe taylor

  • john smith

  • jon doe

  • derrick johnson

When a user types jo the following will be returned as suggestions

  • joe taylor

  • john smith

  • jon doe

derrick johnson isn’t returned because the match is left anchored from the start of the trigger string. If you wish the derrick Johnson suggestion to be a possible return for jo then you would need to have a trigger that returns the suggestion using johnson as the trigger.

When producing a CSV file it can be good practice to apply some logic to the trigger generation, removing stop words and accounting for accented character normalisation.

e.g. We might decide to use a title for a trigger. For the following trigger:

the meaning of life

You may wish to produce several identical suggestions within the CSV file with the following triggers - here we remove the leading word and any leading stop words for each subsequent trigger. This allows the suggestion to be returned for queries starting with any of the words in the title (except for 'of').

  • the meaning of life

  • meaning of life

  • life

There is an extensive exercise in FUNL203 covering the process of generating an autocompletion CSV file using the Funnelback search index as the source. 

You can generate the auto-completion CSV file using an asset listing that formats the output as compatible auto-completion CSV and have this downloaded as part of an update using a similar approach as described for external metadata.

1.8. Knowledge graph

There is a detailed workshop (TW01) that covers the setup of knowledge graph - this should be completed prior to attempting to set up knowledge graph with a Matrix site.

Much of the complexity of that workshop can be removed by carefully planning the knowledge graph then ensuring that appropriate pages inside the Matrix site have the metadata required to support your knowledge graph. By designing this carefully you should be able to avoid all of the custom filtering and scraping that was required for the training workshop.

For knowledge graph to work best with Matrix ensure every URL that should be treated as an entity in the knowledge graph has the following metadata:

  • A unique type - this metadata field needs to be mapped to the FUNfkgNodeLabel metadata class and is treated as the entity type.

  • A set of identifying names - this metadata field needs to be mapped to the FUNfkgNodeNames metadata class and is treated as the entity’s set of names. Multiple values should be delimited with a vertical bar character.

  • Any other properties that you wish to display in the graph specified as metadata fields.

If the metadata is applied correctly then the graph setup should be a simple process.

2. Crawling Squiz Matrix

This section covers the configuration of Funnelback (and Squiz Matrix) to build a basic search of a public Matrix site.

This doesn’t cover the case where search results need to be filtered based on the logged in user’s permissions - this is dealt with in the Matrix collections section later in the course.

Crawling a Squiz Matrix site is fundamentally the same as crawling any other website - at the end of the day the Squiz Matrix site is just another set of HTML pages as far as Funnelback is concerned.

There are, however, some things that can be done to improve the search of the Squiz Matrix site - these will make use of attributes that are specific to Squiz Matrix sites. This includes:

  • Squiz Matrix specific exclude patterns

  • Crawler settings that are designed to make the crawling of Squiz Matrix more reliable

  • The use of Funnelback external metadata to ensure that binary content that is indexed from Squiz Matrix includes any relevant metadata sourced from the Squiz Matrix database.

2.1. Configuring the Funnelback web crawler

A web crawler works by accessing a start URL (or list of start URLs) and recursively following all the links that are found within the page.

Funnelback’s web crawler works in the same way - once a start URL is downloaded all the links will be extracted and each is compared to a list of include/exclude patterns to determine if this link should be accessed. If the link passes the include/exclude test it’s added to a list of uncrawled URLs (called the crawl frontier). The crawler will work through this list until all the URLs are crawled (repeating the process of extracting links and adding to the frontier) or a timeout is reached.

2.1.1. Funnelback start URLs

A list of start URLs must be provided to Funnelback when crawling a website. This tells Funnelback where to start crawling. This is normally just the home page of the Squiz Matrix site, but can be any set of URLs.

2.1.2. Funnelback include/exclude patterns

Funnelback provides configuration for the web crawler to include and exclude URLs that are found based on pattern matching against the URL.

You always need to provide a basic level of configuration for the include/exclude patterns - this controls what URLs Funnelback will consider when crawling a site.

An include pattern is required - this tells Funnelback what pages can be considered for inclusion (before any exclude patterns are applied). This is usually just set to the site’s domain name (or top level domain if you wish to include other subdomains).

The exclude patterns duplicate the functionality that can be provided by setting appropriate robots.txt / robots meta tags / rel=nofollow attributes as discussed above.

Funnelback’s default exclude patterns include some Matrix specific control pages such as SQ_ACTION. Generally speaking, it is better to define appropriate robots exclusion using the Squiz Matrix techniques described earlier as these will apply to all web robots and not just Funnelback.

The pattern matching is normally a substring match against the URL, but the full set of include or exclude patterns can be compared using a regular expression for finer control.

2.1.3. Web crawler timeouts

The web crawler request timeout settings should be adjusted for Squiz Matrix sites to avoid problems caused by connection timeouts.

  • crawler.request_timeout: This should be increased from the default (15000ms) to something like 30s by setting crawler.request_timeout=30000

  • crawler.max_timeout_retries: Funnelback can be configured to retry in the event of a connection timeout. Setting this value is recommended for Squiz Matrix sites. e.g. crawler.max_timeout_retries=3

Tutorial: Crawl a Squiz Matrix site
  1. Create a web collection in Funnelback to hold the search for the Matrix site

  2. If authenticated access is required to Matrix (e.g. during development) then create a user with read access for Funnelback in Matrix and configure form interaction within Funnelback. However don’t forget to remove this once the authenticated access is no longer required.

    1. Add the following to the collection configuration (edit collection configuration) then save it

      • crawler.form_interaction.pre_crawl.matrixlogin.url: http//<MATRIX-SERVER>/home

      • crawler.form_interaction.pre_crawl.matrixlogin.form_number: 1

      • crawler.form_interaction.pre_crawl.matrixlogin.cleartext.sq_username: <USER-NAME>

      • crawler.form_interaction.pre_crawl.matrixlogin.encrypted.password: <PASSWORD>

  3. Define start URL(s) - this is normally just the URL of the home page.

  4. Define relevant include/exclude patterns.

    1. Always include the domain of your site as an include pattern.

    2. The default exclude patterns include some Squiz Matrix specific URLs such as ?SQ_ACTION.

    3. Consider excluding ?a.

    4. Remember that everything listed in the exclude patterns is substring matched (unless the exclude patterns starts with regex:) against the URL of links that are found during the crawl.

  5. Increase crawler timeouts and retries - this is a good idea with Squiz Matrix sites as timeouts can cause problems with crawls.

    1. Increase the request timeout - add the following to the collection configuration to increase the request timeout to 30s:

      • crawler.request_timeout: 30000

    2. Set retries (on timeout) - add the following to the collection configuration to retry timeouts three times:

      • crawler.max_timeout_retries: 3

  6. Configure metadata mappings (including any metadata that’s added via external metadata).

  7. Add workflow to download external metadata from the matrix site:

  8. Crawl the Matrix site and debug as necessary.

When crawling with authentication keep in mind that Funnelback will index whatever the user can see - this includes any in-page personalisation as well as non-live or safe edit content that the user currently has access to and binary documents with Matrix managed URLs.

2.1.4. Funnelback included file types

Funnelback will index a set of binary documents that it encounters during the crawl. The default set of documents includes Microsoft Office, RTF, PDF and text documents. This is normally all that is required for a web crawl.

This can be extended to many other document types - as long as there is a method to convert the document into text, or to associate the document URL with some relevant text. For some types the text equivalent will be metadata only (e.g. ID3 tags for a MP3), or derived from an external source (e.g. a video or audio transcript).

Funnelback uses the Apache Tika library to convert the supported file types. Tika supports many file types beyond the standard set.

The full list of file types that Tika can handle is outlined in the Tika documentation.

2.1.5. Funnelback web crawler configuration

Other crawler settings to think about configuring include:

  • crawler.max_download_size: Funnelback will reject any document that is larger than 3MB unless you increase this value.

  • crawler.max_link_distance: This tells Funnelback how deep to crawl from the start URL.

  • server_alias.cfg: This can be used to set canonical server names - otherwise you can end up with variant names depending on what was found during the crawl - e.g. http://www.example.com vs https://www.example.com vs http://example.com vs https://example.com

  • site_profile.cfg: Can be used to set some per-domain settings.

2.2. Configuring in Squiz Matrix vs. Funnelback

The previous section has demonstrated ways of controlling what Funnelback indexes by altering Squiz Matrix or Funnelback configuration.

This raises the question - what is the best practice approach?

As a general rule it is better to control this via Squiz Matrix configuration for a number of reasons:

  • Non user agent specific configuration will apply to all web crawlers, not just Funnelback. This means that you also control what happens in Google etc. in a consistent manner. It also means that more thought is given to search and how it impacts on the site.

  • Because the configuration is attached to the pages on the site it’s much clearer to an administrator what is going on.

Use the Funnelback include/exclude rules:

  • If you are unable to update at the source.

  • For binary documents that are not managed via Squiz Matrix

  • For documents sourced from external sites where you don’t have control over the source.

3. Using Squiz Matrix with push collections

Funnelback includes a collection type that is updated via calls to an API.

A push collection doesn’t contain any logic for the fetching of content (such as a web crawler) - it is purely a data store and an associated set of indexes.

Updates of a search index occur in near-real time, whenever content is pushed in via the API.

A push collection can be hooked up to Squiz Matrix with the index content maintained via various triggers that are defined within Matrix.

These triggers handle the addition, update and deletion of content within the indexes.

Don’t forget that Squiz Matrix is responsible for deleting content from the index. If you forget to do this then the content will stay in the index forever. Squiz Matrix is also responsible for any error handling (e.g. if the push API returns an error).
Tutorial: Using Squiz Matrix with a push collection
  1. Create a push collection from the Funnelback administration interface. Select push from the create collection menu.

  2. Edit the collection settings for the newly created collection. From the indexer tab add the following to the indexer command line options if the content that is to be pushed into Funnelback includes matrix web pages. This tells Funnelback to use link text and click logs into consideration when building the index.

    -anniemode=3
  3. Log in to the Squiz Matrix administration interface. Create a trigger to add new or updated content by calling the push collection’s API. Locate the trigger manager under system management in the asset map and create a new child trigger element.

    exercise using squiz matrix with a push collection 01

    Under the details section choose an appropriate name, description and category

    exercise using squiz matrix with a push collection 02
  4. Under the events section add check the status changed event

    exercise using squiz matrix with a push collection 03
  5. Under the conditions section select asset status being changed to * then click commit.

    exercise using squiz matrix with a push collection 04
  6. Update the conditions to select live from the status list

    exercise using squiz matrix with a push collection 05
  7. Under the actions section select call REST resource then click commit.

    exercise using squiz matrix with a push collection 06
  8. The call REST resource dropdown is replaced with a large form. Update it to add the following

    1. Method: select PUT from the drop down menu

    2. Add the REST URL - replace <FUNNLEBACK-SERVER> and <PUSH-COLLECTION-NAME> with the correct values.

      URL(s) = https://<FUNNLEBACK-SERVER>:8443/v1/collections/<PUSH-COLLECTION-NAME>/documents?key=<ASSET-URL>&filters= CombinerFilterProvider,TikaFilterProvider,ExternalFilterProvider:JSoupProcessingFilterProvider:DocumentFixerFilterProvider
    3. Check the disable SSL certificate verification on HTTP requests (unless you have installed a signed SSL certificate on your Funnelback server)

    4. Change the cache options to never cache

    5. Set the authentication type to HTTP Basic then click commit.

    6. The HTTP Basic option will update to allow the input of a username and password. Enter the details for an appropriate administration user with access to Funnelback.

    7. In the request headers set a Content-Type header that matches the type of the asset being submitted.

    8. If the document is a binary document that is to be indexed by Funnelback pass relevant fields in the Matrix metadata using HTTP headers. Each metadata value should be prefixed with X-Funnelback-Push-Meta-Data-

      exercise using squiz matrix with a push collection 07
    9. In the Request body section pass the body content (check how to do this). For HTML docs pass the document body content, for binary documents submit the binary document?

    10. Check the convert to UTF-8 option

    11. Click commit

  9. Set up a trigger to handle deletions

    1. Follow a similar process to above except call the DELETE /v1/collections/{collection}/documents API when a document is no longer available to users (this could be a delete, or possibly a document being set to under construction).

    2. The DELETE call itself doesn’t require any of the additional HTTP headers etc.

4. Search results integration

There are generally four ways to integrate with Funnelback to provide a search results page.

4.1. Nest the search inside a Squiz Matrix search page

This is the most common method of integration between Squiz Matrix and Funnelback and involves configuring a template in Funnelback that returns a well formed HTML fragment (a chunk of HTML code that you can safely nest within a DIV). Squiz Matrix takes the user’s query and constructs a call to Funnelback - Squiz Matrix makes the HTTP request on the user’s behalf. Funnelback returns the HTML fragment to Squiz Matrix, which is then nested inside a page on the Squiz Matrix website. Squiz Matrix is responsible for the styling, overall page structure (headers/footers/navigation etc).

nest the search inside a squiz matrix search page

This method of integration is achieved using a REST asset.

When embedding the search the whole return from Funnelback should be nested in a single REST asset. If you need dynamic Matrix content to appear within the Funnelback return you can return Matrix snippets within Funnelback’s return.

When using this method of integration:

  • Ensure the X-Forwarded-For HTTP header is set containing the user’s remote address (otherwise Funnelback analytics will show all search result traffic as coming from Matrix).

  • Funnelback’s return is encoded in UTF-8 and Matrix should be configured so that nesting of this content doesn’t result in incorrectly encoded characters.

  • Caching of the search results page should be disabled

  • If using a single REST asset for the search integration you can use the ui.modern.search_link, ui.modern.click_link and the ui_cache_link configuration values within Funnelback to set appropriate links for inclusion in the search results output - this will ensure correct links are returned within the search interface.

  • There is a cost in terms of performance from nesting search results within the matrix page as you are adding the Matrix page render time on top of the time taken by Funnelback to run and build the search results.

Advantages:

  • Centralised control of Matrix-managed CSS, JS, wrapping markup

  • High re-use of existing Funnelback FTL macros

  • Possible caching benefits for static markup components

  • Matrix keywords embedded in FTL responses are translated

  • Matrix can be used to provide authenticated access to the search results

Disadvantages:

  • Response time has added overhead of Squiz Matrix access and rendering.

  • FTL stored on Funnelback server - changes to search results markup must be made within Funnelback

  • May require search-specific template in Matrix

  • May not benefit from Matrix caching

Tutorial: Configure a Funnelback template for REST integration

This involves creating a template in Funnelback that outputs a chunk of HTML suitable for nesting within Matrix.

Before doing this you need to decide if you want Funnelback to return the search form as well or just results.

The exercise below uses the bare bones template that was used for the inventors collection in the FUNL201 course.

  1. Log in to the Funnelback administration interface and select the inventors collection.

  2. From the file manager (administer tab, view collection configuration files) create a new custom.ftl (from under the inventors / presentation (preview) heading). Paste in the following code (this is the contents of the simple.ftl for that collection) and save and publish the template as rest.ftl.

    <#ftl encoding="utf-8" />
    <#import "/web/templates/modernui/funnelback_classic.ftl" as s/>
    <#import "/web/templates/modernui/funnelback.ftl" as fb/>
    <#escape x as x?html>
    <#--
      Sets HTML encoding as the default for this file only - use <#noescape>...</#noescape> around anything which should not be escaped.
      Note that if you include macros from another file, they are not affected by this and must hand escaping themselves
      Either by using a similar escape block, or ?html escapes inline.
    -->
    
    <!DOCTYPE html>
    <html lang="en-us">
    <head>
       <title><@s.AfterSearchOnly>${question.inputParameterMap["query"]!}<@s.IfDefCGI name="query">,&nbsp;</@s.IfDefCGI></@s.AfterSearchOnly><@s.cfg>service_name</@s.cfg>, Funnelback Search</title>
    </head>
    <body>
      <form action="${question.currentProfileConfig.get("ui.modern.search_link")}" method="GET">
        <input type="hidden" name="collection" value="${question.inputParameterMap["collection"]!}">
        <@s.IfDefCGI name="enc"><input type="hidden" name="enc" value="${question.inputParameterMap["enc"]!}"></@s.IfDefCGI>
        <@s.IfDefCGI name="form"><input type="hidden" name="form" value="${question.inputParameterMap["form"]!}"></@s.IfDefCGI>
        <@s.IfDefCGI name="scope"><input type="hidden" name="scope" value="${question.inputParameterMap["scope"]!}"></@s.IfDefCGI>
        <@s.IfDefCGI name="lang"><input type="hidden" name="lang" value="${question.inputParameterMap["lang"]!}"></@s.IfDefCGI>
        <@s.IfDefCGI name="profile"><input type="hidden" name="profile" value="${question.inputParameterMap["profile"]!}"></@s.IfDefCGI>
        <input required name="query" id="query" title="Search query" type="text" value="${question.inputParameterMap["query"]!}" accesskey="q" placeholder="Search <@s.cfg>service_name</@s.cfg>&hellip;">
        <button type="submit">Search</button>
        <@s.FacetScope> Within selected categories only</@s.FacetScope>
      </form>
    
      <@s.AfterSearchOnly>
        <div>
          <#if response.resultPacket.resultsSummary.totalMatching == 0>
            0 search results for <@s.QueryClean />
          </#if>
          <#if response.resultPacket.resultsSummary.totalMatching != 0>
            ${response.resultPacket.resultsSummary.currStart} -
            ${response.resultPacket.resultsSummary.currEnd} of
            ${response.resultPacket.resultsSummary.totalMatching?string.number}
            search results for <@s.QueryClean></@s.QueryClean>
          </#if>
        </div>
    
        <#if (response.curator.exhibits)!?size &gt; 0>
          <#list response.curator.exhibits as exhibit>
            <#if exhibit.messageHtml??>
              <blockquote class="search-curator-message">
                <#noescape>${exhibit.messageHtml}</#noescape>
              </blockquote>
            </#if>
          </#list>
        </#if>
    
        <@s.CheckSpelling prefix="<p>Did you mean <em>" suffix="</em>?</p>" />
    
        <h2>Results</h2>
    
        <#if response.resultPacket.resultsSummary.totalMatching == 0>
            <h3>No results</h3>
            <p>Your search for <strong>${question.originalQuery!}</strong> did not return any results. Please ensure that you:</p>
            <ul>
              <li>are not using any advanced search operators like + - | " etc.</li>
              <li>expect this document to exist within the <em><@s.cfg>service_name</@s.cfg></em> collection <@s.IfDefCGI name="scope"> and within <em><@s.Truncate length=80>${question.inputParameterMap["scope"]!}</@s.Truncate></em></@s.IfDefCGI></li>
              <li>have permission to see any documents that may match your query</li>
            </ul>
        </#if>
    
        <#if (response.resultPacket.bestBets)!?size &gt; 0>
          <ul>
            <#-- Curator exhibits -->
            <#list response.curator.exhibits as exhibit>
              <#if exhibit.titleHtml?? && exhibit.linkUrl??>
                <li>
                  <h4><a href="${exhibit.linkUrl}"><@s.boldicize><#noescape>${exhibit.titleHtml}</#noescape></@s.boldicize></a></h4>
                  <#if exhibit.displayUrl??>${exhibit.displayUrl}</#if>
                  <#if exhibit.descriptionHtml??><p><@s.boldicize><#noescape>${exhibit.descriptionHtml}</#noescape></@s.boldicize></p></#if>
                </li>
              </#if>
            </#list>
            <#-- Old-style best bets -->
            <@s.BestBets>
              <li>
                <#if s.bb.title??><h4><a href="${s.bb.clickTrackingUrl}"><@s.boldicize>${s.bb.title}</@s.boldicize></a></h4></#if>
                <#if s.bb.title??>${s.bb.link}</#if>
                <#if s.bb.description??><p><@s.boldicize><#noescape>${s.bb.description}</#noescape></@s.boldicize></p></#if>
                <#if ! s.bb.title??><p><strong>${s.bb.trigger}:</strong> <a href="${s.bb.link}">${s.bb.link}</a></#if>
              </li>
            </@s.BestBets>
          </ul>
        </#if>
    
        <ol start="${response.resultPacket.resultsSummary.currStart}">
          <@s.Results>
            <#if s.result.class.simpleName != "TierBar">
              <li>
    
                <h4>
                  <a href="${s.result.clickTrackingUrl}" title="${s.result.liveUrl}">
                    <@s.boldicize><@s.Truncate length=70>${s.result.title}</@s.Truncate></@s.boldicize>
                  </a>
                </h4>
    
                <cite><@s.cut cut="http://"><@s.boldicize>${s.result.displayUrl}</@s.boldicize></@s.cut></cite>
    
                <#if s.result.summary??>
                  <p>
                    <#if s.result.date??>${s.result.date?date?string("d MMM yyyy")}:</#if>
                    <@s.boldicize><#noescape>${s.result.summary}</#noescape></@s.boldicize>
                  </p>
                </#if>
                <#if s.result.metaData["c"]??><p><@s.boldicize>${s.result.metaData["c"]!}</@s.boldicize></p></#if>
              </li>
            </#if>
          </@s.Results>
        </ol>
    
        <h2>Pagination</h2>
        <p>
          <@fb.Prev><a href="${fb.prevUrl}">Prev</a></@fb.Prev>
          <@fb.Page numPages=5><#if fb.pageCurrent>${fb.pageNumber}<#else><a href="${fb.pageUrl}">${fb.pageNumber}</a></#if></@fb.Page>
          <@fb.Next><a href="${fb.nextUrl}">Next</a></@fb.Next>
        </p>
      </@s.AfterSearchOnly>
    </body>
    </html>
    </#escape>
  3. Run a search against the collection, specifying the rest template and view the HTML source.

  4. Edit the template and remove the code that is highlighted in yellow (at the top and bottom of the file) then save and publish the template. The import statements at the head of the file as well as the escape code should be retained. Refresh the search observing the HTML source now returns a chunk of HTML which was the contents of the body tag - this includes both the search box code as well as the results

  5. Edit the template and remove the code that is highlighted in green then save and publish the template and refresh the search observing the HTML source now returns a chunk of HTML only for the search results.

Tutorial: Configure a Squiz Matrix REST asset for use with Funnelback
  1. Log into Squiz Matrix.

  2. Locate the Funnelback folder at the top level of the site hierarchy.

  3. Create a child REST resource: add > web services > REST resource

  4. Set the following details

    • page name: Funnelback Search via REST

    • Link type: TYPE_1

  5. Commit

  6. Acquire the lock on the REST asset.

  7. Update the configuration in the HTTP request section:

    1. Method: GET

    2. The value for URL(s) can be set to remove any variables that should be hard-coded and not repeated (typically 'collection', 'profile' and 'form'):

      http://<FUNNELBACK-SERVER>/s/search.html?%globals_server_query_string^replace:profile=<PROFILE-ID>:^replace:collection=<COLLECTION-ID>:^replace:form=<TEMPLATE-ID>:^replace:&+:&%&collection=<COLLECTION-ID>&profile=<PROFILE-ID>&form=<TEMPLATE-ID>
    3. Set a request header to pass the remote user’s IP address to Funnelback for search analytics:

      X-Forwarded-For:%globals_server_remote_addr%
    4. Check append query string to the request URL

  8. Set allow keyword replacement to yes to support the processing of Matrix keywords (such as snippets) present in the Funnelback HTML response

  9. Set the error response to something helpful:

    Search service is currently unavailable.
  10. Log into the Funnelback administration interface and edit the collection.cfg to set the UI links - this needs to be performed to ensure all the dynamic template links work

    1. Set the ui_cache_link to an absolute path to the Funnelback server’s cache URL correctly without needing to hardcode values within the template:

      ui_cache_link=http://funnelback-server/s/cache

      This will ensure views of the cache version are served directly by Funnelback.

    2. Set the ui.modern.search_link to an absolute URL referencing the REST asset in Matrix.

      ui.modern.search_link=http://matrix-server/search/funnelback).

      This will ensure all generated links within the template (such as faceted navigation and pagination links) call the REST asset when clicked on.

    3. Set the ui.modern.click_link to an absolute URL referencing the redirect controller in Funnelback.

      ui.modern.click_link=http://funnelback-server/s/redirect

      This will ensure all result links are directed to Funnelback to log the click before redirecting to the final URL.

Tutorial: Configuring REST asset caching

In general caching should be disabled on the REST asset. However, if the Funnelback REST asset is being used to generate a canned search response, or if the page is otherwise suitable for heavy caching by Matrix, align the cache expiry with the corresponding Funnelback collection update schedule.

  1. HTTP Request Section

    1. Cache Options: Use Default Expiry

    2. Default Cache Expiry (seconds): 86400 (assuming a 24-hr update schedule for the corresponding Funnelback collection)

  2. Funnelback can be configured to call the _recache Matrix URL from a post update command

    post_update_command=curl 'http://matrixsite/url_to_canned_search_page/_recache'

Otherwise, the Matrix defaults (Cache Options: Use HTTP Expiry Headers) should suffice.

Tutorial: Nesting the REST asset
  1. Create a standard page (typically at the site’s root) in Matrix: New child > pages > standard page

  2. Page name: "Search"

  3. Link type: "TYPE_1"

  4. Commit

  5. Edit this page’s contents (Right click: 'edit contents > acquire lock(s)'), then edit the Properties of a <div> within the page’s contents where you’d like the Funnelback results to appear.

  6. DIV properties:

    1. Style information > presentation: raw HTML

    2. Content type > content type: nest content

      exercise nesting the rest asset 01
  7. Commit

  8. Edit the page contents again. Use the asset ID of the REST asset within the search DIV.

  9. Commit.

  10. Within the DIV’s properties, ensure 'paint this asset' is set to 'raw (no Paint Layout)':

    exercise nesting the rest asset 02
  11. Commit.

Tutorial: Testing and accessing the REST asset
  1. Visit your search results page in Matrix, using a test query: http://<MATRIX-SERVER>/search?query=TEST

  2. Test clicking through a result, using pagination and cached copy links (if applicable).

  3. The search is linked via a standard HTML search form using GET similar to the code below:

    <form action="/search" method="get">
      <input type="search" name="query" value="">
      <input type="submit"
    </form>
Review exercise
  1. Create and publish a new Funnelback template on the Shakespeare collection, rest.ftl, that returns a code chunk for nesting within a DIV.

  2. Within Matrix, create a REST asset using the response from Funnelback the Shakespeare collection

  3. Embed this response in your test Matrix site’s search page

4.2. Search results served from Funnelback

A template is configured in Funnelback to return the full search results page.

Funnelback IncludeUrl calls are optionally used to remotely source headers/footers/navigation from a nominated Matrix page.

Suitable for traditional 'full-page' search results, where facets, contextual navigation, blending, etc. are incorporated. The example diagram below shows several components being included from a remote location (typically a Matrix web server) using an approach similar to server-side inclusions.

Configuration of the search template is the same process as for any other website (see the FUNL201 training course examples).

search results served from funnelback 01

Advantages:

  • Provides the fastest response time. Funnelback responsible for all render response times (assuming IncludeUrl success)

  • Able to use IncludeUrl template macros to reference and cache centralised site template chunks

  • Able to use Matrix-managed CSS, JS and other web resources.

  • Maximum re-use of existing Funnelback FTL macros

Disadvantages:

  • FTL stored on Funnelback server - changes to search results markup must be made within Funnelback.

  • Typically requires a different domain for search pages vs. site pages (eg. http://www.example.com/ vs. http://search.example.com/)

  • No caching benefits for static markup components

Integration with the Funnelback search is achieved simply by creating a standard HTML form within your Matrix site template that directly references the Funnelback server. The search box code will be similar to:

<form action="http://FUNNELBACK-SERVER/s/search.html" method="get>
  <input type="search" name="query" value="" >
  <input type="hidden" name="collection" value="COLLECTION_ID">
  <input type="submit"
</form>

Several search-specific properties are used:

Upon submission of this form, users would be redirected to a URL like:

http://<FUNNELBACK-SERVER>/s/search/html?collection=<COLLECTION-ID>&query=<QUERY>

4.3. Funnelback raw JSON or XML

This method involves using Funnelback’s raw JSON or XML interfaces which deliver the full data model to Matrix.

This method is suitable for developers comfortable dealing with complex JSON or XML responses and associated client-side or server-side Javascript templating languages (e.g. AngularJS, HandlebarsJS, etc.)

funnelback raw json or xml

However there are a number of major disadvantages of using this approach:

  1. The integration is much tighter so changes between versions in the JSON or XML structure could break the integration (and require additional work within Matrix to resolve any issues).

  2. A number of features within Funnelback are implemented in the user interface layer - this means that the logic behind how these features operates is wholly the responsibility of the Matrix developer.

    Affected features include:

    • Faceted navigation: Facets are constructed from a series of element counts that are returned within the data model combined with the faceted navigation element that details how the faceted navigation is configured. All faceted navigation logic would require implementation within the Matrix code consuming the XML or JSON.

    • Pagination controls

    • Funnelback sessions

    • Quick links functionality

  3. There is potentially a significant amount of work required Matrix side to take advantage of new features that are introduced in future versions of Funnelback (for much the same reason as the point above).

Advantages:

  • No FTL template maintenance

  • Client-side or server-side templating feasible

Disadvantages:

  • Some Funnelback features must be reimplemented within Matrix (if used).

  • Much higher level of custom code is required meaning upgrades are at higher risk of causing issues.

  • It can be difficult and expensive to make use of features introduced in future versions of Funnelback.

  • Large JSON response packet may contain unnecessary information

  • Extensive Javascript templating skill required

4.4. Funnelback custom JSON or XML

This method involves using Funnelback’s Freemarker templating to create a custom JSON or XML response that is then integrated with Matrix.

Suitable for developers comfortable dealing with simple JSON or XML responses and associated client-side or server-side Javascript templating languages (e.g. Angular, Handlebars, etc.)

funnelback custom json or xml

Advantages:

  • Minimal FTL template maintenance

  • Smaller, custom JSON response containing only required fields

  • Client-side or server-side templating feasible

Disadvantages:

  • Some Funnelback features must be reimplemented within Matrix (if used).

  • Much higher level of custom code is required meaning upgrades are at higher risk of causing issues.

  • It can be difficult and expensive to make use of features introduced in future versions of Funnelback.

  • Extensive Javascript templating skill required

To ensure that your custom template is returned as JSON, ensure the following collection-level setting is configured in collection.cfg:

ui.modern.form.custom-json-template-name.content_type=application/json

or if the template is configured to return as JSONP then set the content type to application/javascript.

Cross-domain JavaScript issues can also be overcome by setting appropriate CORS headers using the ui.modern.form.TEMPLATENAME.headers.X and ui.modern.form.TEMPLATENAME.headers.count collection.cfg parameters.

4.4.1. Client side application

Used for dynamic regions that may be transient, heavily personalised or ad-hoc. Examples include:

  • Shopping carts and search short lists

  • Recommendations based on the current user’s previous search, browsing or purchasing history

  • Tailored content regions based on a user’s locale or IP address

  • Hotels near the postcode '4740' with a three-star rating or higher and BBQ facilities

4.4.2. Server side application

Used for dynamic regions that need to be indexed and/or followed by web search engines. Examples include:

  • Hotels near the postcode '4740'

  • Latest publications

  • A-Z listing of staff in the accounting department

  • Movies starring Russell Crowe

4.4.3. XML responses

Like the native and custom JSON responses above, similar caveats apply.

  • Funnelback provides native XML responses at /s/search.xml - Matrix XML Data source assets assume a simplistic XML structure, which may not be well-suited to Funnelback’s default XML response.

  • Custom XML responses can be rendered using /s/search.html

  • XSLT skills are required to transform native Funnelback XML into HTML

4.5. Review questions

  • Describe a scenario where the 'Full HTML' approach is preferable to the 'Partial HTML as REST asset' approach

  • Assume you’re working on a search-powered travel destination website, using Funnelback to derive listings at pages like /accommodation/region/area/town. Describe how your Matrix caching strategy may be applied, and the limitations of caching refinements to this page (e.g. /accommodation/region/area/town?star-rating=three&features=bbq&min-cost=250&type=backpacker)

5. Auto-completion integration

Auto-completion can be integrated with any search box that’s deployed on the Matrix site.

Funnelback provides a web service that returns auto-completion based of a partial query. There is a JavaScript library that ships with Funnelback that offers an example implementation of interacting with the web service. This offers the simplest method of deploying auto-completion, but you are not limited to using this.

See the example in the FUNL201 training course for the process required for setting up auto-completion using the JavaScript that ships with Funnelback (attach auto-completion to a search box).

6. Recommendations

Funnelback has a recommendation service that can be used to provide recommendations of similar pages (similar to the users who were interested in X were also interested in Y type pages on sites like amazon.com).

The recommendations service is accessed via an API that returns a JSON packet.

6.1. Recommendation web service

Further leveraging the search index, Funnelback’s recommendations service only provides a JSON response, taking a single URL and collection as a seed item.

recommendation web service

7. Debugging the user interface

It can be difficult to debug issues with the search when Funnelback is integrated into Matrix using a REST asset. If no results (or unexpected results) are returned is the problem Matrix-side or within Funnelback?

Steps can be taken both within Matrix and Funnelback to assist in debugging these problems.

The FUNL201 course provides more detailed information and exercises for debugging Funnelback templates.

7.1. Add debugging to Funnelback output

Funnelback template errors, processing times and query transformations can sometimes be opaque to Matrix implementers. Enabling the debug mode for the asset within Squiz can also trigger additional debugging output from Funnelback, if required.

  1. From the Admin UI, add the following snippet to your search template (Customise > Design Results Page).

    ...
    <#if question.inputParameterMap["SQ_DEBUG_MODE"]?? && (question.inputParameterMap["SQ_DEBUG_MODE"] == "true")>
      <!-- DEBUGGING OUTPUT -->
      <div id="funnelback-debugging-mode" style="background: rgba(255,0,0,0.75); color: #fff; position: fixed; top: 0; left: 0; z-index: 999999; display: block; width: 100%; text-align: center;">
        <h1>Debugging Mode Enabled</h1>
        <p>Check JavaScript console for output</p>
      </div>
      <#-- Output Funnelback debugging to JavaScript Console -->
        <script>
        <#-- Default Debugging -->
        console.group("Funnelback");
        console.debug('Request URL: ${question.inputParameterMap["REQUEST_URL"]}');
        console.debug('[DEBUG] Funnelback: Query: ${response.resultPacket.query}');
        console.debug('[DEBUG] Funnelback: Query as Processed: ${response.resultPacket.queryAsProcessed}');
        console.debug('[DEBUG] Funnelback: QueryString: ${QueryString}');
        <#if question.inputParameterMap["origin"]??>console.debug('Origin: ${question.inputParameterMap["origin"]?html}');</#if>
        <#if question.inputParameterMap["maxdist"]??>console.debug('MaxDist: ${question.inputParameterMap["maxdist"]?html}');</#if>
        console.debug('Matching Results: ${response.resultPacket.resultsSummary.totalMatching}');
        console.debug('Collection updated: ${response.resultPacket.details.collectionUpdated?datetime}');
    
        <#-- Meta Parameters and QPOs -->
        <#if question.inputParameterMap["SQ_DEBUG_VERBOSITY"]?? &&
          (question.inputParameterMap["SQ_DEBUG_VERBOSITY"] == "2" ||
          question.inputParameterMap["SQ_DEBUG_VERBOSITY"] == "3")>
    
          <#list question.metaParameters as metaParameter>
            console.debug('Meta Parameter: ${metaParameter}');
          </#list>
          <#list question.dynamicQueryProcessorOptions as qpo>
            console.debug('Dynamic QPO: ${qpo}');
          </#list>
        </#if>
        <#-- Performance Metrics -->
        <#if question.inputParameterMap["SQ_DEBUG_VERBOSITY"]?? &&
          question.inputParameterMap["SQ_DEBUG_VERBOSITY"] == "3">
    
          console.info('Performance - TOTAL: ${response.performanceMetrics.totalTimeMillis}');
          <#list response.performanceMetrics.taskInfo as task>
            console.info('Performance - ${task.taskName}: ${task.timeMillis}');
          </#list>
        </#if>
        console.groupEnd();
     </script>
    </#if>
    ...
  2. Save and publish once complete.

Different debugging modes may be required at different points in an implementation project - the snippet above allows for a SQ_DEBUG_VERBOSITY parameter to also be sent. Values include:

  • SQ_DEBUG_VERBOSITY=1: (Default) Display request URL, query, query as processed, query string, matching results, origin and maximum distance (if defined), collection update time

  • SQ_DEBUG_VERBOSITY=2: Additionally, display all meta parameters and dynamic query processor options

  • SQ_DEBUG_VERBOSITY=3: Additionally, display all performance metrics

7.2. Add debugging the Matrix output

Debugging Funnelback and Matrix integrations is best performed at both ends. The JavaScript below can be added to the JavaScript processing being performed on your Funnelback REST asset within Matrix, if desired.

  1. From the Matrix asset tree, locate your REST asset > details

  2. Acquire Lock

  3. Add the following code snippet to the JavasScript processing > JavaScript text area:

    ...
    ///////////////
    // Debugging //
    ///////////////
    // Log Funnelback debugging info to the JavaScript console if 'SQ_DEBUG_MODE=true' appears on the search URL
    
    if (_REST.request.urls[0].match(/SQ_DEBUG_MODE=true/)) {
        print("<script type=\"text/javascript\">\n\
        console.log('[DEBUG] REST Request URL: ','" + _REST.response.info.url + "');\n\
        console.log('[DEBUG] REST Request Headers: ','" + _REST.request.headers.toSource() + "');\n\
        console.log('[DEBUG] REST Response Name Lookup Time: ','" + _REST.response.info.namelookup_time + "');\n\
        console.log('[DEBUG] REST Response Connect Time: ','" + _REST.response.info.connect_time + "');\n\
        console.log('[DEBUG] REST Response Pretransfer Time: ','" + _REST.response.info.pretransfer_time + "');\n\
        console.log('[DEBUG] REST Response Start Transfer Time: ','" + _REST.response.info.starttransfer_time + "');\n\
        console.log('[DEBUG] REST Response Total Time: ','" + _REST.response.info.total_time + "');\n\
        console.log('[DEBUG] REST Response Headers: ','" + _REST.response.headers.toSource().replace(/\'/g, '') + "');\n\
        console.log('[DEBUG] Estimated Funnelback Response Time: ','" + (_REST.response.info.total_time - _REST.response.info.namelookup_time) + "');\
        </script>");
    }
  4. Commit.

  5. Release Lock(s)

  6. From the Matrix Search URL, add the Squiz debugging parameters and examine the output in your browser’s JavaScript console:

    http://<MATRIX-SERVER>/search?query=QUERY&SQ_DEBUG_MODE=true&SQ_DEBUG_VERBOSITY=3

8. Funnelback as a data source

This involves using Funnelback as a data source that populates dynamic content areas on a page in a similar way to how you might pull in content from a database.

  1. In Funnelback, create a profile (enabled as a frontend service) for this. Disable logging (by adding -log=off to the padre_opts.cfg of the profile unless usage analytics is of particular interest.

  2. Create a suitable Funnelback template that returns the results in the desired format.

  3. Create a Matrix REST asset to make the relevant query. Depending on the application the query might be completely 'static' (i.e. Have all the parameters hardcoded in the request URL), or may have variable components that a user can set via controls within the web page. The URL called by the REST asset should be updated to appropriately reflect the desired query.

  4. Configure appropriate caching for the REST asset.