This tutorial is only available to users of the Vagrant training VM as it requires you to log in via SSH.

Introduction

This workshop takes an experienced Funnelback implementer through the steps involved in building a knowledge graph.

Special tool tips or hints will appear throughout the exercises as well, which will provide extra knowledge or tips. They will look like:

This box is used to provide links to further reading available on the current topic.
This box is used to provide tips or hints related to the current topic.
This box is used to provide important advice relating to the current topic. This includes advice to be aware of to avoid common errors.
This box contains advice specific to the Linux version of Funnelback.
This box contains advice specific to the Windows version of Funnelback.

What this workshop will cover:

  • Guided walkthrough of the process required to set up a working knowledge graph.

Prerequisites to completing the course:

  • FUNL201, FUNL202, FUNL203

1. Audience

This training is aimed at implementers and technical consultants. The following exercises assume a working knowledge of the Funnelback platform, including concepts such as metadata class mapping, external metadata, and familiarity with Funnelback’s filter framework. It is advised that you complete Funnelback training before attempting this tutorial.

2. Overview

The aim of this training workshop is to build a knowledge graph using a website and a set of related structured data. Funnelback knowledge graph was designed for use within multi-repository enterprise environments, but a website knowledge graph makes for a simpler tutorial, and it’s also expected there will be valuable applications for website graphs, such as content promotions/recommendations, helping organisations manage content, and helping end-users find information.

The knowledge graph in this tutorial will have nodes and relationships for a subset of pages on the Beatles Bible website, with certain web pages used to form a node in the graph. Funnelback knowledge graph is tightly integrated with Funnelback so that it can leverage the full functionality of the platform. This means that knowledge graphs can be created on top of legacy Funnelback implementations or on new collections created specifically for the knowledge graph.

In addition to indexing data, a key aspect of creating a knowledge graph is ensuring every knowledge graph record is assigned two elements of special-purpose metadata. This doesn’t require manually adding metadata to all content. In this workshop Funnelback features such as external metadata and filters will be used to generate and assign the metadata. Once the metadata is assigned, the remainder involves simply defining how the graph will look.

3. Graph databases

A graph database is comprised of many nodes connected via relationships. Each node has two key attributes:

  1. A unique name, and optional synonyms, that uniquely identify it. E.g. for a person node, the synonyms we use to refer to a node could be 'John Lennon', 'Lennon, John' 'J Lennon', and 'john.lennon@thebeatles.com'. When any of these variants are seen in a document, a relationship is created back to the node identified by these synonyms. When we create a Funnelback knowledge graph, we need to configure the metadata mappings such that this list of names is stored in a special metadata class named FUNkgNodeNames.

  2. A node type that defines what kind of node it is. A node type can be a person, a document, a meeting, a project, a helpdesk ticket, a product, or anything else you decide. Node types should have no white space and no special characters (e.g. 'person', 'document', 'project', 'meeting', etc.). Each node can only have one type. In a Funnelback knowledge graph, the metadata classes need to be configured such that the node type is stored in the special metadata class named FUNkgNodeLabel. A node can only be of a single type.

4. Knowlege graph implementation workshop

For this exercise a partial index of the Beatles Bible website will be used to create a knowledge graph.

Exercise 1: Create a web collection

The first step is to create a collection containing a working Funnelback index.

Create a collection to crawl the Beatles Bible website:

  1. Create a new collection with the following details:

    • Project group ID: Training workshops

    • Collection ID: kgworkshop

    • Collection type: web

  2. Add the following configuration settings:

  3. Run an update of the collection. The collection will take a few minutes to update. Once the update is complete run a search and confirm that it is returning search results.

    exercise create a web collection 02
Exercise 2: Identify pages that will make suitable nodes

Recall that the graph is made up of entities or nodes, and these are all unique things that are related to each other. When indexing a website it is unlikely that every page will be suitable for a node.

Spend a bit of time looking over the website and identify classes of pages that will make suitable nodes in the knowledge graph.

  1. Open up the songs page in your browser and examine the site menus. There are several 'types' that immediately jump out as being suitable for nodes in our graph. Remember that for the graph to be useful the nodes will need to have some sort of relationship between them. For this tutorial we will build a basic graph that includes three types of nodes:

    • Songs

    • People

    • Albums

  2. Once you’ve identified suitable candidates in the site take a closer look at each of the types and make a note of the what can be used to identify a page as this type of node. E.g. A metadata field within the content, a URL pattern, something defined within the page content. For the types we have identified we can identify the types based on their URLs. e.g. Songs:

    http://training-search.clients.funnelback.com/training/training-data/beatles/www.beatlesbible.com/songs/*
  3. For each of the types make a note of the different attributes that can be used to uniquely define the item. E.g. for a person that might be 'FirstName LastName', 'LastName, FirstName', 'userlogon', 'email address'.

    Inspecting the song entities reveals that we can use the song’s name as the identifier, and this can be obtained cleanly from the og:title metadata field.

  4. For each of the types also make a note of other attributes that can be recorded about the entity, and how this information will be sourced. This will ideally come from embedded metadata but could also be applied via external metadata or by writing filters to parse and extract metadata from the page. For a song we might include 'release date', 'author', 'producer' and so on.

Exercise 3: Configure the node label metadata

The node label metadata is used to identify the type of node when building the graph.

There are various sources for the type metadata:

  • Values from an embedded metadata field (e.g. the dc.type metadata field might contain suitable values for the different node types).

  • External metadata that uses a left-matching URL string that is matched against the URL to assign a static value.

  • Values obtained using a custom filter that implements some logic to identify and extract the relevant metadata.

For the workshop we’ll use external metadata to attach the node label metadata.

  1. From the administration interface select browse collection configuration files and create a new external_metadata.cfg from the create dropdown for the collection configuration section.

  2. Configure the following external metadata definitions by pasting the following three lines of text into the window:

    http://training-search.clients.funnelback.com/training/training-data/beatles/www.beatlesbible.com/songs/ FUNkgNodeLabel:song
    http://training-search.clients.funnelback.com/training/training-data/beatles/www.beatlesbible.com/people/ FUNkgNodeLabel:person
    http://training-search.clients.funnelback.com/training/training-data/beatles/www.beatlesbible.com/albums/ FUNkgNodeLabel:album
  3. Save the file.

If you want to make changes to external metadata after the collection has crawled, you can quickly reflect the changes in the index by going to the Funnelback admin UI and clicking the update tab, followed by start advanced update and then re-index live view. There’s no need to re-crawl for external metadata configuration changes.
Exercise 4: Configure the node names metadata

Every record that is intended to go into the knowledge graph needs to have one or more node name values which are used to identify a node.

  1. Inspect each of the node types and identify where the node name metadata will be sourced. It may be sourced from more than one location.

    • Songs: og:title metadata field

    • People: og:title metadata field

    • Albums: og:title metadata field

  2. Since all of the nodes have suitable values contained within page metadata we can directly map this field to the FUNkgNodeNames metadata field from the metadata mapping editor.

  3. From the administration interface select configure metadata mappings from the administer tab.

  4. Check to see if og:title is currently mapped to anything by typing it into the search box. You’ll see it’s already mapped to the t metadata class. og:title will need to be removed from the t metadata class definition since metadata fields can only be mapped to a single metadata class. Click on the t value and remove og:title from the list of sources by clicking on the delete icon then save the mapping.

    exercise configure the node names metadata 01
  5. Edit the FUNkgNodeNames mapping and add og:title as a source then save it.

    exercise configure the node names metadata 02

Note: we have reassigned the mapping for og:title from t to the FUNkgNodeNames metadata class. After the re-index is rebuilt t will no longer contain any values sourced from og:title. Reassigning a metadata field like this may cause issues on an existing collection if the previous mapping was required for any functionality - such as ranking or was used in the results template. A better solution is to clone the metadata field using a filter (we will cover this later in the tutorial).

Exercise 5: Update the search index

The search index needs to be updated to reflect the changes made to apply the node names and label metadata. The type of update that will be required will depend on the nature of the changes made to apply the metadata.

In our case the changes that we made only require a re-index as we only changed metadata mappings and added external metadata

  1. On the Funnelback admin page, click the update tab and then advanced update.

  2. Select re-index the live view then press the update button. This causes Funnelback to only run the update steps required to rebuild the index. Funnelback will not recrawl the website.

  3. When the update is finished is is worth checking to ensure the metadata configuration is correct. This can be done by running some searches with the search configured to display the special knowledge graph metadata fields. Do this by adding a -SF option to the web collection’s query processor options. From the administration interface select the administer tab, then edit collection configuration, then select the interface tab from the left-hand list. . Add the following to the query processor options then save the configuration.

    -SF=[.*]
  4. Run a search for let it be and observer the FUNkgNodeNames and FUNkgNodeLabel metadata returned below the results summary and contains the og:title and metadata assigned using external metadata.

    exercise update the search index 01
Exercise 6: Build the knowledge graph
  1. On the administration home page click the graph tab, and then click update knowledge graph. The administration interface allows a graph to be built for both live and preview versions of the profile which allows changes to be made and tested on the knowledge graph before being published.

  2. Update the graph for the preview view.

  3. Monitor the graph build. While the graph is building open the knowledge graph log file - this will show you the progress of the graph update. Click the link to the log file that is shown below the message about the update, then click the knowledge_graph._default_preview.log to view the log.

    You can refresh the log to see the new log lines as the graph builds. When the graph build is near complete, toward the end of the log you’ll see a summary showing the number of each type of node. For example, the initial build of the kgworkshop graph should be similar to:

    2019-11-15 03:54:18,408 [main] WARN  file.ConfigEditor - Config strings can not be shared amongst the wars, This will result in a higher memory usage
    2019-11-15 03:54:30,135 [main] INFO  integration.CsvImporterNeo4j - Graph statistics before the update
    2019-11-15 03:54:30,429 [main] INFO  integration.PrintStatistics - Looks like this is the first run. Statistics will be displayed after the update
    2019-11-15 03:54:49,248 [main] INFO  integration.CsvImporterNeo4j - Processing of KG on kgworkshop is complete!
    2019-11-15 03:54:49,375 [main] INFO  integration.CsvImporterNeo4j - Graph statistics after the update
    2019-11-15 03:54:49,429 [main] INFO  integration.PrintStatistics - Number of [song] nodes: 357
    2019-11-15 03:54:49,430 [main] INFO  integration.PrintStatistics - Number of [person] nodes: 452
    2019-11-15 03:54:49,430 [main] INFO  integration.PrintStatistics - Number of [album] nodes: 70
    2019-11-15 03:54:49,678 [main] INFO  integration.PrintStatistics - Number of mentions relationships: 14160
    2019-11-15 03:54:49,680 [main] INFO  postUpdate.PostUpdateHookScriptRunner - KnowledgeGraphPostUpdateScript.groovy Groovy script not found in search home /opt/funnelback
Exercise 7: Define a default knowledge graph template

You should now have a working, but very basic, knowledge graph. The interface will not contain very much information as the knowledge graph templates have not yet been configured.

The knowledge graph templates need to be configured so that the widget knows how to display each type of entity that you have defined. This is very similar to how a search results template needs to be configured in order to get the most from your search results.

Knowledge graph templates allow the graph’s design to be customised without needing to modify HTML, CSS, or JavaScript. These templates provide convenient but limited control over presentation.

If there are no templates defined, or if the following fields are not defined in a template then some metadata classes will be used automatically by default:

  • Title: t

  • Subtitle: subtitle

  • Description: c

  • Image: image

  • URL: id (this inherits the data model’s liveUrl value).

We’ll now configure the default template.

  1. Return to the administration interface then select the graph tab and click the browse knowledge graph (preview) button.

    exercise define knowledge graph templates 01
  2. A basic knowledge graph has been built and you can browse around the graph. The information showing is a result of the default template field mappings described above. We will now configure the default template. The default template is applied to any entity that does not have a specific template configured.

  3. From the administration interface click on the graph tab. Observe that there are three disabled buttons labelled edit relationships, edit UI labels and edit UI templates. The profile will need to be configured as a frontend service before these edit functions are available. From the administration interface click the create service button to enable the current profile as a frontend service.

  4. Click the graph tab again, and then click edit UI templates to open the template editing screen.

  5. Create a template by clicking the add new button.

  6. From the entity type field choose _default from the autocomplete list.

  7. Set an appropriate default icon for the selected entity type by browsing through the available icons listed in the Font Awesome 5.7.2 free icon set. Since this is a search of Beatles related content set the default icon to bug.

  8. Configure the metadata class to display for the title of this entity type. The t metadata class is the default field in Funnelback where titles are mapped. Enter t into the title field.

  9. Configure the display field to use the c metadata class which is the default metadata class for description metadata.

  10. Once finished, your template definition should look like this:

    exercise define knowledge graph templates 02
  11. Click the add button at the bottom of the page.

  12. Return to the graph widget and observe that titles, descriptions and icons are now displayed.

    exercise define knowledge graph templates 03

At this point you have a basic knowledge graph that you can browse around.

Looking at the graph you’ll notice many problems - the titles need to be cleaned up and there appears to be a lot of duplication in the entities.

Exercise 8: Clean up incorrectly identified nodes

After browsing through the graph it becomes clear that the rules defined to identify the nodes are not targeted enough resulting in many items in the graph that should not be there.

Recall that we used external metadata to assign the entity types based on a left-anchored pattern match against the URL.

Looking more closely at the content pages shows that this needs to be tightened up to:

Unfortunately, this means external metadata will not be suitable for defining the FUNkgNodeLabel metadata because it only supports a left-anchored substring match.

A filter that analyses the URLs and adds the appropriate metadata will need to be written to implement the rules defined above.

  1. Open up a terminal window SSH to the training VM.

  2. change to the folder that contains the Vagrant VM (the folder where you ran the vagrant up command to start the VM). Log in to your VM via SSH by entering the following command into the terminal:

    vagrant ssh

    If you are on Windows (or prefer to log in directly via SSH), SSH into the VM using the following details then skip step 3:

    • Host: localhost

    • Port: 2222

    • Username: search

    • Password: se@rch

  3. This will log you in to the VM via ssh as the vagrant user. Switch to the search user

    sudo su - search
  4. Change to the kgworkshop collection configuration folder

    cd /opt/funnelback/conf/kgworkshop
  5. Create a new folder for the filter

    mkdir -p /opt/funnelback/conf/kgworkshop/@groovy/com/funnelback/training/beatles
  6. Create a new file called MetadataScraper.groovy

    nano /opt/funnelback/conf/kgworkshop/@groovy/com/funnelback/training/beatles/MetadataScraper.groovy
  7. Add the following code then save the file. This code adds the node label metadata the URL matches the patterns discussed above.

    package com.funnelback.training.beatles;
    
    import java.net.URI;
    import org.junit.*;
    import org.junit.Test;
    import com.google.common.collect.ListMultimap;
    import static com.funnelback.filter.api.DocumentType.*;
    import com.funnelback.filter.api.*;
    import com.funnelback.filter.api.documents.*;
    import com.funnelback.filter.api.filters.*;
    import com.funnelback.filter.api.mock.*;
    
    @groovy.util.logging.Log4j2
    public class MetadataScraper implements StringDocumentFilter {
    
        @Override
        public PreFilterCheck canFilter(NoContentDocument document, FilterContext context) {
            if(document.getDocumentType().isHTML()) {
                return PreFilterCheck.ATTEMPT_FILTER;
            }
            return PreFilterCheck.SKIP_FILTER;
        }
    
        @Override
        public FilterResult filterAsStringDocument(StringDocument document, FilterContext context) {
    
            //Ensure we get the existing metadata from the document, to preserve existing
            //metadata
            ListMultimap<String, String> metadata = document.getCopyOfMetadata();
    
            def url = document.getURI().getPath()
            String hdoc = document.getContentAsString();
    
            // Add node labels and names based on URL.  Can't use external metadata as the
            // match isn't precise enough.
            // The node names value is extracted from og:title for items identified as nodes.
            if (url =~ /beatlesbible.com\/people\/[^\/]+?\/index.html$/) {
                metadata.put("FUNkgNodeLabel", "person")
                log.info("Added metadata: FUNkgNodeLabel 'person'")
                if (hdoc =~ /<meta property="og:title" content="(.+?)"/) {
                    def nodename = (hdoc =~ /<meta property="og:title" content="(.+?)"/)
                    metadata.put("FUNkgNodeNames", nodename[0][1])
                    log.info("Added metadata: FUNkgNodeNames '{}'", nodename[0][1])
                }
            }
            else if (url =~ /beatlesbible.com\/songs\/[^\/]+?\/index.html$/) {
                metadata.put("FUNkgNodeLabel", "song")
                log.info("Added metadata: FUNkgNodeLabel 'song'")
                if (hdoc =~ /<meta property="og:title" content="(.+?)"/) {
                    def nodename = (hdoc =~ /<meta property="og:title" content="(.+?)"/)
                    metadata.put("FUNkgNodeNames", nodename[0][1])
                    log.info("Added metadata: FUNkgNodeNames '{}'", nodename[0][1])
                }
            }
            else if (url =~ /beatlesbible.com\/albums\/[^\/]+?\/index.html$/) {
                metadata.put("FUNkgNodeLabel", "album")
                log.info("Added metadata: FUNkgNodeLabel 'album'")
                if (hdoc =~ /<meta property="og:title" content="(.+?)"/) {
                    def nodename = (hdoc =~ /<meta property="og:title" content="(.+?)"/)
                    metadata.put("FUNkgNodeNames", nodename[0][1])
                    log.info("Added metadata: FUNkgNodeNames '{}'", nodename[0][1])
                }
            }
    
            return FilterResult.of(document.cloneWithMetadata(metadata));
        }
    }
  8. Edit the collection configuration and add the filter to the default filter chain. In the administration interface click the administer tab then edit collection configuration then click on the workflow tab in the left-hand menu. Add com.funnelback.training.beatles.MetadataScraper to the filter classes so the value is updated to:

    CombinerFilterProvider,TikaFilterProvider,ExternalFilterProvider:JSoupProcessingFilterProvider:DocumentFixerFilterProvider:com.funnelback.training.beatles.MetadataScraper
  9. Remove the external metadata entries that you added previously. From the administer tab select browse collection configuration files then delete the external_metadata.cfg file as we don’t want to keep any of the entries included in the file.

  10. Run a full update of the collection by clicking on the update tab then selecting advanced update and choosing the full update option from the list. A full update is required because we have added new filters that modify the content as it is gathered.

  11. Inspect the collection’s crawler.central.log while the collection is updating to see messages written by the filter. You’ll notice that FUNkgNodeLabel and FUNkgNodeNames metadata is being added to some of the pages, hopefully to all the ones that match the patterns we’ve defined earlier. For example:

    2019-05-20 13:56:05,135 [com.funnelback.crawler.NetCrawler 19 http://training-search.clients.funnelback.com/training/training-data/beatles/www.beatlesbible.com/people/george-harrison/index.html] INFO  beatles.MetadataScraper - Added metadata: FUNkgNodeLabel 'person'
  12. Run a search of the kgworkshop collection and you’ll see that none of the search results seem to have FUNkgNodeLabel metadata. This is because Funnelback hasn’t been told to include the newly added metadata in the index.

  13. Any metadata that’s added via the filters needs to be mapped. From the administer tab select configure metadata mappings then edit the FUNkgNodeLabel mapping which is currently unmapped.

  14. Add a new mapping to a field of FUNkgNodeLabel - so you are creating a mapping of FUNkgNodeLabel metadata contained within the content (added via the filter) to the internal FUNkgNodeLabel metadata class in the Funnelback index. Save the metadata configuration

  15. Since FUNkgNodeNames metadata is now inserted directly into the stored document by the filter update the FUNkgNodeNames mapping to use the FUNkgNodeNames instead of og:title. og:title will be mapped to a display title field later.

  16. From the update tab run an advanced update on the collection and re-index the live view. This will re-index the crawled data with the updated settings.

  17. Run a search of the kgworkshop collection and note that there is now FUNkgNodeLabel metadata appearing on some of the items.

    exercise clean up incorrectly identified nodes 01
  18. Update the knowledge graph for the preview profile and inspect the knowledge graph update log. Observe the before and after counts.

    2019-11-17 22:54:58,862 [main] WARN  file.ConfigEditor - Config strings can not be shared amongst the wars, This will result in a higher memory usage
    2019-11-17 22:55:05,023 [main] INFO  integration.CsvImporterNeo4j - Graph statistics before the update
    2019-11-17 22:55:05,235 [main] INFO  integration.PrintStatistics - Number of [song] nodes: 357
    2019-11-17 22:55:05,235 [main] INFO  integration.PrintStatistics - Number of [person] nodes: 452
    2019-11-17 22:55:05,235 [main] INFO  integration.PrintStatistics - Number of [album] nodes: 70
    2019-11-17 22:55:05,338 [main] INFO  integration.PrintStatistics - Number of mentions relationships: 14160
    2019-11-17 22:55:17,111 [main] INFO  integration.CsvImporterNeo4j - Processing of KG on kgworkshop is complete!
    2019-11-17 22:55:17,209 [main] INFO  integration.CsvImporterNeo4j - Graph statistics after the update
    2019-11-17 22:55:17,213 [main] INFO  integration.PrintStatistics - Number of [song] nodes: 303
    2019-11-17 22:55:17,214 [main] INFO  integration.PrintStatistics - Number of [person] nodes: 38
    2019-11-17 22:55:17,214 [main] INFO  integration.PrintStatistics - Number of [album] nodes: 28
    2019-11-17 22:55:17,351 [main] INFO  integration.PrintStatistics - Number of mentions relationships: 9571
    2019-11-17 22:55:17,355 [main] INFO  postUpdate.PostUpdateHookScriptRunner - KnowledgeGraphPostUpdateScript.groovy Groovy script not found in search home /opt/funnelback
  19. Browse the updated knowledge graph. You’ll notice the duplication seems to have gone now.

Exercise 9: Clean up the templates

The knowledge graph widget assumes that the metadata fields that it is configured with will contain clean data. It will also display metadata fields containing multiple values by concatenating them with a semicolon.

With this in mind we need to ensure our metadata is mapped in such a way that the display makes sense.

Having a quick look at the current graph shows a number of display issues. Let’s fix the issue with the page titles first. The widget is displaying the t metadata field which contains titles sourced from many different locations. To fix this we need to isolate the value we wish to display as the title into a separate metadata field.

Recall that we were using the og:title as the FUNkgNodeNames value as this had the correct title, and that we extracted it out and unmapped it in the previous step. We can now remap the og:title to a new metadata field. When fixing up your mappings you might find you need to remove the field you want to use from an existing mapping (like we did with og:title which was originally part of the t class mapping).

  1. In the administration interface select the administer tab then configure metadata mappings.

  2. Create a new metadata mapping for a field called displayTitle and map the og:title metadata field to this metadata class.

  3. Run an advanced update and re-index the live view.

  4. Update the preview view of the knowledge graph as you’ve changed the metadata mappings so the graph needs to be regenerated.

  5. After the update is finished navigate to the graph tab and select the UI templates editor and update the title mapping of the default template to use displayTitle instead of t.

  6. View the knowledge graph again and observe that the titles have now been cleaned up.

    exercise clean up the templates 01
Exercise 10: Define relationship labels

You’ll notice that the tabs displayed above the list view contain labels such as Mentions.incoming and All.undirected.

We will now define human readable text to describe the knowledge graph relationships.

  1. On the administration home page, click the graph tab again, and then click edit UI labels.

  2. On this screen, we’re going to create a label for each node type. Click the add new button to create the first one.

  3. In the category field, select RELATIONSHIP

  4. In the direction field, select undirected. Every relationship in the graph can be incoming, outgoing or undirected. For example, an incoming relationship could be "manages" and the outgoing relationship could be "manager of". Note: there is currently a bug in the interface that means you can’t update mentions relationships if you first select outgoing or incoming - so you need to select undirected first, select the label key then update the direction. This should be fixed soon.

  5. In the label key field, select mentions. Mentions is the default relationship that’s created whenever one entity is mentioned by another (i.e. as opposed to a named relationship, such as when a person name appears in an author field).

  6. Update the direction field to be outgoing.

  7. In the label field, type Refers to.

  8. Once finished it should look like this:

    exercise define relationship labels 01
  9. The result of this relationship label is that outgoing mention labels will no longer have the default name of Mentions.outgoing and will instead be labelled Refers to. Refresh the knowledge graph widget and observe the updated label.

    exercise define relationship labels 02
  10. Repeat to update the labels for the all.undirected and mentions.incoming relationships.

    • all.undirectedAll

    • mentions.incomingMentioned by

  11. Refresh the widget and observe the new labels.

    exercise define relationship labels 03
Exercise 11: Ensure the mentions relationships apply only to real content

As previously discussed mentions relationships are detected when an entity name is detected in the text of another entity. For this to work correctly it is important to ensure that Funnelback excludes textual content that does not form part of an item’s 'content'. e.g. if entities are being based off pages of a website you need to make sure that only the content region of the page is actually counted as page content. This means you need to configure Funnelback to ignore the header, footer, navigation and any other extra content areas (such as the related blog posts panel on the Beatles Bible website).

If you inspect the knowledge graph carefully and start to look at the detected relationships you will soon find that there are some mentions relationships that do not make sense. e.g. looking at the song entity: Yesterday shows a Refers to (mentions.outgoing) relationship to Yoko Ono.

exercise ensure the mentions relationships apply only to real content 01

If you view the page for Yesterday you will find that the only mention of Yoko Ono is in the On this day in Beatles history right hand panel.

exercise ensure the mentions relationships apply only to real content 02

In order to fix this we will need to add some Funnelback noindex tags to the page to tell Funnelabck to ignore the header, footer, navigation and right panel regions of the page.

  1. Return to the SSH session and create a new file NoIndex.groovy in the same folder as the MetadataScraper.groovy filter you previously edited. ($SEARCH_HOME/conf/kgworkshop/@groovy/com/funnelback/training/beatles/).

  2. We will add a Jsoup filter that adds the required noindex tags to exclude the non-content regions of the page. Add the following code to the file the save it.

    package com.funnelback.training.beatles
    
    import com.funnelback.common.filter.jsoup.*
    
    /**
     * Inject some noindex tags
     */
    
    @groovy.util.logging.Log4j2
    public class NoIndex implements IJSoupFilter {
    
       @Override
       void processDocument(FilterContext context) {
        def doc = context.getDocument()
        def url = doc.baseUri()
    
        try {
          // 1. Inject <!--noindex--> at start of body
          doc.select("body").prepend("<!--noindex-->")
    
          // 2. Inject <!--endnoindex--> at start of article
          doc.select("article").before("<!--endnoindex-->")
    
          // 3. Inject <!--noindex--> at end of article
          doc.select("article").after("<!--noindex-->")
    
        } catch (e) {
          log.error("Error adding noindex tags to '{}'", url, e)
        }
      }
    }
  3. Add the filter to the collection’s jsoup filter chain. Add the following to the filter.jsoup.classes in the collection configuration. Note: we are adding the new filter (com.funnelback.training.beatles.NoIndex) to the default set of jsoup filters that run.

    com.funnelback.training.beatles.NoIndex
  4. Run a full update of the collection.

  5. After the update has completed update the knowledge graph.

  6. Reload the widget and inspect the mentions relationships for the Yesterday song entity. Observe that there are fewer relationships and the refers to relationship on the person tab no longer includes Yoko Ono.

    exercise ensure the mentions relationships apply only to real content 03
Exercise 12: Enhance the graph

Now that the graph is starting to look a bit more usable we can think about how to enhance it.

We will first look at each of the node types and see what properties we can define. Properties in the knowledge graph are obtained from metadata classes available in the Funnelback index.

For an ideal graph the content being indexed should have clean metadata associated - but this often isn’t the case and for the beatles site there is little useful metadata exposed via metadata tags within the page.

For this training exercise we will extend the filter to extract fielded values (such as Written by, Recorded, Publisher) from the page content and inject these into the document as metadata.

The process we are using to extract the metadata is known as scraping and is not recommended for a production solution as any changes to the structure of the underlying pages will break the scraping.

For a production grade implementation this fielded information should be exposed as in-page metadata.

The first step is to inspect the source code of the page to figure out how the required data will be extracted. If the page is well structured and there makes good use of CSS classes and IDs then you can write a Jsoup filter to parse the page and extract the values.

Always try to use a Jsoup filter when parsing HTML documents as Jsoup parses the HTML into an object representing the HTML. It allows you to use the HTML structure as well as CSS classes and IDs to interact with the HTML and select content. It also means that things in the text such as the order of attributes on a field, differences in case or extra whitespace won’t affect your ability to extract items.

The alternative is to use a standard string document filter (which gives you the document as a big string) and run string comparisons and regular expressions on the content. This is not recommended for the above reasons and should only be used if the HTML is so poorly structured that Jsoup will not provide a suitable way of selecting the content.

  1. From the administration interface run a search against the preview profile for yesterday. Locate the search result for yesterday that’s a song entity (it will show a FUNkgNodeLabel: song value in the search result.

  2. Click on this search result and inspect the page.

    exercise enhance the graph 01

    You’ll notice there are some suitable looking bits of fielded information in the page - written by, recorded, producer, engineer, released. These look like suitable candidates.

    exercise enhance the graph 02
  3. View the source code for the page and see if there is any suitable metadata defined in the page. With a little luck some of those candidates might already be exposed as metadata. Observe that there is some useful metadata that we can make use of such as the og:image field which contains a thumbnail image, the og:title which contains a clean title (which we are already using in the graph template), the og:description which will be useful for the node’s description. Make a note of the fields containing useful info as we’ll want to ensure these are mapped to appropriate metadata classes so we can include them as properties in the template. Unfortunately, none of the fields we noted (written by, etc.) are exposed as metadata so we’ll need to scrape these in order to display them in the graph template.

  4. View the page source and locate the HTML source that displays the candidate fields and make a note of the code structure for these. Make sure you view the HTML source rather than the pretty rendered view that the browser debugger shows as the exact code text is important for the filter to work.

    exercise enhance the graph 03
  5. Check some of the other song pages and confirm if similar fields are available, and that the code structure is the same. If the structure of the code is not consistent then you may not be able to scrape these fields from the text. The overall structure of the document unfortunately doesn’t include enough information (such as CSS classes and IDs) that will allow us to extract the information we seek reliably using Jsoup - meaning we will need to use the non-preferred method of string comparison and regular expressions to extract the content. This is a much less reliable way of extracting content as very subtle changes in the actualy HTML code can break the matches.

  6. Return to the SSH session and open the MetadataScraper.groovy file in an editor.

  7. Add the following code to scrape the written by, producer, recorded, engineer and released metadata. Add the code immediately before the return FilterResult.of(document.cloneWithMetadata(metadata)); line at the bottom of the file. This will extract the text from the five fields and emit fb.written_by, fb.producer, fb.recorded, fb.engineer and fb.released metadata fields if anything is extracted.

            // Additional metadata
           if (hdoc =~ /Written by\s*:\s*(<\/strong>)?\s*(.+?)\s*(<br>|<\/p>)/) {
                def writtenBy = (hdoc =~ /Written by\s*:\s*(<\/strong>)?\s*(.+?)\s*(<br>|<\/p>)/)
                def writtenByClean = writtenBy[0][2].replaceAll(/<.+?>/,"")
                metadata.put("fb.written_by", writtenByClean)
                log.info("Added metadata: fb.written_by '{}'", writtenByClean)
            }
            if (hdoc =~ /Producer(s)?\s*:\s*(<\/strong>)?\s*(.+?)\s*(<br>|<\/p>)/) {
                def producer = (hdoc =~ /Producer(s)?\s*:\s*(<\/strong>)?\s*(.+?)\s*(<br>|<\/p>)/)
                def producerClean = producer[0][3].replaceAll(/<.+?>/,"")
                metadata.put("fb.producer", producerClean)
                log.info("Added metadata: fb.producer '{}'", producerClean)
            }
            if (hdoc =~ /Recorded\s*:\s*(<\/strong>)?\s*(.+?)(<br>|<\/p>)/) {
                def recorded = (hdoc =~ /Recorded\s*:\s*(<\/strong>)?\s*(.+?)\s*(<br>|<\/p>)/)
                def recordedClean = recorded[0][2].replaceAll(/<.+?>/,"")
                metadata.put("fb.recorded", recordedClean)
                log.info("Added metadata: fb.recorded '{}'", recordedClean)
            }
            if (hdoc =~ /Engineer(s)?\s*:\s*(<\/strong>)?\s*(.+?)(<br>|<\/p>)/) {
                def engineer = (hdoc =~ /Engineer(s)?\s*:\s*(<\/strong>)?\s*(.+?)(<br>|<\/p>)/)
                def engineerClean = engineer[0][3].replaceAll(/<.+?>/,"")
                metadata.put("fb.engineer", engineerClean)
                log.info("Added metadata: fb.engineer '{}'", engineerClean)
            }
            if (hdoc =~ /Released\s*:\s*(<\/strong>)?\s*(.+?)(<br>|<\/p>)/) {
                def released = (hdoc =~ /Released\s*:\s*(<\/strong>)?\s*(.+?)(<br>|<\/p>)/)
                def releasedClean = released[0][2].replaceAll(/<.+?>/,"")
                metadata.put("fb.released", releasedClean)
                log.info("Added metadata: fb.released '{}'", releasedClean)
            }
  8. Save the filter. Because the filter is already in the filter chain all we need to do is update the collection. From the administration interface select the update tab, advanced update then run a full update. Remember it’s important to run a full update because you have changed the filter which means we need to recrawl and filter all of the data.

  9. While the update is running inspect the crawler.central.log in the offline logs to see messages about metadata that is extracted.

    2019-11-18 02:11:16,611 [com.funnelback.crawler.NetCrawler 0 http://localhost:9080/training/training-data/beatles/www.beatlesbible.com/people/george-harrison/songs/awaiting-on-you-all/index.html] INFO  beatles.MetadataScraper - Added metadata: fb.released '30 November 1970 (UK), 27 November 1970 (US)'
    2019-11-18 02:11:16,701 [com.funnelback.crawler.NetCrawler 0 http://localhost:9080/training/training-data/beatles/www.beatlesbible.com/people/george-harrison/songs/let-it-down/index.html] INFO  beatles.MetadataScraper - Added metadata: fb.written_by 'Harrison'
    2019-11-18 02:11:16,704 [com.funnelback.crawler.NetCrawler 0 http://localhost:9080/training/training-data/beatles/www.beatlesbible.com/people/george-harrison/songs/let-it-down/index.html] INFO  beatles.MetadataScraper - Added metadata: fb.producer 'George Harrison, Phil Spector'
    2019-11-18 02:11:16,706 [com.funnelback.crawler.NetCrawler 0 http://localhost:9080/training/training-data/beatles/www.beatlesbible.com/people/george-harrison/songs/let-it-down/index.html] INFO  beatles.MetadataScraper - Added metadata: fb.recorded 'May-October 1970'
    2019-11-18 02:11:16,709 [com.funnelback.crawler.NetCrawler 0 http://localhost:9080/training/training-data/beatles/www.beatlesbible.com/people/george-harrison/songs/let-it-down/index.html] INFO  beatles.MetadataScraper - Added metadata: fb.engineer 'Ken Scott, Phil McDonald'
    2019-11-18 02:11:16,711 [com.funnelback.crawler.NetCrawler 0 http://localhost:9080/training/training-data/beatles/www.beatlesbible.com/people/george-harrison/songs/let-it-down/index.html] INFO  beatles.MetadataScraper - Added metadata: fb.released '30 November 1970 (UK), 27 November 1970 (US)'
    2019-11-18 02:11:16,866 [com.funnelback.crawler.NetCrawler 0 http://localhost:9080/training/training-data/beatles/www.beatlesbible.com/people/george-harrison/songs/behind-that-locked-door/index.html] INFO  beatles.MetadataScraper - Added metadata: fb.written_by 'Harrison'
    2019-11-18 02:11:16,869 [com.funnelback.crawler.NetCrawler 0 http://localhost:9080/training/training-data/beatles/www.beatlesbible.com/people/george-harrison/songs/behind-that-locked-door/index.html] INFO  beatles.MetadataScraper - Added metadata: fb.producer 'George Harrison, Phil Spector'
    2019-11-18 02:11:16,873 [com.funnelback.crawler.NetCrawler 0 http://localhost:9080/training/training-data/beatles/www.beatlesbible.com/people/george-harrison/songs/behind-that-locked-door/index.html] INFO  beatles.MetadataScraper - Added metadata: fb.recorded 'May-October 1970'
    2019-11-18 02:11:16,875 [com.funnelback.crawler.NetCrawler 0 http://localhost:9080/training/training-data/beatles/www.beatlesbible.com/people/george-harrison/songs/behind-that-locked-door/index.html] INFO  beatles.MetadataScraper - Added metadata: fb.engineer 'Ken Scott, Phil McDonald'
  10. Before we can make use of any additional metadata we need to map the additional metadata fields. In the administration interface switch click the administer tab then select configure metadata mappings. Add mappings for the five new fields:

    • Class: writtenBy with a source metadata field of fb.written_by.

    • Class: producer with a source metadata field of fb.producer.

    • Class: recorded with a source metadata field of fb.recorded.

    • Class: released with a source metadata field of fb.released.

    • Class: engineer with a source metadata field of fb.engineer.

  11. Run and advanced update to re-index the live view to apply the metadata changes to the index.

  12. Run a search of the preview profile and inspect the song records. Observe that additional metadata for the extracted fields is now being returned.

    exercise enhance the graph 04
  13. Update the knowledge graph to ensure the new metadata is captured in the graph nodes.

  14. It’s time to start customising templates for specific entity types. Open up the template editor and copy the _default template. Click on the copy to edit it.

  15. Observe that the template copy is populated with all the values that you set previously in the _default template. Update the template to make it specific to song entities by changing the following:

    • Entity type: song

    • Icon: music

    • Primary set (both detail and list view): writtenBy

    • Secondary set (both detail and list view): engineer, producer, recorded, released

    exercise enhance the graph 05
  16. Repeat the above step but for the list view.

  17. Browse the knowledge graph for the preview view and confirm additional properties are now being displayed for song nodes. Also ensure you expand some of the nodes in the list view to see the secondary properties.

    exercise enhance the graph 06
  18. Repeat this section for the other node types.

Exercise 13: Create custom relationships

Custom relationships can be defined where an entity name is the value of one of another node’s properties.

For example, if 'John Lennon' appears in the 'Written by' field of a song then we can define a relationship where 'John Lennon' is a writer of 'Yesterday'.

This also defines a reverse relationship where 'Yesterday' is written by 'John Lennon'.

To facilitate this we need to ensure the stored metadata values will exactly match the node names.

Looking at the available metadata shows that with a bit of cleanup we could create several relationships between:

Source node Target node Target field

Person

Song

Written by

Person

Song

Producer

Person

Song

Engineer

Person

Album

Producer

Person

Album

Engineer

The producer and engineer metadata looks ok with the person names appearing in full within the metadata field. The written by field only contains last names so this will need a bit of extra work. Let’s start with the producer and engineer metadata fields.

  1. From the administration interface’s graph tab click the edit relationships button.

  2. Add a new relationship with the following details:

    • Relationship: producerOfSong

    • Source entity type: Person

    • Target entity: Song

    • Metadata class: producer

  3. Repeat for the other relationships.

    exercise create custom relationships 01
  4. Update the knowledge graph. View the knowledge graph update log and observe that the summary includes some extra relationship counts after the graph is updated. Don’t be concerned about warning messages relating to missing metadata - not all of the nodes have producer and engineer metadata fields. The summary is the important bit to view:

    2019-11-18 02:52:59,805 [main] WARN  neo4j.Relationship - engineer metadata is missing. unable to create relationship.
    2019-11-18 02:52:59,805 [main] WARN  neo4j.Relationship - producer metadata is missing. unable to create relationship.
    2019-11-18 02:52:59,964 [main] INFO  integration.CsvImporterNeo4j - Graph statistics before the update
    2019-11-18 02:53:00,111 [main] INFO  integration.PrintStatistics - Number of [song] nodes: 303
    2019-11-18 02:53:00,111 [main] INFO  integration.PrintStatistics - Number of [person] nodes: 38
    2019-11-18 02:53:00,111 [main] INFO  integration.PrintStatistics - Number of [album] nodes: 28
    2019-11-18 02:53:00,140 [main] INFO  integration.PrintStatistics - Number of mentions relationships: 8199
    2019-11-18 02:53:07,039 [main] INFO  integration.CsvImporterNeo4j - Processing of KG on kgworkshop is complete!
    2019-11-18 02:53:07,062 [main] INFO  integration.CsvImporterNeo4j - Graph statistics after the update
    2019-11-18 02:53:07,067 [main] INFO  integration.PrintStatistics - Number of [song] nodes: 303
    2019-11-18 02:53:07,068 [main] INFO  integration.PrintStatistics - Number of [person] nodes: 38
    2019-11-18 02:53:07,068 [main] INFO  integration.PrintStatistics - Number of [album] nodes: 28
    2019-11-18 02:53:07,120 [main] INFO  integration.PrintStatistics - Number of producerOfSong relationships: 209
    2019-11-18 02:53:07,120 [main] INFO  integration.PrintStatistics - Number of engineerOfAlbum relationships: 5
    2019-11-18 02:53:07,120 [main] INFO  integration.PrintStatistics - Number of engineerOfSong relationships: 158
    2019-11-18 02:53:07,120 [main] INFO  integration.PrintStatistics - Number of producerOfAlbum relationships: 9
    2019-11-18 02:53:07,120 [main] INFO  integration.PrintStatistics - Number of mentions relationships: 8199
    2019-11-18 02:53:07,120 [main] INFO  integration.PrintStatistics - Number of writerOfSong relationships: 8
    2019-11-18 02:53:07,128 [main] INFO  postUpdate.PostUpdateHookScriptRunner - KnowledgeGraphPostUpdateScript.groovy Groovy script not found in search home /opt/funnelback
  5. Browse the knowledge graph and observe that there are some additional tabs displayed above the list view when selecting the 'person' type. Browse to the 'yesterday' song node and note the producer and engineer incoming relationships and how they match the engineer and producer metadata listed in the node’s properties.

    exercise create custom relationships 02
  6. Note the reverse relationship by browsing to Norman Smith, who has an engineer outgoing relationship.

  7. This all seems good until you browse around a bit more and notice that some of the relationships are not being detected - for example browsing to an album such as 'Abbey Road' does not display any incoming relationships. This is because relationships must be an exact match to a value of a metadata field. For the Abbey Road album not that there are several engineers and producers listed. In order for us to detect the relationships we’ll need to ensure that the metadata is split into multiple values within the index. This follows Funnelback’s normal metadata splitting rules - either by using the ‘|’ character as the delimiter, or by using the -facet_item_sepchars indexer option. We will use our filter to replace the commas in the field with the vertical bar character.

  8. Return to your SSH window and edit the MetadataScraper.groovy filter. Replace the line

    def producerClean = producer[0][3].replaceAll(/<.+?>/,"")

    with

    def producerClean = producer[0][3].replaceAll(/<.+?>/,"").replaceAll(/\s*,\s*/,"|")

    The first replace removes any HTML tags that might be in the string and the second replace tells Funnelback to replace any commas with vertical bars.

  9. Repeat for the def engineerClean line then run a full update of the collection observing that the log is reflecting the vertical bar in the metadata that’s being written.

  10. There’s a similar issue with missing relationships for the written by relationship due to the names in the written by field not matching the person node names. Some additional custom logic can be used to replace the shorthand names used in the written by field with the values that we can use in the knowledge graph. In this example we’ll hardcode the matches as the set of values we want to replace is known and small enough to deal with.

    Edit the MetadataScraper.groovy file and replace the block of code extracting the Written by field:

            if (hdoc =~ /Written by\s*:\s*(<\/strong>)?\s*(.+?)\s*(<br>|<\/p>)/) {
                def writtenBy = (hdoc =~ /Written by\s*:\s*(<\/strong>)?\s*(.+?)\s*(<br>|<\/p>)/)
                def writtenByClean = writtenBy[0][2].replaceAll(/<.+?>/,"")
                metadata.put("fb.written_by", writtenByClean)
                log.info("Added metadata: fb.written_by '{}'", writtenByClean)
            }

    with:

        if (hdoc =~ /Written by\s*:\s*(<\/strong>)?\s+(.+?)(<br>|<\/p>)/) {
            def writtenBy = (hdoc =~ /Written by\s*:\s*(<\/strong>)?\s+(.+?)(<br>|<\/p>)/)

                // Compare and extract names.  Extraction discards any embedded html tags and comparison is on lowercase value
                def wbclean = writtenBy[0][2].replaceAll(/<.+?>/,"").toLowerCase()
                switch (wbclean) {
                    case "harrison":
                        metadata.put("fb.written_by", "George Harrison")
                        log.info("Added metadata: fb.written_by 'George Harrison'")
                        break
                    case "harrison-lennon":
                        metadata.put("fb.written_by", "George Harrison")
                        metadata.put("fb.written_by", "John Lennon")
                        log.info("Added metadata: fb.written_by 'George Harrison', 'John Lennon'")
                        break
                    case "lennon":
                        metadata.put("fb.written_by", "John Lennon")
                        log.info("Added metadata: fb.written_by 'John Lennon'")
                        break
                    case "lennon-mccartney":
                        metadata.put("fb.written_by", "John Lennon")
                        metadata.put("fb.written_by", "Paul McCartney")
                        log.info("Added metadata: fb.written_by 'John Lennon', 'Paul McCartney'")
                        break
                    case "lennon-mccartney-harrison-starr":
                        metadata.put("fb.written_by", "John Lennon")
                        metadata.put("fb.written_by", "Paul McCartney")
                        metadata.put("fb.written_by", "George Harrison")
                        metadata.put("fb.written_by", "Ringo Starr")
                        log.info("Added metadata: fb.written_by 'George Harrison', 'John Lennon', 'Paul McCartney', 'Ringo Starr'")
                        break
                    case "lennon-ono":
                        metadata.put("fb.written_by", "John Lennon")
                        metadata.put("fb.written_by", "Yoko Ono")
                        log.info("Added metadata: fb.written_by 'John Lennon', 'Yoko Ono'")
                        break
                    case "mccartney":
                        metadata.put("fb.written_by", "Paul McCartney")
                        log.info("Added metadata: fb.written_by 'Paul McCartney'")
                        break
                    case "mr & mrs mccartney":
                        metadata.put("fb.written_by", "Linda McCartney")
                        metadata.put("fb.written_by", "Paul McCartney")
                        log.info("Added metadata: fb.written_by 'Paul McCartney', 'Linda McCartney'")
                        break
                    case "paul and linda mccartney":
                        metadata.put("fb.written_by", "Linda McCartney")
                        metadata.put("fb.written_by", "Paul McCartney")
                        log.info("Added metadata: fb.written_by 'Paul McCartney', 'Linda McCartney'")
                        break
                    case "lennon-mccartney-starkey":
                        metadata.put("fb.written_by", "John Lennon")
                        metadata.put("fb.written_by", "Maureen Starkey")
                        metadata.put("fb.written_by", "Paul McCartney")
                        log.info("Added metadata: fb.written_by 'John Lennon', 'Paul McCartney', 'Maureen Starkey'")
                        break
                    case "lennon-mccartney-harrison-starkey":
                        metadata.put("fb.written_by", "John Lennon")
                        metadata.put("fb.written_by", "Paul McCartney")
                        metadata.put("fb.written_by", "George Harrison")
                        metadata.put("fb.written_by", "Maureen Starkey")
                        log.info("Added metadata: fb.written_by 'John Lennon', 'Paul McCartney', 'George Harrison', 'Maureen Starkey'")
                        break
                    default:
                        metadata.put("fb.written_by", writtenBy[0][2])
                        log.info("Added metadata: fb.written_by '{}' for '{}'", writtenBy[0][2], url)
                        break
                }
        }
  1. Save the filter then run a full update of the collection. Observe that the filter now replaces the written by field with the desired values.

    2019-11-18 02:59:27,101 [com.funnelback.crawler.NetCrawler 0 http://localhost:9080/training/training-data/beatles/www.beatlesbible.com/people/paul-mccartney/songs/dear-boy/index.html] INFO  beatles.MetadataScraper - Added metadata: fb.written_by 'Paul McCartney', 'Linda McCartney'
  2. Update of the knowledge graph and observe the new counts.

    2019-11-18 03:03:54,914 [main] WARN  neo4j.Relationship - engineer metadata is missing. unable to create relationship.
    2019-11-18 03:03:54,914 [main] WARN  neo4j.Relationship - producer metadata is missing. unable to create relationship.
    2019-11-18 03:03:55,147 [main] INFO  integration.CsvImporterNeo4j - Graph statistics before the update
    2019-11-18 03:03:55,985 [main] INFO  integration.PrintStatistics - Number of [song] nodes: 303
    2019-11-18 03:03:55,986 [main] INFO  integration.PrintStatistics - Number of [person] nodes: 38
    2019-11-18 03:03:55,986 [main] INFO  integration.PrintStatistics - Number of [album] nodes: 28
    2019-11-18 03:03:56,372 [main] INFO  integration.PrintStatistics - Number of producerOfSong relationships: 209
    2019-11-18 03:03:56,373 [main] INFO  integration.PrintStatistics - Number of engineerOfAlbum relationships: 5
    2019-11-18 03:03:56,373 [main] INFO  integration.PrintStatistics - Number of engineerOfSong relationships: 158
    2019-11-18 03:03:56,373 [main] INFO  integration.PrintStatistics - Number of producerOfAlbum relationships: 9
    2019-11-18 03:03:56,373 [main] INFO  integration.PrintStatistics - Number of mentions relationships: 8199
    2019-11-18 03:03:56,373 [main] INFO  integration.PrintStatistics - Number of writerOfSong relationships: 8
    2019-11-18 03:04:04,665 [main] INFO  integration.CsvImporterNeo4j - Processing of KG on kgworkshop is complete!
    2019-11-18 03:04:04,687 [main] INFO  integration.CsvImporterNeo4j - Graph statistics after the update
    2019-11-18 03:04:04,693 [main] INFO  integration.PrintStatistics - Number of [song] nodes: 303
    2019-11-18 03:04:04,693 [main] INFO  integration.PrintStatistics - Number of [person] nodes: 38
    2019-11-18 03:04:04,693 [main] INFO  integration.PrintStatistics - Number of [album] nodes: 28
    2019-11-18 03:04:04,728 [main] INFO  integration.PrintStatistics - Number of producerOfSong relationships: 243
    2019-11-18 03:04:04,729 [main] INFO  integration.PrintStatistics - Number of engineerOfAlbum relationships: 13
    2019-11-18 03:04:04,729 [main] INFO  integration.PrintStatistics - Number of engineerOfSong relationships: 187
    2019-11-18 03:04:04,729 [main] INFO  integration.PrintStatistics - Number of producerOfAlbum relationships: 25
    2019-11-18 03:04:04,729 [main] INFO  integration.PrintStatistics - Number of mentions relationships: 8199
    2019-11-18 03:04:04,729 [main] INFO  integration.PrintStatistics - Number of writerOfSong relationships: 414
    2019-11-18 03:04:04,731 [main] INFO  postUpdate.PostUpdateHookScriptRunner - KnowledgeGraphPostUpdateScript.groovy Groovy script not found in search home /opt/funnelback
  3. Browse the knowledge graph observing that song entities now show writer of tabs.

    exercise create custom relationships 03
  4. Finally clean up the relationship labels by selecting edit UI labels from the graph tab. Create clean labels for the incoming and outgoing relationships that you have defined.

    Relationship Target node

    producerOfSong.incoming

    Produced by

    producerOfSong.outgoing

    Producer of

    writerOfSong.incoming

    Written by

    writerOfSong.outgoing

    Writer of

    engineerOfSong.incoming

    Engineered by

    engineerOfSong.outgoing

    Engineer of

    producerOfAlbum.incoming

    Produced by

    producerOfAlbum.outgoing

    Producer of

    engineerOfAlbum.incoming

    Engineered by

    engineerOfAlbum.outgoing

    Engineer of

  5. Browse the knowledge graph and observe the updated labels.

    exercise create custom relationships 04
Exercise 14: Create search result filters

Faceted navigation defined on the profile will also be displayed as filters within the graph’s search results screen.

  1. Within the graph run a search for 'john lennon' and observe the search results.

    exercise create search result filters 01
  2. From the administration interface select customise faceted navigation from the customise tab.

  3. Create facets for the following:

    Facet type Name Metadata source

    Filter on single category

    Song writer

    writtenBy

    Filter on single category

    Type

    FUNkgNodeLabel

  4. Refresh the knowledge graph and repeat the search for 'John Lennon' observing the filters that are now displayed.

    exercise create search result filters 02

5. Extended exercises

Here are a few things you might like to try to extend the knowledge graph.

  • For album nodes add a property that contains the tracks from the album. This can be scraped from the tracks field within the page content.

  • For people add some additional properties (such as date of birth, death). You could use external metadata for this and grab the details from a source such as Wikipedia.

  • Improve the accuracy of mentions relationships by adding in noindex tags to hide the headers, footers, navigation and the right hand side menus (which often will generate mentions relationships as they contain lists of recent blog posts).

  • Edit the collection’s search results template (i.e. the simple.ftl) and add an additional option to the menu containing the cache link that enables the knowledge graph to be accessed from the search results. The additional option should open the knowledge graph widget on the node that corresponds to the record and should only be displayed for the results that are nodes in the knowledge graph.

    Hint: add the following code to the template:

    <#if s.result.metaData["FUNkgNodeNames"]?? && s.result.metaData["FUNkgNodeLabel"]??><li><a href="/s/knowledge-graph/index.html?atab=Graph&collection=${question.collection.id}&profile=${question.profile}&targetUrl=${s.result.liveUrl}">Browse in knowledge graph</a></li></#if>
  • Add a post-update workflow command to trigger an update of the knowledge graph so that the graph is updated whenever the collection is updated. Hint: to do this you’ll need to call the API to trigger an update as a post update workflow command. See: Funnelback documentation - Knowledge graph: scheduling automatic updates