Extra Feature: The Arango View Wrapper

The arango view wrapper (cag.view_wrapper) is a tool to simplify the creation of Arango Analyzers. This tool can be used by the Analyzer component mentioned above. This wrapper has classes that facilitate the creation of arango view and all its properties and components.

The full example can be found in the examples folder

Create an arango analyzer

The valid attributes/values for the properties depend on the type used. For example, the delimiter type needs to know the desired delimiting character(s), whereas the text type takes a locale, stop-words, and more.

source

The analyzer class loads the required attributes of an analyzer based on its type. The supported types are:

  • _TYPE_IDENTITY -> “identity”, attributes to set: None

  • _TYPE_TEXT -> “text”, attributes to set: ‘locale’, ‘case’, ‘stopwords’, ‘accent’, ‘stemming’, ‘edge_ngram’

  • _TYPE_NGRAM -> “ngram”, attributes to set: ‘min’, ‘max’, ‘preserve_original’, ‘start_marker’, ‘end_marker’, ‘stem_type’

  • _TYPE_STEM -> “stem”, attributes to set: locale

  • _TYPE_DELIMITE -> “delimiter”, attributes to set: delimiter

from cag.view_wrapper.arango_analyzer import ArangoAnalyzer, EdgeNGram

analyzer = ArangoAnalyzer("sample_analyzer")
analyzer.type = ArangoAnalyzer._TYPE_TEXT
analyzer.set_stopwords(language="english", custom_stopwords=['hello'], include_default=False)

print(analyzer.get_type_fields())
## Returns: ['locale', 'case', 'stopwords', 'accent', 'stemming', 'edge_ngram']

analyzer.set_features(frequency=True, norm=True, position=True) # by defaults, all the features are set to True
analyzer.set_edge_ngrams(EdgeNGram(min=2,
                            max=4,
                            preserve_original=False))
print(analyzer.summary())

The summary returns the dictionary used to create the Analyzer:

{
    "name": "sample_analyzer",
    "type": "text",
    "features": [
        "Frequency",
        "norm",
        "position"
    ],
    "locale": "en",
    "case": "lower",
    "stopwords": [
        "hello"
    ],
    "accent": False,
    "stemming": True,
    "edgeNgram": {
        "min": {
            "min": 2,
            "max": 4,
            "preserveOriginal": False
        },
        "max": 5,
        "preserveOriginal": False
    }
}

The analyzer can simply be created as follows:

## Create
from arango import ArangoClient

client = ArangoClient()
database = client.db('_System', username='root', password='root')
analyzer.create(database)

Create the View

view = View('sample_view',
            view_type="arangosearch")
## add the link (can have 0 or 1 link)
view.add_link(link)

## can have 0..* primary sort
view.add_primary_sort("text", asc = False)
view.add_stored_value(["text", "timestamp"], compression="lz4")

print("Prints the *view* as a dict:", view.summary())

!!! Note: The links might need a few minutes to be created and to show in ArangoDB.

{
    "name": "sample_view",
    "viewType": "arangosearch",
    "properties": {
        "cleanupintervalstep": 0,
        "cleanupIntervalStep": 0,
        "commitIntervalMsec": 1000,
        "consolidationIntervalMsec": 0,
        "consolidationPolicy": {
            "type": "tier",
            "segmentsMin": 1,
            "segmentsMax": 10,
            "segmentsBytesMax": 5368709120,
            "segmentsBytesFloor": 2097152,
            "minScore": 0
        },
        "primarySortCompression": "lz4",
        "writebufferIdle": 64,
        "writebufferActive": 0,
        "writebufferMaxSize": 33554432
    },
    "links": {
        "TextNode": {
            "analyzer": [
                "identity"
            ],
            "fields": {
                "text": {
                    "analyzer": [
                        "text_en",
                        "analyzer_sample"
                    ]
                }
            },
            "includeAllFields": False,
            "trackListPositions": False,
            "inBackground": False
        }
    },
    "primarySort": [
        {
            "field": "text",
            "asc": False
        }
    ],
    "storedValues": [
        {
            "fields": [
                "text"
            ],
            "compression": "lz4"
        },
        {
            "fields": [
                "timestamp"
            ],
            "compression": "lz4"
        }
    ]
}