Extra Feature: The Arango View Wrapper¶

The arango view wrapper (cag.view_wrapper) is a tool to simplify the creation of Arango Analyzers. This tool can be used by the Analyzer component mentioned above. This wrapper has classes that facilitate the creation of arango view and all its properties and components.

The full example can be found in the examples folder

Create an arango analyzer¶

The valid attributes/values for the properties depend on the type used. For example, the delimiter type needs to know the desired delimiting character(s), whereas the text type takes a locale, stop-words, and more.

—source

The analyzer class loads the required attributes of an analyzer based on its type. The supported types are:

_TYPE_IDENTITY -> “identity”, attributes to set: None
_TYPE_TEXT -> “text”, attributes to set: ‘locale’, ‘case’, ‘stopwords’, ‘accent’, ‘stemming’, ‘edge_ngram’
_TYPE_NGRAM -> “ngram”, attributes to set: ‘min’, ‘max’, ‘preserve_original’, ‘start_marker’, ‘end_marker’, ‘stem_type’
_TYPE_STEM -> “stem”, attributes to set: locale
_TYPE_DELIMITE -> “delimiter”, attributes to set: delimiter

from cag.view_wrapper.arango_analyzer import ArangoAnalyzer, EdgeNGram

analyzer = ArangoAnalyzer("sample_analyzer")
analyzer.type = ArangoAnalyzer._TYPE_TEXT
analyzer.set_stopwords(language="english", custom_stopwords=['hello'], include_default=False)

print(analyzer.get_type_fields())
## Returns: ['locale', 'case', 'stopwords', 'accent', 'stemming', 'edge_ngram']

analyzer.set_features(frequency=True, norm=True, position=True) # by defaults, all the features are set to True
analyzer.set_edge_ngrams(EdgeNGram(min=2,
                            max=4,
                            preserve_original=False))
print(analyzer.summary())

The summary returns the dictionary used to create the Analyzer:

{
    "name": "sample_analyzer",
    "type": "text",
    "features": [
        "Frequency",
        "norm",
        "position"
    ],
    "locale": "en",
    "case": "lower",
    "stopwords": [
        "hello"
    ],
    "accent": False,
    "stemming": True,
    "edgeNgram": {
        "min": {
            "min": 2,
            "max": 4,
            "preserveOriginal": False
        },
        "max": 5,
        "preserveOriginal": False
    }
}

The analyzer can simply be created as follows:

## Create
from arango import ArangoClient

client = ArangoClient()
database = client.db('_System', username='root', password='root')
analyzer.create(database)

Create a link with fields¶

# Create Link - a view can have 0 to * links
link = Link(name="TextNode") # Name of a collection in the database
linkAnalyzers = AnalyzerList(["identity"])
link.analyzer = linkAnalyzers

# A link can have 0..* fields
# for the *text* field in the *textNode* collection, add the analyzer below
field = Field("text", AnalyzerList(["text_en", "invalid_analyzer", "analyzer_sample"])) # text_en is a predefined analyzer from arango

# filters out the analyzer that are not defined in the database
field.analyzer.filter_invalid_analyzer(DB, verbose=1)
print("current analyzer after filtering invalid ones: ", field.analyzer)

current analyzer after filtering invalid ones:

AnalyzerList(analyzerList=['text_en', 'analyzer_sample'])

link.add_field(field)

## Show the dict format of all the fields in a link
print(link.get_fields_dict())

{'text': {'analyzer': ['text_en', 'analyzer_sample']}}

Create the View¶

view = View('sample_view',
            view_type="arangosearch")
## add the link (can have 0 or 1 link)
view.add_link(link)

## can have 0..* primary sort
view.add_primary_sort("text", asc = False)
view.add_stored_value(["text", "timestamp"], compression="lz4")

print("Prints the *view* as a dict:", view.summary())

!!! Note: The links might need a few minutes to be created and to show in ArangoDB.

{
    "name": "sample_view",
    "viewType": "arangosearch",
    "properties": {
        "cleanupintervalstep": 0,
        "cleanupIntervalStep": 0,
        "commitIntervalMsec": 1000,
        "consolidationIntervalMsec": 0,
        "consolidationPolicy": {
            "type": "tier",
            "segmentsMin": 1,
            "segmentsMax": 10,
            "segmentsBytesMax": 5368709120,
            "segmentsBytesFloor": 2097152,
            "minScore": 0
        },
        "primarySortCompression": "lz4",
        "writebufferIdle": 64,
        "writebufferActive": 0,
        "writebufferMaxSize": 33554432
    },
    "links": {
        "TextNode": {
            "analyzer": [
                "identity"
            ],
            "fields": {
                "text": {
                    "analyzer": [
                        "text_en",
                        "analyzer_sample"
                    ]
                }
            },
            "includeAllFields": False,
            "trackListPositions": False,
            "inBackground": False
        }
    },
    "primarySort": [
        {
            "field": "text",
            "asc": False
        }
    ],
    "storedValues": [
        {
            "fields": [
                "text"
            ],
            "compression": "lz4"
        },
        {
            "fields": [
                "timestamp"
            ],
            "compression": "lz4"
        }
    ]
}

Extra Feature: The Arango View Wrapper¶

Create an arango analyzer¶

Create a link with fields¶

Create the View¶

Corpus Annotation Graph (CAG)

Navigation

Related Topics