Extra Feature: The Arango View Wrapper ====================================== The arango view wrapper (:py:mod:`cag.view_wrapper`) is a tool to simplify the creation of Arango Analyzers. This tool can be used by the *Analyzer* component mentioned above. This wrapper has classes that facilitate the creation of arango view and all its properties and components. The full example can be found `in the examples folder `_ Create an arango analyzer ------------------------- .. epigraph:: The valid attributes/values for the properties depend on the type used. For example, the delimiter type needs to know the desired delimiting character(s), whereas the text type takes a locale, stop-words, and more. -- `source `_ The analyzer class loads the required attributes of an analyzer based on its type. The supported types are: * _TYPE_IDENTITY -> "identity", **attributes to set:** None * _TYPE_TEXT -> "text", **attributes to set:** 'locale', 'case', 'stopwords', 'accent', 'stemming', 'edge_ngram' * _TYPE_NGRAM -> "ngram", **attributes to set:** 'min', 'max', 'preserve_original', 'start_marker', 'end_marker', 'stem_type' * _TYPE_STEM -> "stem", **attributes to set:** locale * _TYPE_DELIMITE -> "delimiter", **attributes to set:** delimiter .. code-block:: python from cag.view_wrapper.arango_analyzer import ArangoAnalyzer, EdgeNGram analyzer = ArangoAnalyzer("sample_analyzer") analyzer.type = ArangoAnalyzer._TYPE_TEXT analyzer.set_stopwords(language="english", custom_stopwords=['hello'], include_default=False) print(analyzer.get_type_fields()) ## Returns: ['locale', 'case', 'stopwords', 'accent', 'stemming', 'edge_ngram'] analyzer.set_features(frequency=True, norm=True, position=True) # by defaults, all the features are set to True analyzer.set_edge_ngrams(EdgeNGram(min=2, max=4, preserve_original=False)) print(analyzer.summary()) The summary returns the dictionary used to create the Analyzer: .. code-block:: python { "name": "sample_analyzer", "type": "text", "features": [ "Frequency", "norm", "position" ], "locale": "en", "case": "lower", "stopwords": [ "hello" ], "accent": False, "stemming": True, "edgeNgram": { "min": { "min": 2, "max": 4, "preserveOriginal": False }, "max": 5, "preserveOriginal": False } } The analyzer can simply be created as follows: .. code-block:: python ## Create from arango import ArangoClient client = ArangoClient() database = client.db('_System', username='root', password='root') analyzer.create(database) Create a *link* with *fields* ----------------------------- .. code-block:: python # Create Link - a view can have 0 to * links link = Link(name="TextNode") # Name of a collection in the database linkAnalyzers = AnalyzerList(["identity"]) link.analyzer = linkAnalyzers # A link can have 0..* fields # for the *text* field in the *textNode* collection, add the analyzer below field = Field("text", AnalyzerList(["text_en", "invalid_analyzer", "analyzer_sample"])) # text_en is a predefined analyzer from arango # filters out the analyzer that are not defined in the database field.analyzer.filter_invalid_analyzer(DB, verbose=1) print("current analyzer after filtering invalid ones: ", field.analyzer) current analyzer after filtering invalid ones: .. code-block:: python AnalyzerList(analyzerList=['text_en', 'analyzer_sample']) .. code-block:: python link.add_field(field) ## Show the dict format of all the fields in a link print(link.get_fields_dict()) .. code-block:: python {'text': {'analyzer': ['text_en', 'analyzer_sample']}} Create the *View* ----------------- .. code-block:: python view = View('sample_view', view_type="arangosearch") ## add the link (can have 0 or 1 link) view.add_link(link) ## can have 0..* primary sort view.add_primary_sort("text", asc = False) view.add_stored_value(["text", "timestamp"], compression="lz4") print("Prints the *view* as a dict:", view.summary()) !!! Note: The links might need a few minutes to be created and to show in ArangoDB. .. code-block:: python { "name": "sample_view", "viewType": "arangosearch", "properties": { "cleanupintervalstep": 0, "cleanupIntervalStep": 0, "commitIntervalMsec": 1000, "consolidationIntervalMsec": 0, "consolidationPolicy": { "type": "tier", "segmentsMin": 1, "segmentsMax": 10, "segmentsBytesMax": 5368709120, "segmentsBytesFloor": 2097152, "minScore": 0 }, "primarySortCompression": "lz4", "writebufferIdle": 64, "writebufferActive": 0, "writebufferMaxSize": 33554432 }, "links": { "TextNode": { "analyzer": [ "identity" ], "fields": { "text": { "analyzer": [ "text_en", "analyzer_sample" ] } }, "includeAllFields": False, "trackListPositions": False, "inBackground": False } }, "primarySort": [ { "field": "text", "asc": False } ], "storedValues": [ { "fields": [ "text" ], "compression": "lz4" }, { "fields": [ "timestamp" ], "compression": "lz4" } ] }