Extra Feature: The Arango View Wrapper
======================================


The arango view wrapper (:py:mod:`cag.view_wrapper`) is a tool to simplify the creation of Arango Analyzers. This tool can be used by the *Analyzer* component mentioned above. This wrapper has classes that facilitate the creation of arango view and all its properties and components.

The full example can be found `in the examples folder <examples/view_creation_example.py>`_


Create an arango analyzer
-------------------------

.. epigraph::

    The valid attributes/values for the properties depend on the type used. For example, the delimiter type needs to know the desired delimiting character(s), whereas the text type takes a locale, stop-words, and more.

    -- `source <https://www.arangodb.com/docs/stable/analyzer.html>`_

The analyzer class loads the required attributes of an analyzer based on its type. The supported types are:

* _TYPE_IDENTITY -> "identity", **attributes to set:** None
* _TYPE_TEXT -> "text", **attributes to set:** 'locale', 'case', 'stopwords', 'accent', 'stemming', 'edge_ngram'
* _TYPE_NGRAM -> "ngram", **attributes to set:**  'min', 'max', 'preserve_original', 'start_marker', 'end_marker', 'stem_type'
* _TYPE_STEM -> "stem", **attributes to set:** locale
* _TYPE_DELIMITE -> "delimiter", **attributes to set:** delimiter


.. code-block:: python

    from cag.view_wrapper.arango_analyzer import ArangoAnalyzer, EdgeNGram

    analyzer = ArangoAnalyzer("sample_analyzer")
    analyzer.type = ArangoAnalyzer._TYPE_TEXT
    analyzer.set_stopwords(language="english", custom_stopwords=['hello'], include_default=False)

    print(analyzer.get_type_fields())
    ## Returns: ['locale', 'case', 'stopwords', 'accent', 'stemming', 'edge_ngram']

    analyzer.set_features(frequency=True, norm=True, position=True) # by defaults, all the features are set to True
    analyzer.set_edge_ngrams(EdgeNGram(min=2,
                                max=4,
                                preserve_original=False))
    print(analyzer.summary())

The summary returns the dictionary used to create the Analyzer:

.. code-block:: python

    {
        "name": "sample_analyzer",
        "type": "text",
        "features": [
            "Frequency",
            "norm",
            "position"
        ],
        "locale": "en",
        "case": "lower",
        "stopwords": [
            "hello"
        ],
        "accent": False,
        "stemming": True,
        "edgeNgram": {
            "min": {
                "min": 2,
                "max": 4,
                "preserveOriginal": False
            },
            "max": 5,
            "preserveOriginal": False
        }
    }


The analyzer can simply be created as follows:

.. code-block:: python

  ## Create 
  from arango import ArangoClient

  client = ArangoClient()
  database = client.db('_System', username='root', password='root')
  analyzer.create(database)


Create a *link* with *fields*
-----------------------------


.. code-block:: python

    # Create Link - a view can have 0 to * links
    link = Link(name="TextNode") # Name of a collection in the database
    linkAnalyzers = AnalyzerList(["identity"])
    link.analyzer = linkAnalyzers

    # A link can have 0..* fields
    # for the *text* field in the *textNode* collection, add the analyzer below
    field = Field("text", AnalyzerList(["text_en", "invalid_analyzer", "analyzer_sample"])) # text_en is a predefined analyzer from arango
    
    # filters out the analyzer that are not defined in the database
    field.analyzer.filter_invalid_analyzer(DB, verbose=1) 
    print("current analyzer after filtering invalid ones: ", field.analyzer)

current analyzer after filtering invalid ones:  
    
.. code-block:: python
    
    AnalyzerList(analyzerList=['text_en', 'analyzer_sample'])  

.. code-block:: python
    
    link.add_field(field)

    ## Show the dict format of all the fields in a link
    print(link.get_fields_dict())
   

.. code-block:: python
    
    {'text': {'analyzer': ['text_en', 'analyzer_sample']}}


Create the *View*
-----------------

.. code-block:: python
    
    view = View('sample_view',
                view_type="arangosearch")
    ## add the link (can have 0 or 1 link)
    view.add_link(link)

    ## can have 0..* primary sort
    view.add_primary_sort("text", asc = False)
    view.add_stored_value(["text", "timestamp"], compression="lz4")

    print("Prints the *view* as a dict:", view.summary())


!!! Note: The links might need a few minutes to be created and to show in ArangoDB.

.. code-block:: python

    {
        "name": "sample_view",
        "viewType": "arangosearch",
        "properties": {
            "cleanupintervalstep": 0,
            "cleanupIntervalStep": 0,
            "commitIntervalMsec": 1000,
            "consolidationIntervalMsec": 0,
            "consolidationPolicy": {
                "type": "tier",
                "segmentsMin": 1,
                "segmentsMax": 10,
                "segmentsBytesMax": 5368709120,
                "segmentsBytesFloor": 2097152,
                "minScore": 0
            },
            "primarySortCompression": "lz4",
            "writebufferIdle": 64,
            "writebufferActive": 0,
            "writebufferMaxSize": 33554432
        },
        "links": {
            "TextNode": {
                "analyzer": [
                    "identity"
                ],
                "fields": {
                    "text": {
                        "analyzer": [
                            "text_en",
                            "analyzer_sample"
                        ]
                    }
                },
                "includeAllFields": False,
                "trackListPositions": False,
                "inBackground": False
            }
        },
        "primarySort": [
            {
                "field": "text",
                "asc": False
            }
        ],
        "storedValues": [
            {
                "fields": [
                    "text"
                ],
                "compression": "lz4"
            },
            {
                "fields": [
                    "timestamp"
                ],
                "compression": "lz4"
            }
        ]
    }