.. role:: python(code) :language: python Graph-Annotator =============== Textmining annotator -------------------- A simple pipeline, using existing pipes, can be created as follows (assuming you have an arangodb instance up and running): .. code-block:: python from pyArango.collection import Collection from cag.framework.annotator.pipeline import Pipeline from cag.utils.config import Config ## set database configuration config= Config( url="http://127.0.0.1:8529", user="root", password="root", database="_system", graph="GenericGraph" ) ## define the pipeline pipeline: Pipeline = Pipeline(database_config=config) pipeline.add_annotation_pipe("NamedEntityAnnotator", save=True) coll: Collection = pipeline.database_config.db["TextNode"] ## fetch data docs = coll.fetchAll(limit=500) processed = [] for txt_node in docs: processed.append((txt_node.text, {"_key": txt_node._key})) ## annotating using the defined pipes pipeline.annotate(processed) ## save to the database pipeline.save() General annotator ----------------- These annotator fit a more general class, where we only provide basic functionality, similar to the graph creator. To ease the filtering based on the parameters, we provide a simple base class where the documents can be checked in and easily filtered: .. code-block:: python from cag.framework import GenericAnnotator class AnyAnnotator(GenericAnnotator): def __init__(self, conf: Config, params={'mode': 'run-1'}, filter_annotatable=True): super().__init__(query=f"""FOR dp IN {AnyGraphCreator._ANY_DATASET_NODE_NAME} RETURN dp """, params=params, conf=conf, filter_annotatable=filter_annotatable) def update_graph(self, timestamp, data): for d in data: d['add-prop']=some_algo(d['text']) self.upsert_node(d) #will annotate the data! You can disable the filtering by providing :python:`filter_annotatable=False`. When returning more complex data make sure that you also return a root-level field (in your data structure) called :python:`'_annotator_params'` (from a component that will be annotated) or provide your own fieldname in the parameter :python:`annotator_fieldname`. Each document that will be upserted (or checked into :python:`complete_annotation`) will recieve the parameter on this field, providing the next run with the neccessary information to filter. An example for annotation metadata as a :python:`dict()` for annotations produced by keyphrase extraction is given below: .. code-block:: python { "analysis_component": "keyphrase_extraction", "parameters": { "algorithm": "text_rank", "relevance_threshold": 0.75 } }