Skip to content

Ner Module

Total files documented: 3


ner/main.py

ner/main.py

Purpose

This file sets up the main FastAPI application, including CORS middleware and routing for NER (Named Entity Recognition) and merge operations.

Key Responsibilities

  • Sets up the FastAPI application instance
  • Configures CORS middleware for cross-origin resource sharing
  • Includes routers for NER and merge operations

Important Functions

  • app.add_middleware: Adds CORS middleware to the application
  • app.include_router: Includes routers for NER and merge operations

Important Classes

  • FastAPI: The main application instance
  • CORSMiddleware: Middleware for enabling CORS

System Fit

This file is part of the wider 3-Cubed Python system, which appears to be a FastAPI-based API for natural language processing tasks. The ner/main.py file serves as the entry point for the application, setting up the main application instance and including routers for specific operations. The NER and merge routers are likely implemented in separate files (ner_router.py and merge.py, respectively) and are included in this file using the include_router method.


ner/merge.py

Purpose

This file, ner/merge.py, is responsible for classifying input items into Activities, Control, or NVA (Non-Value-Adding) categories using a CrewAI agent. The file defines a FastAPI endpoint to receive input items, run the classification process, and return the results in a strict JSON format.

Key Responsibilities

  • Define a CrewAI agent for classifying input items
  • Create a FastAPI endpoint to receive input items and trigger the classification process
  • Handle errors and exceptions during the classification process
  • Return the classification results in a strict JSON format

Important Functions

  • classify_activities: The main function that receives input items, runs the classification process, and returns the results
  • prepare_inputs: A helper function that normalizes the input items to a plain dict format for CrewAI
  • classifier_agent: A function that defines the CrewAI agent for classifying input items
  • classify_task: A function that defines the task for the CrewAI agent to classify input items
  • activity_classifier_crew: A function that defines the CrewAI crew for classifying input items
  • to_items: A helper function that extracts the classification results from the raw output and returns them in a strict JSON format

Important Classes

  • ItemIn: A Pydantic model that represents the input item with an ID and name
  • ActivityClassifyRequest: A Pydantic model that represents the input request with a list of items
  • ItemOut: A Pydantic model that represents the output item with an ID and name
  • ActivityClassifyResponse: A Pydantic model that represents the output response with a list of classified items
  • ActivityClassificationCrew: A CrewAI crew class that defines the agent, task, and process for classifying input items

System Fit

This file fits into the wider 3-Cubed Python system as a part of the Natural Language Processing (NLP) module. It uses the CrewAI library to define a custom agent for classifying input items and integrates with the FastAPI framework to create a RESTful endpoint for receiving input items and returning the classification results. The file is designed to be modular and reusable, allowing for easy integration with other components of the 3-Cubed system.


ner/ner.py

Purpose

This file, ner/ner.py, is a Python module that provides Natural Language Processing (NLP) functionality for detecting organization mentions in text data. It utilizes the Hugging Face Transformers library and FastAPI for building a web API.

Key Responsibilities

  • Load pre-trained NLP models and pipelines for organization detection
  • Define custom regular expressions for filtering organization mentions
  • Implement functions for logging API requests and responses
  • Provide a FastAPI router for handling API requests
  • Define Pydantic models for structured input and response data

Important Functions

  • _now_iso(): Returns the current timestamp in ISO format.
  • log_io(): Logs API requests and responses to a JSON file.
  • is_probably_real_org(): Filters organization mentions based on custom rules and regular expressions.
  • get_company_mentions(): Detects organization labels in text data using a combination of exact matches and Hugging Face NER.

Important Classes

  • StructuredInput: A Pydantic model for structured input data, containing industry name, activities, and teams.
  • OrgsOnlyResponse: A Pydantic model for response data, containing activity organizations and team organizations.

System Fit

This file fits into the wider 3-Cubed Python system by providing a web API for detecting organization mentions in text data. The API can be used as a component in larger applications, such as information extraction, entity recognition, or text analysis pipelines. The file's functionality is designed to be reusable and can be integrated with other modules and systems within the 3-Cubed ecosystem.


🤖