Ml Module¶
Total files documented: 7
ml/cleantxt.py¶
Purpose¶
This file, cleantxt.py, contains a function to clean and preprocess text data for machine learning models. It removes punctuation, numbers, and converts text to lowercase.
Key Responsibilities¶
- Clean and preprocess text data for machine learning models
- Remove punctuation and numbers from text
- Convert text to lowercase
Important Functions¶
clean_text(text)¶
This function takes in a string of text and returns the cleaned text. It performs the following operations: * Converts the text to lowercase * Removes punctuation using regular expressions * Removes numbers using regular expressions
Important Classes¶
None
System Fit¶
This file is part of the wider 3-Cubed Python system, which utilizes FastAPI and CrewAI for machine learning tasks. The cleantxt.py file is likely used as a utility module to preprocess text data before it is fed into machine learning models. The cleaned text can then be used for tasks such as natural language processing, text classification, or sentiment analysis.
ml/db.py¶
Purpose¶
This file, db.py, provides a set of functions for interacting with a database using the pyodbc library. It establishes a connection to the database, fetches forms, rules, and other data based on specific process types.
Key Responsibilities¶
- Establish a connection to the database using pyodbc.
- Fetch forms from the
mstRef_Formstable based on process types. - Fetch rules from the
mstRef_BusinessRulestable based on process types. - Handle database connection errors and exceptions.
Important Functions¶
get_db_connection()¶
Establishes a connection to the database using pyodbc. It takes no arguments and returns a database connection object.
fetch_forms(process_type: str) -> list¶
Fetches forms from the mstRef_Forms table based on process types. It takes a comma-separated list of process types as input and returns a list of form names.
fetch_rules(process_type: str) -> list¶
Fetches rules from the mstRef_BusinessRules table based on process types. It takes a comma-separated list of process types as input and returns a list of rule names.
System Fit¶
This file fits into the wider 3-Cubed Python system as a utility for interacting with the database. It provides a set of functions that can be used by other parts of the system to fetch data from the database. The functions are designed to be reusable and can be easily integrated into other components of the system.
Notes¶
- The database connection string is hardcoded in the
get_db_connection()function. This may not be suitable for production environments where database credentials should be kept secure. - The
fetch_forms()andfetch_rules()functions use dynamic SQL to construct the query conditions based on the input process types. This may introduce security risks if not properly sanitized. - The functions do not handle pagination or limit the number of results returned from the database. This may lead to performance issues if the database contains a large number of records.
ml/ner.py¶
ml/ner.py¶
Purpose¶
This file provides a function for removing company names from input text using spaCy's Named Entity Recognition (NER) model. It also includes a custom dictionary for fallback cases where spaCy's model is unable to detect company names.
Key Responsibilities¶
- Load spaCy's NER model for English language
- Define a function to remove company names from input text
- Provide a custom dictionary for fallback company name detection
Important Functions¶
remove_company_names(text: str) -> str: Removes company names (ORG entities) from the input text using spaCy's NER. It includes case normalization and fallback to a custom dictionary.
Important Classes¶
None
System Fit¶
This file is part of the wider 3-Cubed Python system, which utilizes spaCy's NER model for text processing. The remove_company_names function can be used in various applications, such as data preprocessing, text analysis, or content generation. It is designed to be a reusable component that can be easily integrated into other parts of the system.
Notes¶
- The
en_core_web_smspaCy model is used for NER, which is a small English model suitable for general-purpose text processing. - The custom dictionary
custom_companiescontains a list of company names that are not detected by spaCy's model. This dictionary can be extended or modified as needed. - The
remove_company_namesfunction preserves the original text for case-sensitive replacement and uses regular expressions to replace detected company names.
ml/suggesttools.py¶
Purpose¶
This file, suggesttools.py, provides a set of custom tools for predicting forms and rules used in process activities, considering the predicted product. These tools utilize a large language model (LLM) to generate suggestions based on input prompts.
Key Responsibilities¶
- Predict forms used in a process activity, considering the predicted product.
- Predict rules used in a process activity, considering the predicted product.
- Suggest forms used in a process activity, considering the predicted product and available forms.
- Utilize a large language model (LLM) to generate suggestions based on input prompts.
Important Functions¶
predict_forms_tool¶
- Purpose: Predict forms used in a process activity, considering the predicted product.
- Input:
process_type,activity_name,product, and an LLM instance. - Output: A list of predicted forms.
predict_rules_tool¶
- Purpose: Predict rules used in a process activity, considering the predicted product.
- Input:
process_type,activity_name,product, and an LLM instance. - Output: A list of predicted rules, each with a skill level.
suggest_forms¶
- Purpose: Suggest forms used in a process activity, considering the predicted product and available forms.
- Input:
process_type,activity_name,product, and an LLM instance. - Output: A list of suggested forms, each with a mode.
System Fit¶
This file fits into the wider 3-Cubed Python system by providing a set of custom tools for predicting forms and rules used in process activities. These tools can be integrated with other components of the system to provide a comprehensive solution for process-based form and rule classification. The file utilizes the crewai library for interacting with the LLM and the db module for fetching forms from the database.
ml/aht/cleantxt.py¶
Purpose¶
This file, cleantxt.py, contains a function to clean and preprocess text data for machine learning models. It removes punctuation, converts text to lowercase, and removes numbers.
Key Responsibilities¶
- Clean and preprocess text data
- Remove punctuation and numbers from text
- Convert text to lowercase
Important Functions¶
clean_text(text)¶
This function takes in a string of text and returns the cleaned text. It uses regular expressions to remove punctuation and numbers, and converts the text to lowercase.
System Fit¶
This file is part of the 3-Cubed Python system, specifically within the ml/aht module. It is designed to be used in conjunction with machine learning models to preprocess text data before feeding it into the models. The clean_text function can be used as a preprocessing step in various machine learning pipelines within the system.
ml/ml_notebooks/cleantxt.py¶
Purpose¶
This file, cleantxt.py, contains functions for cleaning text data. It removes punctuation, converts text to lowercase, and removes digits from the input text.
Key Responsibilities¶
- Clean text data by removing punctuation and digits
- Convert text to lowercase
Important Functions¶
clean_text(text)¶
Removes punctuation and digits from the input text and converts it to lowercase. This function is a simple text cleaning utility.
clean_text1(text)¶
This function is identical to clean_text(text). It is unclear why there are two identical functions in this file.
Important Classes¶
None
System Fit¶
This file is part of the ML (Machine Learning) component of the 3-Cubed Python system. It is likely used as a utility function to preprocess text data before it is fed into machine learning models. The cleaned text data can then be used for tasks such as text classification, sentiment analysis, or topic modeling.
ml/ml_notebooks/main.py¶
Purpose¶
This file, main.py, is the main entry point for the 3-Cubed(ML) FastAPI application. It handles API requests for making NVA (Not Vital Activity) predictions using a machine learning model.
Key Responsibilities¶
- Load and verify API keys
- Load and initialize machine learning models for NVA predictions
- Handle API requests for making NVA predictions
- Return predictions in a standardized format
Important Functions¶
make_nva_predictions(text_inputs): This function takes a list ofNVATextInputobjects as input and returns a list ofNVAPredictionResponseobjects containing the predicted NVA labels and confidence scores.Predict_Nva(text_inputs, api_key): This is the API endpoint for making NVA predictions. It takes a list ofNVATextInputobjects and an API key as input, verifies the API key, and calls themake_nva_predictionsfunction to generate the predictions.
Important Classes¶
NVATextInput: This is a Pydantic model representing the input data for making NVA predictions. It has attributes forid,industry,activity_name, andsystems_and_applications.NVAPredictionResponse: This is a Pydantic model representing the output data for NVA predictions. It has attributes forid,nva, andconfidence.
System Fit¶
This file fits into the wider 3-Cubed Python system as the main entry point for the machine learning API. It relies on the api_keys.json file for API key verification and the NVA_Type_pkl directory for loading the machine learning models. The predictions generated by this file can be used as input for other components of the system.