v1.1.6 update

batmen-lab · Nov 27, 2023 · 75b9237 · 75b9237
1 parent aa4f47d
commit 75b9237
Show file tree

Hide file tree

Showing 7 changed files with 228 additions and 54 deletions.
diff --git a/Git2APP.md b/Git2APP.md
@@ -143,7 +143,12 @@ Note: You have the option to alter the filtering rules by modifying filter_speci
 python Git2APP/get_API_docstring_from_sourcecode.py --LIB ${LIB}
 ```
 
-This script is designed to execute only on APIs lacking docstrings or well-documented parameters. It automatically skips APIs that meet the established documentation standards. Please note that running this script requires a paid OpenAI account.
+Tips 
+- This script is designed to execute only on APIs lacking docstrings or well-documented parameters. It automatically skips APIs that meet the established documentation standards. Please note that running this script requires a paid OpenAI account. 
+
+- This script is based on LLM responses for modification, and the quality of the results may not be entirely satisfactory. Users need to ensure that the necessary parameters type are provided for inference, as `None` type may lead to execution failures in the API due to missing essential parameters.
+
+- To accommodate the fact that `args`, `kwargs`, and similar parameters in general APIs are optional, we currently filter them out during prediction. Therefore, it's advisable to avoid using args as parameters in the code.
 
 It is better that if you can design the docstrings by yourself as it is more accurate. `NumPy` format is preferred than `reStructuredText` and `Google` format. Here's a basic example of an effective docstring :
 
@@ -231,6 +236,8 @@ hugging_models
 
 Add a logo image to `BioMANIA/chatbot_ui_biomania/public/apps/` and modify the link in `BioMANIA/chatbot_ui_biomania/components/Chat/LibCardSelect.tsx`.
 
+Be mindful of the capitalization in library names, as it affects the recognition of the related model data loading paths.
+
 ### 2.3 Use UI service.
 
 Follow the steps in [`Run with script/Inference`](README.md#inference) section in `README` to start UI service. Don’t forget to set an OpenAI key in `.env` file as recommended in `README`.

diff --git a/README.md b/README.md
@@ -1,14 +1,24 @@
 
-<h1 align="center">BioMANIA</h1>
-
-<a target="_blank" href="https://www.biorxiv.org/content/10.1101/2023.10.29.564479v1">
-<img style="height:22pt" src="https://img.shields.io/badge/-Paper-burgundy?style=flat&logo=arxiv">
-</a><a target="_blank" href="https://github.com/batmen-lab/BioMANIA">
-<img style="height:22pt" src="https://img.shields.io/badge/-Code-black?style=flat&logo=github"></a><a target="_blank" href="https://railway.app/template/WyEd-d">
-<img style="height:22pt" src="https://img.shields.io/badge/-Railway-purple?style=flat&logo=railway">
-</a><a target="_blank" href="https://hub.docker.com/repositories/chatbotuibiomania">
-<img style="height:22pt" src="https://img.shields.io/badge/-Docker-blue?style=flat&logo=docker">
-</a>
+<h1 align="center">BioMANIA</h1> 
+
+<p align="center">
+  <img src=./images/BioMANIA.png width="150" height="150">
+</p>
+
+<p align="center">
+  <a target="_blank" href="https://www.biorxiv.org/content/10.1101/2023.10.29.564479v1">
+    <img style="height:22pt" src="https://img.shields.io/badge/-Paper-burgundy?style=flat&logo=arxiv">
+  </a>
+  <a target="_blank" href="https://github.com/batmen-lab/BioMANIA">
+    <img style="height:22pt" src="https://img.shields.io/badge/-Code-black?style=flat&logo=github">
+  </a>
+  <a target="_blank" href="https://railway.app/template/WyEd-d">
+    <img style="height:22pt" src="https://img.shields.io/badge/-Railway-purple?style=flat&logo=railway">
+  </a>
+  <a target="_blank" href="https://hub.docker.com/repositories/chatbotuibiomania">
+    <img style="height:22pt" src="https://img.shields.io/badge/-Docker-blue?style=flat&logo=docker">
+  </a>
+</p>
 
 Welcome to the BioMANIA! This guide provides detailed instructions on how to set up, run, and interact with the BioMANIA chatbot interface, which connects seamlessly with various APIs to deliver information across numerous libraries and frameworks.
 
@@ -31,6 +41,7 @@ Tips:
 - We have implemented switching different libraries inside one dialog. You can 
 - Notice that the inference speed depends on OpenAI key and back-end device. A paid OpenAI key and running back-end on GPU will speed up the inference quite a lot!
 - All uploaded files are saved under `./tmp` folder. Please enter `./tmp/`+your_file_name when the API requires filename parameters.
+- It will be quite slow if the file for transmission is too large.
 
 > **This has only one backend, which may lead to request confusion when multiple users request simultaneously. The stability of the operation is affected by the device's network. When it runs on the CPU, switching between different libraries takes about half a minute to load models and data. We recommend prioritizing running it locally with GPU, which takes only about 3 seconds to switch between different libraries!**
 
@@ -275,7 +286,7 @@ cp -r ./data/standard_process/${LIB}/API_init.json ./data/standard_process/${LIB
 
 4. Following this, create instructions, generate various JSON files, and split the data.
 ```bash
-python dataloader/preprocess_retriever_data.py --concurrency 80 --LIB ${LIB}
+python dataloader/preprocess_retriever_data.py --concurrency 200 --LIB ${LIB}
 ```
 
 Tips:
@@ -295,10 +306,9 @@ python inference/retriever_bm25_inference.py --LIB ${LIB} --top_k 3
 7. Fine-tune the retriever.
 You can finetune the retriever based on the [bert-base-uncased](https://huggingface.co/bert-base-uncased) model
 ```bash
-export LIB=MIOSTONE
 CUDA_VISIBLE_DEVICES=0
 mkdir ./hugging_models/retriever_model_finetuned/${LIB}
-python models/train_retriever_multigpu.py \
+python models/train_retriever.py \
     --data_path ./data/standard_process/${LIB}/retriever_train_data/ \
     --model_name bert-base-uncased \
     --output_path ./hugging_models/retriever_model_finetuned/${LIB} \
@@ -484,17 +494,19 @@ report/Py2report.py
 ```
 
 ## Version History
-- v1.1.6 (comming soon!)
+- v1.1.6 (2023-11-27)
   - Support sharing your APP and install others' APP through [our issue](https://github.com/batmen-lab/BioMANIA/issues/2)!
-  - Provide data and pretrained models for batmen-lab developed tools MIOSTONE and SONATA, expanding our suite of available resources and functionalities.
-  - Support UI installation APP service!
-  - Add R inference code. Provide data and pretrained models for R tools.
+  - Enhance code robustness: 
+    - When it returns a tuple, split it to multiple variables by adding code `result_n+1, result_n+2, ... = result_n`. 
+    - During parameter inference, if a parameter is of 'NoneType', replace it with 'Any' to run smoothly.
+    - Fix bug for adding quotation when user input value for str type parameters.
+  - Release a package.
 - v1.1.5 (2023-11-25)
-  - Enhanced Docker Integration: Now featuring seamless packaging of both front-end and back-end components using Docker. This update simplifies deployment processes, ensuring a more streamlined development experience.
+  - Enhanced Docker Integration: Now featuring seamless packaging of both front-end and back-end components using Docker. This update simplifies deployment processes, ensuring a more streamlined development experience. We update `chatbotuibiomania/biomania-together:v1.1.3`.
   - Automated Docstring Addition: Users can now effortlessly convert GitHub source code to our tool with scripts that automatically add docstrings, freeing them from the manual effort previously required.
 - v1.1.4 (2023-11-22)
   - Add [`manual`](R2APP.md) support for converting R code to API_init.json. Will support for converting R code to APP later!
-  - Release docker v1.1.3 with support for 12 PyPI biotools. Notice that some tools are only available under their own conda environment!!
+  - Release docker v1.1.3 with support for 12 PyPI biotools. Notice that some tools are only available under their own conda environment!! We update `chatbotuibiomania/biomania-frontend:v1.1.3` and `chatbotuibiomania/biomania-backend:v1.1.3`.
   - Resolved issues related to Docker networking and Docker CUDA.
   - Improved the stability of the server demo by ensuring that it runs continuously in the background.
 - v1.1.3 (2023-11-20)

diff --git a/chatbot_ui_biomania/components/Chat/LibCardSelect.tsx b/chatbot_ui_biomania/components/Chat/LibCardSelect.tsx
@@ -14,7 +14,7 @@ export const libImages: { [key: string]: string } = {
   'pyopenms': '/apps/pyopenms.png',
   'scenicplus': '/apps/SCENIC.png',
   'scvi-tools': '/apps/scvitools.svg',
-  'SONATA': '/apps/SONATA.png',
+  'sonata': '/apps/SONATA.png',
   'MIOSTONE': '/apps/MIOSTONE.jpg',
   //'custom': '/apps/customize.jpg',
 };
@@ -48,7 +48,7 @@ export const LibCardSelect = () => {
       { id: 'pyopenms', name: 'pyopenms' },
       { id: 'scenicplus', name: 'scenicplus' },
       { id: 'scvi-tools', name: 'scvi-tools' },
-      { id: 'SONATA', name: 'SONATA' },
+      { id: 'sonata', name: 'sonata' },
       { id: 'MIOSTONE', name: 'MIOSTONE' },
       //{ id: 'custom', name: 'custom' },
     ];

diff --git a/images/BioMANIA.png b/images/BioMANIA.png
diff --git a/src/Git2APP/get_API_init_from_sourcecode.py b/src/Git2APP/get_API_init_from_sourcecode.py
@@ -1,4 +1,5 @@
 import pydoc, argparse, json, re, os, collections, inspect, importlib, typing, functools
+from typing import Any, Type, List
 from docstring_parser import parse
 from langchain.document_loaders import BSHTMLLoader
 from configs.model_config import ANALYSIS_PATH, get_all_variable_from_cheatsheet, get_all_basic_func_from_cheatsheet
@@ -16,7 +17,17 @@ def process_html(html_path: str) -> str:
     content = re.sub(r'\s+', ' ', content) # remove large blanks
     return content
 
-def get_dynamic_types():
+def get_dynamic_types() -> List[Type]:
+    """
+    Retrieves a list of various basic and complex data types from Python's built-in, typing, 
+    collections, and collections.abc modules.
+    
+    Returns
+    -------
+    List[Type]
+        A list containing types such as int, float, str, list, dict, and more specialized types 
+        like typing.Union, collections.deque, collections.abc.Iterable, etc.
+    """
     basic_types = [int, float, str, bool, list, tuple, dict, set, type(None)]
     useful_types_from_typing = [typing.Any, typing.Callable, typing.Union, typing.Optional, 
         typing.List, typing.Dict, typing.Tuple, typing.Set, typing.Type, typing.Collection]
@@ -28,7 +39,22 @@ def get_dynamic_types():
     all_types = basic_types + useful_types_from_typing + useful_types_from_collections + useful_types_from_collections_abc
     return all_types
 
-def type_to_string(t):
+def type_to_string(t: Type[Any]) -> str:
+    """
+    Convert a type to its string representation.
+
+    Parameters
+    ----------
+    t : Type[Any]
+        The type to be converted to string.
+
+    Returns
+    -------
+    str
+        The string representation of the type. If the type is a Cython function or method,
+        it returns "method". If it's a class, it returns the class name. Otherwise, it returns
+        the type name or its string representation.
+    """
     type_str = str(t)
     if 'cython_function_or_method' in type_str:
         return "method" # label cython func/method as "method"
@@ -42,7 +68,23 @@ def type_to_string(t):
 type_strings = get_dynamic_types()
 typing_list = [type_to_string(t) for t in type_strings]
 
-def expand_types(param_type):
+def expand_types(param_type: str) -> List[str]:
+    """
+    Expands a string representing a type or multiple types separated by '|' or 'or' into a list 
+    of individual type strings.
+
+    Parameters
+    ----------
+    param_type : str
+        A string representing a single type or multiple types separated by '|' or 'or'.
+
+    Returns
+    -------
+    List[str]
+        A list of strings, where each string is a type extracted from the input string. 
+        The types are stripped of leading and trailing whitespace.
+
+    """
     if is_outer_level_separator(param_type, "|"):
         types = param_type.split('|')
     elif is_outer_level_separator(param_type, " or "):
@@ -51,7 +93,7 @@ def expand_types(param_type):
         types = [param_type]
     return [t.strip() for t in types]
 
-def is_outer_level_separator(s, sep="|"):
+def is_outer_level_separator(s: str, sep: str = "|") -> bool:
     """
     Check if the separator (like '|' or 'or') is at the top level (not inside brackets).
     """
@@ -65,9 +107,21 @@ def is_outer_level_separator(s, sep="|"):
             return True
     return False
 
-def resolve_forwardref(forward_ref_str):
+def resolve_forwardref(forward_ref_str: str):
     """
-    Resolve a string representation of a ForwardRef type into the actual type.
+    Resolves a string representing a ForwardRef type (like 'List[int]') into the actual Python type,
+    using a predefined namespace of common types and typing constructs.
+
+    Parameters
+    ----------
+    forward_ref_str : str
+        The string representation of a ForwardRef type.
+
+    Returns
+    -------
+    Any
+        The resolved type if the string can be evaluated successfully within the provided namespace;
+        otherwise, returns the input string itself indicating an unresolved ForwardRef.
     """
     namespace = {
         "int": int,
@@ -87,6 +141,10 @@ def resolve_forwardref(forward_ref_str):
         return forward_ref_str
 
 def format_type_ori(annotation):
+    """
+    Formats a type annotation into a string representation, resolving forward references and
+    handling various special cases like None, Optional, and Union types.
+    """
     if not annotation:
         return None
     if annotation == inspect.Parameter.empty:
@@ -122,12 +180,22 @@ def format_type_ori(annotation):
     return str(annotation).replace("typing.", "")
 
 def format_type(annotation):
+    """
+    Formats a type annotation into a string representation with specific handling for NumPy's
+    'NDArrayA' type, converting it into 'ndarray[Any, dtype[Any]]'.
+    """
     ans = format_type_ori(annotation)
     if ans:
         ans = ans.replace("NDArrayA", "ndarray[Any, dtype[Any]]")
     return ans
 
-def is_valid_member(obj):
+def is_valid_member(obj) -> bool:
+    """
+    Determines whether the given object is a valid member based on its type.
+    Valid members include callable objects, specific collections (dict, list, tuple, set),
+    classes, functions, methods, modules, and objects with a '__call__' method that are
+    identified as methods.
+    """
     return (
         callable(obj) or 
         isinstance(obj, (dict, list, tuple, set)) or  # , property
@@ -138,7 +206,12 @@ def is_valid_member(obj):
         (hasattr(obj, '__call__') and 'method' in str(obj))
     )
 
-def is_unwanted_api(member):
+def is_unwanted_api(member) -> bool:
+    """
+    Determines whether a member (typically a class) is considered an unwanted API.
+    Unwanted APIs are identified as either subclasses of BaseException or classes whose 
+    base classes are defined in a different module than the class itself.
+    """
     if inspect.isclass(member):
         if issubclass(member, BaseException):
             return True
@@ -163,7 +236,22 @@ def is_from_external_module(lib_name, member):
         return LIB not in module_name"""
         return False
 
-def are_most_strings_modules(api_strings):
+def are_most_strings_modules(api_strings: list) -> bool:
+    """
+    Determines whether the majority of strings in a given list represent valid Python modules.
+    It tries to import each string as a module and counts the successful imports.
+
+    Parameters
+    ----------
+    api_strings : list
+        A list of strings, each potentially representing a module name.
+
+    Returns
+    -------
+    bool
+        True if more than 50% of the strings in the list are valid module names, False otherwise.
+
+    """
     valid_modules = 0
     total_strings = len(api_strings)
     for api in api_strings:
@@ -174,7 +262,11 @@ def are_most_strings_modules(api_strings):
             continue
     return valid_modules / total_strings > 0.5
 
-def recursive_member_extraction(module, prefix, lib_name, visited=None, depth=None):
+def recursive_member_extraction(module, prefix: str, lib_name: str, visited=None, depth=None) -> list:
+    """
+    Recursively extracts members from a module, including classes and submodules, 
+    while avoiding duplicates and unwanted members.
+    """
     if visited is None:
         visited = set()
     members = []
@@ -190,7 +282,7 @@ def recursive_member_extraction(module, prefix, lib_name, visited=None, depth=No
         if inspect.isclass(member):
             if issubclass(member, Exception): #inspect.isclass(member) and 
                 continue
-        if inspect.isabstract(member):  # 排除抽象属性
+        if inspect.isabstract(member):  # remove abstract attribute
             continue
         """if member.__module__ == 'builtins':
             continue"""
@@ -429,7 +521,7 @@ def filter_optional_parameters(api_data):
 def generate_api_callings(results, basic_types=['str', 'int', 'float', 'bool', 'list', 'dict', 'tuple', 'set', 'any', 'List', 'Dict']):
     updated_results = {}
     for api_name, api_info in results.items():
-        if api_info["api_type"] in ['function', 'method', 'class', 'functools.partial']:
+        if api_info["api_type"] in ['function', 'method', 'class', 'Class', 'functools.partial']:
             # Update the optional_value key for each parameter
             for param_name, param_details in api_info["Parameters"].items():
                 param_type = param_details.get('type')
@@ -493,7 +585,7 @@ def filter_specific_apis(data, lib_name):
         parameters = details['Parameters']
         Returns_type = details['Returns']['type']
         Returns_description = details['Returns']['description']
-        if api_type in ["module", "constant", "property", "getset_descriptor", "Class", "class"]:
+        if api_type in ["module", "constant", "property", "getset_descriptor"]:
             filter_counts["api_type_module_constant_property_getsetdescriptor"] += 1
             filter_API["api_type_module_constant_property_getsetdescriptor"].append(api)
             continue