Skip to content

Commit

Permalink
v1.1.6 update
Browse files Browse the repository at this point in the history
  • Loading branch information
DoraDong-2023 committed Nov 27, 2023
1 parent aa4f47d commit 75b9237
Show file tree
Hide file tree
Showing 7 changed files with 228 additions and 54 deletions.
9 changes: 8 additions & 1 deletion Git2APP.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,12 @@ Note: You have the option to alter the filtering rules by modifying filter_speci
python Git2APP/get_API_docstring_from_sourcecode.py --LIB ${LIB}
```

This script is designed to execute only on APIs lacking docstrings or well-documented parameters. It automatically skips APIs that meet the established documentation standards. Please note that running this script requires a paid OpenAI account.
Tips
- This script is designed to execute only on APIs lacking docstrings or well-documented parameters. It automatically skips APIs that meet the established documentation standards. Please note that running this script requires a paid OpenAI account.

- This script is based on LLM responses for modification, and the quality of the results may not be entirely satisfactory. Users need to ensure that the necessary parameters type are provided for inference, as `None` type may lead to execution failures in the API due to missing essential parameters.

- To accommodate the fact that `args`, `kwargs`, and similar parameters in general APIs are optional, we currently filter them out during prediction. Therefore, it's advisable to avoid using args as parameters in the code.

It is better that if you can design the docstrings by yourself as it is more accurate. `NumPy` format is preferred than `reStructuredText` and `Google` format. Here's a basic example of an effective docstring :

Expand Down Expand Up @@ -231,6 +236,8 @@ hugging_models

Add a logo image to `BioMANIA/chatbot_ui_biomania/public/apps/` and modify the link in `BioMANIA/chatbot_ui_biomania/components/Chat/LibCardSelect.tsx`.

Be mindful of the capitalization in library names, as it affects the recognition of the related model data loading paths.

### 2.3 Use UI service.

Follow the steps in [`Run with script/Inference`](README.md#inference) section in `README` to start UI service. Don’t forget to set an OpenAI key in `.env` file as recommended in `README`.
Expand Down
50 changes: 31 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,24 @@

<h1 align="center">BioMANIA</h1>

<a target="_blank" href="https://www.biorxiv.org/content/10.1101/2023.10.29.564479v1">
<img style="height:22pt" src="https://img.shields.io/badge/-Paper-burgundy?style=flat&logo=arxiv">
</a><a target="_blank" href="https://github.com/batmen-lab/BioMANIA">
<img style="height:22pt" src="https://img.shields.io/badge/-Code-black?style=flat&logo=github"></a><a target="_blank" href="https://railway.app/template/WyEd-d">
<img style="height:22pt" src="https://img.shields.io/badge/-Railway-purple?style=flat&logo=railway">
</a><a target="_blank" href="https://hub.docker.com/repositories/chatbotuibiomania">
<img style="height:22pt" src="https://img.shields.io/badge/-Docker-blue?style=flat&logo=docker">
</a>
<h1 align="center">BioMANIA</h1>

<p align="center">
<img src=./images/BioMANIA.png width="150" height="150">
</p>

<p align="center">
<a target="_blank" href="https://www.biorxiv.org/content/10.1101/2023.10.29.564479v1">
<img style="height:22pt" src="https://img.shields.io/badge/-Paper-burgundy?style=flat&logo=arxiv">
</a>
<a target="_blank" href="https://github.com/batmen-lab/BioMANIA">
<img style="height:22pt" src="https://img.shields.io/badge/-Code-black?style=flat&logo=github">
</a>
<a target="_blank" href="https://railway.app/template/WyEd-d">
<img style="height:22pt" src="https://img.shields.io/badge/-Railway-purple?style=flat&logo=railway">
</a>
<a target="_blank" href="https://hub.docker.com/repositories/chatbotuibiomania">
<img style="height:22pt" src="https://img.shields.io/badge/-Docker-blue?style=flat&logo=docker">
</a>
</p>

Welcome to the BioMANIA! This guide provides detailed instructions on how to set up, run, and interact with the BioMANIA chatbot interface, which connects seamlessly with various APIs to deliver information across numerous libraries and frameworks.

Expand All @@ -31,6 +41,7 @@ Tips:
- We have implemented switching different libraries inside one dialog. You can
- Notice that the inference speed depends on OpenAI key and back-end device. A paid OpenAI key and running back-end on GPU will speed up the inference quite a lot!
- All uploaded files are saved under `./tmp` folder. Please enter `./tmp/`+your_file_name when the API requires filename parameters.
- It will be quite slow if the file for transmission is too large.

> **This has only one backend, which may lead to request confusion when multiple users request simultaneously. The stability of the operation is affected by the device's network. When it runs on the CPU, switching between different libraries takes about half a minute to load models and data. We recommend prioritizing running it locally with GPU, which takes only about 3 seconds to switch between different libraries!**
Expand Down Expand Up @@ -275,7 +286,7 @@ cp -r ./data/standard_process/${LIB}/API_init.json ./data/standard_process/${LIB

4. Following this, create instructions, generate various JSON files, and split the data.
```bash
python dataloader/preprocess_retriever_data.py --concurrency 80 --LIB ${LIB}
python dataloader/preprocess_retriever_data.py --concurrency 200 --LIB ${LIB}
```

Tips:
Expand All @@ -295,10 +306,9 @@ python inference/retriever_bm25_inference.py --LIB ${LIB} --top_k 3
7. Fine-tune the retriever.
You can finetune the retriever based on the [bert-base-uncased](https://huggingface.co/bert-base-uncased) model
```bash
export LIB=MIOSTONE
CUDA_VISIBLE_DEVICES=0
mkdir ./hugging_models/retriever_model_finetuned/${LIB}
python models/train_retriever_multigpu.py \
python models/train_retriever.py \
--data_path ./data/standard_process/${LIB}/retriever_train_data/ \
--model_name bert-base-uncased \
--output_path ./hugging_models/retriever_model_finetuned/${LIB} \
Expand Down Expand Up @@ -484,17 +494,19 @@ report/Py2report.py
```

## Version History
- v1.1.6 (comming soon!)
- v1.1.6 (2023-11-27)
- Support sharing your APP and install others' APP through [our issue](https://github.com/batmen-lab/BioMANIA/issues/2)!
- Provide data and pretrained models for batmen-lab developed tools MIOSTONE and SONATA, expanding our suite of available resources and functionalities.
- Support UI installation APP service!
- Add R inference code. Provide data and pretrained models for R tools.
- Enhance code robustness:
- When it returns a tuple, split it to multiple variables by adding code `result_n+1, result_n+2, ... = result_n`.
- During parameter inference, if a parameter is of 'NoneType', replace it with 'Any' to run smoothly.
- Fix bug for adding quotation when user input value for str type parameters.
- Release a package.
- v1.1.5 (2023-11-25)
- Enhanced Docker Integration: Now featuring seamless packaging of both front-end and back-end components using Docker. This update simplifies deployment processes, ensuring a more streamlined development experience.
- Enhanced Docker Integration: Now featuring seamless packaging of both front-end and back-end components using Docker. This update simplifies deployment processes, ensuring a more streamlined development experience. We update `chatbotuibiomania/biomania-together:v1.1.3`.
- Automated Docstring Addition: Users can now effortlessly convert GitHub source code to our tool with scripts that automatically add docstrings, freeing them from the manual effort previously required.
- v1.1.4 (2023-11-22)
- Add [`manual`](R2APP.md) support for converting R code to API_init.json. Will support for converting R code to APP later!
- Release docker v1.1.3 with support for 12 PyPI biotools. Notice that some tools are only available under their own conda environment!!
- Release docker v1.1.3 with support for 12 PyPI biotools. Notice that some tools are only available under their own conda environment!! We update `chatbotuibiomania/biomania-frontend:v1.1.3` and `chatbotuibiomania/biomania-backend:v1.1.3`.
- Resolved issues related to Docker networking and Docker CUDA.
- Improved the stability of the server demo by ensuring that it runs continuously in the background.
- v1.1.3 (2023-11-20)
Expand Down
4 changes: 2 additions & 2 deletions chatbot_ui_biomania/components/Chat/LibCardSelect.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ export const libImages: { [key: string]: string } = {
'pyopenms': '/apps/pyopenms.png',
'scenicplus': '/apps/SCENIC.png',
'scvi-tools': '/apps/scvitools.svg',
'SONATA': '/apps/SONATA.png',
'sonata': '/apps/SONATA.png',
'MIOSTONE': '/apps/MIOSTONE.jpg',
//'custom': '/apps/customize.jpg',
};
Expand Down Expand Up @@ -48,7 +48,7 @@ export const LibCardSelect = () => {
{ id: 'pyopenms', name: 'pyopenms' },
{ id: 'scenicplus', name: 'scenicplus' },
{ id: 'scvi-tools', name: 'scvi-tools' },
{ id: 'SONATA', name: 'SONATA' },
{ id: 'sonata', name: 'sonata' },
{ id: 'MIOSTONE', name: 'MIOSTONE' },
//{ id: 'custom', name: 'custom' },
];
Expand Down
Binary file added images/BioMANIA.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
118 changes: 105 additions & 13 deletions src/Git2APP/get_API_init_from_sourcecode.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import pydoc, argparse, json, re, os, collections, inspect, importlib, typing, functools
from typing import Any, Type, List
from docstring_parser import parse
from langchain.document_loaders import BSHTMLLoader
from configs.model_config import ANALYSIS_PATH, get_all_variable_from_cheatsheet, get_all_basic_func_from_cheatsheet
Expand All @@ -16,7 +17,17 @@ def process_html(html_path: str) -> str:
content = re.sub(r'\s+', ' ', content) # remove large blanks
return content

def get_dynamic_types():
def get_dynamic_types() -> List[Type]:
"""
Retrieves a list of various basic and complex data types from Python's built-in, typing,
collections, and collections.abc modules.
Returns
-------
List[Type]
A list containing types such as int, float, str, list, dict, and more specialized types
like typing.Union, collections.deque, collections.abc.Iterable, etc.
"""
basic_types = [int, float, str, bool, list, tuple, dict, set, type(None)]
useful_types_from_typing = [typing.Any, typing.Callable, typing.Union, typing.Optional,
typing.List, typing.Dict, typing.Tuple, typing.Set, typing.Type, typing.Collection]
Expand All @@ -28,7 +39,22 @@ def get_dynamic_types():
all_types = basic_types + useful_types_from_typing + useful_types_from_collections + useful_types_from_collections_abc
return all_types

def type_to_string(t):
def type_to_string(t: Type[Any]) -> str:
"""
Convert a type to its string representation.
Parameters
----------
t : Type[Any]
The type to be converted to string.
Returns
-------
str
The string representation of the type. If the type is a Cython function or method,
it returns "method". If it's a class, it returns the class name. Otherwise, it returns
the type name or its string representation.
"""
type_str = str(t)
if 'cython_function_or_method' in type_str:
return "method" # label cython func/method as "method"
Expand All @@ -42,7 +68,23 @@ def type_to_string(t):
type_strings = get_dynamic_types()
typing_list = [type_to_string(t) for t in type_strings]

def expand_types(param_type):
def expand_types(param_type: str) -> List[str]:
"""
Expands a string representing a type or multiple types separated by '|' or 'or' into a list
of individual type strings.
Parameters
----------
param_type : str
A string representing a single type or multiple types separated by '|' or 'or'.
Returns
-------
List[str]
A list of strings, where each string is a type extracted from the input string.
The types are stripped of leading and trailing whitespace.
"""
if is_outer_level_separator(param_type, "|"):
types = param_type.split('|')
elif is_outer_level_separator(param_type, " or "):
Expand All @@ -51,7 +93,7 @@ def expand_types(param_type):
types = [param_type]
return [t.strip() for t in types]

def is_outer_level_separator(s, sep="|"):
def is_outer_level_separator(s: str, sep: str = "|") -> bool:
"""
Check if the separator (like '|' or 'or') is at the top level (not inside brackets).
"""
Expand All @@ -65,9 +107,21 @@ def is_outer_level_separator(s, sep="|"):
return True
return False

def resolve_forwardref(forward_ref_str):
def resolve_forwardref(forward_ref_str: str):
"""
Resolve a string representation of a ForwardRef type into the actual type.
Resolves a string representing a ForwardRef type (like 'List[int]') into the actual Python type,
using a predefined namespace of common types and typing constructs.
Parameters
----------
forward_ref_str : str
The string representation of a ForwardRef type.
Returns
-------
Any
The resolved type if the string can be evaluated successfully within the provided namespace;
otherwise, returns the input string itself indicating an unresolved ForwardRef.
"""
namespace = {
"int": int,
Expand All @@ -87,6 +141,10 @@ def resolve_forwardref(forward_ref_str):
return forward_ref_str

def format_type_ori(annotation):
"""
Formats a type annotation into a string representation, resolving forward references and
handling various special cases like None, Optional, and Union types.
"""
if not annotation:
return None
if annotation == inspect.Parameter.empty:
Expand Down Expand Up @@ -122,12 +180,22 @@ def format_type_ori(annotation):
return str(annotation).replace("typing.", "")

def format_type(annotation):
"""
Formats a type annotation into a string representation with specific handling for NumPy's
'NDArrayA' type, converting it into 'ndarray[Any, dtype[Any]]'.
"""
ans = format_type_ori(annotation)
if ans:
ans = ans.replace("NDArrayA", "ndarray[Any, dtype[Any]]")
return ans

def is_valid_member(obj):
def is_valid_member(obj) -> bool:
"""
Determines whether the given object is a valid member based on its type.
Valid members include callable objects, specific collections (dict, list, tuple, set),
classes, functions, methods, modules, and objects with a '__call__' method that are
identified as methods.
"""
return (
callable(obj) or
isinstance(obj, (dict, list, tuple, set)) or # , property
Expand All @@ -138,7 +206,12 @@ def is_valid_member(obj):
(hasattr(obj, '__call__') and 'method' in str(obj))
)

def is_unwanted_api(member):
def is_unwanted_api(member) -> bool:
"""
Determines whether a member (typically a class) is considered an unwanted API.
Unwanted APIs are identified as either subclasses of BaseException or classes whose
base classes are defined in a different module than the class itself.
"""
if inspect.isclass(member):
if issubclass(member, BaseException):
return True
Expand All @@ -163,7 +236,22 @@ def is_from_external_module(lib_name, member):
return LIB not in module_name"""
return False

def are_most_strings_modules(api_strings):
def are_most_strings_modules(api_strings: list) -> bool:
"""
Determines whether the majority of strings in a given list represent valid Python modules.
It tries to import each string as a module and counts the successful imports.
Parameters
----------
api_strings : list
A list of strings, each potentially representing a module name.
Returns
-------
bool
True if more than 50% of the strings in the list are valid module names, False otherwise.
"""
valid_modules = 0
total_strings = len(api_strings)
for api in api_strings:
Expand All @@ -174,7 +262,11 @@ def are_most_strings_modules(api_strings):
continue
return valid_modules / total_strings > 0.5

def recursive_member_extraction(module, prefix, lib_name, visited=None, depth=None):
def recursive_member_extraction(module, prefix: str, lib_name: str, visited=None, depth=None) -> list:
"""
Recursively extracts members from a module, including classes and submodules,
while avoiding duplicates and unwanted members.
"""
if visited is None:
visited = set()
members = []
Expand All @@ -190,7 +282,7 @@ def recursive_member_extraction(module, prefix, lib_name, visited=None, depth=No
if inspect.isclass(member):
if issubclass(member, Exception): #inspect.isclass(member) and
continue
if inspect.isabstract(member): # 排除抽象属性
if inspect.isabstract(member): # remove abstract attribute
continue
"""if member.__module__ == 'builtins':
continue"""
Expand Down Expand Up @@ -429,7 +521,7 @@ def filter_optional_parameters(api_data):
def generate_api_callings(results, basic_types=['str', 'int', 'float', 'bool', 'list', 'dict', 'tuple', 'set', 'any', 'List', 'Dict']):
updated_results = {}
for api_name, api_info in results.items():
if api_info["api_type"] in ['function', 'method', 'class', 'functools.partial']:
if api_info["api_type"] in ['function', 'method', 'class', 'Class', 'functools.partial']:
# Update the optional_value key for each parameter
for param_name, param_details in api_info["Parameters"].items():
param_type = param_details.get('type')
Expand Down Expand Up @@ -493,7 +585,7 @@ def filter_specific_apis(data, lib_name):
parameters = details['Parameters']
Returns_type = details['Returns']['type']
Returns_description = details['Returns']['description']
if api_type in ["module", "constant", "property", "getset_descriptor", "Class", "class"]:
if api_type in ["module", "constant", "property", "getset_descriptor"]:
filter_counts["api_type_module_constant_property_getsetdescriptor"] += 1
filter_API["api_type_module_constant_property_getsetdescriptor"].append(api)
continue
Expand Down
Loading

0 comments on commit 75b9237

Please sign in to comment.