neulab · neubig · Oct 20, 2022 · Oct 18, 2022 · Oct 18, 2022 · Oct 18, 2022
diff --git a/README.md b/README.md
@@ -5,19 +5,17 @@ using [ExplainaBoard](https://explainaboard.inspiredco.ai).
 
 ## Preparation
 
-### Install
+**Install:** First, install the client.
 
-- For CLI/api users
-    - `pip install explainaboard_client`
-- For explainaboard client developers
-    - `pip install ".[dev]"`
-
-### Acquiring a Login and API Key
+```bash
+pip install explainaboard_client
+```
 
-First, create an account at the [ExplainaBoard](https://explainaboard.inspiredco.ai)
-site and remember the email address you used. Once you are logged in, you can click on
-the upper-right corner of the screen, and it will display your API key, which you can
-copy-paste.
+**Acquiring a Login and API Key:**
+Create an account at the [ExplainaBoard](https://explainaboard.inspiredco.ai)
+site and log in. Once you are logged in, you can click on
+the upper-right corner of the screen, and it will display your email and API key, which 
+you can copy-paste.
 
 You can save these into environmental variables for convenient use in the commands
 below:
@@ -27,127 +25,55 @@ export EB_USERNAME="[your username]"
 export EB_API_KEY="[your API key]"
 ```
 
-## Usage
-
-### Evaluating/Browsing/Deleting Systems from the Command Line
-
-**Evaluating Systems:** The most common usage of this client will probably be to
-evaluate systems on the ExplainaBoard server. You can do that from the
-command line.
-
-If you are using a pre-existing dataset viewable from the
-[ExplainaBoard datasets](https://explainaboard.inspiredco.ai/datasets)
-page then you can use something like the following command:
-
-```
-python -m explainaboard_client.cli.evaluate_system \
-  --username $EB_USERNAME --api_key $EB_API_KEY \
-  --task [TASK_ID] \
-  --system_name [MODEL_NAME] \
-  --system_output_file [SYSTEM_OUTPUT] --system_output_file_type [FILE_TYPE] \
-  --dataset [DATASET] --sub_dataset [SUB_DATASET] --split [SPLIT] \
-  --source_language [SOURCE] --target_language [TARGET] \
-  [--public]
+## Use in Python
+
+The most common usage of this client will probably be to
+evaluate systems on the ExplainaBoard server.
+Below is an example of how you can do this in Python.
+
+```python
+import os
+import explainaboard_client
+
+# Set up your environment
+explainaboard_client.username = os.environ['EB_USERNAME']
+explainaboard_client.api_key = os.environ['EB_API_KEY']
+client = explainaboard_client.ExplainaboardClient()
+
+# Do the evaluation
+evaluation_result = client.evaluate_system_file(
+    task='text-classification',
+    system_name='text-classification-test',
+    system_output_file='example/data/sst2-lstm-output.txt',
+    system_output_file_type='text',
+    dataset='sst2',
+    split='test',
+    source_language='en',
+)
 ```
 
-You will need to fill in all the settings appropriately, for example:
-* `[TASK_ID]` is the ID of the task you want to perform. A full list is [here](https://github.com/neulab/explainaboard_web/blob/main/backend/src/impl/tasks.py).
-* `[MODEL_NAME]` is whatever name you want to give to your model.
-* `[SYSTEM_OUTPUT_FILE]` is the file that you want to evaluate.
-* `[FILE_TYPE]` is the type of the file, "text", "tsv", "csv", "conll", or "json".
-* `[DATASET]`, `[SUB_DATASET]` and `[SPLIT]` indicate which dataset you're evaluating
-  a system output for.
-* `[SOURCE]` and `[TARGET]` language indicate the language code of the input and output of
-  the system. Please refer to the [ISO-639-3](https://iso639-3.sil.org/code_tables/639/data) list for the 3-character 693-3 language codes. Enter `other-[your custom languages]` if the dataset uses custom languages. Enter `none` if the dataset uses other modalities like images. If the inputs and outputs are the in the same language you only need to
-  specify one or the other.
-* By default your systems will be private, but if you add the `--public` flag they
-  will be made public on the public leaderboards and system listing.
-
-**Evaluating w/ Custom Datasets:** You can also evaluate results for custom datasets
-that are not supported by DataLab yet:
+For more details on precisely how to specify all the variables, as well as how to do
+other things such as search for and delete systems, see the
+[documentation of the Python API](docs/python_api.md).
 
-```
-python -m explainaboard_client.cli.evaluate_system \
-  --username $EB_USERNAME --api_key $EB_API_KEY \
-  --task [TASK_ID] \
-  --system_name [MODEL_NAME] \
-  --system_output_file [SYSTEM_OUTPUT] --system_output_file_type [FILE_TYPE] \
-  --custom_dataset_file [CUSTOM_DATASET] --custom_dataset_file_type [FILE_TYPE] \
-  --source_language [SOURCE] --target_language [TARGET]
-```
+## Use from the Command Line
 
-with similar file and file-type arguments to the system output above. If you're
-interested in getting your datasets directly supported within ExplainaBoard, please
-open an issue or send a PR to [DataLab](https://github.com/expressai/datalab), and we'll
-be happy to help out!
-
-**Finding Uploaded Systems:** You can also find systems that have already been evaluated
-using the following syntax
-```
-python -m explainaboard_client.cli.find_systems \
-  --username $EB_USERNAME --api_key $EB_API_KEY --output_format tsv
-```
-By default this outputs in a summarized TSV format (similar to the online system
-browser), but you can set `--output_format json` to get more extensive information.
-There are many options for how you can specify which systems you want to find, which you
-can take a look at by running `python -m explainaboard_client.cli.find_systems` without
-any arguments.
-
-**Deleting System Outputs:** You can delet existing system outputs using the following
-command:
-```
-python -m explainaboard_client.cli.delete_systems \
-  --username $EB_USERNAME --api_key $EB_API_KEY --system_ids XXX YYY
-```
-Here the `system_ids` are the unique identifier of each system returned in the
-`system_id` field of the JSON returned by the `find_systems` command above. The system
-IDs are *not* the system name as displayed in the interface.
-
-### Evaluating Systems on Benchmarks from the Command Line
-Instead of simply evaluating an individual system, another common scenario is 
-to submit a group of systems to a benchmark (e.g., GLUE). To achieve this goal,
-you can follow the command below: 
+You can also evaluate systems from the command line like this.
 
 ```shell
-python -m explainaboard_client.cli.evaluate_benchmark \
-      --username XXX  \
-      --api_key YYY \
-      --system_name your_system \
-      --system_outputs submissions/* \
-      --benchmark benchmark_config.json \
-      --server local
+python -m explainaboard_client.cli.evaluate_system \
+  --task text-classification \
+  --system_name text-classification-test \
+  --system_output_file example/data/sst2-lstm-output.txt \
+  --system_output_file_type text \
+  --dataset sst2 \
+  --split test \
+  --source_language 'en'
 ```
-where
-* `--username`: the email of your explainaboard account
-* `--api_key`: your API key
-* `--system_name`: the system name of your submission. Note: this assumes that all
-system output share one system name.
-* `--benchmark`: the benchmark config file (you can check out this [doc](TBC) to see how to configure the benchmark.)
-* `system_outputs`: system output files. Note that the order of `system_outputs` files should
-strictly correspond to the dataset order of `datasets` in `benchmark_config.json`.
-* By default, your systems will be private, but if you add the `--public` flag, they
-  will be made public on the public leaderboards and system listing.
-
-Here is one [example](./example/benchmark/gaokao/) for the `Gaokao` benchmark.
-
-
-
-### Programmatic Usage
-
-Please see examples in `./tests`.
-We will be working on more examples and documentation shortly.
-
-
-
-
-## Update
-
-There are two packages associated with this CLI: `explainaboard_api_client` and `explainaboard_client`
-- `explainaboard_api_client`: auto generated according to OpenAPI definition specified in [openapi.yaml](https://github.com/neulab/explainaboard_web/tree/main/openapi). Version of this client is specified in the same yaml file (`info.version`).
-  - To update: `pip install -U explainaboard_api_client` or specify a specific version
-  - To check the API version used in the live environment: `curl https://explainaboard.inspiredco.ai/api/info` (this information will be added to the UI in the future)
-- `explainaboard_client`: a thin wrapper for the API client to make it easy to use. It helps users configure API keys, choose host names, load files from local FS, etc. Usually, this package is relatively stable so you don't need to update unless a new feature of the CLI is released.
-  - To update: `pip install -U explainaboard_client`
 
+For more details, see the [command line documentation](docs/cli.md).
 
+## Having Trouble?
 
+Please [open an issue](https://github.com/neulab/explainaboard_client/issues) on the
+issues page and we'll be happy to help!
diff --git a/docs/cli.md b/docs/cli.md
@@ -0,0 +1,114 @@
+# Command Line Access to ExplainaBoard
+
+You can use `explainaboard_client` to evaluate, browse, and delete systems.
+
+## Evaluation
+
+The most common usage of this client will probably be to
+evaluate systems on the ExplainaBoard server. You can do that from the
+command line.
+
+If you are using a pre-existing dataset viewable from the
+[ExplainaBoard datasets](https://explainaboard.inspiredco.ai/datasets)
+page then you can use something like the following command:
+
+```shell
+python -m explainaboard_client.cli.evaluate_system \
+  --username $EB_USERNAME --api_key $EB_API_KEY \
+  --task [TASK_ID] \
+  --system_name [MODEL_NAME] \
+  --system_output_file [SYSTEM_OUTPUT] --system_output_file_type [FILE_TYPE] \
+  --dataset [DATASET] --sub_dataset [SUB_DATASET] --split [SPLIT] \
+  --source_language [SOURCE] --target_language [TARGET] \
+  [--public]
+```
+
+You will need to fill in all the settings appropriately, for example:
+* `[TASK_ID]` is the ID of the task you want to perform. A full list is [here](https://github.com/neulab/explainaboard_web/blob/main/backend/src/impl/tasks.py).
+* `[MODEL_NAME]` is whatever name you want to give to your model.
+* `[SYSTEM_OUTPUT_FILE]` is the file that you want to evaluate. The file format depends
+  on the task, and you can see the list of
+  [ExplainaBoard task file formats](https://github.com/neulab/ExplainaBoard/blob/main/docs/task_file_formats.md)
+  for more details..
+* `[FILE_TYPE]` is the type of the file, "text", "tsv", "csv", "conll", or "json".
+* `[DATASET]`, `[SUB_DATASET]` and `[SPLIT]` indicate which dataset you're evaluating
+  a system output for.
+* `[SOURCE]` and `[TARGET]` language indicate the language code of the input and output of
+  the system. Please refer to the [ISO-639-3](https://iso639-3.sil.org/code_tables/639/data) list for the 3-character 693-3 language codes. Enter `other-[your custom languages]` if the dataset uses custom languages. Enter `none` if the dataset uses other modalities like images. If the inputs and outputs are the in the same language you only need to
+  specify one or the other.
+* By default your systems will be private, but if you add the `--public` flag they
+  will be made public on the public leaderboards and system listing.
+
+## Evaluation with Custom Datasets
+
+You can also evaluate results for custom datasets
+that are not supported by ExplainaBoard yet:
+
+```shell
+python -m explainaboard_client.cli.evaluate_system \
+  --username $EB_USERNAME --api_key $EB_API_KEY \
+  --task [TASK_ID] \
+  --system_name [MODEL_NAME] \
+  --system_output_file [SYSTEM_OUTPUT] --system_output_file_type [FILE_TYPE] \
+  --custom_dataset_file [CUSTOM_DATASET] --custom_dataset_file_type [FILE_TYPE] \
+  --source_language [SOURCE] --target_language [TARGET]
+```
+
+with similar file and file-type arguments to the system output above. If you're
+interested in getting your datasets directly supported within ExplainaBoard, please
+open an issue or send a PR to [DataLab](https://github.com/expressai/datalab), and we'll
+be happy to help out!
+
+## Finding Uploaded Systems
+
+You can also find systems that have already been evaluated
+using the following syntax
+```shell
+python -m explainaboard_client.cli.find_systems \
+  --username $EB_USERNAME --api_key $EB_API_KEY --output_format tsv
+```
+By default this outputs in a summarized TSV format (similar to the online system
+browser), but you can set `--output_format json` to get more extensive information.
+There are many options for how you can specify which systems you want to find, which you
+can take a look at by running `python -m explainaboard_client.cli.find_systems` without
+any arguments.
+
+## Deleting System Outputs
+
+You can delete existing system outputs using the following
+command:
+```shell
+python -m explainaboard_client.cli.delete_systems \
+  --username $EB_USERNAME --api_key $EB_API_KEY --system_ids XXX YYY
+```
+Here the `system_ids` are the unique identifier of each system returned in the
+`system_id` field of the JSON returned by the `find_systems` command above. The system
+IDs are *not* the system name as displayed in the interface.
+
+## Evaluating Systems on Benchmarks from the Command Line
+Instead of simply evaluating an individual system, another common scenario is 
+to submit a group of systems to a benchmark (e.g., GLUE). To achieve this goal,
+you can follow the command below: 
+
+```shell
+python -m explainaboard_client.cli.evaluate_benchmark \
+      --username XXX  \
+      --api_key YYY \
+      --system_name your_system \
+      --system_outputs submissions/* \
+      --benchmark benchmark_config.json \
+      --server local
+```
+where
+* `--username`: the email of your explainaboard account
+* `--api_key`: your API key
+* `--system_name`: the system name of your submission. Note: this assumes that all
+system output share one system name.
+* `--benchmark`: the benchmark config file (you can check out this [doc](TBC) to see how to configure the benchmark.)
+* `system_outputs`: system output files. Note that the order of `system_outputs` files should
+strictly correspond to the dataset order of `datasets` in `benchmark_config.json`.
+* By default, your systems will be private, but if you add the `--public` flag, they
+  will be made public on the public leaderboards and system listing.
+
+Here is one [example](./example/benchmark/gaokao/) for the `Gaokao` benchmark.
+
diff --git a/docs/development.md b/docs/development.md
@@ -0,0 +1,21 @@
+# Developer Details
+
+This documentation stores notes for developers.
+
+## Installation
+
+If you would like to install the client for development, you can run the following
+command.
+
+```shell
+pip install ".[dev]"
+```
+
+## client and api_client packages
+
+There are two packages associated with this CLI: `explainaboard_api_client` and `explainaboard_client`
+- `explainaboard_api_client`: auto generated according to OpenAPI definition specified in [openapi.yaml](https://github.com/neulab/explainaboard_web/tree/main/openapi). Version of this client is specified in the same yaml file (`info.version`).
+  - To update: `pip install -U explainaboard_api_client` or specify a specific version
+  - To check the API version used in the live environment: `curl https://explainaboard.inspiredco.ai/api/info` (this information will be added to the UI in the future)
+- `explainaboard_client`: a thin wrapper for the API client to make it easy to use. It helps users configure API keys, choose host names, load files from local FS, etc. Usually, this package is relatively stable so you don't need to update unless a new feature of the CLI is released.
+  - To update: `pip install -U explainaboard_client`