Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to support CDM data import #1

Merged
merged 20 commits into from
Aug 14, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
name: tests

# This action is triggered:
# 1. when someone creates a pull request for a merge to the main branch
# 2. when changes are merged into the main branch (via a pull request)
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
jeff-cohere marked this conversation as resolved.
Show resolved Hide resolved

jobs:
test:
runs-on: ${{ matrix.os }}
container: ${{ matrix.container }}

# we support Linux and macOS
strategy:
matrix:
os: [ubuntu-latest, macos-latest]

# Steps for running tests and analysis.
steps:
- name: Checking out repository (${{ matrix.os }})
uses: actions/checkout@v4
with:
token: ${{ secrets.GITHUB_TOKEN }}
submodules: recursive

- name: Setting up Python 3.10 (${{ matrix.os }})
jeff-cohere marked this conversation as resolved.
Show resolved Hide resolved
uses: actions/[email protected]
with:
python-version: "3.10"
jeff-cohere marked this conversation as resolved.
Show resolved Hide resolved

- name: Installing dtspy dependencies (${{ matrix.os }})
run: python3 -m pip install -r requirements.txt

- name: Running tests (${{ matrix.os }})
run: coverage run -m unittest discover
env:
DTS_KBASE_DEV_TOKEN: ${{ secrets.DTS_KBASE_DEV_TOKEN }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should already be some kbase dev tokens in the environment if you check in the settings for this repo.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried KBASE_CI_TOKEN, but it doesn't work. The GitHub documentation is entering the Enterprise Software Heat Death stage of its lifecycle, so it's hard to figure out the difference between org secrets and repo secrets, or even whether anyone at GitHub knows what the difference is supposed to be. So I'm reverting my change here for the time being. I'll revisit when we have this other stuff sorted.


# add this when ready
#- if: ${{ matrix.os == 'ubuntu-latest' }}
# name: Uploading coverage report to codecov.io
# uses: codecov/[email protected]
# with:
# token: ${{ secrets.CODECOV_TOKEN }}
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,5 @@
# dtspy

![Tests](https://github.com/kbase/dtspy/actions/workflows/tests.yml/badge.svg)

Python client for the Data Transfer Service
95 changes: 65 additions & 30 deletions dts/client.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import base64
from frictionless.resources import JsonResource
import io
import requests
from requests.auth import AuthBase
import logging
Expand All @@ -22,8 +21,9 @@ def __init__(self, api_key):
self.api_key = api_key

def __call__(self, r):
jeff-cohere marked this conversation as resolved.
Show resolved Hide resolved
token = base64.b64encode(bytes(self.api_key + '\n', 'utf-8'))
r.headers['Authorization'] = f'Bearer {token.decode('utf-8')}'
b64_token = base64.b64encode(bytes(self.api_key + '\n', 'utf-8'))
token = b64_token.decode('utf-8')
r.headers['Authorization'] = f'Bearer {token}'
return r

class Client(object):
jeff-cohere marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -39,6 +39,8 @@ def __init__(self,
self.connect(server = server, port = port, api_key = api_key)
else:
self.uri = None
self.name = None
self.version = None

def connect(self,
api_key = None,
Expand Down Expand Up @@ -104,47 +106,69 @@ def search(self,
status = None,
offset = 0,
limit = None,
specific = None,
):
"""
`client.search(database = None,
query = None,
status = None,
offset = 0,
limit = None) -> `list` of `frictionless.DataResource` objects
limit = None,
specific = None) -> `list` of `frictionless.DataResource` objects

* Performs a synchronous search of the database with the given name using the
given query string.
Optional arguments:
* query: a search string that is directly interpreted by the database
* status: filters for files based on their status:
* `"staged"` means "search only for files that are already in the source database staging area"
* `"archived"` means "search only for files that are archived and not staged"
* `"unstaged"` means "search only for files that are not staged"
* offset: a 0-based index from which to start retrieving results (default: 0)
* limit: if given, the maximum number of results to retrieve
* specific: a dictionary mapping database-specific search parameters to their values
"""
if not self.uri:
raise RuntimeError('dts.Client: not connected.')
if type(query) != str:
raise RuntimeError('search: missing or invalid query.')
if type(database) != str:
raise TypeError('search: database must be a string.')
if status and status not in ['staged', 'unstaged']:
raise TypeError(f'search: invalid status: {status}.')
if type(offset) != int or offset < 0:
raise TypeError('search: invalid offset: %s.'%offset)
raise TypeError(f'search: invalid offset: {offset}.')
if limit:
if type(limit) != int:
raise TypeError('search: limit must be an int.')
elif limit < 1:
raise TypeError(f'search: invalid number of retrieved results: {N}')
if specific and type(specific) != dict:
jeff-cohere marked this conversation as resolved.
Show resolved Hide resolved
raise TypeError('search: specific must be a dict.')
try:
params = {
'database': database,
'query': query,
'status': status,
'offset': offset,
'limit': limit,
}
response = requests.get(url=f'{self.uri}/files', params=params, auth=self.auth)
for name in ['status', 'offset', 'limit']:
val = eval(name)
if val:
params[name] = val
jeff-cohere marked this conversation as resolved.
Show resolved Hide resolved
if specific:
params['specific'] = specific
response = requests.post(url=f'{self.uri}/files',
json=params,
auth=self.auth)
else:
response = requests.get(url=f'{self.uri}/files',
params=params,
auth=self.auth)
jeff-cohere marked this conversation as resolved.
Show resolved Hide resolved
response.raise_for_status()
except HTTPError as http_err:
logger.error(f'HTTP error occurred: {http_err}')
return None
except requests.exceptions.HTTPError as err:
logger.error(f'HTTP error occurred: {err}')
return None
jeff-cohere marked this conversation as resolved.
Show resolved Hide resolved
except Exception as err:
logger.error(f'Other error occurred: {err}')
return None
Expand All @@ -154,11 +178,13 @@ def search(self,
def transfer(self,
file_ids = None,
source = None,
destination = None):
destination = None,
timeout = None):
"""
`client.transfer(file_ids = None,
source = None,
destination = None) -> UUID
destination = None,
timeout = None) -> UUID

* Submits a request to transfer files from a source to a destination database. the
files in the source database are identified by a list of string file_ids.
Expand All @@ -170,26 +196,33 @@ def transfer(self,
if type(destination) != str:
raise TypeError('transfer: destination database name must be a string.')
if type(file_ids) != list:
raise TypeError('batch: sequences must be a list of string file IDs.')
raise TypeError('transfer: file_ids must be a list of string file IDs.')
if timeout and type(timeout) != int and type(timeout) != float:
raise TypeError('transfer: timeout must be a number of seconds.')
try:
response = requests.post(f'{self.uri}/transfers',
data={
source: source,
destination: destination,
file_ids: file_ids,
})
response = requests.post(url=f'{self.uri}/transfers',
json={
'source': source,
'destination': destination,
'file_ids': file_ids,
},
auth=self.auth,
timeout=timeout)
response.raise_for_status()
except HTTPError as http_err:
logger.error(f'HTTP error occurred: {http_err}')
return None
except requests.exceptions.HTTPError as err:
logger.error(f'HTTP error occurred: {err}')
return None
jeff-cohere marked this conversation as resolved.
Show resolved Hide resolved
except Exception as err:
logger.error(f'Other error occurred: {err}')
return None
else:
return uuid.UUID(response.json()["id"])
jeff-cohere marked this conversation as resolved.
Show resolved Hide resolved

def transferStatus(self, id):
"""`client.transferStatus(id)` -> TransferStatus
def transfer_status(self, id):
"""`client.transfer_status(id)` -> TransferStatus

* Returns status information for the transfer with the given identifier.
Possible statuses are:
Expand All @@ -205,7 +238,8 @@ def transferStatus(self, id):
if not self.uri:
raise RuntimeError('dts.Client: not connected.')
try:
response = requests.get(f'{self.uri}/transfers/{str(id)}')
response = requests.get(url=f'{self.uri}/transfers/{str(id)}',
jeff-cohere marked this conversation as resolved.
Show resolved Hide resolved
auth=self.auth)
response.raise_for_status()
except HTTPError as http_err:
logger.error(f'HTTP error occurred: {http_err}')
Expand All @@ -216,23 +250,24 @@ def transferStatus(self, id):
else:
results = response.json()
return TransferStatus(
id = response['id'],
status = response['status'],
message = response['message'] if 'message' in response else None,
num_files = response['num_files'],
num_files_transferred = response['num_files_transferred'],
id = results['id'],
status = results['status'],
message = results['message'] if 'message' in results else None,
jeff-cohere marked this conversation as resolved.
Show resolved Hide resolved
num_files = results['num_files'],
num_files_transferred = results['num_files_transferred'],
)

def deleteTransfer(self, id):
def cancel_transfer(self, id):
"""
`client.deleteTransfer(id) -> None
`client.cancel_transfer(id) -> None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's probably worth using one of the recognised python documentation formats -- depending on what IDE you use, you may be able to automatically generate documentation skeletons for functions and if you add a linter, you won't get a bunch of errors because the docs don't follow pydocstyle.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've been hesitant to strap myself all the way into Python dev mode, but it's probably a good time to address some of this stuff. I don't use an IDE, though, because I find they make it easier to produce mountains of rubbish with almost no effort.

"one of the recognised python documentation formats" gets at the key issue -- there's 10M ways to do everything in Python. In this it takes after my other "favorite" language, C++, in which everyone strives to prove how clever they are by trotting out esoterically cute ways of doing trivial things that don't pertain to any other language.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: #7

Copy link
Collaborator

@ialarmedalien ialarmedalien Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I use the sphinx documentation style, and ruff for python code linting/formatting. I use VSCode for working with python -- there's an extension for generating python documentation for functions and a nice ruff integration, so you can handle all that stuff whilst you're working on the code. I'm sure that there are also emacs / vi / etc. integrations for ruff, too.


* Deletes a file transfer, canceling
"""
if not self.uri:
raise RuntimeError('dts.Client: not connected.')
try:
response = requests.delete(f'{self.uri}/transfers/{str(id)}')
response = requests.delete(url=f'{self.uri}/transfers/{str(id)}',
jeff-cohere marked this conversation as resolved.
Show resolved Hide resolved
auth=self.auth)
response.raise_for_status()
except HTTPError as http_err:
logger.error(f'HTTP error occurred: {http_err}')
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ certifi==2024.2.2
chardet==5.2.0
charset-normalizer==3.3.2
click==8.1.7
coverage==7.6.0
frictionless==5.17.0
humanize==4.9.0
idna==3.7
Expand Down
Empty file added test/__init__.py
Empty file.
65 changes: 65 additions & 0 deletions test/test_client.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# unit tests for the dts.client package

import dts
import os
import unittest

class TestClient(unittest.TestCase):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would switch to using pytest for tests -- it is a lot more flexible that unittest as you can easily parametrise your tests (run the same test on a bunch of different inputs) and the test output reporting is much more helpful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, one of the 10M testing packages. Thanks for the recommendation!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made an issue for this.

"""Unit tests for dts.client.Client"""

def setUp(self):
self.token = os.getenv('DTS_KBASE_DEV_TOKEN')
if not self.token:
raise ValueError('Environment variable DTS_KBASE_DEV_TOKEN must be set!')
self.server = "https://lb-dts.staging.kbase.us"

def test_ctor(self):
client = dts.Client(api_key = self.token, server = self.server)
self.assertTrue(client.uri)
self.assertTrue(client.name)
self.assertTrue(client.version)
Comment on lines +18 to +20
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these aren't really worthwhile tests -- maybe they can be improved when you move to pytest, but I would expect the tests to check the values of these attributes, not just whether they are truthy or falsy.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can definitely address these when we move to pytest.

Copy link
Collaborator

@ialarmedalien ialarmedalien Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pytest uses assert so these self.assertTrue and self.assertFalse statements would turn into something like:

assert client.uri == "some_client_uri"
assert client.name
assert isinstance(client.version, str)
assert len(client.name) > 15


def test_connect(self):
client = dts.Client()
self.assertFalse(client.uri)
self.assertFalse(client.name)
self.assertFalse(client.version)
client.connect(api_key = self.token, server = self.server)
self.assertTrue(client.uri)
self.assertTrue(client.name)
self.assertTrue(client.version)
client.disconnect()
self.assertFalse(client.uri)
self.assertFalse(client.name)
self.assertFalse(client.version)

def test_databases(self):
client = dts.Client(api_key = self.token, server = self.server)
dbs = client.databases()
self.assertTrue(isinstance(dbs, list))
self.assertEqual(2, len(dbs))
self.assertTrue(any([db.id == 'jdp' for db in dbs]))
self.assertTrue(any([db.id == 'kbase' for db in dbs]))

def test_basic_jdp_search(self):
client = dts.Client(api_key = self.token, server = self.server)
results = client.search(database = 'jdp', query = '3300047546')
self.assertTrue(isinstance(results, list))
self.assertTrue(len(results) > 0)
self.assertTrue(all([result.to_dict()['id'].startswith('JDP:')
for result in results]))

def test_jdp_search_for_taxon_oid(self):
client = dts.Client(api_key = self.token, server = self.server)
taxon_oid = '2582580701'
params = {'f': 'img_taxon_oid', 'extra': 'img_taxon_oid'}
results = client.search(database = 'jdp',
query = taxon_oid,
specific = params)
self.assertTrue(isinstance(results, list))
self.assertTrue(len(results) > 0)
self.assertTrue(any([result.to_dict()['extra']['img_taxon_oid'] == int(taxon_oid)
for result in results]))

if __name__ == '__main__':
unittest.main()