Skip to content

Commit

Permalink
Merge branch 'main' into fix-1596
Browse files Browse the repository at this point in the history
  • Loading branch information
roll authored Apr 29, 2024
2 parents fb0fb0c + 925cbb3 commit cfe11eb
Show file tree
Hide file tree
Showing 368 changed files with 439 additions and 670 deletions.
16 changes: 7 additions & 9 deletions .github/workflows/general.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
# TODO: recover 3.8 and 3.9 when pytest-vcr is fixed
# https://github.com/ktosiek/pytest-vcr/issues/53
# TODO: recover 3.12 when duck is fixed
# https://github.com/duckdb/duckdb/issues/9563
# python-version: [3.8, 3.9, "3.10", "3.11", "3.12"]
python-version: ["3.10", "3.11"]
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]
steps:
- name: Checkout repository
uses: actions/checkout@v4
Expand Down Expand Up @@ -76,7 +71,10 @@ jobs:

test-macos:
if: github.event_name != 'schedule' || github.repository_owner == 'frictionlessdata'
runs-on: macos-latest
# TODO: migrate to macos-latest after figuring out how to
# make `posgres/pg_config` works in the environment. Currently, it fails
# with the following error: "pg_config" not found"
runs-on: macos-12
steps:
- name: Checkout repository
uses: actions/checkout@v4
Expand All @@ -90,7 +88,7 @@ jobs:
run: cp .env.example .env
- name: Test software
# https://stackoverflow.com/questions/9678408/cant-install-psycopg2-with-pip-in-virtualenv-on-mac-os-x-10-7
run: LDFLAGS=`echo $(pg_config --ldflags)` make test
run: LDFLAGS=`echo $(pg_config --ldflags)` hatch run +py=3.10 ci:test

# Test (Windows)

Expand All @@ -109,7 +107,7 @@ jobs:
- name: Prepare variables
run: cp .env.example .env
- name: Test software
run: make test
run: hatch run +py=3.10 ci:test

# Deploy

Expand Down
28 changes: 18 additions & 10 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,12 +49,10 @@ hatch shell
Use the following command to build the container:

```bash tabs=CLI
make docker
hatch run image
```

This should take care of setting up everything. If the container is
built without errors, you can then run commands like `make` inside the
container to accomplish various tasks (see the next section for details).
This should take care of setting up everything. If the container is built without errors, you can then run commands like `hatch` inside the container to accomplish various tasks (see the next section for details).

To make things easier, we can create an alias:

Expand All @@ -65,7 +63,7 @@ alias "frictionless-dev=docker run --rm -v $PWD:/home/frictionless -it frictionl
Then, for example, to run the tests, we can use:

```bash tabs=CLI
frictionless-dev make test
frictionless-dev hatch run test
```

## Development
Expand All @@ -74,13 +72,11 @@ frictionless-dev make test

Frictionless is a Python3.8+ framework, and it uses some common Python tools for the development process (we recommend enabling support of these tools in your IDE):

- code linting: `ruff`
- import sorting: `isort`
- code formatting: `black`
- linting/formatting: `ruff`
- type checking: `pyright`
- code testing: `pytest`

You also need `git` to work on the project, and `make` is recommended.
You also need `git` to work on the project.

### Documentation

Expand Down Expand Up @@ -117,33 +113,44 @@ def vcr_config():
- Setup CKAN local instance: https://github.com/okfn/docker-ckan
- Create a sysadmin account and generate api token
- Set apikey token in .env file

```
CKAN_APIKEY=***************************
```

#### Regenerating cassettes for Zenodo

**Read**

- To read, we need to use live site, the api library uses it by default.
- Login to zenodo if you have an account and create an access token.
- Set access token in .env file.

```
ZENODO_ACCESS_TOKEN=***************************
```

**Write**

- To write we can use either live site or sandbox. We recommend to use sandbox (https://sandbox.zenodo.org/api/).
- Login to zenodo(sandbox) if you have an account and create an access token.
- Set access token in .env file.

```
ZENODO_SANDBOX_ACCESS_TOKEN=***************************
```

- Set base_url in the control params

```
base_url='base_url="https://sandbox.zenodo.org/api/'
```

#### Regenerating cassettes for Github

- Login to github if you have an account and create an access token(Developer settings > Personal access tokens > Tokens).
- Set access token and other details in .env file. If email/name of the user is hidden we need to provide those details as well.

```
GITHUB_NAME=FD
[email protected]
Expand All @@ -153,8 +160,9 @@ GITHUB_ACCESS_TOKEN=***************************
## Releasing

To release a new version:

- check that you have push access to the `main` branch
- run `hatch version <major|minor|micro>` to update the version
- add changes to `CHANGELOG.md` if it's not a patch release (major or minor)
- run `make release` which create a release commit and tag and push it to Github
- run `hatch run release` which create a release commit and tag and push it to Github
- an actual release will happen on the Github CI platform after running the tests
40 changes: 0 additions & 40 deletions Makefile

This file was deleted.

109 changes: 42 additions & 67 deletions frictionless/__init__.py
Original file line number Diff line number Diff line change
@@ -1,68 +1,43 @@
from .actions import convert, describe, extract, index, list, transform, validate
from .analyzer import Analyzer
from .catalog import Catalog, Dataset
from .checklist import Check, Checklist
from .detector import Detector
from .dialect import Control, Dialect
from .error import Error
from .exception import FrictionlessException
from .indexer import Indexer
from .inquiry import Inquiry, InquiryTask
from .metadata import Metadata
from .package import Package
from .pipeline import Pipeline, Step
from .platform import Platform, platform
from .report import Report, ReportTask
from .resource import Resource
from .schema import Field, Schema
from .actions import convert as convert
from .actions import describe as describe
from .actions import extract as extract
from .actions import list as list
from .actions import transform as transform
from .actions import validate as validate
from .analyzer import Analyzer as Analyzer
from .catalog import Catalog as Catalog
from .catalog import Dataset as Dataset
from .checklist import Check as Check
from .checklist import Checklist as Checklist
from .detector import Detector as Detector
from .dialect import Control as Control
from .dialect import Dialect as Dialect
from .error import Error as Error
from .exception import FrictionlessException as FrictionlessException
from .indexer import Indexer as Indexer
from .inquiry import Inquiry as Inquiry
from .inquiry import InquiryTask as InquiryTask
from .metadata import Metadata as Metadata
from .package import Package as Package
from .pipeline import Pipeline as Pipeline
from .pipeline import Step as Step
from .platform import Platform as Platform
from .platform import platform as platform
from .report import Report as Report
from .report import ReportTask as ReportTask
from .resource import Resource as Resource
from .schema import Field as Field
from .schema import Schema as Schema
from .settings import VERSION as __version__
from .system import Adapter, Loader, Mapper, Parser, Plugin, System, system
from .table import Header, Lookup, Row
from .transformer import Transformer
from .validator import Validator

__all__ = [
"Adapter",
"Analyzer",
"Catalog",
"Check",
"Checklist",
"Control",
"Dataset",
"Detector",
"Dialect",
"Error",
"Field",
"FrictionlessException",
"Header",
"Indexer",
"Inquiry",
"InquiryTask",
"Loader",
"Lookup",
"Mapper",
"Metadata",
"Package",
"Parser",
"Pipeline",
"Platform",
"Plugin",
"Report",
"ReportTask",
"Resource",
"Row",
"Schema",
"Step",
"System",
"Transformer",
"Validator",
"convert",
"describe",
"extract",
"index",
"list",
"platform",
"system",
"transform",
"validate",
]
from .system import Adapter as Adapter
from .system import Loader as Loader
from .system import Mapper as Mapper
from .system import Parser as Parser
from .system import Plugin as Plugin
from .system import System as System
from .system import system as system
from .table import Header as Header
from .table import Lookup as Lookup
from .table import Row as Row
from .transformer import Transformer as Transformer
from .validator import Validator as Validator
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,10 @@ def test_analyze_resource_detailed_descriptive_statistics_with_outliers():
assert analysis["fieldStats"]["average_grades"]["outliers"] == [10000.0]


@pytest.mark.skipif(
sys.version_info >= (3, 12),
reason="Fix for Python3.12+ (possible bug to investigate)",
)
def test_analyze_resource_detailed_descriptive_statistics_variables_correlation():
resource = TableResource(path="data/analysis-data.csv")
analysis = resource.analyze(detailed=True)
Expand Down
12 changes: 9 additions & 3 deletions frictionless/analyzer/analyzer.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,9 @@ def analyze_table_resource(
_statistics(rows_without_nan_values) # type: ignore
)
analysis_report["fieldStats"][field.name]["outliers"] = []
analysis_report["fieldStats"][field.name]["missingValues"] = resource.stats.rows - len(rows_without_nan_values) # type: ignore
analysis_report["fieldStats"][field.name]["missingValues"] = (
resource.stats.rows - len(rows_without_nan_values) # type: ignore
)

# calculate correlation between variables(columns/fields)
for field_y in resource.schema.fields:
Expand Down Expand Up @@ -123,10 +125,14 @@ def analyze_table_resource(
"outliers"
].append(cell)

analysis_report["notNullRows"] = resource.stats.rows - analysis_report["rowsWithNullValues"] # type: ignore
analysis_report["notNullRows"] = ( # type: ignore
resource.stats.rows - analysis_report["rowsWithNullValues"] # type: ignore
)
analysis_report["averageRecordSizeInBytes"] = 0
if resource.stats.rows and resource.stats.bytes:
analysis_report["averageRecordSizeInBytes"] = resource.stats.bytes / resource.stats.rows # type: ignore
analysis_report["averageRecordSizeInBytes"] = (
resource.stats.bytes / resource.stats.rows
) # type: ignore
analysis_report["timeTaken"] = timer.time
return {
**analysis_report,
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
16 changes: 1 addition & 15 deletions frictionless/checks/__init__.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,4 @@
from .baseline import baseline
from .baseline import baseline as baseline
from .cell import *
from .row import *
from .table import *

__all__ = [
"ascii_value",
"baseline",
"deviated_cell",
"deviated_value",
"duplicate_row",
"forbidden_value",
"required_value",
"row_constraint",
"sequential_value",
"table_dimensions",
"truncated_value",
]
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
4 changes: 3 additions & 1 deletion tests/conftest.py → frictionless/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,9 @@ def vcr_cassette_dir(request):
def populate_db(engine):
with engine.begin() as conn:
conn.execute(sa.text('CREATE TABLE "table" (id INTEGER PRIMARY KEY, name TEXT)'))
conn.execute(sa.text("INSERT INTO \"table\" VALUES (1, 'english'), (2, '中国人')"))
conn.execute(
sa.text("INSERT INTO \"table\" VALUES (1, 'english'), (2, '中国人')")
)
conn.execute(
sa.text(
"CREATE TABLE fruits (uid INTEGER PRIMARY KEY, fruit_name TEXT, calories INTEGER)"
Expand Down
File renamed without changes.
File renamed without changes.
17 changes: 15 additions & 2 deletions frictionless/console/commands/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,16 @@
# Register modules
from . import convert, describe, explore, extract, index, inspect, list, publish, query
from . import script, summary, transform, validate
from . import (
convert,
describe,
explore,
extract,
index,
inspect,
list,
publish,
query,
script,
summary,
transform,
validate,
)
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading

0 comments on commit cfe11eb

Please sign in to comment.