Skip to content

Commit

Permalink
update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
mohayemin committed Jan 27, 2023
1 parent d71041b commit ddd9ec3
Show file tree
Hide file tree
Showing 3 changed files with 72 additions and 9 deletions.
9 changes: 4 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# PyMigBench
PyMigBench is a benchmark of Python Library Migrations.
The PyMigBench is developed by [Mohayeminul Islam](https://mohayemin.github.io/), [Ajay Kumar Jha](https://hifromajay.github.io/), [Sarah Nadi](https://sarahnadi.org/) and [Ildar Akhmetov](https://ildarakhmetov.com/).
This repository contains the benchmark data and the source code of the tool to explore the data.
This repository contains the library migration data and a tool to explore the data.
Please visit [the PyMigBench website](https://ualberta-smr.github.io/PyMigBench) for detailed instructions on using the data and the tool.

Please visit [the PyMigBench website](https://ualberta-smr.github.io/PyMigBench) to learn about the data structure, and usages and installation of the tool.
For any queries, please contact [email protected].
Contributors: [Mohayeminul Islam](https://mohayemin.github.io/), [Ajay Kumar Jha](https://hifromajay.github.io/), [Sarah Nadi](https://sarahnadi.org/) and [Ildar Akhmetov](https://ildarakhmetov.com/).
For any queries, please contact [email protected].
70 changes: 67 additions & 3 deletions docs/dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@ nav_order: 1
---
# PyMigBench dataset
The PyMigBench dataset is in the [data]({{ site.vars.repo }}/tree/msr-2023-datatrack/data){:target="_blank"} directory.
There are three types of data: analogous library pairs, valid migrations, and migration-related code changes located in respective subdirectories.
Each YAML file in these subdirectories contains information about one data item.
There are two types of data: analogous library pairs and valid migrations located in `libpair` and `migration` subdirectories respectively.
Each YAML file in the `libpair` and `migration` folders contain information about one data item.
Additionally, the `codefile` subdirectory has the diff files of the code changes, and the code files before and after migration.

## Library pair
* Location: [data/libpair]({{ site.vars.repo }}/tree/msr-2023-datatrack/data/libpair){:target="_blank"}
Expand Down Expand Up @@ -42,7 +43,7 @@ domain: Development framework/extension
- `lines`: a list of range of line numbers where the code was changed for migration.

### Sample data file
Migration from flask to quart at commit 0a70f2b: [0a70f2b_flask,quart.yaml]({{ site.vars.repo }}/blob/msr-2023-datatrack/data/migration/0a70f2b_flask,quart.yaml){:target="_blank"}
Migration from flask to quart at commit 0a70f2b: [0a70f2b_flask,quart.yaml]({{ site.vars.repo }}/blob/main/data/migration/0a70f2b_flask,quart.yaml){:target="_blank"}
```yaml
id: 0a70f2b_flask,quart
source: flask
Expand All @@ -66,3 +67,66 @@ code_changes:
- '8:8'
```

### Sample diff file
The below diff file shows the changes in the `app/run.py` in the migration mentioned above: [pgjones@faster_than_flask_article__0a70f2b__app$run.py.diff]({{ site.vars.repo }}/blob/main/data/codefile/pgjones@faster_than_flask_article__0a70f2b__app$run.py.diff).
The diff file file name formate is: `{repouser}@{reponame}__{8_characters_commit_hash}__{filepath-in-repo}.diff`.
The slash (`/`) or backslash (`\`) in the file path is replaced with a dollar (`$`) sign.

```diff
diff --git a/app/run.py b/app/run.py
index 253538aa8cd65a3ed48563c2ea4594d998286293..0a70f2bddae90da13da5bce2b77ea56355ecc5d1 100644
--- a/app/run.py
+++ b/app/run.py
@@ -1,44 +1,21 @@
import os
from contextlib import contextmanager
-from flask import Flask
-from psycopg2.extras import RealDictCursor
-from psycopg2.pool import ThreadedConnectionPool
+import asyncpg
+from quart import Quart
from films import blueprint as films_blueprint
from reviews import blueprint as reviews_blueprint
-class PoolWrapper:
- """Exists to provide an acquire method for easy usage.
-
- pool = PoolWrapper(...)
- with pool.acquire() as conneciton:
- connection.execute(...)
- """
-
- def __init__(self, max_pool_size: int, *, dsn):
- self._pool = ThreadedConnectionPool(
- 1, max_pool_size, dsn=dsn, cursor_factory=RealDictCursor,
- )
-
- @contextmanager
- def acquire(self):
- try:
- connection = self._pool.getconn()
- yield connection
- finally:
- self._pool.putconn(connection)
-
-
def create_app():
- app = Flask(__name__)
+ app = Quart(__name__)
app.config['JSONIFY_PRETTYPRINT_REGULAR'] = False
@app.before_first_request
- def create_db():
- dsn = 'host=0.0.0.0 port=5432 dbname=dvdrental user=dvdrental password=dvdrental'
- app.pool = PoolWrapper(20, dsn=dsn) #os.environ['DB_DSN'])
+ async def create_db():
+ dsn = 'postgres://dvdrental:[email protected]:5432/dvdrental'
+ app.pool = await asyncpg.create_pool(dsn, max_size=20) #os.environ['DB_DSN'])
app.register_blueprint(films_blueprint)
app.register_blueprint(reviews_blueprint)
```
2 changes: 1 addition & 1 deletion docs/tool.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ The repository contains a command line tool to easily query the benchmark.
The source code of the tool is in the [code]({{ site.vars.repo }}/tree/msr-2023-datatrack/code){:target="_blank"} folder.

## Install
1. Install Python from [here](https://www.python.org/). We developed the tool in Python 3.10.0, but a later version should also work.
1. Install Python from [here](https://www.python.org/). We developed the tool in Python 3.11.0, but a later version should also work.
2. Clone the [repository]({{site.vars.repo}}){:target="_blank"} and checkout to `msr-2023-datatrack` branch. Alternatively,
[download the zip](https://github.com/ualberta-smr/PyMigBench/archive/refs/heads/msr-2023-datatrack.zip)
and extract it.
Expand Down

0 comments on commit ddd9ec3

Please sign in to comment.