From ddd9ec30f5df2556808400af32e1c38c0721b9d5 Mon Sep 17 00:00:00 2001 From: Mohayemin Date: Thu, 26 Jan 2023 21:20:57 -0700 Subject: [PATCH] update documentation --- README.md | 9 +++---- docs/dataset.md | 70 ++++++++++++++++++++++++++++++++++++++++++++++--- docs/tool.md | 2 +- 3 files changed, 72 insertions(+), 9 deletions(-) diff --git a/README.md b/README.md index 72aec79..a794adf 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,6 @@ -# PyMigBench PyMigBench is a benchmark of Python Library Migrations. -The PyMigBench is developed by [Mohayeminul Islam](https://mohayemin.github.io/), [Ajay Kumar Jha](https://hifromajay.github.io/), [Sarah Nadi](https://sarahnadi.org/) and [Ildar Akhmetov](https://ildarakhmetov.com/). -This repository contains the benchmark data and the source code of the tool to explore the data. +This repository contains the library migration data and a tool to explore the data. +Please visit [the PyMigBench website](https://ualberta-smr.github.io/PyMigBench) for detailed instructions on using the data and the tool. -Please visit [the PyMigBench website](https://ualberta-smr.github.io/PyMigBench) to learn about the data structure, and usages and installation of the tool. -For any queries, please contact mohayemin@ualberta.ca. +Contributors: [Mohayeminul Islam](https://mohayemin.github.io/), [Ajay Kumar Jha](https://hifromajay.github.io/), [Sarah Nadi](https://sarahnadi.org/) and [Ildar Akhmetov](https://ildarakhmetov.com/). +For any queries, please contact mohayemin@ualberta.ca. \ No newline at end of file diff --git a/docs/dataset.md b/docs/dataset.md index 31f2069..92f399e 100644 --- a/docs/dataset.md +++ b/docs/dataset.md @@ -3,8 +3,9 @@ nav_order: 1 --- # PyMigBench dataset The PyMigBench dataset is in the [data]({{ site.vars.repo }}/tree/msr-2023-datatrack/data){:target="_blank"} directory. -There are three types of data: analogous library pairs, valid migrations, and migration-related code changes located in respective subdirectories. -Each YAML file in these subdirectories contains information about one data item. +There are two types of data: analogous library pairs and valid migrations located in `libpair` and `migration` subdirectories respectively. +Each YAML file in the `libpair` and `migration` folders contain information about one data item. +Additionally, the `codefile` subdirectory has the diff files of the code changes, and the code files before and after migration. ## Library pair * Location: [data/libpair]({{ site.vars.repo }}/tree/msr-2023-datatrack/data/libpair){:target="_blank"} @@ -42,7 +43,7 @@ domain: Development framework/extension - `lines`: a list of range of line numbers where the code was changed for migration. ### Sample data file -Migration from flask to quart at commit 0a70f2b: [0a70f2b_flask,quart.yaml]({{ site.vars.repo }}/blob/msr-2023-datatrack/data/migration/0a70f2b_flask,quart.yaml){:target="_blank"} +Migration from flask to quart at commit 0a70f2b: [0a70f2b_flask,quart.yaml]({{ site.vars.repo }}/blob/main/data/migration/0a70f2b_flask,quart.yaml){:target="_blank"} ```yaml id: 0a70f2b_flask,quart source: flask @@ -66,3 +67,66 @@ code_changes: - '8:8' ``` + +### Sample diff file +The below diff file shows the changes in the `app/run.py` in the migration mentioned above: [pgjones@faster_than_flask_article__0a70f2b__app$run.py.diff]({{ site.vars.repo }}/blob/main/data/codefile/pgjones@faster_than_flask_article__0a70f2b__app$run.py.diff). +The diff file file name formate is: `{repouser}@{reponame}__{8_characters_commit_hash}__{filepath-in-repo}.diff`. +The slash (`/`) or backslash (`\`) in the file path is replaced with a dollar (`$`) sign. + +```diff +diff --git a/app/run.py b/app/run.py + index 253538aa8cd65a3ed48563c2ea4594d998286293..0a70f2bddae90da13da5bce2b77ea56355ecc5d1 100644 + --- a/app/run.py + +++ b/app/run.py +@@ -1,44 +1,21 @@ + import os + from contextlib import contextmanager + +-from flask import Flask +-from psycopg2.extras import RealDictCursor +-from psycopg2.pool import ThreadedConnectionPool ++import asyncpg ++from quart import Quart + + from films import blueprint as films_blueprint + from reviews import blueprint as reviews_blueprint + + +-class PoolWrapper: +- """Exists to provide an acquire method for easy usage. +- +- pool = PoolWrapper(...) +- with pool.acquire() as conneciton: +- connection.execute(...) +- """ +- +- def __init__(self, max_pool_size: int, *, dsn): +- self._pool = ThreadedConnectionPool( +- 1, max_pool_size, dsn=dsn, cursor_factory=RealDictCursor, +- ) +- +- @contextmanager +- def acquire(self): +- try: +- connection = self._pool.getconn() +- yield connection +- finally: +- self._pool.putconn(connection) +- +- + def create_app(): +- app = Flask(__name__) ++ app = Quart(__name__) + app.config['JSONIFY_PRETTYPRINT_REGULAR'] = False + + @app.before_first_request +- def create_db(): +- dsn = 'host=0.0.0.0 port=5432 dbname=dvdrental user=dvdrental password=dvdrental' +- app.pool = PoolWrapper(20, dsn=dsn) #os.environ['DB_DSN']) ++ async def create_db(): ++ dsn = 'postgres://dvdrental:dvdrental@0.0.0.0:5432/dvdrental' ++ app.pool = await asyncpg.create_pool(dsn, max_size=20) #os.environ['DB_DSN']) + + app.register_blueprint(films_blueprint) + app.register_blueprint(reviews_blueprint) +``` diff --git a/docs/tool.md b/docs/tool.md index 8bbd1b6..cea3a85 100644 --- a/docs/tool.md +++ b/docs/tool.md @@ -6,7 +6,7 @@ The repository contains a command line tool to easily query the benchmark. The source code of the tool is in the [code]({{ site.vars.repo }}/tree/msr-2023-datatrack/code){:target="_blank"} folder. ## Install -1. Install Python from [here](https://www.python.org/). We developed the tool in Python 3.10.0, but a later version should also work. +1. Install Python from [here](https://www.python.org/). We developed the tool in Python 3.11.0, but a later version should also work. 2. Clone the [repository]({{site.vars.repo}}){:target="_blank"} and checkout to `msr-2023-datatrack` branch. Alternatively, [download the zip](https://github.com/ualberta-smr/PyMigBench/archive/refs/heads/msr-2023-datatrack.zip) and extract it.