-
Notifications
You must be signed in to change notification settings - Fork 5
/
setup.py
31 lines (25 loc) · 13.8 KB
/
setup.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# -*- coding: utf-8 -*-
from distutils.core import setup
packages = \
['conceptnet_lite']
package_data = \
{'': ['*']}
install_requires = \
['lmdb>=0.97.0,<0.98.0',
'peewee>=3.10,<4.0',
'pysmartdl>=1.3,<2.0',
'tqdm>=4.35,<5.0']
setup_kwargs = {
'name': 'conceptnet-lite',
'version': '0.1.27',
'description': 'Python library to work with ConceptNet offline without the need of PostgreSQL',
'long_description': '# conceptnet-lite\n[![License](https://img.shields.io/pypi/l/conceptnet-lite.svg)](https://www.apache.org/licenses/LICENSE-2.0)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/conceptnet-lite.svg)\n[![PyPI](https://img.shields.io/pypi/v/conceptnet-lite.svg)](https://pypi.org/project/conceptnet-lite/)\n[![Documentation Status](https://img.shields.io/readthedocs/conceptnet-lite.svg)](http://conceptnet-lite.readthedocs.io/en/latest/)\n\nConceptnet-lite is a Python library for working with ConceptNet offline without the need for PostgreSQL.\n\nThe library comes with Apache License 2.0, and is separate from ConceptNet itself. The ConceptNet is available under [CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/) license, which also applies to the formatted database file that we provide. See [here](https://github.com/commonsense/conceptnet5/wiki/Copying-and-sharing-ConceptNet) for the list of conditions for using ConceptNet data.\n\nThis is the official citation for ConceptNet if you use it in research:\n\n> Robyn Speer, Joshua Chin, and Catherine Havasi. 2017. "ConceptNet 5.5: An Open Multilingual Graph of General Knowledge." In proceedings of AAAI 31.\n\n## Installation\n\nTo install `conceptnet-lite` use `pip`:\n\n```shell\n$ pip install conceptnet-lite\n```\n\n## Connecting to the database\n\nBefore you can use `conceptnet-lite`, you will need to obtain ConceptNet dabase file. You have two options: download pre-made one or build it yourself from the raw ConceptNet assertions file.\n\n### Downloading the ConceptNet database \n\nConceptNet releases happen once a year. You can use `conceptnet-lite` to build your own database from the raw assertions file (see below), but if there is a pre-built file it will be faster to just get that one. `conceptnet-lite` can download and unpack it to the specified folder automatically.\n\nHere is a [link](https://conceptnet-lite.fra1.cdn.digitaloceanspaces.com/conceptnet.db.zip) to a compressed database for ConceptNet 5.7. This link is used automatically if you do not supply the alternative.\n\n```python\nimport conceptnet_lite\n\nconceptnet_lite.connect("/path/to/conceptnet.db")\n```\n\nThis command both downloads the resource (our build for ConceptNet 5.7) and connects to the database. If path specified as the first argument does not exist, it will be created (unless there is a permissions problem). Note that the database file is quite large (over 9 Gb). \n\nIf your internet connection is intermittent, the built-in download function may give you errors. If so, just download the file separately, unpack it to the directory of your choice and provide the path to the `.connect()` method as described below.\n\n### Building the database for a new release.\n\nIf a database file is not found in the folder specified in the `db_path` argument, `conceptnet-lite` will attempt to automatically download the raw assertions file from [here](https://github.com/commonsense/conceptnet5/wiki/Downloads) and build the database. This takes a couple of hours, so we recommend getting the pre-built file.\n\nIf you provide a path, this is where the database will be built. Note that the database file is quite large (over 9 Gb). Note that you need to pass `db_download_url=None` to force the library build the database from dump.\n\n```python\nimport conceptnet_lite\n\nconceptnet_lite.connect("/path/to/conceptnet.db", db_download_url=None)\n```\n\nIf the specified does not exist, it will be created (unless there is a permissions problem). If no path is specified, and no database file is not found in the current working directory, `conceptnet-lite` will attempt to build one in the current working directory. \n\nOnce the database is built, `conceptnet-lite` will connect to it automatically.\n\n### Loading the ConceptNet database \n\nOnce you have the database file, all you need to do is to pass the path to it to the `.connect()` method.\n\n```python\nimport conceptnet_lite\n\nconceptnet_lite.connect("/path/to/conceptnet.db")\n```\n\nIf no path is specified, `conceptnet-lite` will check if a database file exists in the current working directory. If it is not found, it will trigger the process of downloading the pre-built database (see above).\n\n## Accessing concepts\n\nConcepts objects are created by looking for every entry that matches the input string exactly.\nIf none is found, the `peewee.DoesNotExist` exception will be raised.\n\n```python\nfrom conceptnet_lite import Label\n\ncat_concepts = Label.get(text=\'cat\').concepts \nfor c in cat_concepts:\n print(" Concept URI:", c.uri)\n print(" Concept text:", c.text)\n```\n```console\nConcept URI: /c/en/cat\nConcept text: cat\nConcept URI: /c/en/cat/n\nConcept text: cat\nConcept URI: /c/en/cat/n/wn/animal\nConcept text: cat\nConcept URI: /c/en/cat/n/wn/person\n...\n```\n\n`concept.uri` provides access to ConceptNet URIs, as described [here](https://github.com/commonsense/conceptnet5/wiki/URI-hierarchy). You can also retrieve only the text of the entry by `concept.text`.\n\n## Working with languages\n\nYou can limit the languages to search for matches. Label.get() takes an optional `language` attribute that is expected to be an instance `Language`, which in turn is created by calling `Language.get()` with `name` argument.\nList of available languages and their codes are described [here](https://github.com/commonsense/conceptnet5/wiki/Languages).\n\n```python\nfrom conceptnet_lite import Label\n\ncat_concepts = Label.get(text=\'cat\', language=\'en\').concepts \nfor c in cat_concepts:\n print(" Concept URI:", c.uri)\n print(" Concept text:", c.text)\n print(" Concept language:", c.language.name)\n```\n\n```console\n Concept URI: /c/en/cat\n Concept text: cat\n Concept language: en\n Concept URI: /c/en/cat/n\n Concept text: cat\n Concept language: en\n Concept URI: /c/en/cat/n/wn/animal\n Concept text: cat\n Concept language: en\n Concept URI: /c/en/cat/n/wn/person\n Concept text: cat\n Concept language: en\n...\n```\n\n\n## Querying edges between concepts\n\nTo retrieve the set of relations between two concepts, you need to create the concept objects (optionally specifying the language as described above). `cn.edges_between()` method retrieves all edges between the specified concepts. You can access its URI and a number of attributes, as shown below.\n\nSome ConceptNet relations are symmetrical: for example, the antonymy between *white* and *black* works both ways. Some relations are asymmetrical: e.g. the relation between *cat* and *mammal* is either hyponymy or hyperonymy, depending on the direction. The `two_way` argument lets you choose whether the query should be symmetrical or not.\n\n```python\nfrom conceptnet_lite import Label, edges_between\n\nintrovert_concepts = Label.get(text=\'introvert\', language=\'en\').concepts\nextrovert_concepts = Label.get(text=\'extrovert\', language=\'en\').concepts\nfor e in edges_between(introvert_concepts, extrovert_concepts, two_way=False):\n print(" Edge URI:", e.uri)\n print(" Edge name:", e.relation.name)\n print(" Edge start node:", e.start.text)\n print(" Edge end node:", e.end.text)\n print(" Edge metadata:", e.etc)\n```\n```console\n Edge URI: /a/[/r/antonym/,/c/en/introvert/n/,/c/en/extrovert/]\n Edge name: antonym\n Edge start node: introvert\n Edge end node: extrovert\n Edge metadata: {\'dataset\': \'/d/wiktionary/en\', \'license\': \'cc:by-sa/4.0\', \'sources\': [{\'contributor\': \'/s/resource/wiktionary/en\', \'process\': \'/s/process/wikiparsec/2\'}, {\'contributor\': \'/s/resource/wiktionary/fr\', \'process\': \'/s/process/wikiparsec/2\'}], \'weight\': 2.0}\n```\n\n* **e.relation.name**: the name of ConceptNet relation. Full list [here](https://github.com/commonsense/conceptnet5/wiki/Relations).\n\n* **e.start.text, e.end.text**: the source and the target concepts in the edge\n\n* **e.etc**: the ConceptNet [metadata](https://github.com/commonsense/conceptnet5/wiki/Edges) dictionary contains the source dataset, sources, weight, and license. For example, the introvert:extrovert edge for English contains the following metadata:\n\n```json\n{\n\t"dataset": "/d/wiktionary/en",\n\t"license": "cc:by-sa/4.0",\n\t"sources": [{\n\t\t"contributor": "/s/resource/wiktionary/en",\n\t\t"process": "/s/process/wikiparsec/2"\n\t}, {\n\t\t"contributor": "/s/resource/wiktionary/fr",\n\t\t"process": "/s/process/wikiparsec/2"\n\t}],\n\t"weight": 2.0\n}\n```\n\n## Accessing all relations for a given concepts\n\nYou can also retrieve all relations between a given concepts and all other concepts, with the same options as above:\n\n```python\nfrom conceptnet_lite import Label, edges_for\n\nfor e in edges_for(Label.get(text=\'introvert\', language=\'en\').concepts, same_language=True):\n print(e.start.text, "::", e.end.text, "|", e.relation.name)\n```\n```console\nextrovert :: introvert | antonym\nintrovert :: extrovert | antonym\noutrovert :: introvert | antonym\nreflection :: introvert | at_location\nintroverse :: introvert | derived_from\nintroversible :: introvert | derived_from\nintroversion :: introvert | derived_from\nintroversion :: introvert | derived_from\nintroversive :: introvert | derived_from\nintroverted :: introvert | derived_from\n...\n```\n\nThe same set of edge attributes are available for `edges_between` and `edges_for` (e.uri, e.relation.name, e.start.text, e.end.text, e.etc).\n\nNote that we have used optional argument `same_language=True`. By supplying this argument we make `edges_for` return\nrelations, both ends of which are in the same language. If this argument is skipped it is possible to get edges to\nconcepts in languages other than the source concepts language. For example, the same command as above with `same_language=False` will include the following in the output:\n\n```console\nkääntyä_sisäänpäin :: introvert | synonym\nsulkeutua :: introvert | synonym\nsulkeutunut :: introvert | synonym\nintroverti :: introvert | synonym\nasociale :: introvert | synonym\nintroverso :: introvert | synonym\nintrovertito :: introvert | synonym\n内向 :: introvert | synonym\n```\n\n## Accessing concept edges with a given relation direction\n\nYou can also query the relations that have a specific concept as target or source. This is achieved with `concept.edges_out` and `concept.edges_in`, as follows:\n\n```python\nfrom conceptnet_lite import Label\n\nconcepts = Label.get(text=\'introvert\', language=\'en\').concepts \nfor c in concepts:\n print(" Concept text:", c.text)\n if c.edges_out:\n print(" Edges out:")\n for e in c.edges_out:\n print(" Edge URI:", e.uri)\n print(" Relation:", e.relation.name)\n print(" End:", e.end.text)\n if c.edges_in:\n print(" Edges in:")\n for e in c.edges_in:\n print(" Edge URI:", e.uri)\n print(" Relation:", e.relation.name)\n print(" End:", e.end.text)\n```\n```console\n Concept text: introvert\n Edges out:\n Edge URI: /a/[/r/etymologically_derived_from/,/c/en/introvert/,/c/la/introvertere/]\n Relation: etymologically_derived_from\n End: introvertere\n...\n Edges in:\n Edge URI: /a/[/r/antonym/,/c/cs/extrovert/n/,/c/en/introvert/]\n Relation: antonym\n End: introvert\n...\n```\n\n## Traversing all the data for a language\n\nYou can go over all concepts for a given language. For illustration, let us try Old Norse, a "small" language with the code "non" and vocab size of 7868, according to the [ConceptNet language statistics](https://github.com/commonsense/conceptnet5/wiki/Languages).\n\n```python\nfrom conceptnet_lite import Language\n\nmylanguage = Language.get(name=\'non\')\nfor l in mylanguage.labels:\n print(" Label:", l.text)\n for c in l.concepts:\n print(" Concept URI:", c.uri)\n if c.edges_out:\n print(" Edges out:")\n for e in c.edges_out:\n print(" Edge URI:", e.uri)\n if c.edges_in:\n print(" Edges in:")\n for e in c.edges_in:\n print(" Edge URI:", e.uri)\n```\n```console\n Label: andsœlis\n Concept URI: /c/non/andsœlis/r\n Edges out:\n Edge URI: /a/[/r/antonym/,/c/non/andsœlis/r/,/c/non/réttsœlis/]\n Edge URI: /a/[/r/related_to/,/c/non/andsœlis/r/,/c/en/against/]\n Edge URI: /a/[/r/related_to/,/c/non/andsœlis/r/,/c/en/course/]\n Edge URI: /a/[/r/related_to/,/c/non/andsœlis/r/,/c/en/sun/]\n Edge URI: /a/[/r/related_to/,/c/non/andsœlis/r/,/c/en/widdershins/]\n Edge URI: /a/[/r/synonym/,/c/non/andsœlis/r/,/c/non/rangsœlis/]\n Concept URI: /c/non/andsœlis\n Edges out:\n Edge URI: /a/[/r/external_url/,/c/non/andsœlis/,/c/en.wiktionary.org/wiki/andsœlis/]\n Label: réttsœlis\n Concept URI: /c/non/réttsœlis\n Edges in:\n Edge URI: /a/[/r/antonym/,/c/non/andsœlis/r/,/c/non/réttsœlis/]\n...\n```\n\n## Accessing Concepts by URI\n\nYou can access concept ORM objects directly by providing a desired ConceptNet URI. This is done as follows:\n\n```python\nfrom conceptnet_lite import Concept\n\nedge_object = Edge.get(start=\'/c/en/example\')\nconcept_object = Concept.get(uri=\'/c/en/example\')\n```\n',
'author': 'Roman Inflianskas',
'author_email': '[email protected]',
'url': 'https://github.com/ldtoolkit/conceptnet-lite',
'packages': packages,
'package_data': package_data,
'install_requires': install_requires,
'python_requires': '>=3.6,<4.0',
}
setup(**setup_kwargs)