Skip to content

Commit

Permalink
Add documentation on how to use memray
Browse files Browse the repository at this point in the history
  • Loading branch information
pedro93 committed Sep 26, 2023
1 parent 20904b2 commit e434e7a
Show file tree
Hide file tree
Showing 3 changed files with 57 additions and 1 deletion.
1 change: 1 addition & 0 deletions docs-website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,7 @@ module.exports = {
"metadata-ingestion/docs/dev_guides/classification",
"metadata-ingestion/docs/dev_guides/add_stateful_ingestion_to_source",
"metadata-ingestion/docs/dev_guides/sql_profiles",
"metadata-ingestion/docs/dev_guides/profiling_ingestions,
],
},
],
Expand Down
55 changes: 55 additions & 0 deletions metadata-ingestion/docs/dev_guides/profiling_ingestions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
import FeatureAvailability from '@site/src/components/FeatureAvailability';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

# Profiling ingestions

<FeatureAvailability/>

**🤝 Version compatibility**
> Open Source DataHub: **0.10.6** | Acryl: **0.2.12**
This page documents how to perform memory profiles of ingestion runs.
It is useful when trying to size the amount of resources necessary to ingest some source or when developing new features or sources.

## How to use
Install the `debug` plugin for DataHub's CLI wherever the ingestion runs:

```bash
pip install 'acryl-datahub[debug]'
```

This will install [memray](https://github.com/bloomberg/memray) in your python environment.

Add a flag to your ingestion recipe to generate a memray memory dump of your ingestion:
````yaml
source:
...

sink:
...

flags:
generate_memory_profiles: "<path to folder where dumps will be written to>"
````

Once the ingestion run starts a binary file will be created and appended to during the execution of the ingestion.

These files follow the pattern `file-<ingestion-run-urn>.bin` for a unique identification.
Once the ingestion has finished you can use `memray` to analyze the memory dump in a flamegraph view using:

`$ memray flamegraph file-None-file-2023_09_18-21_38_43.bin`

This will generate an interactive HTML file for analysis:

<p align="center">
<img width="70%" src="https://github.com/datahub-project/static-assets/blob/ps-memray-example/imgs/metadata-ingestion/memray-example.png?raw=true"/>
</p>


`memray` has an extensive set of features for memory investigation. Take a look at their [documentation](https://bloomberg.github.io/memray/overview.html) to see the full feature set.


## Questions

If you've got any questions on configuring profiling, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
2 changes: 1 addition & 1 deletion metadata-ingestion/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -697,7 +697,7 @@ def get_long_description():
},
entry_points=entry_points,
# Dependencies.
install_requires=list(base_requirements | framework_common | debug_requirements),
install_requires=list(base_requirements | framework_common),
extras_require={
"base": list(framework_common),
**{
Expand Down

0 comments on commit e434e7a

Please sign in to comment.