Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[intel-npu] Adding NPU_DYNAMIC_QUANTIZATION property #28316

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,7 @@ offer a limited set of supported OpenVINO features.
ov::enable_profiling
ov::workload_type
ov::intel_npu::compilation_mode_params
ov::intel_npu::compiler_dynamic_quantization
ov::intel_npu::turbo
ov::intel_npu::tiles
ov::intel_npu::max_tiles
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,14 @@ static constexpr ov::Property<uint32_t, ov::PropertyMutability::RO> driver_versi
*/
static constexpr ov::Property<std::string> compilation_mode_params{"NPU_COMPILATION_MODE_PARAMS"};

/**
* @brief [Only for NPU compiler]
* Type: boolean
* Set or verify state of dynamic quantization in the NPU compiler
* @ingroup ov_runtime_npu_prop_cpp_api
*/
static constexpr ov::Property<bool> compiler_dynamic_quantization{"NPU_COMPILER_DYNAMIC_QUANTIZATION"};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please, add Python API


/**
* @brief [Only for NPU plugin]
* Type: std::bool
Expand Down
1 change: 1 addition & 0 deletions src/plugins/intel_npu/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,7 @@ The following properties are supported:
| `ov::intel_npu::device_total_mem_size`/</br>`NPU_DEVICE_TOTAL_MEM_SIZE` | RO | Size of available NPU DDR memory | `N/A` | `N/A` |
| `ov::intel_npu::driver_version`/</br>`NPU_DRIVER_VERSION` | RO | NPU driver version. | `N/A` | `N/A` |
| `ov::intel_npu::compilation_mode_params`/</br>`NPU_COMPILATION_MODE_PARAMS` | RW | Set various parameters supported by the NPU compiler. (See bellow) | `<std::string>`| `N/A` |
| `ov::intel_npu::compiler_dynamic_quantization`/</br>`NPU_COMPILER_DYNAMIC_QUANTIZATION` | RW | Enable/Disable dynamic quantization by NPU compiler | `YES` / `NO` | `N/A` |
| `ov::intel_npu::turbo`/</br>`NPU_TURBO` | RW | Set Turbo mode on/off | `YES`/ `NO`| `NO` |
| `ov::intel_npu::tiles`/</br>`NPU_TILES` | RW | Sets the number of npu tiles to compile the model for | `[0-]` | `-1` |
| `ov::intel_npu::max_tiles`/</br>`NPU_MAX_TILES` | RW | Maximum number of tiles supported by the device we compile for. Can be set for offline compilation. If not set, it will be populated by driver.| `[0-]` | `[1-6] depends on npu platform` |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -357,4 +357,26 @@ struct COMPILATION_NUM_THREADS final : OptionBase<COMPILATION_NUM_THREADS, int32
}
};

//
// NPU_COMPILER_DYNAMIC_QUANTIZATION
//

struct COMPILER_DYNAMIC_QUANTIZATION final : OptionBase<COMPILER_DYNAMIC_QUANTIZATION, bool> {
static std::string_view key() {
return ov::intel_npu::compiler_dynamic_quantization.name();
}

static bool defaultValue() {
return false;
}

static OptionMode mode() {
return OptionMode::CompileTime;
}

static bool isPublic() {
return true;
}
};

} // namespace intel_npu
Original file line number Diff line number Diff line change
Expand Up @@ -538,6 +538,16 @@ std::string DriverCompilerAdapter::serializeConfig(const Config& config,
content = std::regex_replace(content, std::regex(batchstr.str()), "");
}

// COMPILER_DYNAMIC_QUANTIZATION is not supported in versions < 6.1 - need to remove it
csoka marked this conversation as resolved.
Show resolved Hide resolved
if ((compilerVersion.major < 6) || (compilerVersion.major == 6 && compilerVersion.minor < 3)) {
std::ostringstream dqstr;
dqstr << ov::intel_npu::compiler_dynamic_quantization.name() << KEY_VALUE_SEPARATOR << VALUE_DELIMITER << "\\S+"
<< VALUE_DELIMITER;
logger.warning("COMPILER_DYNAMIC_QUANTIZATION property is not suppored by this compiler version. Removing from "
csoka marked this conversation as resolved.
Show resolved Hide resolved
"parameters");
content = std::regex_replace(content, std::regex(dqstr.str()), "");
}

// NPU_DEFER_WEIGHTS_LOAD is needed at runtime only
{
std::ostringstream batchstr;
Expand Down
1 change: 1 addition & 0 deletions src/plugins/intel_npu/src/plugin/include/metrics.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ class Metrics final {
};
const std::vector<ov::PropertyName> _cachingProperties = {ov::device::architecture.name(),
ov::intel_npu::compilation_mode_params.name(),
ov::intel_npu::compiler_dynamic_quantization.name(),
ov::intel_npu::tiles.name(),
ov::intel_npu::dpu_groups.name(),
ov::intel_npu::dma_engines.name(),
Expand Down
6 changes: 6 additions & 0 deletions src/plugins/intel_npu/src/plugin/src/compiled_model.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -260,6 +260,12 @@ void CompiledModel::initialize_properties() {
[](const Config& config) {
return config.get<COMPILATION_MODE_PARAMS>();
}}},
{ov::intel_npu::compiler_dynamic_quantization.name(),
{true,
ov::PropertyMutability::RO,
[](const Config& config) {
return config.get<COMPILER_DYNAMIC_QUANTIZATION>();
}}},
{ov::intel_npu::turbo.name(),
{isPropertySupported(ov::intel_npu::turbo.name()),
ov::PropertyMutability::RO,
Expand Down
6 changes: 6 additions & 0 deletions src/plugins/intel_npu/src/plugin/src/plugin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -460,6 +460,12 @@ Plugin::Plugin()
[](const Config& config) {
return config.get<COMPILATION_MODE_PARAMS>();
}}},
{ov::intel_npu::compiler_dynamic_quantization.name(),
{true,
ov::PropertyMutability::RW,
[](const Config& config) {
return config.get<COMPILER_DYNAMIC_QUANTIZATION>();
}}},
{ov::intel_npu::turbo.name(),
{_backends->isCommandQueueExtSupported(),
ov::PropertyMutability::RW,
Expand Down
Loading