All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Calendar Versioning.
- #1038 Fix enviroment in ingress-healthz return.
- #1025 Enable Spegel mirroring for private ACR registry.
- #1033 Migrate ingress-healthz to install with flux.
- #1035 Fix ingress-healthz kustomization health check.
- #1027 Add purge task to remove old images from ACR.
- #1024 Update provider versions.
- #1023 Set OS upgrade explicitly to Unmanaged.
- #1017 Add support for kubernetes 1.26.
- #1016 Add variable for VMSS diff disk placement for GitHub Runners.
- #1010 Add azureFile CSI storage classes.
- #1009 Set allow_nested_items_to_be_public in SAs false.
- #1020 Enabled Azure Disk Encryption ability for Key Vaults.
- #1000 Add OTLP support in datadog-agent.
- #1001 Migrate node-ttl to install with Flux.
- #1002 Migrate spegel to install with Flux.
- #1006 Update Git provider.
- #1005 Migrate node-local-dns to install with Flux.
- #1007 Migrate vpa to install with Flux.
- #1003 Migrate gatekeeper to install with Flux.
- #1003 Make Availability Zones configurable for AKS.
- #995 Fix Kubernetes version validation.
- #989 Update Azad-Kube-Proxy to v0.0.47.
- #996 Rename Datadog agent.
- #992 Add AKS cluster principal_id to aksmsi group.
- #997 Add health checks to Datadog.
- #998 Update GitHub Terraform provider to 5.28.0.
- #994 Update Datadog agent config.
- #991 Add vnet role assignment.
- #988 Update Azurerm provider version and enable AKS workload identities.
- #982 Update datadog-operator to 1.0.2 and agent to v2alpha1.
- #972 Update Datadog to install with flux.
- #985 Manage Flux notification provider.
- #980 Re-enable option to disable unique suffixes for resource group key vaults.
- #973 Switch to Standard sku_tier due to deprecations in the AzureAPI.
- #970 Update Azurerm provider to 3.51.0.
- #974 Update Spegel to v0.0.6.
- #971 Add variable for EBS volume size.
- #958 Make azure/governance and azure/core use the aztfmod/azurecaf provider for names.
- #957 Update Spegel to v0.0.5 and set resources.
- #959 Update node-local-dns to 1.22.20 and move to using registry.k8s.io.
- #962 Update Cluster Autoscaler Helm chart to move to using registry.k8s.io.
- #960 Update Goldilocks and VPA and move to using registry.k8s.io.
- #961 Update CSI secret store and move to using registry.k8s.io.
- #963 Update Kube State Metrics and move to using registry.k8s.io.
- #965 Update AAD Pod Identity to chart version 4.1.16 with app version 1.8.15.
- #964 Fix CRD versions.
- #945 Increase flux gitrepository timeout to 120s.
- #947 Bump git-auth-proxy to v0.8.2.
- #946 [Breaking] Enable configuration for private and public ingress controllers.
- #949 Update audit log alert criteria.
- #954 Make audit log alert have bigger window_size and frequency.
- #952 Fix issues created by #943 and change core to use new private_endpoint_network_policies_enabled in core subnet config.
- #939 Update Spegel to v0.0.4 and fix misspelled Spegel namespace.
- #934 Add certificate permissions for resource group AAD group.
- #906 Add support for kubernetes 1.25 in Azure.
- #936 Add Spegel to AKS and EKS.
- #928 Enable Node TTL by default.
- #929 Make allow_gateway_transit configurable.
- #935 Update Node TTL to v0.0.6 and enable monitoring.
- #933 Change from starboard to trivy-operator.
- #917 Remove datasource for azuread_groups in xkf_governance_global.
- #920 Increase default AKS audit retention to 365 days.
- #926 Make disable_bgp_route_propagation configurable.
- #916 Update Node TTL to v0.0.5.
- #856 Update falco to v0.33.0 and falco-exporter to v0.8.0.
- #918 Update workflows with new action versions
- #897 Add Datadog APM ignore resources
- #921 Add Azure Alerts if no data gets sent to log storage account.
- #922 Enable use of spot instances in AKS
- #911 Fix Node TTL status ConfigMap namespace for AKS.
- #894 Add x509-certificate-exporter helm chart.
- #892 Change the default Prometheus scrape interval to every minute.
- #896 Update external-dns and metrics-server.
- #900 Trigger upgrade pipeline in xkf-templates at release
- #902 Update cluster-autoscaler to 1.24.
- #903 Update Node TTL to v0.0.4.
- #905 Update Prometheus to v2.41.0.
- #907 Add node labels and taints as tags to ASG.
- #890 Include specific api server metrics.
- #882 Platform workloads ignore taints and labels.
- #883 Fix promtail configuration.
- #884 Fix cluster-role-binding for get-nodes role in eks.
- #887 Add label and taint tags to eks node group.
- #877 [Breaking] Update Kube Prometheus Stack to 42.1.1.
- #878 Disable collecting API Server metrics.
- #879 Update Promtail Helm chart to 6.6.2.
- #881 Set OPA mutatingWebhookReinvocationPolicy: IfNeeded.
- #885 OPA mutatingWebhookCustomRules trigger ephemeralContainer.
- #886 [Breaking] Update OPA lib to 0.20.1, use Xenit pspReadOnlyRoot and new assign rule for ephemeral containers.
- #888 Upgrade ingress-nginx helm to 4.4.0.
- #857 Grafana agent kubelet metrics.
- #871 Change ingress-nginx affinity match labels.
- #873 Add linkerd exception to default-deny networkpolicy.
- #874 Update linkerd to 2.12.2.
- #865 Make azad-kube-proxy AD group filter more specific.
- #821 Default ingressClassName for ingress_healthz.
- #847 Fix linkerd certificate forced recreation.
- #850 Allow overriding ACR name.
- #849 Bump azad-kube-proxy version to 0.0.36.
- #848 Allow aks name suffix to be set to null.
- #841 Change ACR SKU to Standard.
- #846 Add XKF prefix to node pool label.
- #853 Upgrade azad-kube-proxy version.
- #854 Explicitly set Prometheus version that should be used.
- #837 Update TFLint to 0.42.
- #838 Update git-auth-proxy to v0.8.1.
- #840 Add support for ARM VMs in AKS.
- #836 Make purge protection configurable per keyvault.
- #835 Make falco use priorityClassName again.
- #839 Remove deprecated Flux V1 module.
- 830 Add unique_suffix to core delegate azurerm_role_definition service_endpoint_join.
- 832 Upgrade azurerm provider to 3.28.0.
- #823 Add secrets-store.csi.x-k8s.io to EKS tenants.
- #812 Upgrade terraform to 1.3.0.
- #810 Update provider versions.
- #814 Possible to use data source as policy input to Irsa.
- #816 Update node local dns version.
- #815 Make datadog tolerate all node taints.
- #822 Update secrets-store.csi.x-k8s.io from v1alpha1 to v1.
- #787 Add support for kubernetes 1.24 in Azure.
- #797 [Breaking] Add option to configure extra headers in Ingress NGINX.
- #796 Add custom resource edit rights to tenant service account.
- #791 Add control-plane output solution for Azure.
- #783 Upgrade azurerm and azuread providers.
- #789 [Breaking] Image gallery support for Azure Pipelines module.
- #801 [Breaking] Image gallery support for Github Runners module.
- #802 Ignore commit message changes in Flux installations.
- #807 Increase NGINX Ingress min availible from one to two.
- #788 Stop role assignment recreation on AKS cluster update.
- #793 Downgrade AWS calico to 3.19 and upgrade ingress-nginx config and version.
- #800 Deprecate FLuxcd v1 module.
- #806 [Breaking] Remove creation of service accounts for tenant namespaces.
- #774 [Breaking] Add extra_config to ingress nginx config object.
- #780 Add AWS CSI driver to EKS cluster.
- #778 Upgrade AWS calico to 3.24.
- #776 [Breaking] Remove default value for unique_suffix in Azure core module.
- #775 Update falco helm chart to 2.0.16.
- #777 Update Flux 2.0 to v0.33.0 which bumps the source controller from v1beta1 to v1betav2.
- #715 Long term storage of AKS audit logs.
- #767 Helm-crd-oci module to support helm charts located in OCI.
- #764 Fix Linkerd cert expiring 8 years too early.
- #765 Replace Linkerd image registry with ghcr.
- #766 Skip Linkerd proxy for Ingress Nginx webhook.
- #768 Use linkerd-fork OCI helm charts and update linkerd to 2.12.0.
- #771 Make region optinal in ingress-health fqdn.
- #772 Use correct linkerd-cni chart name.
- #691 [Breaking] Refactor modules to support multi region setup.
- #756 Update terraform and tooling.
- #759 Update Terraform tls provider to 4.0.1.
- #760 Move secrets-store-csi-driver-provider-azure helm chart location.
- #743 Add cert permissions to group owners.
- #744 Add configurable external_dns_hostname annotation for ingress-nginx.
- #753 Add support for kubernetes version 1.23.
- #745 Update ingress-nginx to 4.2.0 and disable chroot image in AWS.
- #748 Enable chroot image on AWS and set a custom internal-logger-address when running in AWS and multiple internal_load_balancer.
- #749 Update gatekeeper to 3.9.0.
- #746 Install VPA crd from correct helm chart.
- #747 Cert-manager AWS webhook config was broken due to replica of yaml config in helm.
- #735 [Breaking] Add the possibility to ignore unique suffix in key-vault creation.
- #739 Enable tenants to read VPA config in there namespace.
- #740 [Breaking] Make AWS assume EKS Admin role configurable.
- #741 Update linkerd control-plane to 1.5.4-edge.
- #738 Disable datadog-operator crd installation.
- #736 Update FluxV1 and Helm Operator.
- #728 Add the possibility to override public ip prefix name in aks global.
- #714 Ingress-nginx helm chart 4.1.4 and use the chroot functionality.
- #721 Allow datadog ingress by default to tenant namespace.
- #724 Upgrade azad-kube-proxy to 0.0.34.
- #727 Upgrade git-auth-proxy to v0.7.2.
- #732 Set resource request and limits to Node TTL.
- #716 Set resource requests for datadog-cluster-agent, starboard-operator, ingress-nginx, external-dns, azure-metrics and goldilocks-controller.
- #717 Remove force conflicts from CRD resource.
- #718 Remove node pool create before destroy.
- #719 Update Flux v1 helm operator rbac to v1.
- #733 Add resource definitions for datadog agent, cert-manager, reloader, prometheus kube-state-metrics.
- #640 [Breaking] AKS set kubelet config default max pod pid to 1000.
- #709 [Breaking] Upgrade linkerd CNI to 2.11.2 and control-plane to edge-22.6.1.
- #710 Increase Prometheus resource request and limit.
- #712 Set resoleve conflicts to overwrite for EKS addons.
- #690 Helm metrics-server extraArgs as list.
- #697 Set default environment in datadog agent.
- #700 Fix node-ttl OCI registry.
- #701 Datadog nginx-ingress-controller log config.
- #703 Exclude prometheus ns from gatekeeper config.
- #692 [Breaking] Add Node TTL to EKS and AKS.
- #636 Make Node Local DNS enabled by default in AWS and Azure.
- #688 Fix Kubernetes version check and update supported versions.
- #654 AWS specify last addon version in EKS.
- #698 Add premium ZRS storage class to AKS.
- #699 Update Helm Terraform provider to support OCI charts.
- #707 Update bitnami/nginx helm chart to 12.0.3 for ingress-healthz.
- #684 Update aad pod identity.
- #685 Update csi secrets store.
- #686 Update Datadog Operator, Kube Prometheus Stack and Metrics Server.
- #664 Manage Helm chart CRDs outside of Helm.
- #678 Update OPA to 3.8.1, gatekeeper-library to 0.12.1 and add k8srequireingressclass constraint.
- #679 Update AzureRM provider version.
- #680 Disable AKS run command.
- #682 Fix CRD server side apply conflicts.
- #683 Fix datadog cluster agent pdb.
- #666 Enable ingress-nginx logs in promtail.
- #670 Ingress-nginx default wildcard certificate enabled.
- #671 [Breaking] Governance delegate-se from regional to global module.
- #672 Fix EKS version validation
- #651 OPA add seccomp profile and disable default mount of SA token.
- #645 [Breaking] Refactor AKS node configuration with default values.
- #659 [Breaking] Bring AKS and EKS config inline with each other.
- #653 Add validation of Kubernetes version in EKS and AKS.
- #662 Modify FluxV2 installation to never remove applied resource.
- #663 Set max history to Helm releases missing configuration.
- #661 Fix cluster-role-binding for get-nodes role.
- #658 Remove 'use-forwarded-headers: "true"' from ingress-nginx
- #648 [Breaking] Make it possible to exclude namespaces from Datadog
- #656 Update Ingress Nginx version to mitigate security disclosure.
- #650 Make it possible to enable Promtail metrics in eks-core
- #647 [Breaking] Create Azure AD Application for azad-kube-proxy using eks-global.
- #639 Create Azure AD Application for azad-kube-proxy using aks-global
- #635 Upgrade azurerm provider to v3.1.0.
- #637 [Breaking] Add tenant namespace default deny network policy by default.
- #638 Set default empty config for
promtail_config
inaks-core
andeks-core
- #642 Add toleration for Promtail to make it run on all nodes.
- #643 Change tenant label name for Promtail.
- #644 Update OPA Gatekeeper Library to v0.12.0.
- #633 Remove deperecated modules xenit, loki, and new-relic.
- #624 Add Promtail for platform logs in Azure.
- #630 Add variable for exluding Promtail namespaces.
- #632 Make Promtail work for AWS/EKS.
- #622 [Breaking] Hardcode prometheus and trivy storage class.
- #617 Upgrade falco to 0.31.1
- #536 Update OPA Gatekeeper Helm charts
- #626 [Breaking] Add support for multiple DNS zones.
- #590 Drop CAP_SYS_ADMIN through OPA and use gatekeeper-library v0.10.0.
- #627 Deprecate loki and xenit modules.
- #618 Create new EKS and AKS node pools before deleting existing node pools.
- #614 Upgrade AWS provider to 4.6.0
- #606 Fix electionID on ingress-nginx when using private and public ingress-nginx.
- #608 Include node-local-dns IP in default-deny networkpolicy CIDR block.
- #611 Add option to turn off ingress-nginx metrics in grafana.
- #610 Add prefetch and serve_stale dns config to node-local-dns.
- #607 [Breaking] Upgrade terraform to 1.1.7.
- #616 Require Terraform version >= 1.1.7 instead of explicit version.
- #612 Update azure-metrics to 22.3.0 and gather more metrics.
- #593 Change Prometheus remote write settings to align with best practices.
- #574 [Breaking] Update cert-manager version.
- #595 Update Terraform provider versions.
- #600 Hardcode trivy starboard image to 0.24.3 and update trivy helm chart.
- #601 Fix api-group for get nodes role.
- #594 Remove deprecated goldpinger module.
- #597 Deprecate New Relic module.
- #589 Update git-auth-proxy to 0.6.0 to include case-insensitive path matching.
- #571 Add storageClass in AKS to enable StandardSSD_ZRS.
- #537 Support private repository scanning with starboard in AWS & Azure.
- #565 [Breaking] Update Ingress Nginx major version.
- #573 Update External DNS version.
- #568 Add kube-state-metrics for tenant namespaces to grafana-agent
- #577 Add ingress-nginx metrics and logs scraping to grafana-agent
- #579 Send EKS audit and API logs to cloudwatch
- #583 Use annotation-value-word-blocklist by default in Ingress-nginx
- #570 Only add network policy for Datadog / Grafana-Agent if default deny is true
- #582 Add the coreDNS ip to tenant networkpolicy CIDR block to work with node-local-dns. Use variable for node-local-dns.
- #541 add support to enable private endpoints on subnets
- #549 Add resource requests & limits for goldilocks.
- #548 Enable grafana-agent in Prometheus.
- #558 Add ClusterRole
kubectl get nodes
to tenants. - #560 Add SecretProviderClass CRD to ClusterRole
custom_resource_edit
. - #563 Upgrade azurerm provider in aks to 2.97.0
- #553 Remove Secrets and ConfigMaps from collected Kube State Metrics resources.
- #551 Fix pod label selector for Prometheus monitor.
- #542 Add node local DNS to resolve throughput issues related to slow DNS queries.
- #545 Set prometheus disk size to 10Gi.
- #543 [Breaking] Allow setting os_disk_type on kubernetes node pools. We recommend setting Ephemeral.
- #540 Add podAntiAffinity to Ingress-nginx.
- #522 Add networkpolicy for datadog and grafana-agent to tenant namespace.
- #524 Update grafana-agent to 0.1.5
- #531 Make prefix configurable for Azure role definition names
- #533 Update cert manager version to 1.6.1
- #535 Azad-kube-proxy define resources
- #536 Update OPA Gatekeeper Helm charts
- #532 [Breaking] Fix bug in route table association (does not affect XKF by default)
- #527 Add kubernetes resource definitions for grafana-agent-operator.
- #523 Update starboard to 0.14.0, only scan the latest deployments and set a TTL on the vulnerability reports to be recreated after 25 hours.
- #513 EKS opinionated module
eks-core
added.
- #517 Change VPA storage from prometheus to checkpoint.
- #519 Fix nginx ingress service monitor selector when running multiple controllers.
- #506 Add VPA (Vertical Pod Autoscaling) as a module.
- #510 [Breaking] Run prometheus in agent mode and update kube-prometheus-stack to v30.0.0.
- #514 Starboard enable scanning of MEDIUM,HIGH,CRITICAL severity CVE:s and disable configAuditScannerEnabled and kubernetesBenchmarkEnabled.
- #504 Give developers access to starboard report.
- #502 Add externalLabels to logs for Grafana Agent
- #497 Remove namespaces config option for kube-state-metrics.
- #498 Set AKS cluster autoscaler expander strategy to least waste.
- #491 Add Grafana Agent for observability with Grafana Cloud
- #486 Enable the option to create AKS node pools backed by spot instances.
- #481 Use upstream starboard exporter.
- #482 add support for non VirtualAppliance routes
- #478 Only set annotation blocklist when allow annotation is false.
- #470 Set max history for Helm releases to reduce the amount of secrets created.
- #471 Update azad-kube-proxy from v0.0.27 to v0.0.30 and remove dashboard (k8dash/skooner).
- #472 [Breaking] Update ingress-nginx to 3.40.0 and disable allow-snippet-annotations by default. Add annotation-value-word-blocklist.
- #463 Add azure-metrics to monitor azure specific metrics.
- #473 Add starboard-exporter to gather trivy metrics from starboard CRD:s.
- #469 Remove deprecated modules azure/governance, kubernetes/external-secrets, and kubernetes/kyverno.
- #474 Adjust prometheus resource requests to fix OOM Kill.
- #475 Fix multi doc separator in prometheus monitors.
- #476 Remove extra separator in prometheus monitors.
- #437 Add podmonitor for secrets-store-csi-driver
- #465 [Breaking] Move xenit credentials to Prometheus and remove Xenit proxy.
- #466 Replace misspelled variable kube_state_metrics_namepsaces with kube_state_metrics_namespaces
- #453 Add role for kubectl top pod
- #461 Set resource request for Prometheus and update remote write config.
- #460 Increase gatekeeper-audit memory request.
- #456 Deprecate goldpinger and remove it from aks-core and eks-core.
- #459 Decrease Prometheus remote write max shards to reduce concurrent requests.
- #457 Increase Prometheus remote write max back off to mitigate DDOS.
- #454 Set prometheus remote write queue config, lowering default max shards and increasing default min back off.
- #451 Set revision history for all certificates to limit the amount of certificate requests.
- #448 [Breaking] Define namespaces that kube-state-metrics should gather metrics from.
This is a breaking change and will cause users that don't include all namespaces they want metrics from
in
kube_state_metrics_namepsaces_extras
to loose metrics. The default values are set in aks-core/eks-core so they are adjusted to our current platform namespaces. We hope this way of working can be improved in future kube-state-metrics releases
- #445 Re-enable resource limit ranges in EKS.
- #441 Fix dependancy between tenant namespaces and resources in namespace.
- #432 Add deletion protection to Flux components to prevent unwanted removal of critical components.
- #439 Add information about Azure AD Graph deprecation.
- #440 Allow scale down for nodes with local storage.
- #442 Fix Datadog monitoring of ingress-nginx and enable x-forwarded-for headers.
- #431 Downgrade external-dns helm chart to 5.4.8 and external-dns to 0.9.0
- #416 Enable Prometheus pod monitoring for azad-kube-proxy.
- #420 Add support for New Relic metrics and log exporting. This feature is optional opt-in and will have no effect on current deployments.
- #424 Add CI step to check if CHANGELOG.md is updated in your PR. If you want to ignore it add "ignore-changelog" label to your PR.
- #413 Add flow-log option to AWS, this is only meant for debugging and thus is disabled by default. If you run this in production it will be expensive fast.
- #415 Migrate from azdo-proxy to git-auth-proxy and update GitHub FluxV2 module to work with git-auth-proxy.
- #418 [Breaking] Update the Flux provider version to 0.4.0. Check the provider release for migration instructions.
- #423 Fix enabling monitors from aks-core and eks-core.
- #425 Switch to using https endpoint when scraping kubelet metrics in EKS.
- #426 Remove CPU limit in csi secrets driver as it could cause high throttling.
- #428 Deprecate kyverno and external-secrets modules.