Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Including SessionToken in url with v2 driver causes SQLException: IAM error retrieving temp credentials #96

Open
Y-Asahi-dev opened this issue Sep 1, 2023 · 4 comments

Comments

@Y-Asahi-dev
Copy link

Driver version

2.1.0.9

Redshift version

PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.54052

Client Operating System

Amazon EMR ver6.9

※OS info
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"

JAVA/JVM version

openjdk version "1.8.0_382"
OpenJDK Runtime Environment Corretto-8.382.05.1 (build 1.8.0_382-b05)
OpenJDK 64-Bit Server VM Corretto-8.382.05.1 (build 25.382-b05, mixed mode)

Table schema

Problem description

I am using amazon-redshift-jdbc-driver v2 with Pyspark (Spark version 3.3.2).
I get a SQLException when I run the code below.
It seems that an error occurs if the URL has a SessionToken parameter.
After replacing the jdbc-driver with v1 I get ret.count() results successfully without any errors.

Has the behavior changed between v1 and v2 when there is a SessionToken?

//-----------------------------------------------------------
from pyspark import SparkContext
from pyspark.sql import SQLContext
import boto3

redshift_cluster_id = 'sample_cluster'
redshift_dbname = 'sample_db'
bucket_name = 'sample_bucket'

sc = SparkContext.getOrCreate()
credentials = boto3.Session().get_credentials()
region = boto3.Session().region_name
sc._jsc.hadoopConfiguration().set("fs.s3.awsAccessKeyId", credentials.access_key)
sc._jsc.hadoopConfiguration().set("fs.s3.awsSecretAccessKey", credentials.secret_key)

url = f'jdbc:redshift:iam://{redshift_cluster_id}:{region}/{redshift_dbname}?DbUser=test_uesr&DbGroups=test_users_group&AutoCreate=true&AccessKeyID={credentials.access_key}&SecretAccessKey={credentials.secret_key}&user=&password='
token = credentials.token
sc._jsc.hadoopConfiguration().set("fs.s3.awsSessionToken", token)
url = f'{url}&SessionToken={credentials.token}'

sql_context = SQLContext(sc)
dfr = sql_context.read
.format('io.github.spark_redshift_community.spark.redshift')
.option('url', url)
.option('tempdir', f's3a://{bucket_name}/redshift_temp_dir')
.option('forward_spark_s3_credentials', 'true')
.option('fetchsize', 10000)

query = f'select * from sample_table'
ret = dfr.option('query', query).load(schema=None)
ret.count()
-----------------------------------------------------------//

  1. Expected behaviour:
  2. Actual behaviour:
  3. Error message/stack trace:
    : java.sql.SQLException: IAM error retrieving temp credentials: The security token included in the request is invalid (Service: AmazonRedshift; Status Code: 403; Error Code: InvalidClientTokenId; Request ID: 78fc4d63-34c5-45a1-af4f-454c122d93e5; Proxy: null)
    at com.amazon.redshift.util.RedshiftException.getSQLException(RedshiftException.java:56)
    at com.amazon.redshift.Driver.connect(Driver.java:339)
    at org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper.connect(DriverWrapper.scala:46)
    at io.github.spark_redshift_community.spark.redshift.JDBCWrapper.getConnector(RedshiftJDBCWrapper.scala:270)
    at io.github.spark_redshift_community.spark.redshift.RedshiftRelation.$anonfun$schema$1(RedshiftRelation.scala:69)
    at scala.Option.getOrElse(Option.scala:189)
    at io.github.spark_redshift_community.spark.redshift.RedshiftRelation.schema$lzycompute(RedshiftRelation.scala:66)
    at io.github.spark_redshift_community.spark.redshift.RedshiftRelation.schema(RedshiftRelation.scala:65)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:440)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
    at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
    at java.lang.Thread.run(Thread.java:750)
    Caused by: com.amazonaws.services.redshift.model.AmazonRedshiftException: The security token included in the request is invalid (Service: AmazonRedshift; Status Code: 403; Error Code: InvalidClientTokenId; Request ID: 78fc4d63-34c5-45a1-af4f-454c122d93e5; Proxy: null)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1879)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1418)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1387)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:814)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541)
    at com.amazonaws.services.redshift.AmazonRedshiftClient.doInvoke(AmazonRedshiftClient.java:9234)
    at com.amazonaws.services.redshift.AmazonRedshiftClient.invoke(AmazonRedshiftClient.java:9201)
    at com.amazonaws.services.redshift.AmazonRedshiftClient.invoke(AmazonRedshiftClient.java:9190)
    at com.amazonaws.services.redshift.AmazonRedshiftClient.executeDescribeClusters(AmazonRedshiftClient.java:4581)
    at com.amazonaws.services.redshift.AmazonRedshiftClient.describeClusters(AmazonRedshiftClient.java:4549)
    at com.amazon.redshift.core.IamHelper.setClusterCredentials(IamHelper.java:546)
    at com.amazon.redshift.core.IamHelper.setIAMCredentials(IamHelper.java:518)
    at com.amazon.redshift.core.IamHelper.setIAMProperties(IamHelper.java:279)
    at com.amazon.redshift.jdbc.RedshiftConnectionImpl.(RedshiftConnectionImpl.java:260)
    at com.amazon.redshift.Driver.makeConnection(Driver.java:502)
    at com.amazon.redshift.Driver.connect(Driver.java:315)
    ... 24 more
  4. Any other details that can be helpful:

JDBC trace logs

Reproduction code

@bhvkshah
Copy link
Contributor

bhvkshah commented Sep 1, 2023

Hi @Y-Asahi-dev , thanks for reporting this issue. Can you try:
1/ with the latest driver, version 2.1.0.18
2/ passing the session token as part of a Property object, instead of as part of the URL?

@Y-Asahi-dev
Copy link
Author

Y-Asahi-dev commented Sep 5, 2023

@bhvkshah

1/ with the latest driver, version 2.1.0.18

There was no change in the results.

2/ passing the session token as part of a Property object, instead of as part of the URL?

https://docs.aws.amazon.com/ja_jp/redshift/latest/mgmt/spark-redshift-connector-other-config.html

Remove the IAM parameters from the URL and

temporary_aws_access_key_id
temporary_aws_secret_access_key
temporary_aws_session_token

When I specified it in the config setting value, the process was successful.
But this config can be processed normally even if the value is empty, so it probably doesn't work.
By removing the IAM parameter from the URL, it seems that Redshift is only auto-loading a valid IAM user.

thanks.

@ravi-Mode
Copy link

Hi @bhvkshah
I am also trying to use the iam role base authetication with assume role api using the jdbc driver version 2.1.0.30. Here I am observing strange behavior, I am able to generate acceskey, secerectAccesskey and session token using asume role api.

But when I pass this as part of URL string parameter then it is failing with Caused by: com.amazonaws.services.redshift.model.AmazonRedshiftException: The security token included in the request is invalid

However when I passed using properties file then it is able to pass this state but now failing with error message java.sql.SQLException: FATAL: user "IAM:" does not exist.

The same accessKey, secretAccesskey and session token if used with aws cli with redshift data api it is working fine.
aws redshift-data execute-statement --sql "select 1" --database dev --cluster-identifier <clusterId>

@ravi-Mode
Copy link

Hi @bhvkshah I am also trying to use the iam role base authetication with assume role api using the jdbc driver version 2.1.0.30. Here I am observing strange behavior, I am able to generate acceskey, secerectAccesskey and session token using asume role api.

But when I pass this as part of URL string parameter then it is failing with Caused by: com.amazonaws.services.redshift.model.AmazonRedshiftException: The security token included in the request is invalid

However when I passed using properties file then it is able to pass this state but now failing with error message java.sql.SQLException: FATAL: user "IAM:" does not exist.

The same accessKey, secretAccesskey and session token if used with aws cli with redshift data api it is working fine. aws redshift-data execute-statement --sql "select 1" --database dev --cluster-identifier <clusterId>

Adding to above, I have given s3 access as part of this role and when use s3 aws sdk api, then it is working so it is making sure that credential generated is correct there are something missing with jdbc driver.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants