Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid utf-8 byte sequences in varchar #252

Open
chulkilee opened this issue May 7, 2019 · 4 comments
Open

Invalid utf-8 byte sequences in varchar #252

chulkilee opened this issue May 7, 2019 · 4 comments

Comments

@chulkilee
Copy link

chulkilee commented May 7, 2019

I have an invalid utf-8 string in varchar, and mariaex returns it as binary, not string.

  • mariaex: 0.9.1
  • proxysql: 1.4.12 (severalnines/proxysql:1.4.12 docker image)
  • table: InnoDB / CHARSET=utf8 COLLATE=utf8_unicode_ci
  • column: varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci

Query:

query = "SELECT name, LENGTH(name), BIT_LENGTH(name) FROM my_table WHERE id = 1;"

When talking to ProxySQL:

{:ok, %Mariaex.Result{
    columns: ["name", "HEX(name)", "LENGTH(name)", "BIT_LENGTH(name)"],
    connection_id: #PID<0.458.0>,
    last_insert_id: nil,
    num_rows: 1,
    rows: [[<<199, 224, 226>>, "C387C3A0C3A2", 6, 48]]
  }
}

When talking directly to MySQL:

Mariaex.query(pid, query)

{:ok, %Mariaex.Result{
    columns: ["name", "HEX(name)", "LENGTH(name)", "BIT_LENGTH(name)"],
    connection_id: #PID<0.1256.0>,
    last_insert_id: nil,
    num_rows: 1,
    rows: [["Çàâ", "C387C3A0C3A2", 6, 48]]
  }
}

Examining the value:

str = "Çàâ"
raw = <<199, 224, 226>>

String.valid?(raw)
# => false

byte_size(str)
# => 6

byte_size(raw)
# => 3

to_charlist(str)
# => [199, 224, 226]

Base.decode16("C387C3A0C3A2")
# => {:ok, "Çàâ"}
@wojtekmach
Copy link
Contributor

@chulkilee hello, could you provide a bit more context what the problem is? Perhaps you can reproduce this problem with just Mariaex function calls?

@chulkilee
Copy link
Author

chulkilee commented May 9, 2019

@wojtekmach updated the issue description with more tests. I found that it only happens when I talk to ProxySQL. If I run the query directly MySQL, it works. Interesting..

I don't know much details how ProxySQL handles it. Probably it's a ProxySQL bug?

@IanLuites
Copy link

IanLuites commented May 24, 2019

We actually experienced the exact same issue.

UTF issues when Mariaex connects through ProxySQL.

Unable to read existing UTF fields and when writing writes garbled latin1.

@StanAnsems
Copy link

It looks like we have the same problem, for now we found a workaround that works for us. we added BINARY before the field we select, update or insert:

Mariaex.query("SELECT BINARY data FROM mytable WHERE id = ?", [id])
Mariaex.query("UPDATE mytable SET data = BINARY ?, [data])
Mariaex.query("INSERT INTO mytable (id, data) VALUES (?,BINARY ?), [id,data])

Maybe this helps for finding the issue or moving forward for now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants