RMySQL encoding issue on Windows

来源:互联网 发布:游戏里被广为人知的梗 编辑:程序博客网 时间:2024/06/03 19:14

https://stackoverflow.com/questions/43091376/rmysql-encoding-issue-on-windows-spanish-character-%C3%B1

While using RMySQL::dbWriteTable function in R to write a table to MySQL on Windows I get an error message concerning the character [ñ].

The simplified example is:

    table <- data.frame(a=seq(1:3), b=c("És", "España", "Compañía"))    table a        b1 1       És2 2   España3 3 Compañíadb <- dbConnect(MySQL(), user = "####", password = "####", dbname ="test", host= "localhost")RMySQL::dbWriteTable(db, name="test1", table, overwrite=T, append=F )Error in .local(conn, statement, ...) :   could not run statement: Invalid utf8 character string: 'Espa'

As you can see, there is no problem with the accents ("És") but there is with the ñ character ("España").

On the other hand, there is no problem with MySQL since this query works fine:

INSERT INTO test.test1 (a,b)values (1, "España");

Things I have already tried previous to write the table:

  1. Encoding(x) <- "UTF-8" for all table.

  2. iconv(x, "UTF-8", "UTF-8") for all table.

  3. Sent pre-query: dbSendQuery(db, "SET NAMES UTF8;")

  4. Change MySQL table Collation to: "utf-8-general, latin-1, latin-1-spanish...)

*Tried "Latin-1" encoding and didn't work either.

I have been looking for an answer to this question for a while with no luck.
Please help!

Versions:

MySQL 5.7.17

R version 3.3.0

Sys.getlocale()

[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=C"

PS: Works fine in Linux environment but I am stuck with Windows in my current project :(

shareimprove this question
 
 
There's nothing wrong with Windows, an OS that uses UTF16 since its inception (OK, UCS2 up to ~2000). MySQL does have issues since it doesn't support Unicode natively (ie as nvarchar) but depends on varchar fields with a UTF8 collation. That does lead to problems, if an application saves ASCII data with a different codepage to the field. – Panagiotis Kanavos Mar 29 at 11:01
 
Most of the time, the problem is the application itself, especially in C and C++. Instead of using multi-byte or Unicode types (char16_t, utf16string), applications use ASCII buffers and types and to store UTF8-encoded strings. To be fair, these were added in C++11. R is notorious for this - R itself compiles with Unicode support on Windows, but third-party packages are often inconsistent. Some use Unicode, some allow you to specify a locale, and the worst case, some depend on the user's locale to specify the codepage, just like any non-Unicode application – Panagiotis Kanavos Mar 29 at 11:04 
 
Before saying "it's a Windows problem", imagine having to work with data from multiple locales. You can't hard-code just one locale in LC_COLLATE when you have to deal with multiple codepages – Panagiotis Kanavos Mar 29 at 11:08 
 
Check How to set charset for MySQL in RODBC? RODBC allows you to set the connection's character set to UTF8. – Panagiotis Kanavos Mar 29 at 12:01
 
I was able to isolate the problem to the GUI. RStudio 1.0.136 and Rgui silently convert Unicode literals. Whatever you type is treated as a non-Unicode string. On my machine (Greek Locale), the ñ was converted to n. R Tools for Visual Studio on the other hand doesn't have this issue, the characters were preserved – Panagiotis Kanavos Mar 29 at 13:20 

1 Answer

activeoldestvotes
up vote0down vote

This works for me in Windows:

write.csv(table, file = "tmp.csv", fileEncoding = "utf8", quote = FALSE, row.names = FALSE)db <- dbConnect(MySQL(), user = "####", password = "####", dbname ="test", host= "localhost")dbWriteTable( db, value = "tmp.csv", name = "test1", append = TRUE, row.names = FALSE, sep = ",", quote='\"', eol="\r\n")
shareimprove this answer

原创粉丝点击