Ошибка эцп unable to parse the encoded bytes

This fails because I get the following error message:

Msg 9402, Level 16, State 1, Line 14

XML parsing: line 1, character 38, unable to switch the encoding

Is there a way that I can cast @DataSheetXML to @DataSheetXML2, ie from NVARCHAR(MAX) to XML?

June 4, 2021 at 5:55 pm

Was doing a little bit of random testing on this and I think the problem is that utf-8 is going to be VARCHAR. utf-16 would be NVARCHAR.

This is easy to test by taking your SET statement and changing the second convert from NVARCHAR(MAX) to VARCHAR(MAX). Alternately, you can change your encoding to utf-16 in the XML.

Now, to handle this in coded format, I think you would need to have an IF statement. Something along the lines of this:

if the string "utf-8" exists in your XML, then convert to VARCHAR(MAX), otherwise leave it as an NVARCHAR(MAX). No need to CONVERT @DataSheetXML from NVARCHAR(MAX) to NVARCHAR(MAX) as that isn't doing anything.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

June 7, 2021 at 2:25 pm

Thanks Brian. that worked perfectly for me

June 7, 2021 at 2:57 pm

Not a problem. I imagine there are other solutions as well. If you KNOW you are wanting NVARCHAR for sure, you could also do a REPLACE on the utf-8 to be utf-16. This is MAY break the XML though (shouldn't but never say never). Or if you are ALWAYS going to be working with utf-8 XML, then having the first parameter as a VARCHAR would save some conversions.

You can see there are a lot of encoding types. My approach above ONLY handles utf-8 and casts that to VARCHAR first. I am not certain if any of the other types (such as US-ASCII) would need to be converted to VARCHAR or if NVARCHAR can handle it. I expect if you don't know what encoding(s) will be used, you will need to do trial and error, with my above method, on all different encoding types and there are a lot.

As it is a large list to parse through if you needed to check all of them, I would take a different approach. A safer approach would be something like:

The above approach ends up using 2 try catch blocks which I generally try to avoid in SQL if I can, but I think this is going to be easier than trying to test and capture all possible encoding types. Plus it handles the case where it can't convert to VARCHAR or NVARCHAR. I left the explicit NVARCHAR conversion in even though we have the NVARCHAR at the start because to me it makes it clearer to see what we are doing. That is, it increases readability for future developers. It is not required.

Plus, that last CONVERT for handling the error could be handled by parsing the XML rather than converting it. Parsing will likely give better performance, but I expect it to be very minimal performance gain. In this case, the XML is short, so converting it and doing a LIKE comparison on it should be quick and allows you to handle the error in the event some encoding type can't be converted to VARCHAR or NVARCHAR.

However that same code brings back the below error when ran in Test.

Msg 9402, Level 16, State 1, Line 9 XML parsing: line 1, character 38, unable to switch the encoding

I have seen the fix provided by this site of conversion of UTF and this works in both prod and test. See below. However i need to provide an answer to the developers of why this behavior is occurring and a rationale why they should change their code(if that is the case)

I have compared both DB's and looked for anything obvious such as ansi nulls and ansi padding. Everything is the same and the version of SQL Server. This is SQL SERVER 2012 11.0.5388 version. Data between environments is different but the table schema is identical and the data type for col1xml is ntext.

Because the encoding of the XML is (implicitly or explicitly) fully determined by the underlying string type, your XML documents should not contain encoding directives -- these do nothing but take up space and potentially trip up the parser. You're best off stripping these entirely on storing them, if you can't avoid getting them (that is, don't replace them with encoding="utf-8" , replace them with nothing). Most XML libraries can be convinced to not output an XML declaration, or at least not one with an encoding.

UPDATE Differences in environment

You write: Data between environments is different

If your column would store the XML in the native type, you would not need a cast (which is very expensive!!) at all. But in your case this cast depends on the actual XML. As this is stored in NTEXT it is 2-byte-string. If your XML starts with a declaration stating a non-supported encoding (in most cases utf-8 ), this will fail.

I'm trying to insert into XML column (SQL SERVER 2008 R2), but the server's complaining:

System.Data.SqlClient.SqlException (0x80131904):
XML parsing: line 1, character 39, unable to switch the encoding

I found out that the XML column has to be UTF-16 in order for the insert to succeed.

The code I'm using is:

How can I serialize object to be in UTF-8 string?

EDIT: Ok, sorry for the mixup - the string needs to be in UTF-8. You were right - it's UTF-16 by default, and if I try to insert in UTF-8 it passes. So the question is how to serialize into UTF-8.

Example

This causes errors while trying to insert into SQL Server:

Update

I figured out when the SQL Server 2008 for its Xml column type needs utf-8, and when utf-16 in encoding property of the xml specification you're trying to insert:

When you want to add utf-8 , then add parameters to SQL command like this:

If you try to add the xmlValueToAdd with encoding=utf-16 in the previous row it would produce errors in insert. Also, the VarChar means that national characters aren't recognized (they turn out as question marks).

To add utf-16 to db, either use SqlDbType.NVarChar or SqlDbType.Xml in previous example, or just don't specify type at all:

Can you not keep everything as XML, rather than converting it into a string in your application, only to have SQL Server try to convert it back into XML?

8 Answers 8

This question is a near-duplicate of 2 others, and surprisingly - while this one is the most recent - I believe it is missing the best answer.

In the end, it doesn't matter what encoding is declared or used, as long as the XmlReader can parse it locally within the application server.

By using SqlXml , XML will be sent pre-parsed to the database, and then the DB doesn't need to know anything about character encodings - UTF-16 or otherwise. In particular, note that the XML declarations aren't even persisted with the data in the database, regardless of which method is used to insert it.

Please refer to the above-linked answers for methods that look very similar to this, but this example is mine:

Note that I would not consider the last (non-commented) example to be "production-ready", but left it as-is to be concise and readable. If done properly, both the StringReader and the created XmlReader should be initialized within using statements to ensure that their Close() methods are called when complete.

Please don't make waste by running XML through extra conversions (de-deserializations and serializations - to DOM, strings, or otherwise), as shown in other answers here and elsewhere.

Fretice commented Sep 13, 2018

i found that when i use certbot-auto in windows remote, it throw me this error ,but when i use it in linux remote ,it works great

Here is the relevant nginx server block or Apache virtualhost for the domain I am configuring:

The text was updated successfully, but these errors were encountered:

dsignr commented Apr 29, 2021

After reading a comment on another thread by user @egberts, I ran the following command:

grep -r -P '[^\x00-\x7f]' /etc/apache2 /etc/letsencrypt /etc/nginx

That command found the offending character "´" in one .conf file in the comment. After removing it (you can edit comments as you wish) and reloading nginx, everything worked again.

Three years later and this is still a life saver. THANK YOU!

1 Answer 1

In SQL Server you should store XML in a column typed XML . This native type has a got a lot of advantages. It is much faster and has implicit validity checks.

From your question I take, that you store your XML in NTEXT . This type is deprecated for centuries and will not be supported in future versions! You ought to change this soon!

SQL-Server knows two kinds of strings:

1 byte strings ( CHAR or VARCHAR ), which is extended ASCII
Important: This is not UTF-8! Native UTF-8 support will be part of a coming version.
2 byte string ( NCHAR or NVARCHAR ), which is UTF-16 (UCS-2)

If the XML has a leading declaration with an encoding (in most cases this is utf-8 or utf-16 ) you can get into troubles.

If the XML is stored as 2-byte-string (at least the NTEXT tells me this), the declaration has to be utf-16 . With a 1-byte-string it should be utf-8 .

The best (and easiest) was to ommit the declaration completely. You do not need it. Storing the XML in the appropriate type will kill this declaration automatically.

What you should do: Create a new column of type XML and shuffle all your XMLs to this column. Get rid of any TEXT , NTEXT and IMAGE columns you might have!

The next step is: Be happy and enjoy the fast and easy going with the native XML type :-D

mauryakrishna commented Jan 4, 2021 •

After reading a comment on another thread by user @egberts, I ran the following command:

grep -r -P '[^\x00-\x7f]' /etc/apache2 /etc/letsencrypt /etc/nginx

That command found the offending character "´" in one .conf file in the comment. After removing it (you can edit comments as you wish) and reloading nginx, everything worked again.

Running the grep command as mentioned above did find a unwanted character in my nginx .conf file. Replacing that with the valid character does resolve the issue.

Finesse commented Sep 15, 2018 •

I had the same problem. I've found the following workaround (on Ubuntu):

After doing it, running cerbot --nginx successes.

mardiros commented Jul 5, 2018

For instance, the mime type files contains this lines:

g³ will make raise the UnicodeDecodeError

Here is a Certbot log showing the issue (if available):

Certbot's behavior differed from what I expected because:

joohoi commented Jul 6, 2018

Thanks for the additional information!

Bill0412 commented Jan 22, 2020

Thanks, it worked for me.

Kassan424kh commented Aug 20, 2019

Hy ✌🏻,
you should only using other terminal.
I have the same problem while used mac terminal, than i used windows bash to get it works.

I installed Certbot with (certbot-auto, OS package manager, pip, etc):

pacman -S cerbot-nginx

A34 commented Apr 25, 2019 •

I got a similar issue, it was in /etc/nginx/conf.d/default.conf line 13 to 17:

I removed those comments and it worked.

Thanks @egberts & @TommyZG for the grep tip.

TommyZG commented Oct 29, 2018

It was in one of my .conf files. You have others. It was in the comment line.

da3020 commented Sep 18, 2018

Hi! The reason is that if you have some non ascii letters in nginx config (even in comments. ) it will not work.

I ran this command and it produced this output:

fandigunawan commented Feb 3, 2022

After reading a comment on another thread by user @egberts, I ran the following command:

grep -r -P '[^\x00-\x7f]' /etc/apache2 /etc/letsencrypt /etc/nginx

That command found the offending character "´" in one .conf file in the comment. After removing it (you can edit comments as you wish) and reloading nginx, everything worked again.

Я пытаюсь вставить в столбец XML (SQL SERVER 2008 R2), но сервер жалуется:

System.Data.SqlClient.SqlException (0x80131904):
XML parsing: line 1, character 39, unable to switch the encoding

Я обнаружил, что столбец XML должен быть UTF-16, чтобы вставка прошла успешно.

Код, который я использую:

Как я могу сериализовать объект в строке UTF-8?

РЕДАКТИРОВАТЬ : Хорошо, извините за путаницу - строка должна быть в UTF-8. Вы были правы - по умолчанию это UTF-16, и если я пытаюсь вставить UTF-8, он проходит. Итак, вопрос в том, как сериализовать в UTF-8.

Пример

Это вызывает ошибки при попытке вставки в SQL Server:

Обновлять

Я понял, когда SQL Server 2008 для своего Xml типа столбца нуждается в utf-8, а когда utf-16 в encoding свойстве спецификации xml, которую вы пытаетесь вставить:

Если вы хотите добавить utf-8 , добавьте параметры в команду SQL следующим образом:

Если вы попытаетесь добавить xmlValueToAdd encoding=utf-16 в предыдущую строку, это приведет к ошибкам при вставке. Кроме того, VarChar означает, что национальные символы не распознаются (они превращаются в вопросительные знаки).

Чтобы добавить utf-16 в db, либо используйте SqlDbType.NVarChar or SqlDbType.Xml в предыдущем примере, либо вообще не указывайте тип:

Разве вы не можете сохранить все как XML, а не преобразовывать его в строку в своем приложении только для того, чтобы SQL Server попытался преобразовать его обратно в XML?

Этот вопрос является почти дубликатом двух других, и, что удивительно, хотя этот вопрос является самым последним, я считаю, что в нем отсутствует лучший ответ.

Дубликаты и то, что я считаю их лучшими ответами, таковы:

В конце концов, не имеет значения, какая кодировка объявлена или используется, если XmlReader можно проанализировать ее локально на сервере приложений.

При использовании SqlXml XML будет отправлен в базу данных предварительно проанализированным, и тогда БД не нужно ничего знать о кодировках символов - UTF-16 или что-то еще. В частности, обратите внимание, что объявления XML даже не сохраняются вместе с данными в базе данных, независимо от того, какой метод используется для их вставки.

Пожалуйста, обратитесь к приведенным выше ответам для методов, которые очень похожи на это, но этот пример мой:

Обратите внимание, что я не считаю последний (без комментариев) пример «готовым к использованию», но оставил его как есть, чтобы он был кратким и удобочитаемым. Если все сделано правильно, то и the, StringReader и created XmlReader должны быть инициализированы в using операторах, чтобы их Close() методы вызывались по завершении.

Пожалуйста, не тратьте деньги , запуская XML через дополнительные преобразования (де-десериализация и сериализация - в DOM, строки или иным образом), как показано в других ответах здесь и в других местах.

ohemorange commented Sep 18, 2018

Fretice commented Sep 7, 2018

@TommyZG can u tell me change which file ?

TommyZG commented Sep 2, 2018

After reading a comment on another thread by user @egberts, I ran the following command:

grep -r -P '[^\x00-\x7f]' /etc/apache2 /etc/letsencrypt /etc/nginx

That command found the offending character "´" in one .conf file in the comment. After removing it (you can edit comments as you wish) and reloading nginx, everything worked again.

shiroorg commented May 5, 2018

Читайте также: