Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parsing unicode characters #1

Open
szmcdull opened this issue Aug 13, 2019 · 2 comments
Open

parsing unicode characters #1

szmcdull opened this issue Aug 13, 2019 · 2 comments

Comments

@szmcdull
Copy link

szmcdull commented Aug 13, 2019

A patch provided:
) parse unicode characters;
) save/restore offset before/after ToString();

From: 
Date: Tue, 13 Aug 2019 12:42:08 +0800
Subject: [PATCH] parse MBCS characters; save/restore offset before/after
 ToString();

---
 PHPDeserializer/PHPDeserializer.cs | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/PHPDeserializer/PHPDeserializer.cs b/PHPDeserializer/PHPDeserializer.cs
index 377d675..ca7d3ba 100644
--- a/PHPDeserializer/PHPDeserializer.cs
+++ b/PHPDeserializer/PHPDeserializer.cs
@@ -232,7 +232,7 @@ namespace SickSixtySix.PHPDeserializer
             string str = null;
             try
             {
-                str = m_string.Substring(m_offset, length);
+                str = Encoding.UTF8.GetString(m_string, m_offset, length);
                 m_offset += length;
             }
             catch (ArgumentOutOfRangeException)
@@ -308,12 +308,12 @@ namespace SickSixtySix.PHPDeserializer
         /// <summary>
         /// String containing serialized data
         /// </summary>
-        private string m_string;
+        private byte[] m_string;
 
         /// <summary>
         /// Current parsing offset
         /// </summary>
-        private int m_offset;
+        private int m_offset = 0;
 
         /// <summary>
         /// Character at current offset
@@ -324,7 +324,7 @@ namespace SickSixtySix.PHPDeserializer
             {
                 try
                 {
-                    return m_string[m_offset];
+                    return (char) m_string[m_offset];
                 }
                 catch (IndexOutOfRangeException)
                 {
@@ -340,6 +340,11 @@ namespace SickSixtySix.PHPDeserializer
         /// </summary>
         /// <param name="str"></param>
         public PHPDeserializer(string str)
+        {
+            m_string = Encoding.UTF8.GetBytes(str);
+        }
+
+        public PHPDeserializer(byte[] str)
         {
             m_string = str;
         }
@@ -400,8 +405,11 @@ namespace SickSixtySix.PHPDeserializer
         /// <returns>PHP-like associative array representation</returns>
         public override string ToString()
         {
+            var offset = m_offset;
+            m_offset = 0;
             var stringBuilder = new StringBuilder();
             traverse(Deserialize(), stringBuilder);
+            m_offset = offset;
             return stringBuilder.ToString();
         }
 
-- 
2.20.1.windows.1

@SickSixtySix
Copy link
Owner

Hi! Thank you for improving the project! I see you have forked it. Can you create a pull request, so I can apply your changes? If not, I will add these changes by myself reflecting your name in a commit description. My apologies for a late answer on the issue, I was very busy.

@szmcdull
Copy link
Author

Sure. Also I added handling for the 'd' type. There is a parseDouble() but I did not used it as my projects use decimal most of the time. So I implemented a new parseDecimal(). Not sure if you can accept it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants