Wednesday, December 11, 2013

Writing a STL-style UTF-8 String Class Part 5

Update: The latest code can now be found on GitHub. Check the post "STL-Style UTF-8 String Class now on GitHub" for more information.
____________________________________________________________________________

In this final part of my blog series "Writing a STL-style UTF-8 String Class", I will continue to make this string clas behave more like std::string. First I added support for using different allocators. How did I go about doing this? I took a lesson from std::basic_string and did it using templates. This means the entire utf8string class is now a template class. This is the basic setup now:
namespace sd_utf8
{

template <class Alloc = std::allocator<_uchar8bit>>
class _utf8string
{
 public:
  // some types that we need to define to make this work like an stl object
  // internally this is an std string, but outwardly, it returns __char32bit
  typedef _char32bit   value_type;
  typedef _char32bit   *pointer;
  typedef const _char32bit *const_pointer;
  typedef _char32bit   &reference;
  typedef const _char32bit &const_reference;
  typedef size_t    size_type;
  typedef ptrdiff_t   difference_type;

  ...

  // make our iterator types here
  typedef utf8string_iterator<value_type>   iterator;
  typedef utf8string_iterator<const value_type> const_iterator;
  typedef value_reverse_iterator<iterator>  reverse_iterator;
  typedef value_reverse_iterator<const_iterator> const_reverse_iterator;

 private:
  std::basic_string<_uchar8bit, std::char_traits<unsigned char>, Alloc> utfstring_data;

 public:
  ...


};

typedef _utf8string<> utf8string;

}


I've added the code "typedef _utf8string<> utf8string;" so I'll still be able to use utf8string with the default template parameters. Using templates in this way makes adding an allocator very simple. Of course, this also mean some changes were neccessary in utf8utils.h"

<// use a template method because the 16bit and 32 bit implementations are identical
// except for the type
template <typename char_type, typename Alloc>
inline void MakeUTF8StringImpl(const char_type* instring, std::basic_string<_uchar8bit data-blogger-escaped-alloc=""> &out, bool appendToOut)
{
...
}

Next I added all of the other std::string methods(functions), but not all of the overloads. I learned something working on this project. std::string has a ton of functions. Since the class uses a std::basic_string internally, I was able to use many of the STL functions to do the work. I just had to make sure the parameters were correct. replace() was a little trickier, but then I realized that replace was just an erase and then an insert.

_utf8string<Alloc>& replace (size_type pos, size_type len, const _utf8string<Alloc>& str)
{
 // make copy so exceptions won't change string
 _utf8string<Alloc> temp_copy(*this);

 // erase
 temp_copy.erase(pos, len);

 // insert
 temp_copy.insert(pos, str);

 assign(temp_copy);

 return *this;
}

Most of the function overloads that I didn't implement were ones that took iterators as parameters. Since my iterators are always constant and also since they don't have a reference to the actual utf8string object, addind those functions would have been difficult, but not impossible. I could have implemented them, but I'd have to re-scan the string each time and I thought that would be a bit wasteful.

This was a fun project for me. Here's a tip if you ever want to build something like this. There are so many functions that you need to try to reuse code as much as you can and find easy ways to do things.

What's next? Over the next few weeks, I'll continue to tinker with the code. My plan is to put this up on SourceForge and make it opensource. I'll try to get that set up within the next two weeks, but in the meantime, I need to work on my main project Auxnet: Battlegrounds. Thanks for reading this long series. I hope you were able to benefit from it.

For the complete source code

utf8string.h
utf8utils.h

-------------
For the other post in this series:

Writing a STL-Style UTF-8 String Class Part 1
Writing a STL-Style UTF-8 String Class Part 2
Writing a STL-Style UTF-8 String Class Part 3
Writing a STL-style UTF-8 String Class Part 4
Writing a STL-style UTF-8 String Class Part 5