Q. What is Bstrlib?
A. Bstrlib is a string data type library
for C and C++. It was written primarily with safety in mind, but also has
performance and functionality advantages while remaining totally
interoperable with ordinary char *
usage.
Q. What makes Bstrlib so much safer than using the standard C/C++ library?
A. Bstrlib contains these mechanism as part of its API:
memory management
The memory required to hold a string as it is modified is
automatically allocated and managed in the Bstrlib functions.
buffer overflow detection
The Bstrlib API has well defined interpretations for all legal values
of its parameters including character indexes and lengths that fall
outside the boundaries of a given bstring.
error propagation
Well defined error conditions are detected rather than leading to
undefined action. So scenarios with multiple failure conditions don't
need to be littered with large amounts of error detection -- it is
typically sufficient to put a single check at the end of a long series
Bstrlib operations. (The C++ API uses exception handling.)
write protection
bstrings declared statically or
constructed from unqualified sources are write protected. bstrings
can also be made write protected dynamically.
aliasing support
Unlike most C libraries, aliased parameters are detected and
supported and are given the most natural interpretation.
reduction of undefined scenarios
C's standard library (as well as many other libraries implemented in
C) suffers from being littered with a minefield of undefined
behaviors that result from a myriad of semantic conditions even when
passing parameters to functions which in of themselves are legal.
Bstrlib, as a matter of policy, does not allow this to happen. If
your parameters are legal with respect to their type (independent of
their meaning as that parameter), then using the Bstrlib API function
will not lead to any undefined behavior. (What a concept!)
Q. Doesn't all the safety of Bstrlib lead to increased overhead versus
standard char buffer usage?
A. Bstrlib uses a (length, data) internal
representation rather than '\0' termination. Because of this, functions
which require length determinations have dramatically improved performance
versus the corresponding standard C library functions (see the benchmarks in the feature comparisons
page.) Where performance is concerned, this means that string
manipulations of strings that are not very small will favor Bstrlib.
In addition, the Bstrlib API is substantially more functional than the C
library. This means that function call overhead is better amortized (by
virtue of not needing to call as many such functions) than the C library.
A minimum useage of Bstrlib measured on a variety of compilers shows an
additional object code size of between 18k and 28K.
All this being said, manipulations that are extremely trivial, on very
small strings may execute marginally faster just using the straight C
library.
Q. By always allocating memory for strings, does this open Bstrlib up to
denial of service attacks when receiving user input?
A. Bstrlib contains a function
bSecureInput () in the bstraux module which addresses this issue;
it takes an optional maximum length parameter for user input. So
malformed input cannot lead to unsually large amounts of resources to be
wasted unnecessarily.
Q. What is the relationship between bstrings and CBString?
A. CBString
is a C++ class that substantially uses bstrings to implement its
functionality. Both provide nearly the same functionality. Like STL's
std::string or MFC's CString, CBString
uses operator overloading, exception handling, and STL to maximally
leverage C++ functionality. The [] operator is additionally safe via
bounds checking. CBString throws
exceptions as a result of any error encountered, while Bstrlib propagates
such errors.
Q. Doesn't the bounds protection decrease the performance of Bstrlib?
A. For most operations, no. This only
affects per-character operations, where by default Bstrlib favors safety
over performance. However, one can always achieve higher performance by
gaining direct access to the buffer as necessary. I.e., its easy to be
safe and fast most of the time, and requires just a little effort (its
just a little more same-line typing) to be less-safe but always fast.
Q. What advantage does Bstrlib::CBString have over std::string?
A. std::string is an STL generic which will
be somewhat slower than CBString because
of it. std::string also does not contain a lot of standard character
string manipulations like format, findreplace, split and join. CBString
implements useful write protection (that extends into per character
protection.) With std::string it is cumbersome to be bounds protected,
and easy to be unsafe (this is opposite to Bstrlib::CBString, where
a cast and a dereference is required to drop safety.)
Q. Isn't using some of the string library extensions such as strlcat and
snprintf sufficient to overcome the main safety problems of char * buffer
usage so as to make Bstrlib unnecessary?
A. No. String manipulation, by its very
nature requires memory management to go hand in hand with each operation.
These solutions do not provide any such thing and continue to defer the
problem to the programmer. Arbitrary preconceived length cut-offs are
just not adequate. It leads to the typical double problem of "make the
buffer large enough for all reasonable cases" that just wastes memory in
the most typical cases, while not functioning ideally at all in uncommon
cases.
Buffer overflows only represent the very worst problem with string support
in the C/C++ languages. The other safety features in Bstrlib still give
it a notable advantage versus any typical function augmentation based on
the C library as a foundation.
A survey of other modern programming languages also shows that a larger
complement of string manipulation functions are required. Something that
the C library or STL or its extensions does not provide.
Q. Aren't all these safety mechanisms really only an issue with beginner
programmers? Is there any advantage for experienced programmers?
A. The safety features are not meant
merely to coddle the beginner programmer. Bstrlib uses
policies that minimize the number of required reallocs,
implements correct alias detection, performs no action when trying to
destroy a write protected bstring, accepts NULL parameters, deals with
out-of-range indexes, does not inhibit thread safety and adds
functionality found in other more modern programming languages.
While any sufficiently skilled programmer can duplicate (and possibly
improve on) any of this functionality, Bstrlib offers all these
features in one well tested package without the need for "reinventing the
wheel". The fact that other string libraries exist but are unable to match
Bstrlib feature for feature is evidence, that doing so is, perhaps,
easier said than done.
Q. Isn't Bstrlib really appropriate just for applications that are being
written scratch?
A. Not at all! Bstrlib is highly
interoperable with standard char * strings. Complete conversion from
char *'s to bstrings is usually not necessary, since the library contains
key functions for mixing the two and extracting a NUL ('\0') terminated
char * from a bstring is a trivial operation. Bstrlib also contains
macros which replaces the semantics of "pointer arithmetic" with a safe
superset (string segment arithmetic) of such functionality. So migrating
from char * usage to Bstrlib can be done incrementally without semantic
impediments.
Q. Isn't using a different programming language a better answer than using Bstrlib?
A. The C (and C++) language has been much
maligned for its severe lack of safety. And when one takes into account
the C standard library as well, one is inevitably lead to the conclusion
that this criticism is justified. While basically every other modern high
level language in existence is far safer than C in all modes of usage,
they also all pretty universally find themselves compromising in one way
or another:
Performance
Even if some alternative language can theortically perform comparably
with C (or if someone rigs a benchmark to make it suggest that the
performance of C can be equalled or beaten), there simply is no
denying the centuries (if not millienia) of man years of research and
effort put into C compiler optimizers coupled with the more natural
mapping of the C language to native machine language versus other
languages. This explains the rise of C++ over other languages like
Eiffel or Ada -- C++ is able to leverage all of the results from the
C world, while those languages are not.
Portability and Availability
Actually, from a semantics point of view, C/C++ are horribly
unportable. But this does not detract from the fact that just about
every serious computing platform today has a C or C++ compiler
available for it. C/C++ at best can be described as *syntactically*
portable, but for most development projects (which will typically be
limited to one or at most a few target platforms) this is definately
good enough.
While Java is semantically portable (as part of the standard, which in
of itself, is far more rigorously and usefully enforced than the
ANSI C standard) its availability does not include legacy platforms
and has been maligned and actively attacked by certain aggressive
proprietary software development houses. Regardless of the
righteousness of the cause, one must make practical decisions which
may require taking a pass on using Java.
Developer familliarity
There is simply a larger and better skill base amongst C/C++
programmers than others. While Java's programmer base is certainly
growing to challenge this, it has not yet acheived a situation where
the available pool of programmers of comparable skill level favors
it.
Libraries and tools
Given its 20+ year head start over languages like Perl, Python and
Java, there is clearly much more avaliable in terms of libraries and
tools.
None of these points should be taken as the final word on other programming
languages. I make them only to demonstrate that there certainly exist
compelling reasons for some to not switch to another programming language
even for modern software development.
All that being said, Bstrlib helps solve a problem in what is one of the
weakest areas for the C language -- the C standard library's pathetic
string support. Other languages like Perl and Python have made a point of
having really good and highly functional string support without suffering
from anything akin to a "buffer overflow" that really puts the C standard
library to shame. Use of Bstrlib substantially reduces this advantage.
So if one is compelled to switch languages because they believe that C is
just an unsafe language, the existence of Bstrlib is in effect saying
"maybe not".
Q. Does Bstrlib include regular expression searching?
A. No. There is no single defacto regular
expression standard, they are weaker than other parsing mechanisms (such
as context free, LALR grammars, etc) and each requires a fairly non-trivial
implementation in of themselves.
That said, Bstrlib is totally interoperable with char * usage. Any other, char * compatible library can be used in
conjunction with Bstrlib. Thus, most available regular expression
libraries (in particular including PCRE) can be used with Bstrlib.
Q. Does Bstrlib support Unicode?
A.
The library includes modules for rudimentary UTF8 support. Conversion to
and from UTF16 blocks is also supported. Normalization and collation are
not currently supported. Conversions of other obsolete international
encodings and GB18030 are planned for a future release.
Q. Does Bstrlib support garbage collection?
A. No. Certainly not directly. bstrings and CBStrings need to be correctly destroyed much
like any allocated memory or object in C/C++ to avoid memory leaks. That
said, both implementations allocate memory via malloc, realloc or new, so
the garbage collection mechanisms that do exist for C/C++ (such as the
Boehm garbage collector) should integrate with Bstrlib without issue.
Schemes such as "reference counting" do not work in a language like C
without a lot of hand holding (ADT construction, copying and destruction
would have to precisely track references to any contained strings) and can
inhibit the creation of thread safe solutions.
Q. Is Bstrlib thread safe?
A. The thread safety in Bstrlib is
comparable to that of a linked list or any other self-contained ADT
(abstract data type) rather than that of a system-defined atomic data-type
such as sig_atomic_t. I.e., reading and writing to the same bstring at
the same time leads to undefined results. However, there is no
restriction whatsoever from manipulation or reading two different
bstrings. Also clearly there can be any number of readers for a single
bstring. In particular, Bstrlib has no static/globally written state, so
using it won't lead to hidden or unavoidable race conditions. Every
function in Bstrlib is also re-entrant, and can be called even by
dynamically linked code provided they have a proper context for calling
malloc/realloc/free.
To support shared read/write semantics, it is sufficient and recommended
that exclusion/critical-sections be handled at higher layers in your code
by just making sure that the same bstring is not being modified by more
than one thread at once (this will bring the thread safety characteristics
to the level of, say, malloc/realloc/free with respect to the shared heap.)
This consideration is typical for ADT manipulation in multithreaded
programming, and therefore should not lead to any burden that is not
expected or already present.
Q. Why does Bstrlib implement abstracted stream reading functionality, but not
abstracted stream writing functionality?
A. Stream based writing was added to the
bstraux module of Bstrlib.
Q. How do I convince my programming organization to use Bstrlib?
A. Bstrlib is easy to migrate to with numerous good properties:
- Bstrlib/CBString has a short learning curve (there are not that many
functions/methods)
- Bstrlib allows for very concise code. This will inevitably lead to much
more maintainable code.
- Bstrlib's safety features will lead to improved reliability. It will
raise the level of less experienced programmers, while complementing the
capabilities of more experienced programmers.
- Bstrlib uses a safety model that can be educational to those that use
it. Some of the ideas used in Bstrlib (read-only strings, complete
alias support, absolutely minimized undefined behaviors) are rare or
just completely unseen in other existing libraries. Programmers who use
Bstrlib can be motivated to use its techniques in other code they write.
- Bstrlib has complete interoperability with ordinary '\0' terminated
char * buffers. This means that
using Bstrlib does not burn any bridges or in any way compromise the
ability to use other libraries which rely on char * buffers. This also provides a way
to migrate to use of Bstrlib in an incremental fashion.
- Bstrlib/CBString is totally portable
from compilation to run time behavior. It does not require UNIX tools
or MFC or any other common but non-standard mechanisms. (Use of STL and
exception handling can be turned off.)
- Bstrlib is well tested and in fact comes with an extensive unit test.
- Bstrlib comes with various utility functions which support "net
strings", CSV, base 64, UU and Y codecs which make it ideal for dealing
with MIME.
- It is dual licensed under both the BSD license and the GNU public
license. This means it can be used on any project and with any vendor
without serious issue.
Q. How does Bstrlib compare to other solutions?
A.
See the Bstrlib feature comparison table or
refer to the Bstrlib documentation which gives more detailed comparisons
between existing string library alternatives as well as with the standard C
library.
Q. Have you thought of proposing Bstrlib to the ANSI C committee?
A.
I have, and at one point I was tempted. Then I realized, that the fact that
Bstrlib is open source, and is the same everywhere actually makes it much
more portable, and reliable. Some closed source compiler vendors offer the
source code of their libraries as a "value add", and as such I would just be
playing into their hands. Very few real world developers follow the
developments of the ANSI C committee, and not all compiler vendors even
bother to implement the standard as it comes out. So it is not obvious that
developers would really use Bstrlib, just because it was in the standard.
|