Welcome to the first post of the C# In Simple Terms Addendum! I've added a few new topics to this mega series to better cover all the features C# has to offer.

In this post, let's take at the string type and how we can manipulate it. We'll also have a brief overview of C# cultures and their purpose, to prepare us for a later article.

A closeup of two people playing violas in concert.
Our string manipulation will be without a bow. Photo by Manuel Nägeli / Unsplash

The Sample Project

exceptionnotfound/CSharpInSimpleTerms
Contribute to exceptionnotfound/CSharpInSimpleTerms development by creating an account on GitHub.
Project for this post: 19StringManipulation

Cultures

C# and the .NET Framework have many different "cultures", which are collections of rules about how text, dates, and times are meant to be written and can be compared. These rules include:

  • The letters or symbols used in text.
  • The format of dates and times (e.g. month/day/year vs day/month/year).
  • The "word breakers" that determine what a word is (you can imagine that English and German word breakers are very different).
  • Plus many other things.

Cultures are often identified by a code such as "en-US" (American English), "en-GB" (British English), "zu" (isiZulu), "zh-Hant" (Traditional Chinese), or "ar-sa" (Saudi Arabian Arabic).

By default, a C# program will use the culture that is set in the operating system of the machine it is running on.

We should be aware that it is possible, though rare, for cultures to change their desired formats and standards. If the United States were to decide that their standard date format is now yyyy/dd/MM, for some reason, the culture for "en-US" would change. Also, users of a system can make changes to the culture that system runs on (in the operating system), and C#/.NET will respect those changes, so it is theoretically possible for any given culture to have different rules on different machines.

Namespace

Culture information is in the System.Globalization namespace.

Creating and Using a Culture Object

We can create a culture object using the culture's code and the CultureInfo class.

CultureInfo myCulture = new CultureInfo("es-ES"); //Spain - Spanish

When converting objects to strings, we can pass in the format object of the culture to an overload of ToString().

CultureInfo myCulture = new CultureInfo("es-ES"); //Spain - Spanish

DateTime now = DateTime.Now;

//Will output in the dd/MM/yyyy with 24-hour clock format
Console.WriteLine(now.ToString(myCulture.DateTimeFormat)); 

myCulture = new CultureInfo("en-US"); //English - United States

DateTime now = DateTime.Now;

//Will output in the MM/dd/yyyy with 12-hour clock format
Console.WriteLine(now.ToString(myCulture.DateTimeFormat)); 

The Invariant Culture

There is a special culture we can use called the invariant culture. This culture is not associated with any particular region of the world, but is based on English. Further, this culture does not change.

We access this culture using the CultureInfo.InvariantCulture property.

var invariantCulture = CultureInfo.InvariantCulture;

//Will output in the MM/dd/yyyy format
Console.WriteLine(now.ToString(invariantCulture.DateTimeFormat));

Escape Characters

When writing a string, it is possible to use specific characters to represent text features such as tabs or newlines; we call these escape characters.

For example, if we wanted to create a string with a newline character in it, we would use the \n escape character.

string withNewline = "This is on the first line.\nThis is on the second.";

Escape characters are always marked by a backslash (\). There are several kinds of escape characters we can use, including:

  • \t - Horizontal tab
  • \r - Carriage return
  • \b - Backspace

Escaping Literals

Sometimes we want to include characters that normally mark the beginning and end of strings in the string itself, or the escape characters within. We can escape such characters, including:

  • \' - Single quote
  • \" - Double quote
  • \\ - Backslash
string dialogue = "She said, \"I didn't know that was him!\".";

String Operators ($ and @)

There are two special operators we can use with strings. One is $, the interpolated string operator, which we saw in the previous post and allows us to insert variables or values directly into the string:

string name1 = "Jack";
string name2 = "Quint";

Console.WriteLine($"Good morning {name1} and {name2}!");

The other operator is @, the literal string operator, which tells C# to ignore any escape characters found in the string. This is particularly useful for file paths, such as this:

string filePath = @"C:\this\is\a\file\path";

Without the @ operator, each \ in the string above would be read as an escape character.

Formatting Numeric Strings

We can use special character sequences to format strings that contain numbers using the ToString() method.

Standard Numeric Formats

For strings that contain numbers, we can use a set of standard formats.

decimal money = 5.67M;
Console.WriteLine(money.ToString("C")); //Currency, e.g. $5.67

double percentage = 0.67;
Console.WriteLine(percentage.ToString("P")); //Percentage, e.g. 67.00%

The rest of the formats are found here:

Standard numeric format strings
In this article, learn to use standard numeric format strings to format common numeric types into text representations in .NET.

Custom Numeric Formats

Let's imagine that we have a phone number that is stored as an integer:

int phoneNumber = 2125559731;

We can use a custom format to output this number as a string to look like a phone number:

int phoneNumber = 2125559731;
string format = "(###) ###-####";
Console.WriteLine(phoneNumber.ToString(format));
//Output: (212) 555-9731

More information about custom numeric formats is available in the Microsoft documentation:

Custom numeric format strings
Learn how to create a custom numeric format string to format numeric data in .NET. A custom numeric format string has one or more custom numeric specifiers.

String Concatenation

Many times when developing a C# app, we will want to combine strings to form a new string; this is called string concatenation. There are several ways to do this, and some are more valid than others.

Naive Concatenation

A naive way to concatenate strings is to use the + operator.

int value = 6;

Console.WriteLine("The value is " + value.ToString());

For many scenarios, this is perfectly fine. However, if you need to deal with more than just a few strings, there are better options available.

String.Concat

Another option is the method String.Concat(), which combines a variable number of string into a single string. This method uses the params keyword we discussed in the Methods, Parameters, and Arguments article.

string hello = "Hello ";
string firstName1 = "Jack, ";
string firstName2 = "Quint, ";
string firstName3 = "June, ";
string firstName4 = "and Dirk!";

var combined = string.Concat(hello, 
                             firstName1, 
                             firstName2, 
                             firstName3, 
                             firstName4);
Console.WriteLine(combined);
Extra special bonus points to whomever knows what these names are from.

However, this method gets harder to use and harder to read the more strings you have to deal with. For large numbers of strings, there's a better way.

StringBuilder

.NET provides us with a class called StringBuilder which is intended to deal with large numbers of strings efficiently. To use it, we instantiate an object, add strings, and have it generate the combined result with the ToString() method.

StringBuilder builder = new StringBuilder();

for(int i = 0; i < 100; i++)
{
    builder.Append(i);
    builder.Append(" ");
}

Console.WriteLine(builder.ToString());

Searching in Strings

C# and .NET provide a few methods that allow us to search within strings for particular values.

Contains()

The string.Contains() method returns a boolean that says whether or not the string contains another string. The string that is being searched for must be an exact match for text in the string being searched.

string sentence = "This is a sentence.";

Console.WriteLine(sentence.Contains("is a")); //true
Console.WriteLine(sentence.Contains("isa")); //false

IndexOf()

We can get the first index of a given substring using the string.IndexOf() method.

string sentence = "This is a sentence.";

int index = sentence.IndexOf("is");
Console.WriteLine($"Found the substring 'is' at position {index}");
//Position 2. Note that this is the position of "is" in the word "this".

StartsWith() and EndsWith()

Finally, we can check if a string either starts with or ends with a given substring using the string.StartsWith() and string.EndsWith() methods.

string sentence = "This is a sentence.";

bool startsWith = sentence.StartsWith("This"); //true
bool endsWith = sentence.EndsWith("tence"); //false, missing the period

Trimming

We can remove all whitespace from the beginning, end, or both sides of a string using the Trim() methods. This is called trimming.

string sentenceWhitespace = "     This has some whitespace.     ";

//Removes starting whitespace
Console.WriteLine(sentenceWhitespace.TrimStart() + "End of line.");

//Removes ending whitespace
Console.WriteLine(sentenceWhitespace.TrimEnd() + "End of line.");

//Removes both
Console.WriteLine(sentenceWhitespace.Trim() + "End of line.");

Padding

We can add characters to either the start or end of a string using the PadLeft() and PadRight() methods. Note that each method takes the total length of the string after padding, so if you want a 7-character string to be padded to 10 characters, you pass 10 for this parameter.

string sevenDigitPhone = "1234567";

Console.WriteLine(sevenDigitPhone.PadLeft(10, '0')); //0001234567
Console.WriteLine(sevenDigitPhone.PadRight(10, '0')); //1234567000

Change Case

We can change the case of a string to all-uppercase or all-lowercase using their corresponding methods.

string mixedCaseString = "ThIs Is A mIxEd cAsE StRiNg.";

Console.WriteLine(mixedCaseString.ToUpper()); //THIS IS A MIXED CASE STRING
Console.WriteLine(mixedCaseString.ToLower()); //this is a mixed case string

Equality Comparisons

Strings are a bit trickier to compare for equality than, say, numbers. When we compare two strings, we must decide if:

  • we will care about UPPER and lower case letters, or not.
  • we want to compare strings in the same culture or not, and if not, which culture to use, AND
  • whether we want to use an ordinal or a linguistic comparison.

Information about ordinal or linguistic comparisons can be found in Microsoft's official documentation. Both terms are also defined in the glossary below.

How to compare strings - C# Guide
Learn how to compare and order string values, with or without case, with or without culture specific ordering

Naive Comparison

We can do a simple comparison of two strings using the == and != operators. This kind of comparison requires that the two strings are either both null or have exactly the same length and identical characters at each position.

string test1 = "This is a semicolon ;";
string test2 = "This is a semicolon ;";
string test3 = "This is a semicolon ;"; //This one is a greek question mark ;.

Console.WriteLine($"test1 and test2 are {(test1 == test2 ? "equal" : "NOT equal")}"); //equal
Console.WriteLine($"test2 and test3 are {(test2 == test3 ? "equal" : "NOT equal")}"); //NOT equal

String.Equals()

We can use the String.Equals() method to check for equality between two strings. We can optionally pass values from the class StringComparison to specify how we want to check for equality. In particular, we can choose to ignore the casing of the string using options that include IgnoreCase.

string firstString = "This is the First String.";
string secondString = "This Is The First String.";

Console.WriteLine(firstString.Equals(secondString)); //False

Console.WriteLine(firstString
                  .Equals(secondString, 
                          StringComparison.OrdinalIgnoreCase)); //True
                          
Console.WriteLine(firstString
                  .Equals(secondString, 
                          StringComparison.InvariantCultureIgnoreCase)); //True
                          
Console.WriteLine(firstString
                  .Equals(secondString, 
                          StringComparison.InvariantCulture)); //False

Splitting Strings

We can split strings into substrings using the String.Split() method and identifying a character that should be split on. For example, we can split a string into a collection of works by specifying a space as a delimiter.

string toBeSplit = "This is a bunch of words and we will split this sentence.";

var words = toBeSplit.Split(' ');

foreach (var word in words)
{
    Console.WriteLine(word);
}

//OUTPUT
//This
//is
//a
//bunch
//of
//words
//and
//we
//will
//split
//this
//sentence.

Note that the character we are splitting by will not be included in the elements of the resulting array.

We can also split by multiple characters by passing in an array of them to the Split() method.

string toParse = "This:is another\tstrange/sentence.";

char[] delimiters = { ':', ' ', '\t', '/' };

var words = toParse.Split(delimiters);

foreach (var word in words)
{
    Console.WriteLine(word);
}

//OUTPUT
//This
//is
//another
//strange
//sentence.

We can even split by using string delimiters:

string toSplit = "This...is our final>>odd sentence.";

string[] stringDelimiters = { ">>", "..." };

words = toSplit.Split(stringDelimiters, StringSplitOptions.RemoveEmptyEntries);

foreach (var word in words)
{
    Console.WriteLine(word);
}

//OUTPUT
//This
//is our final
//odd sentence.

Glossary

  • Escape characters - Special characters in a string marked by a backslash \, such as \n for a newline and \' for a literal single quote.
  • String concatenation - The process of joining two or more strings together to form a new string.
  • Trimming - The act of removing whitespace, such as spaces or tabs, from the beginning or end of a string.
  • Padding - The act of adding characters to the beginning or end of a string until that string is of a specified length.
  • Ordinal comparison - Comparing two or more strings by looking at the position and content of their characters.
  • Linguistic comparison - Comparing two or more strings by using culture-specific rules, such as whether or not to ignore the "-" character in the word "co-op".
  • Delimiter - An identifying mark or character used to separate pieces of information from one another.

New Keywords and Operators

  • @ - String literal operator. Tells the compiler to ignore escape characters in a string and treat them as though they are part of the string.

Summary

Cultures are collections of rules about to display and format text, dates, times, and more. C# can create and use them for many different kinds of functionality.

When dealing with strings, we can manipulate them in many ways, including with:

  • Operators like $ and @.
  • Escape characters such as \n, \t, or \\.
  • Standard numeric formats like "C" for currency or "P" for percentages.
  • Custom numeric format strings such as "(###) ###-####" for a phone number.
  • Concatenation with +, String.Concat, or the StringBuilder class.
  • Searching in strings with Contains(), IndexOf(), StartsWith(), and EndsWith().
  • Trimming with Trim(), TrimStart() or TrimEnd().
  • Padding with PadLeft() and PadRight().
  • Changing the case with ToUpper() and ToLower().
  • Equality comparisons with ==, !=, and String.Equals().
  • Splitting strings using String.Split() and char or string delimiters.
A bunch of different colors of yarn, sorted by color, sit bundled in cubbies.
StringBuilder to the rescue! Photo by Paul Hanaoka / Unsplash

Need some more help with string manipulation, or how to use cultures? I want to help! Let me know in the comments below!

In the next part of this series, we will continue with the data manipulation theme and dive into how C# can represent dates and times using cultures and formats. Stick around!

Happy Coding!