Read HTML table with C#

Question

Read HTML table with C#

Kmcnet 1,181

Hello everyone and thanks for the help in advance. I wrote a screen scrape program that returns a variable number of rows HTML table that I need to extract data from. I'm really not sure the best way to proceed, i.e. regex or some other method. Here is what the table looks like:

                    <table id="pcpHistoryTable" class="table table-striped table-bordered table-condensed unit size1of2">
                        <thead>
                        <tr>
                            <th>Name</th>
                            <th width="28%">Start Date</th>
                            <th width="28%">End Date</th>
                        </tr>
                        </thead>
                        <tbody>
			<tr>
				<td>John Doe</td><td>Feb 1, 2025</td><td>Current</td></tr>
				<tr><td>Jane Doe</td><td>Apr 1, 2022</td><td>Jan 31, 2025</td>
			</tr>
			</tbody>
                    </table>

Any help would be appreciated.

Viorel 123.8K Reputation points

2025-08-22T22:46:53.69+00:00

If it is a valid XML, then XDocument, XmlDocument, XPath, LINQ to XML can be used.
Starry Night 110 Reputation points

2025-08-26T12:24:52.3833333+00:00

You can refer to this thread here: Read an HTML-table in C#.

Accepted answer

8 additional answers

Your answer

Viorel 123.8K Reputation points

2025-08-22T22:46:53.69+00:00

If it is a valid XML, then XDocument, XmlDocument, XPath, LINQ to XML can be used.
Starry Night 110 Reputation points

2025-08-26T12:24:52.3833333+00:00

You can refer to this thread here: Read an HTML-table in C#.

Answer 1

Since you're scraping a table with structured data, regex is likely not the best choice — HTML can get messy, and regex doesn't handle nested tags or malformed markup well. Instead, you should use an HTML parser library, which will give you the rows and cells cleanly.

C# (using HtmlAgilityPack)

using HtmlAgilityPack;
using System;
using System.Linq;

class Program
{
    static void Main()
    {
        var html = @"<table id='pcpHistoryTable'>
            <tbody>
                <tr><td>John Doe</td><td>Feb 1, 2025</td><td>Current</td></tr>
                <tr><td>Jane Doe</td><td>Apr 1, 2022</td><td>Jan 31, 2025</td></tr>
            </tbody>
        </table>";

        var doc = new HtmlDocument();
        doc.LoadHtml(html);

        var rows = doc.DocumentNode
                      .SelectNodes("//table[@id='pcpHistoryTable']//tbody//tr");

        foreach (var row in rows)
        {
            var cells = row.SelectNodes("td").Select(td => td.InnerText.Trim()).ToList();
            Console.WriteLine(string.Join(" | ", cells));
        }
    }
}

Output:

John Doe | Feb 1, 2025 | Current
Jane Doe | Apr 1, 2022 | Jan 31, 2025

If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

hth

Marcin

Kmcnet 1,181 Reputation points

2025-08-27T20:32:51.92+00:00

worked like a charm. xml approach kept throwing errors. Thank for the help.

Answer 2

Castorix31 91,066

One way is with System.Xml

With help from ChatGPT :

    string sHtml = @"<table id=""pcpHistoryTable"" class=""table table-striped table-bordered table-condensed unit size1of2"">
            <thead>
            <tr>
                <th>Name</th>
                <th width=""28%"">Start Date</th>
                <th width=""28%"">End Date</th>
            </tr>
            </thead>
            <tbody>
<tr>
	<td>John Doe</td><td>Feb 1, 2025</td><td>Current</td></tr>
	<tr><td>Jane Doe</td><td>Apr 1, 2022</td><td>Jan 31, 2025</td>
</tr>
</tbody>
        </table>";

    var xml = new System.Xml.XmlDocument();
    xml.LoadXml($"<root>{sHtml}</root>");
    foreach (System.Xml.XmlNode row in xml.SelectNodes("//table[@id='pcpHistoryTable']//tr"))
    {
        var cells = row.SelectNodes("th|td");
        if (cells == null) continue;
        Debug.WriteLine(string.Join(" | ", cells.Cast<System.Xml.XmlNode>().Select(c => c.InnerText.Trim())));
    }

Bruce (SqlWork.com) 79,526 Reputation points Volunteer Moderator

2025-08-23T19:23:32.6866667+00:00

Generally html is not fully xml compliant, and most xml readers will fail to properly read a html document. As suggested you should probably use a html reader library like the HtmlAgilityPack.
Kmcnet 1,181 Reputation points

2025-08-27T20:31:56.1466667+00:00

You are correct. The xml approach failed.

Answer 3

Deleted

This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.

Comments have been turned off. Learn more

Answer 4

Deleted

This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.

Comments have been turned off. Learn more

Answer 5

Deleted

This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.

Comments have been turned off. Learn more

Share via

Read HTML table with C#

8 additional answers

Your answer