Since you're scraping a table with structured data, regex is likely not the best choice — HTML can get messy, and regex doesn't handle nested tags or malformed markup well. Instead, you should use an HTML parser library, which will give you the rows and cells cleanly.
C# (using HtmlAgilityPack)
using HtmlAgilityPack;
using System;
using System.Linq;
class Program
{
static void Main()
{
var html = @"<table id='pcpHistoryTable'>
<tbody>
<tr><td>John Doe</td><td>Feb 1, 2025</td><td>Current</td></tr>
<tr><td>Jane Doe</td><td>Apr 1, 2022</td><td>Jan 31, 2025</td></tr>
</tbody>
</table>";
var doc = new HtmlDocument();
doc.LoadHtml(html);
var rows = doc.DocumentNode
.SelectNodes("//table[@id='pcpHistoryTable']//tbody//tr");
foreach (var row in rows)
{
var cells = row.SelectNodes("td").Select(td => td.InnerText.Trim()).ToList();
Console.WriteLine(string.Join(" | ", cells));
}
}
}
Output:
John Doe | Feb 1, 2025 | Current
Jane Doe | Apr 1, 2022 | Jan 31, 2025
If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.
hth
Marcin