MarkdownExtractionService.cs
Service that extracts structured sections from Markdown files using strategy-based extraction.
What it does
Section titled “What it does”Reads a Markdown file from disk, splits it into lines, and applies extraction strategies defined in a SourceConfig to produce named sections. Returns the same ExtractionResult as the PDF extraction service, making source types interchangeable.
Interface
Section titled “Interface”public interface IMarkdownExtractionService{ ExtractionResult ExtractSectionsFromConfig(string filePath, SourceConfig sourceConfig);}Extraction Strategies
Section titled “Extraction Strategies”| Strategy | Implementation |
|---|---|
firstHeading | Finds the first line matching #{level} prefix. Returns heading text without the # prefix. |
firstParagraph | Finds the first non-empty, non-heading line after a heading. Returns the text. |
betweenPatterns | Scans lines, starts capturing at startPattern regex, stops at any stopPatterns regex. Respects includeStartLine. |
regex | Matches a regex pattern against each line. Returns the first match (supports capture groups). |
Key difference from PDF extraction
Section titled “Key difference from PDF extraction”Markdown is already structured text — no PdfPig, no column detection, no font analysis needed. The service is simple line-based text parsing (~170 lines).
Strategy Parameters
Section titled “Strategy Parameters”firstHeading
Section titled “firstHeading”level(int) — Heading level to match (1 =#, 2 =##, etc.). Default: 1.
betweenPatterns
Section titled “betweenPatterns”startPattern(string) — Regex to match the start linestopPatterns(string[]) — Regexes that stop capturing when matchedincludeStartLine(bool) — Whether to include the matched start line. Default: true.
pattern(string) — Regex pattern to matchflags(string) — Regex flags (e.g."i"for case-insensitive)captureGroup(int) — Which capture group to return. Default: 0 (full match).
Registration
Section titled “Registration”Registered as scoped via UpDocComposer:
builder.Services.AddScoped<IMarkdownExtractionService, MarkdownExtractionService>();Dependencies
Section titled “Dependencies”ILogger<MarkdownExtractionService>— for logging extraction progress and errors
Namespace
Section titled “Namespace”namespace UpDoc.Services;