Skip to content

PdfExtractionService.cs

Service that extracts text content from PDF files using UglyToad.PdfPig.

Provides PDF text extraction functionality:

  1. Opens PDF files from file path or stream
  2. Iterates through all pages
  3. Extracts text content from each page
  4. Returns combined text with page break markers
public interface IPdfExtractionService
{
PdfExtractionResult ExtractFromFile(string filePath);
PdfExtractionResult ExtractFromStream(Stream stream);
}
public class PdfExtractionResult
{
public string RawText { get; set; } // Combined text from all pages
public int PageCount { get; set; } // Number of pages in PDF
public string? Error { get; set; } // Error message if extraction failed
}

A .NET library for reading and extracting content from PDF files. It provides:

  • Cross-platform PDF parsing
  • Text extraction with positioning information
  • No external dependencies

The service wraps all operations in try-catch to provide graceful error handling:

  • Returns error message in result instead of throwing
  • Allows caller to check Error property for failures
public class MyController : ControllerBase
{
private readonly IPdfExtractionService _pdfService;
public MyController(IPdfExtractionService pdfService)
{
_pdfService = pdfService;
}
public IActionResult ExtractPdf(string path)
{
var result = _pdfService.ExtractFromFile(path);
if (!string.IsNullOrEmpty(result.Error))
return BadRequest(result.Error);
return Ok(result.RawText);
}
}

Registered via UpDocComposer as a scoped service.