PdfExtractionService.cs
Service that extracts text content from PDF files using UglyToad.PdfPig.
What it does
Section titled “What it does”Provides PDF text extraction functionality:
- Opens PDF files from file path or stream
- Iterates through all pages
- Extracts text content from each page
- Returns combined text with page break markers
Interface
Section titled “Interface”public interface IPdfExtractionService{ PdfExtractionResult ExtractFromFile(string filePath); PdfExtractionResult ExtractFromStream(Stream stream);}Result class
Section titled “Result class”public class PdfExtractionResult{ public string RawText { get; set; } // Combined text from all pages public int PageCount { get; set; } // Number of pages in PDF public string? Error { get; set; } // Error message if extraction failed}Key concepts
Section titled “Key concepts”UglyToad.PdfPig
Section titled “UglyToad.PdfPig”A .NET library for reading and extracting content from PDF files. It provides:
- Cross-platform PDF parsing
- Text extraction with positioning information
- No external dependencies
Error handling
Section titled “Error handling”The service wraps all operations in try-catch to provide graceful error handling:
- Returns error message in result instead of throwing
- Allows caller to check
Errorproperty for failures
public class MyController : ControllerBase{ private readonly IPdfExtractionService _pdfService;
public MyController(IPdfExtractionService pdfService) { _pdfService = pdfService; }
public IActionResult ExtractPdf(string path) { var result = _pdfService.ExtractFromFile(path);
if (!string.IsNullOrEmpty(result.Error)) return BadRequest(result.Error);
return Ok(result.RawText); }}Registration
Section titled “Registration”Registered via UpDocComposer as a scoped service.