From UIImage to searchable PDF Part 2

This is part two of the series. You can find the other parts here:

In part one I showed how to create and use a VNDocumentCameraViewController from SwiftUI and capture images of documents. In this part I want to demonstrate how to convert the captured images into searchable PDF files. Lets recap what is needed:

~~Part 1~~ ✅
- ~~Foundations~~ ✅
- ~~Scan the document~~ ✅
Part 2
- Recognize the text on scanned images
Part 3
- Create a PDF with the images
- Place the recognized text behind each image on the PDF
- Save and display the PDF

Recognize text

In the last part we left of with the following method of the ScanDocumentView.Coordinator, where we hid the presented view after the user was finished with the scan.

ScanDocumentView.swift

func documentCameraViewController(_: VNDocumentCameraViewController, didFinishWith scan: VNDocumentCameraScan) {
  // Toggle the binding of the parent
  parent.isPresented.toggle()
}

The VisionKit framework comes with a pre-built option to recognize text on CGImage instances. Therefore it is necessary to convert the scan into an array of UIImages, which then gives us access to the underlying CGImage instance. So the next thing we do is to iterate over the number of pages of the scan and invoke imageOfPage, which is a built in method of VNDocumentCameraScan and returns an UIImage instance.

let images = (0..<scan.pageCount).map {scan.imageOfPage(at: $0)}

After obtaining the images, it is time to make use of VisionKit in order to recognize the text. The method we want to define accepts the mentioned CGImage instance and returns an array of recognized text.

func recognizeText(from image: CGImage) -> [VNRecognizedText] {
 
}

At first we create an array in which we save the recognized text parts of the image.

var textObservations: [VNRecognizedText] = []

Next up we create a VNRecognizeTextRequest, which takes a closure that defines what we want to execute when the text recognition has finished. For the sake of this tutorial I do not move the work away from the main queue into a background queue. In a production application I would definitely recommend using a background queue to do the text recognition.

 let recognizeTextRequest = VNRecognizeTextRequest { request, error in
    guard error == nil else { return }
 
    guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
 
    let maximumRecognitionCandidates = 1
    for observation in observations {
        guard let candidate = observation.topCandidates(maximumRecognitionCandidates).first else { continue }
        textObservations.append(candidate)
    }
}

The code above iterates over all found texts and for each we take the best candidate and append it to our textObservations array. Next up we define the recognition level of the request. It can either be .accurate which takes more time but gives better results or .fast, which is then faster but with less accurate results. Since we want to create a PDF, it is important to get accurate results. That's why we choose the .accurate level here.

recognizeTextRequest.recognitionLevel = .accurate

We are done with configuring the request to recognize text on the image and can instantiate an VNImageRequestHandler which handles the execution of our VNRecognizeTextRequest. After the request has been performed, the found text observations are returned.

let requestHandler = VNImageRequestHandler(cgImage: image, options: [:])
try? requestHandler.perform([recognizeTextRequest])
 
return textObservations

The full method declaration looks like this:

func recognizeText(from image: CGImage) -> [VNRecognizedText] {
 
    // Array to save all found texts on the image
    var textObservations: [VNRecognizedText] = []
 
    let recognizeTextRequest = VNRecognizeTextRequest { request, error in
 
        // Errors are not handled in this example
        guard error == nil else { return }
 
        guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
 
        let maximumRecognitionCandidates = 1
        for observation in observations {
            //Take the best candidate of the observation and append it to the array
            guard let candidate = observation.topCandidates(maximumRecognitionCandidates).first else { continue }
            textObservations.append(candidate)
        }
    }
 
    // Set the recognition level
    recognizeTextRequest.recognitionLevel = .accurate
 
    // Handler to perform the text recognition request
    let requestHandler = VNImageRequestHandler(cgImage: image, options: [:])
    try? requestHandler.perform([recognizeTextRequest])
 
    return textObservations
}

Create helper class

Now that we have created our first function that is necessary to create a searchable PDF, we create a small helper class for handling the PDF creation.

PDFCreator.swift

import Foundation
import VisionKit
 
class PDFCreator {
 
    static let shared = PDFCreator()
 
    init () { }
 
    private func recognizeText(from image: CGImage) -> [VNRecognizedText] {
        ...
    }
}

I know that the usage of singletons can be quite controversial. I decided to use one here because it makes the example a lot easier and we do not need to care about dependency injection or something else. You are free to use any kind of structure in your project which fits.

In addition to the method just added to our PDFCreator class, let us define a method signature that will create the searchable PDF data in the end. It should accept an array of UIImages, returned from the VNDocumentCameraScan and return some Data which can be converted into a file representation later.

PDFCreator.swift

class PDFCreator {
  //...
 
  func createSearchablePDF(from images: [UIImage]) -> Data {
    // fill out
  }
}

With having the bare minimum in place we can implement the connection from our ScanDocumentView to the PDFCreator.

ScanDocumentView.swift

struct ScanDocumentView: UIViewControllerRepresentable {
  //...
 
  class Coordinator: NSObject, VNDocumentCameraViewControllerDelegate {
    // ...
 
    func documentCameraViewController(_: VNDocumentCameraViewController, didFinishWith scan: VNDocumentCameraScan) {
        let images: [UIImage] = (0..<scan.pageCount).map { scan.imageOfPage(at: $0) }
        let data: Data = PDFCreator
            .shared
            .createSearchablePDF(from: images)
 
        // Hide the document view after the PDF is created
        parent.isPresented.toggle()
    }
 
  }
}

Please keep in mind, since the createSearchablePDF(from:) method is not implemented right now, the code will not compile at this stage.

Conclusion

In this part I showed how to recognize text on images with the help of VisionKit. We had a detailed look into creating a VNRecognizeTextRequest and execute it on a given CGImage instance in order to recognize text on it. At last we implemented a little helper class to make the connection between the ScanDocumentView from part one. In the third and last part of this series we are going to combine the text recognition with UIImages and create our first searchable PDF file. I hope I can get the final part of the series out quicker than this one.

If you have suggestions or question don't hesitate to reach out to me !

See you next time 👋

Teabyte

From UIImage to searchable PDF Part 2

Recognize text

Create helper class

Conclusion