This is part two of the series. You can find the other parts here:
In part one I showed how to create and use a VNDocumentCameraViewController
from SwiftUI and capture images of documents. In this part I want to demonstrate how to convert the captured images into searchable PDF files. Lets recap what is needed:
Part 1✅Foundations✅Scan the document✅
- Part 2
- Recognize the text on scanned images
- Part 3
- Create a PDF with the images
- Place the recognized text behind each image on the PDF
- Save and display the PDF
Recognize text
In the last part we left of with the following method of the ScanDocumentView.Coordinator
, where we hid the presented view after the user was finished with the scan.
func documentCameraViewController(_: VNDocumentCameraViewController, didFinishWith scan: VNDocumentCameraScan) {
// Toggle the binding of the parent
parent.isPresented.toggle()
}
The VisionKit
framework comes with a pre-built option to recognize text on CGImage
instances. Therefore it is necessary to convert the scan
into an array of UIImage
s, which then gives us access to the underlying CGImage
instance. So the next thing we do is to iterate over the number of pages of the scan and invoke imageOfPage
, which is a built in method of VNDocumentCameraScan
and returns an UIImage
instance.
let images = (0..<scan.pageCount).map {scan.imageOfPage(at: $0)}
After obtaining the images, it is time to make use of VisionKit
in order to recognize the text. The method we want to define accepts the mentioned CGImage
instance and returns an array of recognized text.
func recognizeText(from image: CGImage) -> [VNRecognizedText] {
}
At first we create an array in which we save the recognized text parts of the image.
var textObservations: [VNRecognizedText] = []
Next up we create a VNRecognizeTextRequest
, which takes a closure that defines what we want to execute when the text recognition has finished. For the sake of this tutorial I do not move the work away from the main queue into a background queue. In a production application I would definitely recommend using a background queue to do the text recognition.
let recognizeTextRequest = VNRecognizeTextRequest { request, error in
guard error == nil else { return }
guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
let maximumRecognitionCandidates = 1
for observation in observations {
guard let candidate = observation.topCandidates(maximumRecognitionCandidates).first else { continue }
textObservations.append(candidate)
}
}
The code above iterates over all found texts and for each we take the best candidate and append it to our textObservations
array. Next up we define the recognition level of the request. It can either be .accurate
which takes more time but gives better results or .fast
, which is then faster but with less accurate results. Since we want to create a PDF, it is important to get accurate results. That's why we choose the .accurate
level here.
recognizeTextRequest.recognitionLevel = .accurate
We are done with configuring the request to recognize text on the image and can instantiate an VNImageRequestHandler
which handles the execution of our VNRecognizeTextRequest
. After the request has been performed, the found text observations are returned.
let requestHandler = VNImageRequestHandler(cgImage: image, options: [:])
try? requestHandler.perform([recognizeTextRequest])
return textObservations
The full method declaration looks like this:
func recognizeText(from image: CGImage) -> [VNRecognizedText] {
// Array to save all found texts on the image
var textObservations: [VNRecognizedText] = []
let recognizeTextRequest = VNRecognizeTextRequest { request, error in
// Errors are not handled in this example
guard error == nil else { return }
guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
let maximumRecognitionCandidates = 1
for observation in observations {
//Take the best candidate of the observation and append it to the array
guard let candidate = observation.topCandidates(maximumRecognitionCandidates).first else { continue }
textObservations.append(candidate)
}
}
// Set the recognition level
recognizeTextRequest.recognitionLevel = .accurate
// Handler to perform the text recognition request
let requestHandler = VNImageRequestHandler(cgImage: image, options: [:])
try? requestHandler.perform([recognizeTextRequest])
return textObservations
}
Create helper class
Now that we have created our first function that is necessary to create a searchable PDF, we create a small helper class for handling the PDF creation.
import Foundation
import VisionKit
class PDFCreator {
static let shared = PDFCreator()
init () { }
private func recognizeText(from image: CGImage) -> [VNRecognizedText] {
...
}
}
I know that the usage of singletons can be quite controversial. I decided to use one here because it makes the example a lot easier and we do not need to care about dependency injection or something else. You are free to use any kind of structure in your project which fits.
In addition to the method just added to our PDFCreator
class, let us define a method signature that will create the searchable PDF data in the end. It should accept an array of UIImages
, returned from the VNDocumentCameraScan
and return some Data
which can be converted into a file representation later.
class PDFCreator {
//...
func createSearchablePDF(from images: [UIImage]) -> Data {
// fill out
}
}
With having the bare minimum in place we can implement the connection from our ScanDocumentView
to the PDFCreator
.
struct ScanDocumentView: UIViewControllerRepresentable {
//...
class Coordinator: NSObject, VNDocumentCameraViewControllerDelegate {
// ...
func documentCameraViewController(_: VNDocumentCameraViewController, didFinishWith scan: VNDocumentCameraScan) {
let images: [UIImage] = (0..<scan.pageCount).map { scan.imageOfPage(at: $0) }
let data: Data = PDFCreator
.shared
.createSearchablePDF(from: images)
// Hide the document view after the PDF is created
parent.isPresented.toggle()
}
}
}
Please keep in mind, since the createSearchablePDF(from:)
method is not implemented right now, the code will not compile at this stage.
Conclusion
In this part I showed how to recognize text on images with the help of VisionKit
. We had a detailed look into creating a VNRecognizeTextRequest
and execute it on a given CGImage
instance in order to recognize text on it. At last we implemented a little helper class to make the connection between the ScanDocumentView
from part one.
In the third and last part of this series we are going to combine the text recognition with UIImages
and create our first searchable PDF file. I hope I can get the final part of the series out quicker than this one.
If you have suggestions or question don't hesitate to reach out to me !
See you next time 👋