Natural Language Framework: Sentence Embedding with Swift

Natural Language Processing (NLP) has been on a tear since 2018 (though is still regarded as an unsolved problem), with deep learning techniques producing breakthroughs e.g. OpenAI’s GPT3 can “produce human-like text” but requires extensive computing resources and a massive amount of data to build the language model. Transfer Learning is the technique that allows us to leverage state of the art models without having to build them ourselves from scratch. For Apple Developers, the Natural Language Framework provides access to the Sentence Embedding Model, which is shipped as part of all Apple platforms, to provide on device machine learning capability.

Human language is complicated, so how does a machine learning model recognise meaning in a sentence? If we are searching for a particular phrase, we can calculate the semantic distance between our query sentence and any number of stored sentences to find the closest match. This requires the sentences to be encoded because machine learning models only recognise numeric input. Apple’s Natural Language Framework provides a function to encode a sentence, returning an array of 512 elements that represents the semantic meaning of the sentence. 

/// Convert a string into a vector using Sentence Embedding
///
/// - Parameters:
///     - for: The string to convert
public func vector(for string: String) -> [Double] {
    guard let sentenceEmbedding = NLEmbedding.sentenceEmbedding(for: .english),
          let vector = sentenceEmbedding.vector(for: string) else {
        fatalError()
    }
    return vector
}

Two vectors can be compared using Cosine Similarity to determine how closely related are their meanings.

/// Returns the similarity between two vectors
///
/// - Parameters:
///     - a: The first vector
///     - b: The second vector
public func cosineSimilarity(a: [Double], b: [Double]) -> Double {
    return dot(a, b) / (mag(a) * mag(b))
}

The similarity is a value between 0-1, with 0 being not at all similar and 1 being identical.

let first = vector(for: "Where are coffee beans grown?")
let second = vector(for: "Which country is the leading grower of coffee beans?")

let similarity = cosineSimilarity(a: first, b: second) // 0.7379489081092887

Cosine similarity is not a new technique, but its effectiveness has being enhanced with state of the art machine learning models generated using super computers and massive amounts of data. It’s great to see Apple bring these innovations into the Natural Language Framework and make this technology available across macOS, iOS, Mac Catalyst, watchOS and tvOS. This means that what previously required an API call to a backend server can now be done on device. Learn more about sentence embeddings from WWDC 2020: Make apps smarter with Natural Language.

View the source code in an Xcode Playground available on GitHub.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s