Introduction to Azure Notebooks applying Cognitive Services with Jupyter

Introduction

Within this new post, you will discover the new Azure Notebooks resource that comes to replace the old Databricks Notebooks service.

In this introduction we will see an example of use of the Summarization service running in Jupyter for summarize the text of a hotel customer review.

Step by Step

First, we can access this Azure Notebooks service by visiting notebooks.azure.com , and then, we can create a new Azure Notebooks project from “My Projects”.

 

 

Before creating our own functions, we must load a bundle of Python libraries that includes Azure Machine Learning SDK (azureml-sdk), ONNX Runtime (onnxruntime) and the Natural Language Toolkit (nltk).

The cells of the “.ipynb” file or “init” script looks like that:

 

    {
        "metadata": { "trusted": true },
        "cell_type": "code",
        "source": "!pip install --upgrade azureml-sdk[notebooks]",
        "execution_count": null,
        "outputs": []
    },
    {
        "metadata": { "trusted": true },
        "cell_type": "code",
        "source": "%%sh\npip install onnxruntime",
        "execution_count": null,
        "outputs": []
    },
    {
        "metadata": { "trusted": true },
        "cell_type": "code",
        "source": "import nltk\nnltk.download('all')",
        "execution_count": null,
        "outputs": []
    }

 

After that, we need to upload the file to our Azure Notebook project.

 

 

Then, we can go inside, see the code and run it from Jupyter to load all the libraries. This step may take a few minutes to complete.

 

 

For our example we will create a new “.ipynb” file that imports some of the libraries we load before.

 

       {
        "metadata": { "trusted": true },
        "cell_type": "code",
        "source": "    import nltk\n    import re\n    import unicodedata\n    import numpy as np\n    from gensim.summarization import summarize",
        "execution_count": null,
        "outputs": []
    }


 

Then, we need to create two functions: one for normalize the text and other for summarize it.

 

    {
        "metadata": { "trusted": true },
        "cell_type": "code",
        "source": "def normalize_text(text):\n    text = re.sub('\n', ' ', text)\n    text = text.strip()\n    sentences = nltk.sent_tokenize(text)\n    sentences = [sentence.strip() for sentence in sentences]\n    return sentences",
        "execution_count": null,
        "outputs": []
    },
    {
        "metadata": { "trusted": true },
        "cell_type": "code",
        "source": "def summarize_text(text, summary_ratio=None, word_count=30):\n    sentences = normalize_text(text)\n    cleaned_text = ' '.join(sentences)\n    summary = summarize(cleaned_text, split=True, ratio=summary_ratio, word_count=word_count)\n    return summary ",
        "execution_count": null,
        "outputs": []
    }


 

Now, we can pass a sample text to our main function, run the commands and see the Summarize service in action.

 

 

You can see the whole process of this introduction in the following animated image, from the project setup to the final summarized text output.

 

 

Links: 

Jupyter Notebooks Documentation

Azure Machine Learning Summarize Data

 

Written by: Idiwork’s team

 

Stay up to date!



Leave a comment