EXTENDED OUTLINE OF FINAL PROJECT

TITLE

Classification of documents by using Naive Bayes

ABSTRACT

This project is to identify the class of the document, based on the documents used for training. To be clear, few documents and the corresponding class of document will be given.  Now, by using Multinomial Naive Bayes algorithm given documents are analyzed. Navie Bayes uses the conditional probability of Bayes theorem.  When test document is given multinomial algorithm analyzes every word in the text document in perspective to all classes and assigns a score. Finally, the class with more score is the class of the testing document.

RELATED DOMAIN OF STUDY

Bayes Theorem helps us to find the probability of an event to occur, based on conditions related to the event in advance. For example, if a disease is related to geographical location, then using Bayes theorem, a person’s geographical location can be used to more accurately assess the probability that they have the disease, compared to the assessment of the probability of disease made without knowledge of the person’s location.

Multinomial Naive Bayes uses Bayes Theorem to find the probability of the document to be in particular class or category. And this is based on analysis of the previous document set used for training the algorithm.

ALGORITHMS

  • The Multinomial Navie Bayes algorithm is used to classify the given test document. And this algorithm works based on the conditional probability.
  • Training data has to be given to training algorithm. Training algorithm calculates the conditional probability of the word in a particular class.
  •  This algorithm can be used to train offline by using the local training dataset.
  • The result of training algorithm has to be the conditional probability of every word to be in a document. And the final application algorithm specifies the class of the given test document.

 

DATA SOURCES

Data Sources for my project would be, documents with some text in it. I need to have different sets of documents one with categorized documents for training the algorithm and other with uncategorized documents for testing.

For example, have a look at my sample text file in my dataset,

qwertyui

For reference, my dataset looks like this:

Sample_DataSet

GRAPHICS

I would like to show results in the form of Bar graphs by using Matplotlib in python.

CURRENT CHALLENGES

Finding the existing dataset was the challenge. But I was able to create a dataset for my project by collecting text document by surfing.

REFERENCES

NLP Stanford

Wikipedia

Pillar Global

Sebastian Raschka

SCIKIT-Learn

SciKit Library

Basics of Naive Bayes Algorithm

FINAL PROJECT

TITLE

Classification of documents by using Naive Bayes

ABSTRACT

This project is to identify the class of the documents, based on the documents used for training. To be clear, few documents and the corresponding class of document will be given.  Now, by using Multinomial Naive Bayes algorithm given documents are analyzed. Navie Bayes uses the conditional probability of Bayes theorem.  When test document is given multinomial algorithm analyzes every word in the text document in perspective to all classes and assigns a score. Finally, the class with more score is the class of the testing document.

RELATED DOMAIN OF STUDY

Bayes Theorem helps us to find the probability of an event to occur, based on conditions related to the event in advance. For example, if a disease is related to geographical location, then using Bayes theorem, a person’s geographical location can be used to more accurately assess the probability that they have the disease, compared to the assessment of the probability of disease made without knowledge of the person’s location.

Multinomial Naive Bayes uses Bayes Theorem to find the probability of the document to be in particular class or category. And this is based on analysis of the previous document set used for training the algorithm.

DATA SOURCES

Data Sources for my project would be, documents with some text in it. I need to have two sets of documents one with categorized documents for training the algorithm and other with uncategorized documents for testing.

For example, have a look at my sample dataset

Sample_DataSet

REFERENCES

NLP Stanford

Wikipedia

Pillar Global

Sebastian Raschka

SCIKIT-Learn

SciKit Library

Basics of Naive Bayes Algorithm

BIG DATA ANALYTICS FOR INSURANCE

WHAT IS AN INSURANCE INDUSTRY

Customers take out policies based on their assessment of a particularly bad thing happening to them, and insurers offer them cover based on their assessment of the cost of covering any claims.

CHALLENGES IN INSURANCE INDUSTRY

1-e1509341327144.jpg

WHY BIGDATA FOR INSURANCE

Since the insurance industry is founded on estimating future events and measuring the risk or value of these events. volume, velocity, veracity, and variety of massive datasets has become an essential tool for insurers.  With new data sources such as telematics, sensors, government, customer interactions and social media, the opportunity to utilize big data is more appealing across new areas of this industry nowadays.

Big Data technologies are used comprehensively to determine risk, claims and enhance customer experience, allowing insurance companies to achieve higher predictive accuracy.

USES OF BIGDATA IN INSURANCE INDUSTRY

Let’s take a look at the major uses of big data and its technologies in the insurance industry.

1. Risk Assessment 

One of the most important uses for insurers is determining policy premiums. Used mostly by automobile, home, and health insurance companies, many insurers benefit from telematics (in-vehicle telecommunication devices) IoT devices and wearables (Fitbit, Apple Watch etc.) to track their customers in order to predict and calculate risks.

By using predictive modeling, the insurers can identify whether the drivers are likely to be involved in an accident, or have their car stolen, by combining their behavioral data with the exogenous factors such as road conditions or safe neighborhoods.

A similar use can be seen in the world of health and life insurance due to the growing use of wearable technology. Activity trackers can monitor users’ behaviors and habits and provide ongoing assessments of their activity levels.

2. Fraud Detection

Insurers use Big Data to improve fraud detection and criminal activity through data management and predictive modeling.  They match the variables in every claim against the profiles of past claims which were fraudulent so that when there is a match, the claim is pinned for further investigation.

These matches could also involve the behavior of the person making a claim, the network of people that associate with (social media, credit reference agencies etc.) and partner agencies involved in the claim (e.g. vehicle repair shops).  These complicated matches might drop beneath the radar of a human. however, they are successfully detectable by big data analysis.

3. Customer Insights

Acquiring a comprehensive understanding of customer behaviors, habits, and needs from various sources are very strategic for insurers in order for them to anticipate future behaviors, to offer relevant products and to identify the right segmentation.

Information gained from call center data, customer e-mails, social media, user forums and user behavior while logged into the insurers’ sites enable insurers to build a unique customer profile.  Analytic systems can spot if a customer is about to leave by flagging up a high number of calls to a helpline.

Gaining customer insight with big data analytics not only provides predictions about when a customer is likely to leave or shapes a customer’s policy. It can also help insurers to develop trusted relationships and engage customers in the right way with the accurate information. As a result of this strategic learning, insurers achieve positive outcomes such as solving customer problems real-time with the right approach and also upselling or cross-selling products.

4. Marketing

After gaining a full understanding of customer behavior, insurance companies became more efficient in offering targeted products and services.  This is done by offering personalized services and products such as lower-priced premiums (mostly used by automobile, home, and health insurance companies), contacting the customer for special offers when they are likely to leave or even offering a family package when a family is likely to have a baby.

5. Customer Experience

Insurers now build personalized offers to their customers based on their preferences and behavioral data as well as offering them innovative services that streamline the purchase process.

Especially health insurance companies utilize apps and wearables data enabling them to proactively track their customers while helping the customers to manage their health conditions or chronic diseases. “Scipt Hub Plus” is a project enabling customers to get their prices for the drugs under their insurance plan at the location requested, when they get their medication from a physician. “Cigna” has partnered with BodyMedia to use their armband tracker for diabetes prevention and management, integrated with the customer’s insurance plan.

Another example is the life insurance sector. “Haven Life” (an online provider term of life insurance), enables the users to make quick decisions on policies up to $1 Million through online questionnaires, prescription histories, state motor-vehicle records and other data sources, using big data technologies.

P&C insurers also enhance their customer experience by assisting them to improve safety. Driver Feedback app (owned by State Farm Insurance Company), evaluates customers’ driving behaviors and shares tips to improve their driving habits.

6. Automation

Insurers used to automate simple processes such as compliance checks, data entry, or repetitive tasks that require less-initiative taking skills. With the rise of big data technologies, these simple tasks gave way to more complicated skills, such as loan underwriting, reconciliation, property assessment, claims verification, receiving customer insights, customer interactions, and fraud detection to name a few.

With a move towards more intelligent automation, insurers can save a vast amount of time and money with the help of machine learning which trains data to improve algorithms and of course predictive analysis.

7. Smarter Labor and Finance

With the help of real-time analysis, insurers now can make daily adjustments to premium rates, premium strategies, and underwriting limits by combining internal data (policy, regulations) with external data (social media, press, analyst comments) in order to optimize their finances and instant payouts.

Data mining techniques are also used to cluster and score claims in order to prioritize and assign them to the most appropriate employee based on their experience on claim complexity.  This saves insurers a significant amount of labor-time and prevents them from high settlement amounts.

Overall, big data is undoubtedly a tool that brings positive outcomes such as enhanced customer experience, innovative products, and better risk management leading the insurance industry to make better strategic decisions.

HOW WILL INSURERS USE BIGDATA

Here’s some analysis on few factors in the insurance industry.

2

INSURANCE STARTUPS

MetroMile offers “pay as you go” car insurance, drivers pay by the mile. A device tracks mileage and customers are billed monthly according to how far they have driven. The company claims that this saves low-mileage drivers an average of $500 a year.

Oscar Health Insurance, currently available only in New York. It uses bigdata, modern apps, and web interfaces, enabling customers to get real-time information on which doctors and medicines are available to them in their area and to have a 360 view of each customer.

FUTURE WORK

  • Ability to adjust their business model fast to new trends.
  • Improved financial reporting and control systems.
  • Enhanced data security systems and tools.
  • Predictive analytics and modeling methods tools to manage, synthesize, analyze and leverage massive data volumes.
  • Stronger digital capabilities complemented by new skills, refined metrics, upgraded tools and an innovation culture.

CONCLUSION

Thus Big data is undoubtedly a tool that brings positive outcomes such as enhanced customer experience, innovative products, and better risk management leading the insurance industry to make better strategic decisions.

REFERENCES

Matplot Tutorial

This Tutorial demonstrates bar chart and pie chart with the sample diagrams.

Bar Chart Demonstration:

The below program demonstrates, how to draw a bar chart for random number generation. Y-Axis shows the range of integers and X-Axis shows the counts number of times a number from a particular range is encountered.

 

bar code

 

bar

 

Pie Chart Demonstration:

As mentioned above, even pie chart shows the same result for the count of numbers in a particular range.

pie.PNG

pie

References:

Sample Plots in Matplotlib website

BIG DATA DOMAIN

Big data has become a big game changer in most, if not all types of modern industries over the last few years. The importance of big data doesn’t revolve around how much data we have, but what we do with it.

There are many industries that are using big data for,

  • Communications, Media and Entertainment
  • Banking
  • Securities
  • Education
  • Healthcare Providers
  • Insurance
  • Government e.t.c.

Among all these domains, I choose Insurance. Because, Big data has been used in the insurance, to provide customer insights for transparent and simpler products by analyzing and predicting customer behavior through the data derived.

Nowadays, the opportunity to utilize big data is more appealing across new areas of this industry. I choose the Insurance industry because, Big Data technologies are used to determine risk, claims and enhance customer experience, allowing insurance companies to achieve higher predictive accuracy.

Some of the related sites :

Data sets that are applicable to Insurance:

Hello, Welcome to my blog

The following are the references that helped me in my Course work (Internet of Things)

From Temperature & Humidity Sensor

From Digital Control with PI

Setting PI as WiFi Access point

IN CLASS HOSTAP SETUP

SETTING HOSTAP & HTTP SERVER ON PI

NODEMCU Experiments

Domain Discussions

Final Project- Smart Parking

Final Project- Smart Parking

Project Information

Complete project details can be found from the links given below:

Poster Presentation of the Project

slide112The Application Process and Architecture of Smart Parking Application

The circuit connection is as shown in the following figure.

Slide3

Results

After the User logins to the website, Based on the location of the parking area, the number of available slots will be displayed, as shown in the following figure.

Slide7

When the user reserves a slot, he will be redirected to a success page and one slot will be removed from the available slots. The result is as shown in the following figure.

Slide8

The output for sensor code is as shown in the following figure.

Slide10

Demonstration

This is the demonstration that I presented in the class.

Link to Software

I used  Microsoft Visual studio, an IDE from Microsoft which is used to develop programs as well as websites, web apps, web services and mobile apps. I used C# and .NET stack for programming in IDE. The code for creating the web application is in my GitHub account. The IDE looks as shown in the following figure.

Slide5

To deploy and manage all my applications and services I used Microsoft Azure, which is a cloud computing service. It can create websites in PHP, ASP.NET, Node.js, or Python, or select from several open source applications from a gallery to deploy. This comprises one aspect of the platform as a service offering for the Microsoft Azure Platform.

Slide6

The code for working with RFID & Distance sensor is placed in my GitHub account.

Conclusion

“Smart Parking” measures whether the parking slot is empty or filled by using a distance sensor that sends information to the servers that are deployed in the cloud (Azure) and the availability can be displayed through the web application using RESTful web services. The Security for this application is provided by using a RFID, which allows only the registered user to park the vehicle in the slot.

Future Work

I would like to enhance this project adding features like pre-booking a parking slot, dynamic pricing or adding payment methods for private parking areas, Informing towing company. I can use same sensors combining with Pi camera and buzzer for parking assist in the car. This would display the distance of car behind you and alerts you.

Personal Insight

This course has given me an opportunity to gain IOT Domain knowledge, where I didn’t know before that IOT is a big thing and world is moving to. I also deployed for the first time databases and web services into the cloud. And finally, I implemented a single application (web application) using different technology stack (Python, C#).

References

Other details of Project report

Service Specification:

Service specifications define the services in the IOT system.

serviceFunctional View Specification:

This functional view defines the functions of the IOT systems grouped into various functional groups. A number of these Functionality Groups build on each other, following the relations identified in the IoT Domain Model.

The functional groups involved  in a functional view are as shown in below figure:

functional

Devices: I am going to use are Raspberry Pi and Arduino for computing devices. And PIR motion sensor to implement my system.

Communication: The communication block handles the communication for the IOT system. The communication protocols allow devices to exchange data over the network. I will be using WiFi at link layer to send data over the network.

Services: The service functional group includes various services involved in the IoT system such as services for device monitoring, device control services and data publishing services. In my system, I am going to use native service called the controller service and the web services.

Application: The application functional group provides an interface to the users to control and monitor various aspects of the IOT system. I am going to use a web application to display the sensor values.

Operational View Specification:

The operational view specifications for Smart Parking is as follows:

Devices: The computing devices I am going to use are Raspberry Pi and Arduino. And the sensors that I am going to use are PIR motion sensor.

Communication Protocols: Link layer- WiFi, network layer-IPV4, application layer- web application. I have mentioned all these protocols in the above figure.

Services: Controller service that is hosted on my device runs as a native service.

Application: Web application.

Security: Authentication and authorization.

Management: Raspberry Pi device management.

Device & component Integration:

In this step, we have to integrate the devices and components. The devices and components used in Smart Parking are raspberry Pi, Ultrasonic Distance sensor. The schematic diagram of Smart Parking is as shown in the below figure:

IMG_2842

 

Fundemental parts of final project

Purpose & Requirements Specification

Purpose of the project:

The purpose of Smart Parking is to help people to locate the available parking spots and can reserve a parking spot.

Behaviour:

The user needs to install this application on his mobile by downloading it and the user can get availability of parking or propose next parking available time. This parking system can be installed in all parking demand places in the city such as Shopping malls, Airports, hotels etc.

Hardware Requirements:

  • Raspberry PI
  • Ultrasonic Distance Sensor
  • Two-way LED
  • 1k Ohm Resistor
  • Jumper wires

System Management Requirement:

This system provides remote monitoring and control function. An administrator will be appointed based on the size of the institution, who monitors sensors, payment issues, timing issues (towing vehicle in case of delay in moving out). He also monitors the accuracy of the system and updates information accordingly.

Data Analysis Requirement:

The system will maintain a repository to provide information to support decision making in determining pricing and planning. The system will also calculate the time of parking slot utilized by the user and should report it to the user.

Application Deployment Requirement:

Smart parking detectors can be installed in any place. To install Smart Parking System in any institution will happen in an incremental manner instead of blocking all the parking and installing (depends on the size of institution) which may take few hours to the shutdown of that particular block. Internet connection is mandatory for these devices which can be provided with Wifi or Ethernet. These devices fetch and update the information from/into a database, which is located on a web server. The user should be connected to the internet to get the updates on available parking slots.

Process Specification:

Use case diagram:
usecase

Sensor level diagram:

sensor_level

Sensor keeps on checking for a vehicle presence in the parking slot.  If it finds any change in its state it will update the details in Database. If parking slot is empty it will update the total availability count adding one to it. When parking slot at the particular sensor is occupied, it will update the availability count by subtracting one from it. When parking is full, it will check expire time-stamp(estimated time to vacate)  in the database and determines the availability of next parking slot.

If distance calculated by the sensor is more than 50 cm, parking is available, if its less than that parking slot is occupied.

Process Diagram:

The process diagrams for Smart Parking are as shown below:

process

Process Flow Diagram shows the relationships between the major components of the system. Also, tabulate process design values for the components in different operating modes.

In the smart Parking, Users first request for the available parking slots through a web page. The system checks for the available parking slots in the database and confirms the slot accordingly to the user. The payment method is controlled by the administrator.

Domain Model Specification:

The domain model specification of Smart Parking is as shown below:

Domain

IoT Level Specification:

I am going to implement Smart Parking in IOT level 4. It uses a single level sensor for collecting data and store it in the cloud database. It also uses local analysis, Rest services.

iot

 

 

SETTING HOSTAP & HTTP SERVER ON PI

I followed the same steps of IN CLASS HOSTAP SETUP blog for setting HOSTAP.

 

To access the wifi connection, connect to the network by

user_id: pavani

password: raspberry

The below figure shows the wifi connection on the laptop.

wifi_connected

The following figure shows the wifi connection on the mobile phone.

IMG_5697

Accessing wifi in Arduino

Upload the code from the GitHub link and open the serial monitor.

Code: GitHub

The output is as shown in the following figure.

wifi_audrino

Arduino on PI

  1. The first step is to load the Arduino environment to PI, by using the following commands
  • $ sudo aptget update
  • $ sudo aptget distupgrade

2. Install Arduino IDE with,

  • $ sudo aptget install Arduino (Click Y to add any dependencies)

3. To disable the serial login (for Raspberry- Pi 3),

  • $ sudo systemctl stop serialgetty@ttyS0.service
    $ sudo systemctl disable serialgetty@ttyS0.service

4. The next step is to disable the boot info, for this we need to edit the             /boot/cmdline.txt file by deleting the “console=serail0,115200″ as shown.

  • dwc_otg.lpm_enable=0 console=tty1 root=/dev/mmcblk0p6 rootfstype=ext4 elevator=deadline rootwait

5. To link the serial port to the Arduino IDE, we create a permanent link that maps AMA0 to S0.

  • create a file by “$ sudo nano”
  • In the new file type,
    • KERNEL==“ttyAMA0”, SYMLINK+=“ttyS0”,GROUP=“dialout”,MODE:=0666
      KERNEL==“ttyACM0”, SYMLINK+=“ttyS1”,GROUP=“dialout”,MODE:=0666

and save it to: /etc/udev/rules.d/

6. We are almost there, now we need to setup the Reset(DTR) pin to get the Arduino system behavior on PI instead of using the serial cable. Download the avrdude-rpi files from GitHub. Select Download Zip file, which will download all files to the /home/pi directory.

To download and unzip the files,

  • wget https://github.com/PavaniJagarlamudi/avrdude-rpi
    sudo unzip master.zip

To copy the files and to replace them with a new one for backup,

  • $ cd ./avrduderpimaster/
    $ sudo cp autoreset /usr/bin
    $ sudo cp avrdudeautoreset /usr/bin
    $ sudo mv /usr/bin/avrdude /usr/bin/avrdudeoriginal

And then to link the new avrdude-autoreset to avrdude so that the new version runs instead.

  • $ sudo ln s /usr/bin/avrdudeautoreset /usr/bin/avrdude

7. Adding the sleepy Pi to the Arduino environment, to run the Arduino environment we need to create the sketchbook folder and then “hardware” & “Sleepy_pi ” in sketchbook.

 

  • $ mkdir /home/pi/sketchbook
    $ mkdir /home/pi/sketchbook/hardware
    $ mkdir /home/pi/sketchbook/hardware/Sleepy_pi
  • Now create another file “boards.txt” in Sleepy_pi folder and the file contains the following code,
    • sleepypi.name=Sleepy Pi
      sleepypi.upload.protocol=arduino
      sleepypi.upload.maximum_size=30720
      sleepypi.upload.speed=57600
      sleepypi.bootloader.low_fuses=0xFF
      sleepypi.bootloader.high_fuses=0xDA
      sleepypi.bootloader.extended_fuses=0x05
      sleepypi.bootloader.path=arduino:atmega
      sleepypi.bootloader.file=ATmegaBOOT_168_atmega328_pro_8MHz.hex
      sleepypi.bootloader.unlock_bits=0x3F
      sleepypi.bootloader.lock_bits=0x0F
      sleepypi.build.mcu=atmega328p
      sleepypi.build.f_cpu=8000000L
      sleepypi.build.core=arduino:arduino
      sleepypi.build.variant=arduino:standard

8. Finally, Reboot the Pi to complete and load all changes.

  • $ sudo reboot

HTTP Server on PI

We can use a Webserver on Pi to host a full website or to display some information we wish to share to other machines on our network. There are various web server that are available with different advantages and usage. Among them, I choose Apache, which is a popular web server application and install it on Pi.

  • Install Apache2 package by ” sudo aptget install apache2 y”
  • To Test the webserver,
    • We have a test HTML file in Apache folder, so when we type our Pi’s IP address (like http://198.162.10.22) we will get a default web page as shown in the following figure.

apache-it-works

If we see the default page in the browser, it means that the Apache is working.

References: