Abstract: In the paper, Web Logs Streaming is used to analyze streaming data and batch data. It can read data from Web Server and analyze the data based on the scenario. This basically implements the Streaming Data Analysis for Data Error extraction, Analyze the type of errors from log files and store in one host. The solution providing for streaming real-time error logs / IP addresses of the systems who are accessing the website. It provides a file which contains the keywords of error types for error identification or IP addresses of the people who are accessing the website using spark processing logic. After processing the data result file be placed in AWS Oracle table. Processing logic is written in Spark Eco systems and with Scala language. Spark SQL and Spark Streaming has been used in the project to get desired output. Kafka has been used for sending and receiving the data from the webserver. Kafka internally using Zookeeper for producing the data from the input file. With the help of Zookeeper, Kafka producer will be producing the data and Kafka Consumer will be receiving on the basis of given Topic and will be sending to Spark Streaming. Spark Streaming will be creating DStream and processing the seamless.
Keywords: Web Logs, Hadoop, Scala, Spark-streaming, Kafka, Zookeeper.
Title: Web Log Analysis
Author: G.Santhoshi, H.Meenal, Dr.D.Shravani
International Journal of Interdisciplinary Research and Innovations
ISSN 2348-1218 (print), ISSN 2348-1226 (online)
Research Publish Journals