45 0 2MB
VISVESVARAYA TECHNOLOGICAL UNIVERSITY Jnana Sangama, Belagavi-590018
A Project Report On “MOVIE RECOMMENDER SYSTEM USING MACHINE LEARNING” Submitted in partial fulfillment required for award of the Graduation Degree
Bachelor of Engineering In
Information Science and Engineering Submitted by Jones G
1HK16IS116
Zahra Fathima
1HK16IS110
Uzma Farheen
1HK16IS107
Zaiba Afreen
1HK16IS111
Under the guidance of
Prof. Devi Sivasankari
Associate Professor, Department of Information Science & Engineering
2019-2020
Department of Information Science & Engineering
HKBK COLLEGE of ENGINEERING
(Accredited by NAAC,Approved by AICTE & Affiliated to VTU) 22/1, Nagawara, Arabic College Post, Bangalore-45, Karnataka Email: [email protected] URL: www.hkbk.edu.in
I
HKBK COLLEGE of ENGINEERING Nagawara,Bangalore–560045 Accredited by NAAC,Approved by AICTE & Affiliated toVTU
Department of Information Science and Engineering
Certificate Certified that the Project Work entitled “Movie Recommender System Using Machine Learning”,
carried out by Jones G (1HK16IS116), Zahra Fathima
(1HK16IS110), Uzma Farheen (1HK16IS107), and Zaiba Afreen (1HK16IS111) are bonafide students of HKBK COLLEGE of ENGINEERING, in partial fulfillment for the award of Bachelor of Engineering in Information Science and Engineering of the Visvesvaraya Technological University, Belagavi, during the year 2019–20. It is certified that all corrections/suggestions indicated for Internal Assessment have been incorporated in the report deposited in the departmental library. The project report has been approved as it satisfies the academic requirements in respect of 15ISP85–Evaluation of Project Work and Viva-voce prescribed for the said Degree.
Prof. Devi Sivasankari Guide
Dr. A Syed Mustafa
HOD
Prof. Naseela Jehan Co-Guide
Name of the Examiners
Dr. M S Bhagyashekar
Principal
Prof. Devi Sivasankari
Coordinator
External Viva
1. 2.
II
Signature with Date
ACKNOWLEDGEMENT We would like to express my regards and acknowledgement to all who helped me in completing this project successfully. First of all we would take this opportunity to express my heartfelt gratitude to the personalities of HKBK College of Engineering, Mr. C M Ibrahim, Chairman, HKBKGI and Mr. Faiz Mohammed, Director, HKBKGI for providing facilities throughout the course. We express my sincere gratitude to Dr. M S Bhagyashekar, Principal, HKBCE for his support and which inspired us towards the attainment of knowledge. We consider it as great privilege to convey our sincere regards to Dr. A Syed Mustafa, Professor and HOD, Department of ISE, HKBKCE for his constant encouragement throughout the course of the project. We would specially like to thank our guide, Prof. Devi Sivasankari, Associate Professor, Department of ISE for his vigilant supervision and his constant encouragement. He spent his precious time in reviewing the project work and provided many insightful comments and constructive criticism. Finally, We thank Almighty, all the staff members of ISE Department, our family members and friends for their constant support and encouragement in carrying out the project work.
III
Jones G
1HK16IS116
Zahra Fathima
1HK16IS110
Uzma Farheen
1HK16IS107
Zaiba Afreen
1HK16IS111
ABSTRACT The Movie Recommender system for deep learning-based movie recommender systems, DeepMovRS, is proposed. The framework proposed accepts different various heterogeneous inputs from user and movie entities, and their knowledge to external and implicit feedbacks. In order to ensure the unified deep architecture of the framework, so that it is easier for retrieving and ranking movies, it uses suitable machine learning tools to improve the quality of recommendations. The proposed framework has an additional feature which is flexible and modular, and it can be generalized and distributed easily, and hence it turns out to be a rational choice for the recommendation of movies for movie recommender systems. And this can further be extended for other entities.
IV
TABLE OF CONTENTS ACKNOWLEGEMENT ABSTRACT TABLE OF CONTENTS LIST OF FIGURES CHAPTER 1 1.1 1.2 CHAPTER 2
iii iv v-vi vii Introduction
01-02
Introduction Objectives
1 2
Literature Survey
03-06
2.1 2.2
Literature Survey Problem Statement
3 5
2.3
Existing System
5
2.4
Proposed System
6
CHAPTER 3
Requirements
07-10
3.1 3.2
Software Requirements Hardware Requirements
07 07
3.3
Specific Requirements
07
3.4
Functional Requirements
07
3.5
Non-Functional Requirements
08
3.6
Safety and Security Requirements
08
3.7
Software Quality Attributes
09
CHAPTER 4
System Design
11-14
4.1 4.2
Introduction Existing System
11 11
4.3
Proposed System
12
4.4
System Architecture
13
4.5 4.6
Data Flow Diagram Sequence Diagram
13 14
4.7
Use Case Diagram
14
V
CHAPTER 5
Implementation 5.1 5.2
CHAPTER 6 6.1 6.2 6.3 6.4 6.5 CHAPTER 7 7.1 7.2
15-20
Introduction Software Implementation System Testing Test Objective Testing Principle Testing Design Testing Strategies Testing Methodology Snapshots Experiment Analysis Software Interfaces
15 16 21-25 21 21 21 22 23 26-30 26 26
Conclusion
31
Literature survey and Implementation paper
32
References
36
VI
LIST OF FIGURES Figure No.
Figure name
Page No.
4.1
System Architecture
1
4.2
Data Flow Diagram
4
4.3
Sequence Diagram
5
4.4
Use Case Diagram
6
5.
Fast Search OCR
7
7.1
Home Page
7
7.2
Movies Page
16
7.3
Sign Up Page
18
7.4
Login Page
20
7.5
Detail Description
21
7.6
Recommendations
22
7.7
Movies Database
23
7.8
User Database
24
7.9
Ratings
26
VII
Movie Recommender System Using Machine Learning
Introduction
CHAPTER 1
INTRODUCTION 1.1 INTRODUCTION This system is mainly for the secure recommendation purpose and used for the movie freaks against tedious processes in searching. The first step in this system is to login to check whether the user has been verified or not , the recommendation will not start unless the user logs in and has at least a single rating. In the movie recommendation it the system application has two entities: users and items. This paper focuses on the movie recommender systems which are the core usage functionalities of websites and e-commerce applications, i.e. items=movies. In order to overcome the drawbacks, such as scalability, sparsity and cold-start problems. Although this framework is intended for movie recommender systems, it can be easily extended to other domains such as hospital recommendation system. In such movie recommender systems, users have preferences for certain items, and these preferences must be obtained from the data [8]. And the one main difficulty is in focal point of designing features (e.g. genre in the movie recommenders) especially for a huge amount of items manually, is intractable. In such issues, the concept of machine learning plays an important role. And as obvious as it is in Artificial Intelligence, Deep Learning, which in the recent emerging of machine learning, there is an approach mainly for recommender systems. In this paper, we propose a novel unified framework which has certain advantages in contrast with the current frameworks. This has future evolved the recommendation system, and in this case a movie recommendation system.
Motivation to choose the project The real use for the current audience is a movie which can entertain them or provides message for them in there social life. The audience simply waste there money by watching movies which are not well enough also the awards given to the movie is based on some numerical rating given by the users. But an approach is needed which can find out the weight age of the movie with respect to direction, production. actor, actress, songs and dialogues by performing the feature vector computation of data mining and sentiment analysis of data mining.
Dept. Of ISE, HKBKCE
1
2019-2020
Movie Recommender System Using Machine Learning
Introduction
1.2 OBJECTIVES [1] Design and Development of Review Collection Algorithm which is responsible for collecting the reviews from IMDB or IMDB using web crawler data mining technique or submit customized reviews in the application. [2] Design and Development of Data Cleaning Algorithm which is used to remove the unwanted data known as stop words. [3] Design and Development of Frequency Algorithm which is used to obtain frequency of the token in a review. [4] Design and Development of Feature based Frequency Computation which is used to obtain frequency across all the Movies and for all the reviews per feature. The feature can be Direction, battery life etc. [5] Design and Development of Sentiment Analysis Algorithm for each of the features [6] Design and Development of Feature Extraction Matrix (FEM) generation algorithm. FEM matrix has each row as a observation for a Movie and each of the columns represent the feature. [7] Design and Development of Ranking Algorithm which is used to rank the Movies based on search criteria matching the FEM matrix.
Dept. Of ISE, HKBKCE
2
2019-2020
Movie Recommender System Using Machine Learning
Literature Survey
CHAPTER 2
LITERATURE SURVEY 2.1 LITERATURE SURVEY [1] Deep Learning, Nature Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech. [2] Distributed representations of words and phrases and their compositionality The recently introduced continuous Skip-gram model is an efficient method for learning highquality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.
Dept. Of ISE, HKBKCE
3
2019-2020
Movie Recommender System Using Machine Learning
Literature Survey
[3] Collaborative Deep Learning for Recommender Systems Collaborative filtering (CF) is a successful approach commonly used by many recommender systems. Conventional CF-based methods use the ratings given to items by users as the sole source of information for learning to make recommendation. However, the ratings are often very sparse in many applications, causing CF-based methods to degrade significantly in their recommendation performance. To address this sparsity problem, auxiliary information such as item content information may be utilized. Collaborative topic regression (CTR) is an appealing recent method taking this approach which tightly couples the two components that learn from two different sources of information. Nevertheless, the latent representation learned by CTR may not be very effective when the auxiliary information is very sparse. To address this problem, we generalize recently advances in deep learning from i.i.d. input to non-i.i.d. (CFbased) input and propose in this paper a hierarchical Bayesian model called collaborative deep learning (CDL), which jointly performs deep representation learning for the content information and collaborative filtering for the ratings (feedback) matrix. Extensive experiments on three real-world datasets from different domains show that CDL can significantly advance the state of the art. [4] Neural Collaborative Filtering In recent years, deep neural networks have yielded immense success on speech recognition, computer vision and natural language processing. However, the exploration of deep neural networks on recommender systems has received relatively less scrutiny. In this work, we strive to develop techniques based on neural networks to tackle the key problem in recommendation -- collaborative filtering --- on the basis of implicit feedback. Although some recent work has employed deep learning for recommendation, they primarily used it to model auxiliary information, such as textual descriptions of items and acoustic features of musics. When it comes to model the key factor in collaborative filtering --- the interaction between user and item features, they still resorted to matrix factorization and applied an inner product on the latent features of users and items. By replacing the inner product with a neural architecture that can learn an arbitrary function from data, we present a general framework named NCF, short for Neural network-based Collaborative Filtering. NCF is generic and can express and generalize matrix factorization under its framework. To supercharge NCF modelling with non-linearities, we propose to leverage a multi-layer perceptron to learn the user-item interaction function. Extensive experiments on two real-world datasets show significant improvements of our
Dept. Of ISE, HKBKCE
4
2019-2020
Movie Recommender System Using Machine Learning
Literature Survey
proposed NCF framework over the state-of-the-art methods. Empirical evidence shows that using deeper layers of neural networks offers better recommendation performance.
2.2 PROBLEM STATEMENT The real use for the current audience is a movie which can entertain them or provides message for them in there social life. The audience simply waste there money by watching movies which are not well enough also the awards given to the movie is based on some numerical rating given by the users. But an approach is needed which can find out the weight age of the movie with respect to direction, production. actor, actress, songs and dialogues by performing the feature vector computation of data mining and sentiment analysis of data mining.
2.3 EXISTING SYSTEM Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. Collaborative filtering (CF) is a successful approach commonly used by many recommender systems. Conventional CF-based methods use the ratings given to items by users as the sole source of information for learning to make recommendation. However, the ratings are often very sparse in many applications, causing CF-based methods to degrade significantly in their recommendation performance. In recent years, deep neural networks have yielded immense success on speech recognition, computer vision and natural language processing. However, the exploration of deep neural networks on recommender systems has received relatively less scrutiny. In this work, we strive to develop techniques based on neural networks to tackle the key problem in recommendation --- collaborative filtering --- on the basis of implicit feedback.
Dept. Of ISE, HKBKCE
5
2019-2020
Movie Recommender System Using Machine Learning
Literature Survey
2.4 PROPOSED SYSTEM In the proposed approach the algorithm will first get the reviews of Movies from the given URL and then parse the reviews clean them. Find the positive and negative polarity for each review against the Movie. The Movie is again rated on the various attributes namely and then provide the overall sentiment distribution of Movie. Today the sheer volume of data being that is generated ever is enormous and making any sense out of that data is a tedious task. However, constant efforts and research in this area have led the automation of the process to some extent. With this project, we aim to further this automation process. Using a combination of data aggregation techniques, NLP, linguistic analysis and popular visualization techniques we generate visually appealing and easy to understand graphs which provide summarized feedback. This is done by performing detailed sentiment analysis on the data. The fields of opinion mining and sentiment analysis are distinct but deeply related. Opinion mining focuses on polarity detection [positive, negative or neutral] whereas sentiment analysis involves emotion recognition. Because detecting the polarity of text is often a step in sentiment analysis.
Dept. Of ISE, HKBKCE
6
2019-2020
Movie Recommender System Using Machine Learning
Requirements
CHAPTER 3
REQUIREMENTS 3.1 SOFTWARE REQUIREMENTS Debian Linux OS/Windows OS Atom Coding Language : Python 3.7 Browser Terminal MySQL Sqlite Studio Django,HTML,CSS,Bootstrap,Javascript
3.2 HARDWARE REQUIREMENTS 2GB RAM Touchpad/Mouse 3GB Disk Space
3.3 SPECIFIC REQUIREMENTS The prerequisites particular report enrolls every vital necessity that is required for the venture advancement. To infer the necessities, we need clear and careful comprehension of the items to be created. This is set up after point by point correspondences with the undertaking group and clients.
3.4 FUNCTIONAL REQUIREMENTS Wi-Fi Browser Compatibility Internet Connectivity
Dept. Of ISE, HKBKCE
7
2019-2020
Movie Recommender System Using Machine Learning
Requirements
3.5 NON-FUNCTIONAL REQUIREMENTS Non-functional prerequisites are the functions offered by the system. It incorporates time imperatives and requirements on the improvement procedure and principles. Given below are the non-functional requirements: Speed: The system should process the given input into output within the appropriate time. Ease of use: The software should be user-friendly. Then the customers can use easily, so it doesn't require much training time. Reliability: The rate of failures should be less then only the system is more reliable. Accuracy: The system should provide high accuracy in terms of giving regular and periodic updates. Portability: It should be easy to implement in any system. 3.5.1 Specific Requirements The specific requirements are: Hardware Interfaces: The external hardware interface used for indexing and searching in the personal computers of the clients. The PC's maybe laptops with wireless LAN as the internet connections provided will be wireless. Software Interfaces: Any version of windows can be used. Performance Requirements: The desktop at which we are working should be at least Pentium 4 machines so that it gives an optimum performance of the product.
3.6 SAFETY AND SECURITY REQUIREMENTS The most important factor in any system being actualized in the ventures or homes is the wellbeing and security concerns. The systems being developed should be made with the thought of safety as well as the surrounding in mind. In the proposed system we are developing, there are many solenoids that will be used as Braille, the surface of Braille is touched by the users that does not create any harm to the person. Now if we talk about audio part the speakers are like the usual speakers that is used by all of us the sound level also will be normal.
Dept. Of ISE, HKBKCE
8
2019-2020
Movie Recommender System Using Machine Learning
Requirements
3.7 SOFTWARE QUALITY REQUIREMENTS Software quality attributes are utilized to gauge the execution of items and we need to make sure the software being developed is up to the industry standards and to ensure that our system meets all the below-mentioned quality attributes and we have explored all these attributes to ensure our system meets these qualities. Reliability Reliability is a significant quality in any item for that matter. Since the product is being used for a specific reason if the product is not reliable then there is no point in using it. People can go for other similar products. And in case of the system, we are developing the main agenda is to make sure the existing systems reliability be increased by making use of our systems thereby providing a system which will be 100% reliable. Maintainability The system being developed should be easier to perform maintenance on. Any issues that may occur should not cause any large-scale damage and any repairs to be done should be easy to perform. Usability Usability is the factor which centers around convenience that is the means by which simple it is for individuals to utilize the system. Portability Probability is one amongst the best concentrations with any systems. If the system can be moved around to any place without having to go through any problem then that system has the greatest advantage. Which is the case with our system, since our system is small and can be moved around without any issues and since there is not any requirement for a specific system to be connected to this hardware for it to work, we can say this system is portable. Correctness The values being displayed in the system and the truthfulness of the results shown are very important. If the system fails to show correct results or correct values then the entire reason in installing the entire system fails.
Dept. Of ISE, HKBKCE
9
2019-2020
Movie Recommender System Using Machine Learning
Requirements
Efficiency Efficiency is the major system quality attribute. If the system is not efficient then the whole point in introducing a solution to a problem is moot. We should make sure the system designed for a specific problem has to be prepared for any kind of efficiency problems that may occur. As in our case, the system was designed to make sure the existing systems can be incorporated along with our system to complement each other and thereby helping in refining the efficiency of the whole setup. By making sure the systems are being merged the possible errors of the existing system will be strengthened by this system and the drawbacks of the current system are overcome. Thereby providing an effective solution to the problem.
Dept. Of ISE, HKBKCE
10
2019-2020
Movie Recommender System Using Machine Learning
System Design
CHAPTER 4
SYSTEM DESIGN 4.1 Introduction In the movie recommendation it the system application has two entities: users and items. This paper focuses on the movie recommender systems which are the core usage functionalities of websites and e-commerce applications, i.e. items=movies. In order to overcome the drawbacks, such as scalability, sparsity and cold-start problems. Although this framework is intended for movie recommender systems, it can be easily extended to other domains such as hospital recommendation system. In such movie recommender systems, users have preferences for certain items, and these preferences must be obtained from the data [8]. And the one main difficulty is in focal point of designing features (e.g. genre in the movie recommenders) especially for a huge amount of items manually, is intractable. In such issues, the concept of machine learning plays an important role. And as obvious as it is in Artificial Intelligence, Deep Learning, which in the recent emerging of machine learning, there is an approach mainly for recommender systems. In this paper, we propose a novel unified framework which has certain advantages in contrast with the current frameworks. This has future evolved the recommendation system, and in this case a movie recommendation system.
4.2 Existing System Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. Collaborative filtering (CF) is a successful approach commonly used by many recommender systems. Conventional CF-based methods use the ratings given to items by users as the sole source of information for learning to make recommendation. However, the ratings are often very sparse in many applications, causing CF-based methods to degrade significantly in their recommendation performance.
Dept. Of ISE, HKBKCE
11
2019-2020
Movie Recommender System Using Machine Learning
System Design
4.3 Proposed System In the proposed approach the algorithm will first get the reviews of Movies from the given URL and then parse the reviews clean them. Find the positive and negative polarity for each review against the Movie. The Movie is again rated on the various attributes namely and then provide the overall sentiment distribution of Movie. Today the sheer volume of data being that is generated ever is enormous and making any sense out of that data is a tedious task. However, constant efforts and research in this area have led the automation of the process to some extent. With this project, we aim to further this automation process. Using a combination of data aggregation techniques, NLP, linguistic analysis and popular visualization techniques we generate visually appealing and easy to understand graphs which provide summarized feedback. This is done by performing detailed sentiment analysis on the data. The fields of opinion mining and sentiment analysis are distinct but deeply related. Opinion mining focuses on polarity detection [positive, negative or neutral] whereas sentiment analysis involves emotion recognition. Because detecting the polarity of text is often a step in sentiment analysis.
Dept. Of ISE, HKBKCE
12
2019-2020
Movie Recommender System Using Machine Learning
System Design
4.4 System Architecture
Fig. 4.1 System Architecture
4.5 Data Flow Diagrams Data flow diagrams are used to graphically represent the flow of data in a business information system. DFD describes the processes that are involved in a system to transfer data from the input to the file storage and reports generation. Data flow diagrams can be divided into logical and physical. The logical data flow diagram describes flow of data through a system to perform certain functionality of a business. The physical data flow diagram describes the implementation of the logical data flow.
User Module
Movie Module
Recommendation
Fig. 4.2 Data Flow Diagram
Dept. Of ISE, HKBKCE
13
2019-2020
Movie Recommender System Using Machine Learning
System Design
4.6 Sequence Diagram Sequence Diagrams are interaction diagrams that detail how operations are carried out. They capture the interaction between objects in the context of a collaboration. Sequence Diagrams are time focus and they show the order of the interaction visually by using the vertical axis of the diagram to represent time what messages are sent and when. USER MODULE
MOVIE MODULE
RECOMMENDATION
Fig. 4.3 Sequence Diagram
4.7 Use Case Diagram Movie Genre Rating Review
Recommend
Fig. 4.4 Use Case Diagram
Dept. Of ISE, HKBKCE
14
2019-2020
Movie Recommender System Using Machine Learning
System Design
CHAPTER 5
IMPLEMENTATION 5.1 Introduction Implementation stage is where we convert our design into a working real-world system. We need to put together all the details we collected in the requirements and design stage and device a plan to give shape to our system. By incorporating all the designs and requirements we can start our implementation of the system by coding the entire system according to the architecture and the various functions and system properties we devised in the sequence diagrams as well as ensure all the use cases can be incorporated in the systems implementation. The implementation of the project begins with the installation of the required software on a System having the basic required hardware as discussed in chapter 3. Step-by-step implementation: 1. Open terminal and create a virtual environment with the command : virtualenv . 2. Activate the virtual environment with the command : source bin/activate 3. Run the django project with the command : python manage.py runserver 4. Go to the browser and type the url : http://120.0.0.1:8000/ 5. Browse through the movie list from the website. 6. Register and sign up with mail ID and username. 7. Login with the credentials and browse through the movie list. 8. When a user wants to give rating he/she has to have logged in. 9. When a user wants to get recommendation he/she has to have logged in, and has to have rated at least one movie.
Dept. Of ISE, HKBKCE
15
2019-2020
Movie Recommender System Using Machine Learning
System Design
5.2 Software Implementation Code Snippets:
{% block title %}Movies{% endblock title %} {% load staticfiles %}