This piece of work is to showcase content based (item-item) recommendation of movies. To do this first top 250 imdb movies are crawled and stored as json leveraging BeautifulSoup package. Then a cosine similarity based recommendation system is built using functionalities of pandas, numpy and scikit-learn.
- create aws access key and secret key
- Create an S3 bucket.
- Set the envoronment variables in
docker-compose-recsys.yml
anddocker-compose-aws.yml
, fromNone
value to appropriate values.
git clone /~https://github.com/tuhinsharma/imdb-rec-sys.git
cd imdb-rec-sys
- Follow
COMMON_STEP
- Use
docker-compose
docker-compose -f docker-compose-recsys.yml build
docker-compose -f docker-compose-recsys.yml up
- For Crawling :
curl -H 'Content-Type: application/json' -X POST -d {} http://localhost:6006/api/v1/schemas/crawl_imdb
- For Training :
curl -H 'Content-Type: application/json' -X POST -d {} http:imdb//localhost:6006/api/v1/schemas/train
- For Recommendation :
curl -H 'Content-Type: application/json' -X POST -d '{"movie_list": ["The Green Mile","Witness for the Prosecution"]}' http://localhost:6006/api/v1/schemas/score
- Choose EC2 instance Ubuntu 16.04 LTS - Xenial (HVM)
- Configure security group - SSH - custom, HTTP - anywhere
- Launch instance using key-value pair - tuhin-aws
- ssh into EC2 machine -
ssh -i "tuhin-aws.pem"
ubuntu
@
ec2-54-234-224-219.compute-1.amazonaws.com
sudo apt update --fix-missing
sudo apt install -y python3-pip
sudo apt install -y nginx
- open
nginx.conf
file →sudo vi /etc/nginx/nginx.conf
→ changeuser
ubuntu
;
and addserver_names_hash_bucket_size 128;
in the http block - open
virtual.conf
file →sudo vi /etc/nginx/conf.d/virtual.conf
→ add
server {
listen 80;
server_name ec2-54-234-224-219.compute-1.amazonaws.com;
location / {
proxy_pass http://127.0.0.1:8000;
}
}
sudo systemctl start nginx
git clone /~https://github.com/tuhinsharma/imdb-rec-sys.git
cd imdb-rec-sys
- Follow
COMMON_STEP
sudo pip3 install -r requirements.txt
cp ./rec_platform/deployment/app.py ./app.py
sudo systemctl restart nginx
gunicorn --pythonpath / -b localhost:8000 -k gevent -t 900 app:app -w 5 &
- In local system:-
curl -H 'Content-Type: application/json' -X POST -d '{"movie_list": ["The Green Mile","Witness for the Prosecution"]}' http://ec2-54-234-224-219.compute-1.amazonaws.com/api/v1/schemas/score
The output should be:-
{
"movies": [
"L.A. Confidential",
"Salinui chueok",
"Les diaboliques",
"12 Angry Men",
"Double Indemnity",
"Chinatown",
"On the Waterfront",
"A Wednesday",
"Se7en",
"The Usual Suspects"
]
}
- In remote system do
pkill gunicorn
andsudo systemctl stop nginx
if service no longer needed.
- Choose EC2 instance
ubuntu 16.04 LTS - Xenial (HVM)
- Configure security group -
SSH - custom
,HTTP - anywhere
- Launch instance using key-value pair -
tuhin-aws
- ssh into EC2 machine -
ssh -i "tuhin-aws.pem"
ubuntu
@
ec2-54-234-224-219.compute-1.amazonaws.com
sudo apt update --fix-missing
sudo apt install -y docker.io
sudo apt install -y docker-compose
git clone /~https://github.com/tuhinsharma/imdb-rec-sys.git
cd imdb-rec-sys
- Update the docker-compose-recsys.yml with suitable
ACCESS_KEY
andSECRET_ACCESS_KEY
andAWS_BUCKET_NAME
. Port mapping should be"80:6006"
sudo docker-compose -f docker-compose-recsys.yml build
sudo docker-compose -f docker-compose-recsys.yml up
- In local system:-
curl -H 'Content-Type: application/json' -X POST -d '{"movie_list": ["The Green Mile","Witness for the Prosecution"]}' http://ec2-54-234-224-219.compute-1.amazonaws.com/api/v1/schemas/score
The output should be:-
{
"movies": [
"L.A. Confidential",
"Salinui chueok",
"Les diaboliques",
"12 Angry Men",
"Double Indemnity",
"Chinatown",
"On the Waterfront",
"A Wednesday",
"Se7en",
"The Usual Suspects"
]
}
- configure
aws
withACCESS_KEY
andSECRET_ACCESS_KEY
git clone /~https://github.com/tuhinsharma/imdb-rec-sys.git
cd imdb-rec-sys
aws ecr create-repository --repository-name recsys-ubuntu
$(aws ecr get-login --no-include-email --region us-east-1)
docker build -t recsys-ubuntu -f Dockerfile.ubuntu .
docker tag recsys-ubuntu:latest 184213940252.dkr.ecr.us-east-1.amazonaws.com/recsys-ubuntu:latest
docker push 184213940252.dkr.ecr.us-east-1.amazonaws.com/recsys-ubuntu:latest
- Update the docker-compose-aws.yml with suitable
ACCESS_KEY
andSECRET_ACCESS_KEY
andAWS_BUCKET_NAME
. Port mapping should be"80:6006"
.image
should be184213940252.dkr.ecr.us-east-1.amazonaws.com/recsys-ubuntu
ecs-cli configure --region us-east-1 --cluster fastfilmz-analytics-cluster
ecs-cli up --keypair tuhin-aws --capability-iam --size 1 --instance-type t2.micro --force --cluster fastfilmz-analytics-cluster --region us-east-1
ecs-cli compose --project-name imdb-recsys --file docker-compose-aws.yml up
- In case Outdated ECS Agent -
aws ecs update-container-agent --cluster fastfilmz-analytics-cluster --container-instance bc7e2a68-1be6-48d2-85a6-7f08232f298b
- In local system:-
curl -H 'Content-Type: application/json' -X POST -d '{"movie_list": ["The Green Mile","Witness for the Prosecution"]}' http://ec2-54-234-224-219.compute-1.amazonaws.com/api/v1/schemas/score
The output should be:-
{
"movies": [
"L.A. Confidential",
"Salinui chueok",
"Les diaboliques",
"12 Angry Men",
"Double Indemnity",
"Chinatown",
"On the Waterfront",
"A Wednesday",
"Se7en",
"The Usual Suspects"
]
}
- If done with the service
ecs-cli down