This is a website that you can gather today's top news from
- 조선일보
- 경향신문
- BBC
- The Korea Times
- Washington Post
Moreover, you can write down your comment or thinking on each single news. The top news will be crawled every 3 hours from the moment you turn on server.
See here for demonstration.
- Responsive Web Design
- One comment per one news
- Store newspaper contents to the server
- Ubuntu 16.04.2 LTS server
- ruby 2.3.3p222
- Rails 5.1.1
- Shim, Sehee (심세희)
- Computer Science and Engineering, SNU
- dgssh@naver.com / https://www.facebook.com/dgssh
- Lecture "Introduction to Web Programming" 2017 Spring, M1313.000200
- Teacher : withkali
bundle installrake db:migraterails sthen it runs on localhost:3000
sqlite3 :: papers:table, url:string, company:string, content:string, comment:text
If someone wants to collect more newspaper website, there are two steps
- Get the url of the newspaper's top news you want to add
- Bring the content to string
article - Save all the data related to DB
# Example koreatimes
paper_url = "http://www.koreatimes.co.kr/www/index.asp"
paper_data = Nokogiri::HTML(open(paper_url))
article_page_url = paper_data.css('.top1_headline').css('a')[0]['href']
article_page_url = "http://www.koreatimes.co.kr" + article_page_url
article_page_data = Nokogiri::HTML(open(article_page_url))
article = ""
article_page_data.css('.view_article').css('span').each do |temp|
article = article + " .. " + temp.text
end
paper = Paper.new
paper.url = paper_url
paper.company = "The Korean Times"
paper.content = article
paper.save
#end of koreatimes- Add the newspaper's company name to array
ary
<% ary = ["조선일보", "경향신문", "BBC", "The Korean Times", "Washington Post"] %>- You can track and store top news every 3 hours personally.
- By writing down comment at each single paper, you can save your idea or feeling about the news contents.
- Now the crawler is only set to five newspaper website (Chosun Ilbo, Kyung-hyang shinmun, BBC, The Korea Times, Washington Post). In the future, it might work for every newspaper website.
- For english newspaper, making word-cloud with important keywords so that you can track today's hot topic with short time.


