使用Python实现一个读书内容提醒

我们通常会发现很难记住我们曾经读过的东西,就像上图所示,随着时间的推移,所读的东西会渐渐忘记,只有不断的复习,才能够真正地把它们记住。有时,我也想不停地去复习他们,但是总是会忘记这件事,要是能有一个系统不断地提醒我做这件事就好了。我想我所遇到的这个问题,应该也是大家平常会遇到的。

其实市面上,也有一些网站可以实现这个功能,比如readwise.io,它就会每天给你发送提醒的邮件。那么我就在想,我们能不能自己也做 一个呢?想到就做吧,正好最近也在学习Python,那就让我们一起来试试看能不能实现这个功能。

首先来看看我们要实现的功能:

  1. 从你的数据集中找到笔记和突出显示的内容
  2. 把相关的笔记发送到一个指定的邮箱
  3. 按照用户定义的时间来发送邮件

首先,我们需要一些数据,这个只能是手动来做了。我使用的是一个PDF软件,他可以得到所有的注释。我把他们简单放到一个excel中,然后在转成Json,下面是一个Json的片段:

# JSON data
{
    "Sheet1": [
        {
            "date_added": "May 12, 8:59 AM, by Ankush Garg",
            "source": "Book",
            "title": "Fundamentals of Software Architecture",
            "chapter": "N/A",
            "note": "N/A",
            "highlight": "The microkernel architecture style is a relatively simple monolithic architecture consisting of two architecture components: a core system and plug-in components.",
            "page_number": "Page 165",
            "has_been_chosen_before": "0",
            "id": "48"
        },
        {
            "date_added": "Apr 12, 10:50 AM, by Ankush Garg",
            "source": "Book",
            "title": "Genetic Algorithms with Python",
            "chapter": "Chapter 4: Combinatorial Optimization - Search problems and combinatorial optimization",
            "note": "N/A",
            "highlight": "A search algorithm is focused on solving a problem through methodic evaluation of states and state transitions, aiming to find a path from the initial state to a desirable final (or goal) state. Typically, there is a cost or gain involved in every state transition, and the objective of the corresponding search algorithm is to find a path that minimizes the cost or maximizes the gain. Since the optimal path is one of many possible ones, this kind of search is related to combinatorial optimization, a topic that involves finding an optimal object from a finite, yet often extremely large, set of possible objects.",
            "page_number": "Page 109",
            "has_been_chosen_before": "0",
            "id": "21"
        }
]}

下面,我们来创建几个文件来处理相关的操作:

Database.py

这个文件非常简单,他使用read_json_data函数加载了保存在本地的数据。这样,我们就可以使用这些数据了

import json

# Ended up using http://beautifytools.com/excel-to-json-converter.php to convert Excel to Json

# URL where data is stored - local on my computer for now
url = '/Users/ankushgarg/Desktop/email-reading-highlights/notes-email-sender/data/data.json'

def read_json_data():
    with open(url) as json_file:
        response = json.load(json_file)
    return response

Selector_service.py

这个文件首先我使用我们上面实现的read_json_data来加载数据,并赋值给self.raw_response。这里我们会随机选择三个内容,并把他们保存在self.sampled_object中。对于选中的内容,我们把他们的选中次数增加,以后就可以根据这个选择次数来有权重地进行随机选择。

# This script reads in the data from S3 and selects highlights
import numpy as np
from database import read_json_data


def increment_has_chosen_before(item):
    count_now = int(item['has_been_chosen_before'])
    item['has_been_chosen_before'] = count_now + 1


class SelectorService:
    def __init__(self):
        self.raw_response = read_json_data() # Read in JSON data
        self.sampled_object = None
        self.sheet_name_to_sample_by = 'Sheet1'
        self.num_of_entries_to_sample = 3 # Number of entries to select

    def select_random_entries(self):
        # Randomly choose entries from the dataset
        self.sampled_object = np.random.choice(self.raw_response[self.sheet_name_to_sample_by],
                                               self.num_of_entries_to_sample)

        # For each selection increment the field "has_been_chosen_before"
        # In the future can use probability to make selections to notes that haven't gotten selected
        for note in self.sampled_object:
            increment_has_chosen_before(note)
        return self.sampled_object

Parse_content.py

这个类,就是分析我们上面得到的随机的entry,然后把他们存储在一个self.sample_entries中,并通过parse_selected_entries方法来解析他们。我们主要用这个方法来解析选中的entry,然后转成后面email发送的格式。

	from selector_service import SelectorService
	
	
	class ContentParser:
	    def __init__(self):
	        self.sample_entries = SelectorService().select_random_entries()
	        self.content = None
	
	    def parse_selected_entries(self):
	        content = ''
	        for item_index in range(len(self.sample_entries)):
	            item = "DATE-ADDED: " + self.sample_entries[item_index]['date_added']
	            content = content + item + "\n"
	            item = "HIGHLIGHT: " + self.sample_entries[item_index]['highlight']
	            content = content + item + "\n"
	            item = "TITLE: " + self.sample_entries[item_index]['title']
	            content = content + item + "\n"
	            item = "CHAPTER: " + self.sample_entries[item_index]['chapter']
	            content = content + item + "\n"
	            item = "SOURCE: " + self.sample_entries[item_index]['source']
	            content = content + item + "\n"
	            item = "PAGE-NUMBER: " + self.sample_entries[item_index]['page_number']
	            content = content + item + "\n" + "------------" + "\n"
	        self.content = content
	        return self.content

Mail_service.py

MailService就是通过上面的parse_selected_entries得到内容,并把他们存储到self.content中。define_email_parameters就是设置email相关的参数,比如邮件的名字,从谁发的,以及发给谁等等。

	# This service emails whatever it gets back from Content Parser
	from parse_content import ContentParser
	import smtplib
	from email.message import EmailMessage
	
	
	class MailerService:
	    def __init__(self):
	        self.msg = EmailMessage()
	        self.content = ContentParser().parse_selected_entries()
	
	    def define_email_parameters(self):
	        self.msg['Subject'] = 'Your Highlights and Notes for today'
	        self.msg['From'] = "example@gmail.com" # your email
	        self.msg['To'] = ["example@gmail.com"] # recipient email
	
	    def send_email(self):
	        self.msg.set_content(self.content)
	        with smtplib.SMTP_SSL('smtp.gmail.com', 465) as smtp:
	            smtp.login("example@gmail.com", 'password') # email account used for sending the email
	            smtp.send_message(self.msg)
	        return True
	
	    def run_mailer(self):
	        self.define_email_parameters()
	        self.send_email()
	
	
	def run_job():
	    composed_email = MailerService()
	    composed_email.run_mailer()
	
	
run_job()

这两个方法都是通过run_mailer来调用的,这个应用是通过最下面的run_job函数来开始的。下面就是email发出来的内容:

好了,这样你就可以实现email的发送了。下面唯一一个还要做的事情就是怎样来调用mail_service.py,我们需要一个定时调用这个实现,这里我们使用Cron,它是一个一直运行得process,然后会在特定的时间调用特定的任务,在你的Crontab中加入下面的代码吧:

0 19 * * * /Users/ankushgarg/.pyenv/shims/python /Users/ankushgarg/Desktop/email-reading-highlights/notes-email-sender/mail_service.py >> /Users/ankushgarg/Desktop/email-reading-highlights/notes-email-sender/cron.log 2>&1

这段代码的简单解释如下,他会在每天7PM跑一次。具体的参数可以参见https://crontab.guru/

好了,现在所有的功能就都实现了,下面还有一些内容就交给你来实现了:

  1. 现在准备数据是手动的,你可以通过Python来实现自动的解析,或者试下一个web界面,让你自己假如相关的内容,再通过API读取出来等等。
  2. Email的内容其实不是很好,你可以用HTML来实现。
  3. 使用has_been_chosen_before 来更好的随机选择内容。
  4. 使用NLP来解析这些内容,然后生成一些总结后者概述之类的。
  5. 其它你想到的内容

参考文章:https://hackernoon.com/get-the-most-out-of-everything-you-read-using-python-kw1o3uiz

You may also like...

Leave a Reply

Your email address will not be published.