Regular expression is very often used in NLP. I usually use it to find well-defined patterns in texts. There are lots of symbols and rules in regex. Here I tabulate some frequently used symbols which I always need to look up online. Selected from this webpage. Character Legend \d one digit from 0 to 9 […]
I’ve procrastinated for so long. Let me write about what I’ve done so far. The first step I did was to convert the SANDAG public comments from pdf to csv. Although there are some open source tools for pdf to csv conversion tasks, the file I’m converting exceeds the file size limits. Therefore, I chose […]
http://cnpolitics.org/2016/04/jennifer-pan/ 2016 is a year when we witnessed the Brexit and the heated American Election. For me, it’s also a year when I started to get interested in politics as I started working with Prof Iris Hui in using data to analyze massive public commentaries online. I can’t help feeling that data + internet is […]
Goal: to build a tool to parse the online comments on policies in order to help the polisci researchers get a rough sense of the public opinion.