Deeply Trivial: writing

Showing posts with label writing. Show all posts

Saturday, September 5, 2020

A Weekend of Writing

Just a quick update post. I'm spending my weekend doing something I've wanted to do for years - I decided to join the International 3-Day Novel Contest. Every year, people around the world spend Labor Day weekend hunched over their computer or notebook, trying to write approximately 100 double-spaced pages (or more) of a complete novel. Writers submit their work, and in the Spring, the winner gets their book published by Anvil Press.

I'm stocked up on groceries, my dog is staying with a friend (who has also agreed to sign my witness affidavit, that I followed the rules of writing, most importantly that writing only occurred between Saturday from 12:00 am until Monday at 11:59 pm), and I've got 27 pages written. Let's do this.

Wednesday, August 12, 2020

Creating Things

Normally, this time of year, we'd be getting excited for my choir's new season and rehearsals to begin in early September. Sadly, with the pandemic, it's unlikely we'll be getting together then, and I'm not sure how long it will take before it's safe and people begin feeling comfortable gathering in such a way. So I've been seeking out ways to keep some creativity in my life.

I've started drawing again, something I haven't done in years. I'm a bit rusty but hey - practice practice, right? I started with some pretty flowers from my parents' backyard, in a combination of soft chalk pastels (my favorite medium) and colored pencil:

And my next project is going to be a self-portrait, something I've never done before. Some early work with pencil that I'll fill in soon (thinking again a combo of colored pencil and chalk pastels):

I also had some fun putting together a Lego Architecture set of Paris:

What mainly sparked this round of creativity was writing and recording an arrangement for my choir's virtual benefit. I had so much fun with that, I'm going to keep doing it! I'm planning to share that video soon, and have also started recording some other a cappella arrangements I plan on sharing.

And lastly, because I needed to bring Zep into the fun too, I've finally set up an Instagram for him. If you're on the 'gram, you can follow him here: https://www.instagram.com/zeppelinblackdog/

Friday, March 20, 2020

Talking with Numbers

From Reddit user Itsactuallywhom, a chart of US COVID-19 cases along with Trump's statements about the pandemic:

Illinois residents have been asked to shelter-in-place. I've got plans for writing projects, gardening, and an office to clean/reorganize, so I should be able to keep busy when I'm not working (from home). A few of my writing friends are talking about organizing CoroNaNo - National Novel Writing during COVID-19 quarantine. Anyone want to participate?

Thursday, September 27, 2018

All Thumbs

Recently, I was diagnosed with flexor tendinitis in my left thumb. After weeks of pain, I finally saw an orthopedic specialist yesterday, who confirmed the original diagnosis, let me peek at my hand x-rays, and gave me my first (and hopefully last) cortisone injection. It left my thumb puffy, with a rather interesting almond-shaped bruise at the injection point.

If I thought the pain was bad before, it was nothing compared to the pain that showed up about an hour after injection. So severe, I could feel it in my teeth, and I spent the whole day feeling nauseous and jittery from it. Any movement of my thumb was excruciating. I stupidly ordered a dinner that required a knife and fork last night, but I discovered how to hold my fork between my forefinger and middle finger, with my pinky to steady it, much to the amusement of my dinner companion. I'm sure I looked like a kid just learning to use a fork. Who knows? Perhaps that will be a useful skill in the future.

I slept in my thumb brace with my hand resting flat on a pillow, and even still, woke up after every sleep cycle with my hand curled up and on fire.

Today, my thumb is stiff and sore, but I can almost make a fist before it starts to hurt, and I can grasp objects with it for short periods of time. The pain is minimal. It kind of makes yesterday's suffering worth it.

Best case scenario is that this injection cures my tendinitis. Worst case is that it does nothing, but based on how much better I'm feeling, I'm hopeful that isn't what's happening. Somewhere between best and worst case is that I may need another injection in the future. If I keep making a recovery, and feeling better tomorrow, I would probably be willing to do it again. But next time, I'll know not to try going to work after. I was useless and barely able to type.

I'm moving out of my apartment this weekend and I have a draft of a chapter I'm writing due Monday, so sadly, there will probably be no Statistics Sunday post this week. Back to regular posting in October.

Thursday, September 13, 2018

Don't Upset a Writer

There's an old joke among writers: don't piss us off. You'll probably end up as an unflattering character or a murder victim in our next project.

There's another joke among writers: if you ever need to bump someone off or dispose of a body, ask a writer. Chances are he or she has thought through a hundred different ways to do it.

The thing about these jokes, though, is that writers usually get our retribution through writing. We don't tend to do the things we write about in real life. If someone legitimately encouraged us to cause real harm to another person or actually help in the commission of a crime, the answer would likely be a resounding "hell no." That's not how we deal with life's slings and arrows. Writing about them is generally enough to satisfy the desire and alleviate the pain.

But not in all cases.

If there were any rules about committing a crime (and I'm sure there are), rule #1 should be: don't write about it on the internet before you do it. And yet, a romance novelist may have done just that. Nancy Crampton Brophy has been charged with murdering her husband; this same author also wrote a blog post called "How to Murder Your Husband":

Crampton Brophy, 68, was arrested Sept. 5 on charges of murdering her husband with a gun and unlawful use of a weapon in the death of her husband, Daniel Brophy, according to the Portland Police Bureau.

The killing puzzled police and those close to Daniel Brophy from the start. Brophy, a 63-year-old chef, was fatally shot at his workplace at the Oregon Culinary Institute on the morning of June 2. Students were just beginning to file into the building for class when they found him bleeding in the kitchen, KATU2 news reported. Police had no description of the suspect.

In Crampton Brophy’s “How to Murder Your Husband” essay, she had expressed that although she frequently thought about murder, she didn’t see herself following through with something so brutal. She wrote she would not want to “worry about blood and brains splattered on my walls,” or “remembering lies.”

“I find it easier to wish people dead than to actually kill them,” she wrote. “. . . But the thing I know about murder is that every one of us have it in him/her when pushed far enough.”

It's kind of surprising to me that Brophy's books are romances and yet apparently involve a lot of murder and death. I don't read romance, so maybe murder and death is a common theme and I just don't know it. Of course, as I was thinking about last year's NaNoWriMo book, I surprised myself when I realized that the genre may, in fact, be romance. One without any murder, though.

This year's book will be an adventure/superhero story. And I'm excited to say I'm getting a very clear picture in my head of the story and its key scenes, an important milestone for me if I want to write something I consider good. I didn't have that for last year's project, which is why no one has seen it and I haven't touched it since November 30th. It was by design that I went into it without much prep - that's encouraged for NaNo, to write without bringing out the inner editor. And I'll admit, there are some sections in the book that are beautifully written. I surprised myself, in a good way, with some of it. But all-in-all, it's a mess in need of a LOT of work.

Sunday, September 9, 2018

Statistics Sunday: What is Standard Setting?

In a past post, I talked about content validation studies, a big part of my job. Today, I'm going to give a quick overview of standard setting, another big part of my job, and an important step in many testing applications.

In any kind of ability testing application, items will be written with identified correct and incorrect answers. This means you can generate overall scores for your examinees, whether the raw score is simply the number of correct answers or generated with some kind of item response theory/Rasch model. But what isn't necessarily obvious is how to use those scores to categorize candidates and, in credentialing and similar applications, who should pass and who should fail.

This is the purpose of standard setting: to identify cut scores for different categories, such as pass/fail, basic/proficient/advanced, and so on.

There are many different methods for conducting standard setting. Overall, approaches can be thought of as item-based or holistic/test-based.

For item-based methods, standard setting committee members go through each item and categorize it in some way (the precise way depends on which method is being used). For instance, they may categorize it as basic, proficient, or advanced, or they may generate the likelihood that a minimally qualified candidate (i.e., the person who should pass) would get it right.

For holistic/test-based methods, committee members make decisions about cut scores within the context of the whole test. Holistic/test-based methods still require review of the entire exam, but don't require individual judgments about each item. For instance, committee members may have a booklet containing all items in order of difficulty (based on pretest data), and place a bookmark at the item that reflects the transition from proficient to advanced or from fail to pass.

The importance of standard setting comes down to defensibility. In licensure, for instance, failing a test may mean being unable to work in one's field at all. For this reason, definitions of who should pass and who should fail (in terms of knowledge, skills, and abilities) should be very strong and clearly tied to exam scores. And licensure and credentialing organizations are frequently required to prove, in a court of law, that their standards are fair, rigorously derived, and meaningful.

For my friends and readers in academic settings, this step may seem unnecessary. After all, you can easily categorize students into A, B, C, D, and F with the percentage of items correct. But this is simply a standard (that is, the cut score for pass/fail is 60%), set at some point in the past, and applied through academia.

I'm currently working on a chapter on standard setting with my boss and a coworker. And for anyone wanting to learn more about standard setting, two great books are Cizek and Bunch's Standard Setting and Zieky, Perie, and Livingston's Cut Scores.

Wednesday, September 5, 2018

Lots of Writing, Just Not On the Blog

Hi all! Again, it's been a while since I've blogged something. I'm currently keeping busy with multiple writing projects, and I'm hoping to spin one of them into a blog post soon:

I'm still analyzing a huge survey dataset for my job, and writing up new analyses requested by our boards and advisory councils, as well as creating a version for laypeople (i.e., general public as opposed to people with a stats/research background)
My team submitted to write a chapter for the 3rd edition of The Institute for Credentialing Excellence Handbook, an edited volume about important topics in credentialing/high-stakes testing; I'm leading our chapter on standard setting, which cover methods for standard setting (selecting exam pass points) and the logistics of conducting a standard-setting study
I've already begun research for my NaNoWriMo novel, which will be a superhero story, so elements of sci-fi and fantasy and lots of fun world-building
Lastly, I'm developing two new surveys, one for a content validation study and the other a regular survey we send out to assess the dental assisting workforce

Sunday, August 19, 2018

Statistics Sunday: Using Text Analysis to Become a Better Writer

Using Text Analysis to Become a Better Writer We all have words we love to use, and that we perhaps use too much. As an example: I have a tendency to use the same transitional statements, to the point that, before I submit a manuscript, I do a find all to see how many times I've used some of my favorites, e.g., additionally, though, and so on.

I'm sure we all have our own words we use way too often.

Text analysis can also be used to discover patterns in writing, and for a writer, may be helpful in discovering when we depend too much on certain words and phrases. For today's demonstration, I read in my (still in-progress) novel - a murder mystery called Killing Mr. Johnson - and did the same type of text analysis I've been demonstrating in recent posts.

To make things easier, I copied the document into a text file, and used the read_lines and tibble functions to prepare data for my analysis.

setwd("~/Dropbox/Writing/Killing Mr. Johnson")

library(tidyverse)

KMJ_text <- read_lines('KMJ_full.txt')

KMJ <- tibble(KMJ_text) %>%
  mutate(linenumber = row_number())

I kept my line numbers, which I could use in some future analysis. For now, I'm going to tokenize my data, drop stop words, and examine my most frequently used words.

library(tidytext)
KMJ_words <- KMJ %>%
  unnest_tokens(word, KMJ_text) %>%
  anti_join(stop_words)

## Joining, by = "word"

KMJ_words %>%
  count(word, sort = TRUE) %>%
  filter(n > 75) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(word, n)) +
  geom_col() + xlab(NULL) + coord_flip()

Fortunately, my top 5 words are the names of the 5 main characters, with the star character at number 1: Emily is named almost 600 times in the book. It's a murder mystery, so I'm not too surprised that words like "body" and "death" are also common. But I know that, in my fiction writing, I often depend on a word type that draws a lot of disdain from authors I admire: adverbs. Not all adverbs, mind you, but specifically (pun intended) the "-ly adverbs."

ly_words <- KMJ_words %>%
  filter(str_detect(word, ".ly")) %>%
  count(word, sort = TRUE)

head(ly_words)

## # A tibble: 6 x 2
##   word         n
##   <chr>    <int>
## 1 emily      599
## 2 finally     80
## 3 quickly     60
## 4 emily’s     53
## 5 suddenly    39
## 6 quietly     38

Since my main character is named Emily, she was accidentally picked up by my string detect function. A few other top words also pop up in the list that aren't actually -ly adverbs. I'll filter those out then take a look at what I have left.

filter_out <- c("emily", "emily's", "emily’s","family", "reply", "holy")

ly_words <- ly_words %>%
  filter(!word %in% filter_out)

ly_words %>%
  filter(n > 10) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(word, n)) +
  geom_col() + xlab(NULL) + coord_flip()

I use "finally", "quickly", and "suddenly" far too often. "Quietly" is also up there. I think the reason so many writers hate on adverbs is because it can encourage lazy writing. You might write that someone said something quietly or softly, but is there a better word? Did they whisper? Mutter? Murmur? Hiss? Did someone "move quickly" or did they do something else - run, sprint, dash?

At the same time, sometimes adverbs are necessary. I mean, can I think of a complete sentence that only includes an adverb? Definitely. Still, it might become tedious if I keep depending on the same words multiple times, and when a fiction book (or really any kind of writing) is tedious, we often give up. These results give me some things to think about as I edit.

Still have some big plans on the horizon, including some new statistics videos, a redesigned blog, and more surprises later! Thanks for reading!

Friday, August 17, 2018

Diagramming Sentences

Confession: I never learned to diagram sentences.

Second confession: I'm really upset about this fact.

Solution: This great article taught me how to do it!

Last confession: This GIF came up when I searched for "Happy nerd"

Tuesday, August 7, 2018

Statistics Sunday: Highlighting a Subset of Data in ggplot2

Highlighting Specific Cases in ggplot2 Here's my belated Statistics Sunday post, using a cool technique I just learned about: gghighlight. This R package works with ggplot2 to highlight a subset of data. To demonstrate, I'll use a dataset I analyzed for a previous post about my 2017 reading habits. [Side note: My reading goal for this year is 60 books, and I'm already at 43! I may have to increase my goal at some point.]

setwd("~/R")
library(tidyverse)

books<-read_csv("2017_books.csv", col_names = TRUE)

## Warning: Duplicated column names deduplicated: 'Author' => 'Author_1' [13]

## Parsed with column specification:
## cols(
##   .default = col_integer(),
##   Title = col_character(),
##   Author = col_character(),
##   G_Rating = col_double(),
##   Started = col_character(),
##   Finished = col_character()
## )

## See spec(...) for full column specifications.

One analysis I conducted with this dataset was to look at the correlation between book length (number of pages) and read time (number of days it took to read the book). We can also generate a scatterplot to visualize this relationship.

cor.test(books$Pages, books$Read_Time)

## 
## 	Pearson's product-moment correlation
## 
## data:  books$Pages and books$Read_Time
## t = 3.1396, df = 51, p-value = 0.002812
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1482981 0.6067498
## sample estimates:
##       cor 
## 0.4024597

scatter <- ggplot(books, aes(Pages, Read_Time)) +
  geom_point(size = 3) +
  theme_classic() +
  labs(title = "Relationship Between Reading Time and Page Length") +
  ylab("Read Time (in days)") +
  xlab("Number of Pages") +
  theme(legend.position="none",plot.title=element_text(hjust=0.5))

There's a significant positive correlation here, meaning the longer books take more days to read. It's a moderate correlation, and there are certainly other variables that may explain why a book took longer to read. For instance, nonfiction books may take longer. Books read in October or November (while I was gearing up for and participating in NaNoWriMo, respectively) may also take longer, since I had less spare time to read. I can conduct regressions and other analyses to examine which variables impact read time, but one of the most important parts of sharing results is creating good data visualizations. How can I show the impact these other variables have on read time in an understandable and visually appealing way?

gghighlight will let me draw attention to different parts of the plot. For example, I can ask gghighlight to draw attention to books that took longer than a certain amount of time to read, and I can even ask it to label those books.

library(gghighlight)

scatter + gghighlight(Read_Time > 14) +
  geom_label(aes(label = Title),
             hjust = 1,
             vjust = 1,
             fill = "blue",
             color = "white",
             alpha = 0.5)

Here, the gghighlight function identifies the subset (books that took more than 2 weeks to read) and labels those books with the Title variable. Three of the four books with long read time values are non-fiction, and one was read for a course I took, so reading followed a set schedule. But the fourth is a fiction book, which took over 20 days to read. Let's see how month impacts reading time, by highlighting books read in November. To do that, I'll need to alter my dataset somewhat. The dataset contains a starting date and finish date, which were read in as characters. I need to convert those to dates and pull out the month variable to create my indicator.

library(lubridate)

## 
## Attaching package: 'lubridate'

## The following object is masked from 'package:base':
## 
##     date

books$Started <- mdy(books$Started)
books$Start_Month <- month(books$Started)
books$Month <- ifelse(books$Start_Month > 10 & books$Start_Month < 12, books$Month <- 1,
                      books$Month <- 0)
scatter + gghighlight(books$Month == 1) +
  geom_label(aes(label = Title), hjust = 1, vjust = 1, fill = "blue", color = "white", alpha = 0.5)

The book with the longest read time was, in fact, read during November, when I was spending most of my time writing.

Wednesday, July 25, 2018

Blogging Break

You may have noticed I haven't been blogging as much recently. Though in some aspects, I'm busier than I've been in a while, I still have had a lot of downtime, but sadly not as much inspiration to write on my blog. I've got a few stats side projects I'm working on, but nothing to a point I can blog about, and I'm having difficulty with writing some of the code on projects I've been plan on writing about. Hopefully I'll have something soon, and will get to back to posting weekly Statistics Sunday posts.

Here's what's going on with me currently:

I had my first conference call with my company's Research Advisory Committee last night, a committee I imagine I'll inherit as my own now that I'm Director of Research
I submitted my first novel query to an agent earlier today and received a confirmation email that she got it
I've been reading a ton and apparently am 5 books ahead of schedule on my Goodreads reading challenging: 38 books so far this year
The research center I used to work for was not renewed, so they'll be shutting their doors in 14 months; I'm sad for my colleagues
Today is my work anniversary: I've been at my current job 1 year! My boss emailed me about it this morning, along with this picture:

Friday, July 6, 2018

Lots Going On

So much going on right now that I haven't had much time for blogging.

I'm almost completely transitioned from my old department, Exam Development, to my new one, Research, of which I am Director and currently the only employee. But my first direct report will be coming on soon! We also have a newly hired Director of Exam Development, and we've already started chatting about some Research-Exam Development collaborative projects.
I'm preparing for multiple content validation studies, including one for a brand new certification we'll be offering. So I've been reviewing blueprints for similar exams, curriculum for related programs, and job descriptions from across the US to help build potential topics for the exam, to be reviewed by our expert panel.
I've been participating in Camp NaNoWriMo, which happens in April and July, and allows you to set whatever word/page/time/etc. goals you'd like for your manuscript or project. My goal is to finish the novel I wrote for 2016 NaNoWriMo, so I'm spending most of the month editing as well as writing toward a goal of 8,000 additional words.
Also related to my book, I got some feedback from an agent that I need to play up the mystery aspect of my book, and try to think of some comparative titles and/or authors - that is, "If you liked X, you'll like my book." So in addition to doing a lot writing, I'm doing a lot of reading, looking for some good comparative works. I've asked a few friends to read some chapters and see if they come up with something as well.
I started recording the promised Mixed-Effects Meta-Analysis video earlier this week, but when I listened to what I recorded, you can clearly hear my neighbors shooting off fireworks in the background. So I need to re-record that part and record the rest. Hopefully this weekend.

Bonus writing-related pic, just for fun: I found a book title generator online, and this is the result I got for Mystery:

That's all for now! Hopefully back to regular blogging soon!

Saturday, June 30, 2018

What I Did on My Week's Vacation

It's been quite a week. I traveled to Kansas City Monday to spend time with my family. Tuesday was my parents' 47th wedding anniversary. Unfortunately, we ended up spending part of Tuesday at the ER when my dad's breathing problems were exacerbated by the hot weather and their air conditioner dying over the weekend. In fact, I was at the ER when a tornado warning was issued in Kansas City, MO, and I spent that time hunkered down in a back hallway waiting for it to pass. It turned out to be a small tornado, mostly knocking over some trees and dropping golf ball sized hail on part of the city. We returned home to restored air conditioning and a beautiful (but hot) rest of the day.

Wednesday, we stayed in and got take-out from Joe's Kansas City BBQ.

Thursday, I finally took the Boulevard Brewery tour, starting with a glass of Tank 7, and got to see the brewing operation.

Our tour guide showing where the brewery began.
This section was the entire operation in 1989, but now is used for experimental beers.

The centrifuge that filters the beer. If this thing ever broke down, we're told it would send a stream of beer at such force, we'd be sliced in half. We decided to refer to that phenomenon as a "beersaber."

The tour ended with a beer and food tasting, featuring The Calling Double IPA and an open-faced ham sandwich with Clementine-Thyme marmalade, The Sixth Glass Quadrupel with brie and fig, The Bourbon Barrel-Aged Quadrupel (BBQ, made from The Sixth Glass aged in bourbon barrels and my favorite from the tasting) and goat-cheese cheesecake with blackberry, and The Dark Truth Imperial Stout and a chocolate chip cookie. All food came from KC restaurants and bakeries.

I picked up some yummy beer, including a four-pack of the BBQ, and a jar of the marmalade to bring home to Chicago.

I did a lot of writing - more on that (and hopefully some good news) later - and finished a couple books. Now, back to Chicago!

Friday, June 22, 2018

Thanks for Reading!

As I've been blogging more about statistics, R, and research in general, I've been trying to increase my online presence, sharing my blog posts in groups of like-minded people. Those efforts seem to have paid off, based on my view counts over the past year:

And based on read counts, here are my top 10 blog posts, most of which are stats-related:

Beautiful Asymmetry - none of us is symmetrical, and that's okay
Statistical Sins: Stepwise Regression - just step away from stepwise regression
Statistics Sunday: What Are Degrees of Freedom? (Part 1) - and read Part 2 here
Working with Your Facebook Data in R
Statistics Sunday: Free Data Science and Statistics Resources
Statistics Sunday: What is Bootstrapping?
Statistical Sins: Know Your Variables (A Confession) - we all make mistakes, but we should learn from them
Statistical Sins: Not Making it Fun (A Thinly Veiled Excuse to Post a Bunch of XKCD Cartoons) - the subtitle says it all
Statistics Sunday: Taylor Swift vs. Lorde - Analyzing Song Lyrics - analyzing song lyrics is my jam
How Has Taylor Swift's Word Choice Changed Over Time? - ditto

It's so nice to see people are enjoying the posts, even sharing them and reaching out with additional thoughts and questions. Thanks, readers!

Thursday, June 14, 2018

Working with Your Facebook Data in R

How to Read in and Clean Your Facebook Data - I recently learned that you can download all of your Facebook data, so I decided to check it out and bring it into R. To access your data, go to Facebook, and click on the white down arrow in the upper-right corner. From there, select Settings, then, from the column on the left, "Your Facebook Information." When you get the Facebook Information screen, select "View" next to "Download Your Information." On this screen, you'll be able to select the kind of data you want, a date range, and format. I only wanted my posts, so under "Your Information," I deselected everything but the first item on the list, "Posts." (Note that this will still download all photos and videos you posted, so it will be a large file.) To make it easy to bring into R, I selected JSON under Format (the other option is HTML).

After you click "Create File," it will take a while to compile - you'll get an email when it's ready. You'll need to reenter your password when you go to download the file.

The result is a Zip file, which contains folders for Posts, Photos, and Videos. Posts includes your own posts (on your and others' timelines) as well as posts from others on your timeline. And, of course, the file needed a bit of cleaning. Here's what I did.

Since the post data is a JSON file, I need the jsonlite package to read it.

setwd("C:/Users/slocatelli/Downloads/facebook-saralocatelli35/posts")
library(jsonlite)

FBposts <- fromJSON("your_posts.json")

This creates a large list object, with my data in a data frame. So as I did with the Taylor Swift albums, I can pull out that data frame.

myposts <- FBposts$status_updates

The resulting data frame has 5 columns: timestamp, which is in UNIX format; attachments, any photos, videos, URLs, or Facebook events attached to the post; title, which always starts with the author of the post (you or your friend who posted on your timeline) followed by the type of post; data, the text of the post; and tags, the people you tagged in the post.

First, I converted the timestamp to datetime, using the anytime package.

library(anytime)

myposts$timestamp <- anytime(myposts$timestamp)

Next, I wanted to pull out post author, so that I could easily filter the data frame to only use my own posts.

library(tidyverse)

myposts$author <- word(string = myposts$title, start = 1, end = 2, sep = fixed(" "))

Finally, I was interested in extracting URLs I shared (mostly from YouTube or my own blog) and the text of my posts, which I did with some regular expression functions and some help from Stack Overflow (here and here).

url_pattern <- "http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+"

myposts$links <- str_extract(myposts$attachments, url_pattern)

library(qdapRegex)

myposts$posttext <- myposts$data %>%
  rm_between('"','"',extract = TRUE)

There's more cleaning I could do, but this gets me a data frame I could use for some text analysis. Let's look at my most frequent words.

myposts$posttext <- as.character(myposts$posttext)
library(tidytext)
mypost_text <- myposts %>%
  unnest_tokens(word, posttext) %>%
  anti_join(stop_words)

## Joining, by = "word"

counts <- mypost_text %>%
  filter(author == "Sara Locatelli") %>%
  drop_na(word) %>%
  count(word, sort = TRUE)

counts

## # A tibble: 9,753 x 2
##    word         n
##    <chr>    <int>
##  1 happy     4702
##  2 birthday  4643
##  3 today's    666
##  4 song       648
##  5 head       636
##  6 day        337
##  7 post       321
##  8 009f       287
##  9 ð          287
## 10 008e       266
## # ... with 9,743 more rows

These data include all my posts, including writing "Happy birthday" on other's timelines. I also frequently post the song in my head when I wake up in the morning (over 600 times, it seems). If I wanted to remove those, and only include times I said happy or song outside of those posts, I'd need to apply the filter in a previous step. There are also some strange characters that I want to clean from the data before I do anything else with them. I can easily remove these characters and numbers with string detect, but cells that contain numbers and letters, such as "008e" won't be cut out with that function. So I'll just filter them out separately.

drop_nums <- c("008a","008e","009a","009c","009f")

counts <- counts %>%
  filter(str_detect(word, "[a-z]+"),
         !word %in% str_detect(word, "[0-9]"),
         !word %in% drop_nums)

Now I could, for instance, create a word cloud.

library(wordcloud)

counts %>%
  with(wordcloud(word, n, max.words = 50))

In addition to posting for birthdays and head songs, I talk a lot about statistics, data, analysis, and my blog. I also post about beer, concerts, friends, books, and Chicago. Let's see what happens if I mix in some sentiment analysis to my word cloud.

library(reshape2)

## 
## Attaching package: 'reshape2'

counts %>%
  inner_join(get_sentiments("bing")) %>%
  acast(word ~ sentiment, value.var = "n", fill = 0) %>%
  comparison.cloud(colors = c("red","blue"), max.words = 100)

## Joining, by = "word"

Once again, a few words are likely being misclassified - regression and plot are both negatively-valenced, but I imagine I'm using them in the statistical sense instead of the negative sense. I also apparently use "died" or "die" but I suspect in the context of, "I died laughing at this." And "happy" is huge, because it includes birthday wishes as well as instances where I talk about happiness. Some additional cleaning and exploration of the data is certainly needed. But that's enough to get started with this huge example of "me-search."

Tuesday, May 15, 2018

Happy Birthday, L. Frank Baum!

Today is the 162nd birthday of L. Frank Baum who, as author of The Wonderful Wizard of Oz and 13 other "Oz" books, had a profound effect on my childhood and may even be responsible for my love of writing.

By George Steckel - Los Angeles Times photographic archive, UCLA Library, Public Domain, Link

And two days from now, on May 17, it will be 118 years since the first book of the Oz series was published. I was obsessed with the book series as a kid, and still collect antique copies of the books (and a few other Oz collectibles).

Thursday, May 10, 2018

Keep Writing, Keep Sharing

I have some big plans coming up in the future. (Stay tuned for those.) In preparation, I was pulling together some statistics for my blog and was curious to see how my blog views have increased over time as I try to do more with social media and sharing my posts in places where target readers tend to congregate. I had a feeling my monthly readership has increased but I wasn't sure by how much.

Turns out, quite a bit. Here's a line graph I put together showing monthly readership over the last 12 months, May 2017 to April 2018.

I've jumped from a low of 3,712 last May to over 34,000 last month. There are definite dips in different months, which I know align with times I haven't been blogging as much and/or wasn't sharing my posts as widely, but things are on a definite upward trend.

Keep writing, keep sharing. And stay tuned for more!

Wednesday, May 9, 2018

Updates

Since 2005 or 2006, I've been going to a place called Beer Bistro. We hang out there after (and sometimes before) choir rehearsal and I met my husband here in 2007. I found out it's closing this week. After all the memories I have of this place, it feels like a chapter of my life is closing. I had to go back one last time, so I visited last night after work, enjoying it in my favorite way to enjoy a bar: with a beer and a book.

This morning, I received my feedback from the 2nd round of the NYC Midnight Short Story Competition - my 1st round story came in 3rd in my heat, with the top 5 in each heat advancing. In round 2, only the top 3 advance. My story came in 1st in my heat!

Round 3 assignments release Friday night and I'll have 24 hours to write a 1500 word short story. This is also the weekend I'll be working all day Saturday, so this is going to be a busy weekend. I suspect I'll be staying up late Friday to get as much writing as I can done, then work on it more during my commute Saturday morning and breaks/lunch during the day Saturday.

Tuesday, May 1, 2018

Keeping Notebooks

A friend and fellow writer commented on Facebook that she regularly teaches this essay: On Keeping a Notebook by Joan Didion. Since I've never read it, I decided to hunt it down. You can read it yourself here.

Didion shares how she uses her notebook and why it's been so useful for writing and understanding her own life:

The impulse to write things down is a peculiarly compulsive one, inexplicable to those who do not share it, useful only accidentally, only secondarily, in the way that any compulsion tries to justify itself. I suppose that it begins or does not begin in the cradle. Although I have felt compelled to write things down since I was five years old, I doubt that my daughter ever will, for she is a singularly blessed and accepting child, delighted with life exactly as life presents itself to her, unafraid to go to sleep and unafraid to wake up. Keepers of private notebooks are a different breed altogether, lonely and resistant rearrangers of things, anxious malcontents, children afflicted apparently at birth with some presentiment of loss.

How it felt to me: that is getting closer to the truth about a notebook. I sometimes delude myself about why I keep a notebook, imagine that some thrifty virtue derives from preserving everything observed. See enough and write it down, I tell myself, and then some morning when the world seems drained of wonder, some day when I am only going through the motions of doing what I am supposed to do, which is write — on that bankrupt morning I will simply open my notebook and there it will all be, a forgotten account with accumulated interest

I've kept a notebook for a long time, in which I write down random observations, ideas for blog posts, lines for stories (even if I have no idea where they go or what they refer to, but a combination of words that catch me as particularly beautiful, profound, or silly), and funny conversations I overheard - a favorite is a conversation in which one person insisted she hated fruits and the other person said that, because she has not tried all fruits, she can't categorically say she hates fruits, only the ones she's tried.

I recently started carrying a notebook with me again, and I made sure to print off a copy of Didion's essay to add to it. I also have random bits typed into my phone, which is great for when I'm on the go and need to talk-to-text something, but there's something really nice about writing things down on paper.

Friday, April 27, 2018

WRiTE CLUB 2018

This month is an awesome writing contest called WRiTE CLUB, which faces off 500-word excerpts each day over 15 days. (This is round 1 of the event. There will be new bouts and authors who advance will also have the chance to share additional excerpts.) Ten bouts have been posted so far with 5 more next week. Readers are invited to vote each day, with the chance to win prizes just for voting. And the winning author also wins prizes.

Voting on each bout is open for a week, so voting is still open for many of the bouts. You can access them here.