Word Counts

Discussion related to "Everything" 1.5 Alpha.
Post Reply
ChrisGreaves
Posts: 678
Joined: Wed Jan 05, 2022 9:29 pm

Word Counts

Post by ChrisGreaves »

This is not a problem in need of a solution.
I had cause to deliver a report on five articles Ive been commissioned to write.
EverythingWord2003VBA
554667682Diesel-Electrics
499497497Riding on the Footplate from Southern Cross to Ghooli and Back
330539559The Longest Wheat-Bin in the Southern Hemisphere
2,8512,7652,802Three Times a Year to and from School in Perth on the Kalgoorlie Express
4,2344,4684,5404,387
0.9651.0181.034
Four titles appear to the right.
I measured "word-count" in three ways:-
(1) By including the Word-Count in column labels in Everything: Right-click the column-heading bar, Document, Content, Word-count.
(2) By inspecting within Word2003 (File, Properties, Statistics)
(3) By a cute bit of VBA code that calculates readability statistics on a document.

The sums of word counts average out to 4,387 words and vary between 0.965 of the average and 1.034 of the average.

The VBA methods are always worth investigating, as are methods for calculating sentence-counts. If a sentence is defined as "terminated by a period", then “Dr.”, “Mr.”, “Mrs.” and "etc." affect the sentence count. Likewise for words: Are words delimited by spaces (usually ‘yes’), by hyphens (maybe not), and so on. The programmer makes a decision and lives with it.

I thought to post this in case anyone starts questioning Everything’s means of calculating Word-Count, or worse, stakes the company’s future on Word-Counts(grin!).
Cheers, Chris
horst.epp
Posts: 1430
Joined: Fri Apr 04, 2014 3:24 pm

Re: Word Counts

Post by horst.epp »

So the few people which are interested on Word counts
should select the best method to get a larger number :)
void
Developer
Posts: 16428
Joined: Fri Oct 16, 2009 11:31 pm

Re: Word Counts

Post by void »

Thank you for the issue report ChrisGreaves,

It's a bug with Office.

The word count is also incorrect under Right click -> Properties -> Details -> Word Count
(Everything is using the same word count value)

Please use the word count from Everything as a guide only.
ChrisGreaves
Posts: 678
Joined: Wed Jan 05, 2022 9:29 pm

Re: Word Counts

Post by ChrisGreaves »

void wrote: Thu Feb 29, 2024 4:49 am Thank you for the issue report ChrisGreaves, It's a bug with Office. The word count is also incorrect under Right click -> Properties -> Details -> Word Count. (Everything is using the same word count value) Please use the word count from Everything as a guide only.
Thank you David.
[pedant]Strictly speaking it's a bug in Office2003, and I, for one, have no plans to move on from Office2003 until they have fixed all the bugs :lol: [/pedant]

Regardless of who is wrong and who is right, my point is that different programming code can and will deliver slightly different word-counts (and other counts).
I think that no one is "right" and no one is "wrong". I would trust any device that returns a result that is consistent with other results.
Cheers, Chris
void
Developer
Posts: 16428
Joined: Fri Oct 16, 2009 11:31 pm

Re: Word Counts

Post by void »

The bug is present in Office 2013 and 2016 too.
Unsure if it is still present in Office 2019..

I will add some functionality to Everything to get the correct word count..
It will look something like: add-column:a a:=WORDCOUNT($content:)



You can find the correct word count in Microsoft Word from File -> Properties -> Statistics
Why the Office property handler doesn't use this value is beyond me..
void
Developer
Posts: 16428
Joined: Fri Oct 16, 2009 11:31 pm

Re: Word Counts

Post by void »

Everything 1.5.0.1370a adds a WORDCOUNT() formula function.

The following search will now work as expected:

ext:doc;docx add-column:a a-label:="Word Count" a:=WORDCOUNT($content:)

This word count column will show the number of words from the content.
It may differ slightly to what Word reports from File -> Properties -> Statistics (typically, it is the same -but may ignore comments and other hidden text)
It should be more accurate than the stock word count property.



WORDCOUNT(text) will return the number of words in the specified text.

A word is one or more alpha-numeric characters or punctuation.



Please note: $content: will load the entire file text content.
This may take a long time.
ChrisGreaves
Posts: 678
Joined: Wed Jan 05, 2022 9:29 pm

Blank Word Counts

Post by ChrisGreaves »

void wrote: Fri Mar 01, 2024 5:39 amext:doc;docx add-column:a a-label:="Word Count" a:=WORDCOUNT($content:)
Hi void. I'm back here because last week I uncovered a set of what I thought were Word2003 documents with blank word counts. Where the files were apparently empty of text I had expected a word count of zero.
But then I found some of the documents contained what I think of as regular text.
I have attached " .. Episode 45 ..." as an example document.

First off the file is a mere 2 KB, which suggests that I have created the document from a web page by selecting the web page, Copy, then Paste into Notepad (to drop images and hyperlinks) and saving it as a file with an extension DOC. That is sufficient for my purposes when i am analyzing gobs of text from web pages.
word-count.jpg
word-count.jpg (243.66 KiB) Viewed 486 times
In this image the blank column "Word Count" caught my eye and a forum search brought me back to this thread.
word-count-function.jpg
word-count-function.jpg (78.72 KiB) Viewed 486 times
When I tried the function method I see the extra column (extreme RHS) but the word-count is still blank.

Even if this is a bug, it is not a big issue for me; I am still analyzing documents word by word.
But it might suggest an alternate way for Everything to express a truer word count?

I am so aware that I am pushing boundaries here; these files are not properly MSWord documents; they are text files (as in Notepad) masquerading as Word documents.
Thanks, Chris
Attachments
Episode45ToCoinaPhraseand.doc
(1.78 KiB) Downloaded 5 times
void
Developer
Posts: 16428
Joined: Fri Oct 16, 2009 11:31 pm

Re: Word Counts

Post by void »

The iFilter fails for these .doc files as they are really .txt files.

$content: will be empty to Everything.

Please change the extension of these files to .txt
-or-
Convert them to doc files.
ChrisGreaves
Posts: 678
Joined: Wed Jan 05, 2022 9:29 pm

Re: Word Counts

Post by ChrisGreaves »

void wrote: Tue Aug 13, 2024 10:17 pm Please change the extension of these files to .txt
-or-
Convert them to doc files.
Understood. Thanks Void.
Cheers, Chris
Post Reply