Quantcast
Channel: Hot Weekly Questions - Web Applications Stack Exchange
Viewing all articles
Browse latest Browse all 9707

Is there some way to reliably strip away all "quoted text" parts of both plaintext and HTML-based e-mails?

$
0
0

I'm trying to interpret responses from people via e-mail.

If this had been the year 1985 or something, it would be easy: I would just strip any line beginning with > , and that would be it.

However, the year is 2020 and e-mail is an absolute mess of multiple layers of madness. For one thing, many e-mails aren't plaintext at all, but instead use HTML formatting, and I very strongly doubt that these consistently use <blockquote>s for quotes. I fear that there are numerous different styles of quotes and markup used for HTML e-mail quotes.

Even plaintext e-mails may not consistently use > quotes.

This immediately strikes me as something I do not wish to sit and attempt to code on my own. Is there some existing, reliable PHP library/function for this task?

I already use MailMimeParse, but it doesn't appear to have this feature. Its job appears to be all about parsing the MIME blobs into plaintext/HTML bodies -- not to do anything further with these, once properly extracted.

To make it crystal clear: I'm trying to turn this:

I shall have the business proposal ready tomorrow.OK. Great.

Into:

OK. Great.

And:

<whateverunknownmarkup>I shall have the business proposal ready tomorrow.</whateverunknownmarkup>

OK. Great.

Into:

OK. Great.

Of course, those are just basic examples. These can be nested in many levels, etc.

I don't know how the most popular e-mail clients and e-mail services do this, but it feels like yet another task which has been solved in private a million times but never released to the public.


Viewing all articles
Browse latest Browse all 9707

Latest Images

Trending Articles



Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>