Re: PDF text is overlapping itself Hi Thabang, That is a different issue, that was related to the PDF cutting off due to using TEXT BOXES, there was no overlapping of text.
Active7 months ago
I have several PDF files, using a Windows application (C#), I need to find out whether the PDF files has overlapping text or not. How can I do it, is there any free third party DLLs to achieve this?All I have got now is third party DLLs which can get the text/images from a PDF.
My PDFs are full of texts and images. Here, one line of text is printed on top of another line or few texts are printed on top of some images. These kind of overlapping needs to found.
As you can see in the image, those overlapping might have occurred because of bounding boxes overlap and as well as glyphs contours overlap. So these two occurrences in the PDF needs to be found. My PDF doesn't contain any annotations. So overlapping occurs only in the content of pdf. We don't use poor-man's-bold technique for fatter glyph and if that occurs then it shoul be consider as overlapping.
There is not going to be any transparent images in the PDF, only image we might have is the logo or the digital signature at the bottom of the page, any text overlaps this should be considered as overlapping.
PDFs are not created from image(scan). From some text editor it has been created.
DaveInCaz
4,46944 gold badges2323 silver badges4646 bronze badges
Karthik JaganathanKarthik Jaganathan
3 Answers![]() ![]()
4d design software. It may be as easy as the example above or you have to implement your own reader for this.
If you have not the full control over your PDF files, you have no chance to solve your problem. The defined boxes can be transformed later on. So you have to parse the whole file, too keep track of the box position and form. Additionally some boxes may be on top of other boxes, but render without any collision on the pixel level.
Than you will run into the next problem. Each PDF implementation has different errors. So your system may render the text perfectly but not the printer of your customer.
After going on numerous forums I find that this is happening to hundreds or people over numerous brands of computers, mostly tablets and some laptops. I didn't get to see which one and I don't know if it was an Insider preview or not. Broadcom 802.11n sdio windows 10 driver. Hopefully whatever it is the Insiders can help with the resulting issue.On reinstalling Win10 Home my Wifi adapter is not working 'This device cannot start (Code 10)'. In addition the bcmfn2 device is unable to start as it is dependent on the Wifi adapter (SDVID02d0&PIDa9a6&FN5&0&0) 'Code 51'.I immediately got an update to the version I now have, but the issue persists.
Welcome to hell ;)
Each support guy will tell you that they obey the standard. The others must have implemented their PDF library faulty. Because your customers data will be confident, you cannot proof them wrong. You may find some errors with your test data, but never ever the same errors of your customer documents. Call of duty ghosts full gameplay no commentary.
Run and hide as long as you have not become the PDF expert of your company.
Here is a dirty 'general' method: render your text without the text in bitmap. render the page with your text in another bitmap, compare the area with your text. But this will need a monochrome background. But the load will be really high. But this document looks like a form. Create a form and fill out the form boxes. So you will have no problems and you will even get correct results, fills the form with another program
DaveInCaz
4,46944 gold badges2323 silver badges4646 bronze badges
crumblecrumble
The OP clarified in comments:
An approach using iText 7
As I'm more into Java, I first created a prove-of-concept in Java and ported it to .Net later.
Both for Java and .Net the line of action is the same:
iText 7 for .Net
The event listener might look like this:
This event listener can be used like this:
Text Overlapping In Pdf FormatiText 7 for Java
The event listener might look like this:
This event listener can be used like this:
Remarks
As you can see I don't store the text bounding boxes as they are but instead
i.e. slightly smaller boxes. This is done to prevent false positives which otherwise might occur for tightly set text or text with kerning applied. You may have to fine tune the margin values here.
mklmkl
60.3k1212 gold badges7474 silver badges164164 bronze badges
Hello I have a code sample that uses not free library, but I think other libraries should have similar functionality, so you may use it as the idea:Before use the following code sample please ensure that you use the latest version of the Apitron PDF Kit.
Alexander SAlexander S
Overlapping Text On Web PagesNot the answer you're looking for? Browse other questions tagged c#windowspdfoverlapping or ask your own question.Comments are closed.
|
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |