Commit eabb0329 authored by Zsolt Kajtar's avatar Zsolt Kajtar Committed by Helge Deller
Browse files

fbdev: Refactoring the fbcon packed pixel drawing routines



The original version duplicated more or less the same algorithms for
both system and i/o memory.

In this version the drawing algorithms (copy/fill/blit) are separate
from the memory access (system and i/o). The two parts are getting
combined in the loadable module sources. This also makes it more robust
against wrong memory access type or alignment mistakes as there's no
direct pointer access or arithmetic in the algorithm sources anymore.

Due to liberal use of inlining the compiled result is a single function
in all 6 cases, without unnecessary function calls. Unlike earlier the
use of macros could be minimized as apparently both gcc and clang is
capable now to do the same with inline functions just as well.

What wasn't quite the same in the two variants is the support for pixel
order reversing. This version is capable to do that for both system and
I/O memory, and not only for the latter. As demand for low bits per
pixel modes isn't high there's a configuration option to enable this
separately for the CFB and SYS modules.

The pixel reversing algorithm is different than earlier and was designed
so that it can take advantage of bit order reversing instructions on
architectures which have them. And even for higher bits per pixel modes
like four bpp.

One of the shortcomings of the earlier version was the incomplete
support for foreign endian framebuffers. Now all three drawing
algorithms produce correct output on both endians with native and
foreign framebuffers. This is one of the important differences even if
otherwise the algorithms don't look too different than before.

All three routines work now with aligned native word accesses. As a
consequence blitting isn't limited to 32 bits on 64 bit architectures as
it was before.

The old routines silently assumed that rows are a multiple of the word
size. Due to how the new routines function this isn't a requirement any
more and access will be done aligned regardless. However if the
framebuffer is configured like that then some of the fast paths won't be
available.

As this code is supposed to be running on all supported architectures it
wasn't optimized for a particular one. That doesn't mean I haven't
looked at the disassembly. That's where I noticed that it isn't a good
idea to use the fallback bitreversing code for example.

The low bits per pixel modes should be faster than before as the new
routines can blit 4 pixels at a time.

On the higher bits per pixel modes I retained the specialized aligned
routines so it should be more or less the same, except on 64 bit
architectures. There the blitting word size is double now which means 32
BPP isn't done a single pixel a time now.

The code was tested on x86, amd64, mips32 and mips64. The latter two in
big endian configuration. Originally thought I can get away with the
first two, but with such bit twisting code byte ordering is tricky and
not really possible to get right without actually verifying it.

While writing such routines isn't rocket science a lot of time was spent
on making sure that pixel ordering, foreign byte order, various bits per
pixels, cpu endianness and word size will give the expected result in
all sorts of combinations without making it overly complicated or full
with special cases.

Signed-off-by: default avatarZsolt Kajtar <soci@c64.rulez.org>
Signed-off-by: default avatarHelge Deller <deller@gmx.de>
parent 1a78d9a3
Loading
Loading
Loading
Loading
+9 −1
Original line number Diff line number Diff line
@@ -69,7 +69,7 @@ config FB_CFB_REV_PIXELS_IN_BYTE
	bool
	depends on FB_CORE
	help
	  Allow generic frame-buffer functions to work on displays with 1, 2
	  Allow I/O memory frame-buffer functions to work on displays with 1, 2
	  and 4 bits per pixel depths which has opposite order of pixels in
	  byte order to bytes in long order.

@@ -97,6 +97,14 @@ config FB_SYS_IMAGEBLIT
	  blitting. This is used by drivers that don't provide their own
	  (accelerated) version and the framebuffer is in system RAM.

config FB_SYS_REV_PIXELS_IN_BYTE
	bool
	depends on FB_CORE
	help
	  Allow virtual memory frame-buffer functions to work on displays with 1, 2
	  and 4 bits per pixel depths which has opposite order of pixels in
	  byte order to bytes in long order.

config FB_PROVIDE_GET_FB_UNMAPPED_AREA
	bool
	depends on FB
+11 −417
Original line number Diff line number Diff line
// SPDX-License-Identifier: GPL-2.0-only
/*
 *  Generic function for frame buffer with packed pixels of any depth.
 *
 *      Copyright (C)  1999-2005 James Simmons <jsimmons@www.infradead.org>
 *
 *  This file is subject to the terms and conditions of the GNU General Public
 *  License.  See the file COPYING in the main directory of this archive for
 *  more details.
 *
 * NOTES:
 *
 *  This is for cfb packed pixels. Iplan and such are incorporated in the
 *  drivers that need them.
 *
 *  FIXME
 *
 *  Also need to add code to deal with cards endians that are different than
 *  the native cpu endians. I also need to deal with MSB position in the word.
 *
 *  The two functions or copying forward and backward could be split up like
 *  the ones for filling, i.e. in aligned and unaligned versions. This would
 *  help moving some redundant computations and branches out of the loop, too.
 *	Copyright (C)  2025 Zsolt Kajtar (soci@c64.rulez.org)
 */

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/string.h>
#include <linux/fb.h>
#include <linux/bitrev.h>
#include <asm/types.h>
#include <asm/io.h>
#include "fb_draw.h"

#if BITS_PER_LONG == 32
#  define FB_WRITEL fb_writel
#  define FB_READL  fb_readl
#else
#  define FB_WRITEL fb_writeq
#  define FB_READL  fb_readq
#endif

    /*
     *  Generic bitwise copy algorithm
     */

static void
bitcpy(struct fb_info *p, unsigned long __iomem *dst, unsigned dst_idx,
		const unsigned long __iomem *src, unsigned src_idx, int bits,
		unsigned n, u32 bswapmask)
{
	unsigned long first, last;
	int const shift = dst_idx-src_idx;

#if 0
	/*
	 * If you suspect bug in this function, compare it with this simple
	 * memmove implementation.
	 */
	memmove((char *)dst + ((dst_idx & (bits - 1))) / 8,
		(char *)src + ((src_idx & (bits - 1))) / 8, n / 8);
	return;
#ifdef CONFIG_FB_CFB_REV_PIXELS_IN_BYTE
#define FB_REV_PIXELS_IN_BYTE
#endif

	first = fb_shifted_pixels_mask_long(p, dst_idx, bswapmask);
	last = ~fb_shifted_pixels_mask_long(p, (dst_idx+n) % bits, bswapmask);

	if (!shift) {
		// Same alignment for source and dest

		if (dst_idx+n <= bits) {
			// Single word
			if (last)
				first &= last;
			FB_WRITEL( comp( FB_READL(src), FB_READL(dst), first), dst);
		} else {
			// Multiple destination words

			// Leading bits
			if (first != ~0UL) {
				FB_WRITEL( comp( FB_READL(src), FB_READL(dst), first), dst);
				dst++;
				src++;
				n -= bits - dst_idx;
			}

			// Main chunk
			n /= bits;
			while (n >= 8) {
				FB_WRITEL(FB_READL(src++), dst++);
				FB_WRITEL(FB_READL(src++), dst++);
				FB_WRITEL(FB_READL(src++), dst++);
				FB_WRITEL(FB_READL(src++), dst++);
				FB_WRITEL(FB_READL(src++), dst++);
				FB_WRITEL(FB_READL(src++), dst++);
				FB_WRITEL(FB_READL(src++), dst++);
				FB_WRITEL(FB_READL(src++), dst++);
				n -= 8;
			}
			while (n--)
				FB_WRITEL(FB_READL(src++), dst++);

			// Trailing bits
			if (last)
				FB_WRITEL( comp( FB_READL(src), FB_READL(dst), last), dst);
		}
	} else {
		/* Different alignment for source and dest */
		unsigned long d0, d1;
		int m;

		int const left = shift & (bits - 1);
		int const right = -shift & (bits - 1);

		if (dst_idx+n <= bits) {
			// Single destination word
			if (last)
				first &= last;
			d0 = FB_READL(src);
			d0 = fb_rev_pixels_in_long(d0, bswapmask);
			if (shift > 0) {
				// Single source word
				d0 <<= left;
			} else if (src_idx+n <= bits) {
				// Single source word
				d0 >>= right;
			} else {
				// 2 source words
				d1 = FB_READL(src + 1);
				d1 = fb_rev_pixels_in_long(d1, bswapmask);
				d0 = d0 >> right | d1 << left;
			}
			d0 = fb_rev_pixels_in_long(d0, bswapmask);
			FB_WRITEL(comp(d0, FB_READL(dst), first), dst);
		} else {
			// Multiple destination words
			/** We must always remember the last value read, because in case
			SRC and DST overlap bitwise (e.g. when moving just one pixel in
			1bpp), we always collect one full long for DST and that might
			overlap with the current long from SRC. We store this value in
			'd0'. */
			d0 = FB_READL(src++);
			d0 = fb_rev_pixels_in_long(d0, bswapmask);
			// Leading bits
			if (shift > 0) {
				// Single source word
				d1 = d0;
				d0 <<= left;
				n -= bits - dst_idx;
			} else {
				// 2 source words
				d1 = FB_READL(src++);
				d1 = fb_rev_pixels_in_long(d1, bswapmask);

				d0 = d0 >> right | d1 << left;
				n -= bits - dst_idx;
			}
			d0 = fb_rev_pixels_in_long(d0, bswapmask);
			FB_WRITEL(comp(d0, FB_READL(dst), first), dst);
			d0 = d1;
			dst++;

			// Main chunk
			m = n % bits;
			n /= bits;
			while ((n >= 4) && !bswapmask) {
				d1 = FB_READL(src++);
				FB_WRITEL(d0 >> right | d1 << left, dst++);
				d0 = d1;
				d1 = FB_READL(src++);
				FB_WRITEL(d0 >> right | d1 << left, dst++);
				d0 = d1;
				d1 = FB_READL(src++);
				FB_WRITEL(d0 >> right | d1 << left, dst++);
				d0 = d1;
				d1 = FB_READL(src++);
				FB_WRITEL(d0 >> right | d1 << left, dst++);
				d0 = d1;
				n -= 4;
			}
			while (n--) {
				d1 = FB_READL(src++);
				d1 = fb_rev_pixels_in_long(d1, bswapmask);
				d0 = d0 >> right | d1 << left;
				d0 = fb_rev_pixels_in_long(d0, bswapmask);
				FB_WRITEL(d0, dst++);
				d0 = d1;
			}

			// Trailing bits
			if (m) {
				if (m <= bits - right) {
					// Single source word
					d0 >>= right;
				} else {
					// 2 source words
					d1 = FB_READL(src);
					d1 = fb_rev_pixels_in_long(d1,
								bswapmask);
					d0 = d0 >> right | d1 << left;
				}
				d0 = fb_rev_pixels_in_long(d0, bswapmask);
				FB_WRITEL(comp(d0, FB_READL(dst), last), dst);
			}
		}
	}
}

    /*
     *  Generic bitwise copy algorithm, operating backward
     */

static void
bitcpy_rev(struct fb_info *p, unsigned long __iomem *dst, unsigned dst_idx,
		const unsigned long __iomem *src, unsigned src_idx, int bits,
		unsigned n, u32 bswapmask)
{
	unsigned long first, last;
	int shift;

#if 0
	/*
	 * If you suspect bug in this function, compare it with this simple
	 * memmove implementation.
	 */
	memmove((char *)dst + ((dst_idx & (bits - 1))) / 8,
		(char *)src + ((src_idx & (bits - 1))) / 8, n / 8);
	return;
#endif

	dst += (dst_idx + n - 1) / bits;
	src += (src_idx + n - 1) / bits;
	dst_idx = (dst_idx + n - 1) % bits;
	src_idx = (src_idx + n - 1) % bits;

	shift = dst_idx-src_idx;

	first = ~fb_shifted_pixels_mask_long(p, (dst_idx + 1) % bits, bswapmask);
	last = fb_shifted_pixels_mask_long(p, (bits + dst_idx + 1 - n) % bits, bswapmask);

	if (!shift) {
		// Same alignment for source and dest

		if ((unsigned long)dst_idx+1 >= n) {
			// Single word
			if (first)
				last &= first;
			FB_WRITEL( comp( FB_READL(src), FB_READL(dst), last), dst);
		} else {
			// Multiple destination words

			// Leading bits
			if (first) {
				FB_WRITEL( comp( FB_READL(src), FB_READL(dst), first), dst);
				dst--;
				src--;
				n -= dst_idx+1;
			}

			// Main chunk
			n /= bits;
			while (n >= 8) {
				FB_WRITEL(FB_READL(src--), dst--);
				FB_WRITEL(FB_READL(src--), dst--);
				FB_WRITEL(FB_READL(src--), dst--);
				FB_WRITEL(FB_READL(src--), dst--);
				FB_WRITEL(FB_READL(src--), dst--);
				FB_WRITEL(FB_READL(src--), dst--);
				FB_WRITEL(FB_READL(src--), dst--);
				FB_WRITEL(FB_READL(src--), dst--);
				n -= 8;
			}
			while (n--)
				FB_WRITEL(FB_READL(src--), dst--);

			// Trailing bits
			if (last != -1UL)
				FB_WRITEL( comp( FB_READL(src), FB_READL(dst), last), dst);
		}
	} else {
		// Different alignment for source and dest
		unsigned long d0, d1;
		int m;

		int const left = shift & (bits-1);
		int const right = -shift & (bits-1);

		if ((unsigned long)dst_idx+1 >= n) {
			// Single destination word
			if (first)
				last &= first;
			d0 = FB_READL(src);
			if (shift < 0) {
				// Single source word
				d0 >>= right;
			} else if (1+(unsigned long)src_idx >= n) {
				// Single source word
				d0 <<= left;
			} else {
				// 2 source words
				d1 = FB_READL(src - 1);
				d1 = fb_rev_pixels_in_long(d1, bswapmask);
				d0 = d0 << left | d1 >> right;
			}
			d0 = fb_rev_pixels_in_long(d0, bswapmask);
			FB_WRITEL(comp(d0, FB_READL(dst), last), dst);
		} else {
			// Multiple destination words
			/** We must always remember the last value read, because in case
			SRC and DST overlap bitwise (e.g. when moving just one pixel in
			1bpp), we always collect one full long for DST and that might
			overlap with the current long from SRC. We store this value in
			'd0'. */

			d0 = FB_READL(src--);
			d0 = fb_rev_pixels_in_long(d0, bswapmask);
			// Leading bits
			if (shift < 0) {
				// Single source word
				d1 = d0;
				d0 >>= right;
			} else {
				// 2 source words
				d1 = FB_READL(src--);
				d1 = fb_rev_pixels_in_long(d1, bswapmask);
				d0 = d0 << left | d1 >> right;
			}
			d0 = fb_rev_pixels_in_long(d0, bswapmask);
			if (!first)
				FB_WRITEL(d0, dst);
			else
				FB_WRITEL(comp(d0, FB_READL(dst), first), dst);
			d0 = d1;
			dst--;
			n -= dst_idx+1;

			// Main chunk
			m = n % bits;
			n /= bits;
			while ((n >= 4) && !bswapmask) {
				d1 = FB_READL(src--);
				FB_WRITEL(d0 << left | d1 >> right, dst--);
				d0 = d1;
				d1 = FB_READL(src--);
				FB_WRITEL(d0 << left | d1 >> right, dst--);
				d0 = d1;
				d1 = FB_READL(src--);
				FB_WRITEL(d0 << left | d1 >> right, dst--);
				d0 = d1;
				d1 = FB_READL(src--);
				FB_WRITEL(d0 << left | d1 >> right, dst--);
				d0 = d1;
				n -= 4;
			}
			while (n--) {
				d1 = FB_READL(src--);
				d1 = fb_rev_pixels_in_long(d1, bswapmask);
				d0 = d0 << left | d1 >> right;
				d0 = fb_rev_pixels_in_long(d0, bswapmask);
				FB_WRITEL(d0, dst--);
				d0 = d1;
			}

			// Trailing bits
			if (m) {
				if (m <= bits - left) {
					// Single source word
					d0 <<= left;
				} else {
					// 2 source words
					d1 = FB_READL(src);
					d1 = fb_rev_pixels_in_long(d1,
								bswapmask);
					d0 = d0 << left | d1 >> right;
				}
				d0 = fb_rev_pixels_in_long(d0, bswapmask);
				FB_WRITEL(comp(d0, FB_READL(dst), last), dst);
			}
		}
	}
}
#include "cfbmem.h"
#include "fb_copyarea.h"

void cfb_copyarea(struct fb_info *p, const struct fb_copyarea *area)
{
	u32 dx = area->dx, dy = area->dy, sx = area->sx, sy = area->sy;
	u32 height = area->height, width = area->width;
	unsigned int const bits_per_line = p->fix.line_length * 8u;
	unsigned long __iomem *base = NULL;
	int bits = BITS_PER_LONG, bytes = bits >> 3;
	unsigned dst_idx = 0, src_idx = 0, rev_copy = 0;
	u32 bswapmask = fb_compute_bswapmask(p);

	if (p->state != FBINFO_STATE_RUNNING)
		return;

	if (p->flags & FBINFO_VIRTFB)
		fb_warn_once(p, "Framebuffer is not in I/O address space.");

	/* if the beginning of the target area might overlap with the end of
	the source area, be have to copy the area reverse. */
	if ((dy == sy && dx > sx) || (dy > sy)) {
		dy += height;
		sy += height;
		rev_copy = 1;
	}

	// split the base of the framebuffer into a long-aligned address and the
	// index of the first bit
	base = (unsigned long __iomem *)((unsigned long)p->screen_base & ~(bytes-1));
	dst_idx = src_idx = 8*((unsigned long)p->screen_base & (bytes-1));
	// add offset of source and target area
	dst_idx += dy*bits_per_line + dx*p->var.bits_per_pixel;
	src_idx += sy*bits_per_line + sx*p->var.bits_per_pixel;
		fb_warn_once(p, "%s: framebuffer is not in I/O address space.\n", __func__);

	if (p->fbops->fb_sync)
		p->fbops->fb_sync(p);

	if (rev_copy) {
		while (height--) {
			dst_idx -= bits_per_line;
			src_idx -= bits_per_line;
			bitcpy_rev(p, base + (dst_idx / bits), dst_idx % bits,
				base + (src_idx / bits), src_idx % bits, bits,
				width*p->var.bits_per_pixel, bswapmask);
		}
	} else {
		while (height--) {
			bitcpy(p, base + (dst_idx / bits), dst_idx % bits,
				base + (src_idx / bits), src_idx % bits, bits,
				width*p->var.bits_per_pixel, bswapmask);
			dst_idx += bits_per_line;
			src_idx += bits_per_line;
	fb_copyarea(p, area);
}
	}
}

EXPORT_SYMBOL(cfb_copyarea);

MODULE_AUTHOR("James Simmons <jsimmons@users.sf.net>");
MODULE_DESCRIPTION("Generic software accelerated copyarea");
MODULE_AUTHOR("Zsolt Kajtar <soci@c64.rulez.org>");
MODULE_DESCRIPTION("I/O memory packed pixel framebuffer area copy");
MODULE_LICENSE("GPL");
+11 −351
Original line number Diff line number Diff line
// SPDX-License-Identifier: GPL-2.0-only
/*
 *  Generic fillrect for frame buffers with packed pixels of any depth.
 *
 *      Copyright (C)  2000 James Simmons (jsimmons@linux-fbdev.org)
 *
 *  This file is subject to the terms and conditions of the GNU General Public
 *  License.  See the file COPYING in the main directory of this archive for
 *  more details.
 *
 * NOTES:
 *
 *  Also need to add code to deal with cards endians that are different than
 *  the native cpu endians. I also need to deal with MSB position in the word.
 *
 *	Copyright (C)  2025 Zsolt Kajtar (soci@c64.rulez.org)
 */
#include <linux/module.h>
#include <linux/string.h>
#include <linux/fb.h>
#include <linux/bitrev.h>
#include <asm/types.h>
#include "fb_draw.h"

#if BITS_PER_LONG == 32
#  define FB_WRITEL fb_writel
#  define FB_READL  fb_readl
#else
#  define FB_WRITEL fb_writeq
#  define FB_READL  fb_readq
#ifdef CONFIG_FB_CFB_REV_PIXELS_IN_BYTE
#define FB_REV_PIXELS_IN_BYTE
#endif

    /*
     *  Aligned pattern fill using 32/64-bit memory accesses
     */

static void
bitfill_aligned(struct fb_info *p, unsigned long __iomem *dst, int dst_idx,
		unsigned long pat, unsigned n, int bits, u32 bswapmask)
{
	unsigned long first, last;

	if (!n)
		return;

	first = fb_shifted_pixels_mask_long(p, dst_idx, bswapmask);
	last = ~fb_shifted_pixels_mask_long(p, (dst_idx+n) % bits, bswapmask);

	if (dst_idx+n <= bits) {
		// Single word
		if (last)
			first &= last;
		FB_WRITEL(comp(pat, FB_READL(dst), first), dst);
	} else {
		// Multiple destination words

		// Leading bits
		if (first!= ~0UL) {
			FB_WRITEL(comp(pat, FB_READL(dst), first), dst);
			dst++;
			n -= bits - dst_idx;
		}

		// Main chunk
		n /= bits;
		while (n >= 8) {
			FB_WRITEL(pat, dst++);
			FB_WRITEL(pat, dst++);
			FB_WRITEL(pat, dst++);
			FB_WRITEL(pat, dst++);
			FB_WRITEL(pat, dst++);
			FB_WRITEL(pat, dst++);
			FB_WRITEL(pat, dst++);
			FB_WRITEL(pat, dst++);
			n -= 8;
		}
		while (n--)
			FB_WRITEL(pat, dst++);

		// Trailing bits
		if (last)
			FB_WRITEL(comp(pat, FB_READL(dst), last), dst);
	}
}


    /*
     *  Unaligned generic pattern fill using 32/64-bit memory accesses
     *  The pattern must have been expanded to a full 32/64-bit value
     *  Left/right are the appropriate shifts to convert to the pattern to be
     *  used for the next 32/64-bit word
     */

static void
bitfill_unaligned(struct fb_info *p, unsigned long __iomem *dst, int dst_idx,
		  unsigned long pat, int left, int right, unsigned n, int bits)
{
	unsigned long first, last;

	if (!n)
		return;

	first = FB_SHIFT_HIGH(p, ~0UL, dst_idx);
	last = ~(FB_SHIFT_HIGH(p, ~0UL, (dst_idx+n) % bits));

	if (dst_idx+n <= bits) {
		// Single word
		if (last)
			first &= last;
		FB_WRITEL(comp(pat, FB_READL(dst), first), dst);
	} else {
		// Multiple destination words
		// Leading bits
		if (first) {
			FB_WRITEL(comp(pat, FB_READL(dst), first), dst);
			dst++;
			pat = pat << left | pat >> right;
			n -= bits - dst_idx;
		}

		// Main chunk
		n /= bits;
		while (n >= 4) {
			FB_WRITEL(pat, dst++);
			pat = pat << left | pat >> right;
			FB_WRITEL(pat, dst++);
			pat = pat << left | pat >> right;
			FB_WRITEL(pat, dst++);
			pat = pat << left | pat >> right;
			FB_WRITEL(pat, dst++);
			pat = pat << left | pat >> right;
			n -= 4;
		}
		while (n--) {
			FB_WRITEL(pat, dst++);
			pat = pat << left | pat >> right;
		}

		// Trailing bits
		if (last)
			FB_WRITEL(comp(pat, FB_READL(dst), last), dst);
	}
}

    /*
     *  Aligned pattern invert using 32/64-bit memory accesses
     */
static void
bitfill_aligned_rev(struct fb_info *p, unsigned long __iomem *dst,
		    int dst_idx, unsigned long pat, unsigned n, int bits,
		    u32 bswapmask)
{
	unsigned long val = pat, dat;
	unsigned long first, last;

	if (!n)
		return;

	first = fb_shifted_pixels_mask_long(p, dst_idx, bswapmask);
	last = ~fb_shifted_pixels_mask_long(p, (dst_idx+n) % bits, bswapmask);

	if (dst_idx+n <= bits) {
		// Single word
		if (last)
			first &= last;
		dat = FB_READL(dst);
		FB_WRITEL(comp(dat ^ val, dat, first), dst);
	} else {
		// Multiple destination words
		// Leading bits
		if (first!=0UL) {
			dat = FB_READL(dst);
			FB_WRITEL(comp(dat ^ val, dat, first), dst);
			dst++;
			n -= bits - dst_idx;
		}

		// Main chunk
		n /= bits;
		while (n >= 8) {
			FB_WRITEL(FB_READL(dst) ^ val, dst);
			dst++;
			FB_WRITEL(FB_READL(dst) ^ val, dst);
			dst++;
			FB_WRITEL(FB_READL(dst) ^ val, dst);
			dst++;
			FB_WRITEL(FB_READL(dst) ^ val, dst);
			dst++;
			FB_WRITEL(FB_READL(dst) ^ val, dst);
			dst++;
			FB_WRITEL(FB_READL(dst) ^ val, dst);
			dst++;
			FB_WRITEL(FB_READL(dst) ^ val, dst);
			dst++;
			FB_WRITEL(FB_READL(dst) ^ val, dst);
			dst++;
			n -= 8;
		}
		while (n--) {
			FB_WRITEL(FB_READL(dst) ^ val, dst);
			dst++;
		}
		// Trailing bits
		if (last) {
			dat = FB_READL(dst);
			FB_WRITEL(comp(dat ^ val, dat, last), dst);
		}
	}
}


    /*
     *  Unaligned generic pattern invert using 32/64-bit memory accesses
     *  The pattern must have been expanded to a full 32/64-bit value
     *  Left/right are the appropriate shifts to convert to the pattern to be
     *  used for the next 32/64-bit word
     */

static void
bitfill_unaligned_rev(struct fb_info *p, unsigned long __iomem *dst,
		      int dst_idx, unsigned long pat, int left, int right,
		      unsigned n, int bits)
{
	unsigned long first, last, dat;

	if (!n)
		return;

	first = FB_SHIFT_HIGH(p, ~0UL, dst_idx);
	last = ~(FB_SHIFT_HIGH(p, ~0UL, (dst_idx+n) % bits));

	if (dst_idx+n <= bits) {
		// Single word
		if (last)
			first &= last;
		dat = FB_READL(dst);
		FB_WRITEL(comp(dat ^ pat, dat, first), dst);
	} else {
		// Multiple destination words

		// Leading bits
		if (first != 0UL) {
			dat = FB_READL(dst);
			FB_WRITEL(comp(dat ^ pat, dat, first), dst);
			dst++;
			pat = pat << left | pat >> right;
			n -= bits - dst_idx;
		}

		// Main chunk
		n /= bits;
		while (n >= 4) {
			FB_WRITEL(FB_READL(dst) ^ pat, dst);
			dst++;
			pat = pat << left | pat >> right;
			FB_WRITEL(FB_READL(dst) ^ pat, dst);
			dst++;
			pat = pat << left | pat >> right;
			FB_WRITEL(FB_READL(dst) ^ pat, dst);
			dst++;
			pat = pat << left | pat >> right;
			FB_WRITEL(FB_READL(dst) ^ pat, dst);
			dst++;
			pat = pat << left | pat >> right;
			n -= 4;
		}
		while (n--) {
			FB_WRITEL(FB_READL(dst) ^ pat, dst);
			dst++;
			pat = pat << left | pat >> right;
		}

		// Trailing bits
		if (last) {
			dat = FB_READL(dst);
			FB_WRITEL(comp(dat ^ pat, dat, last), dst);
		}
	}
}
#include "cfbmem.h"
#include "fb_fillrect.h"

void cfb_fillrect(struct fb_info *p, const struct fb_fillrect *rect)
{
	unsigned long pat, pat2, fg;
	unsigned long width = rect->width, height = rect->height;
	int bits = BITS_PER_LONG, bytes = bits >> 3;
	u32 bpp = p->var.bits_per_pixel;
	unsigned long __iomem *dst;
	int dst_idx, left;

	if (p->state != FBINFO_STATE_RUNNING)
		return;

	if (p->flags & FBINFO_VIRTFB)
		fb_warn_once(p, "Framebuffer is not in I/O address space.");
		fb_warn_once(p, "%s: framebuffer is not in I/O address space.\n", __func__);

	if (p->fix.visual == FB_VISUAL_TRUECOLOR ||
	    p->fix.visual == FB_VISUAL_DIRECTCOLOR )
		fg = ((u32 *) (p->pseudo_palette))[rect->color];
	else
		fg = rect->color;

	pat = pixel_to_pat(bpp, fg);

	dst = (unsigned long __iomem *)((unsigned long)p->screen_base & ~(bytes-1));
	dst_idx = ((unsigned long)p->screen_base & (bytes - 1))*8;
	dst_idx += rect->dy*p->fix.line_length*8+rect->dx*bpp;
	/* FIXME For now we support 1-32 bpp only */
	left = bits % bpp;
	if (p->fbops->fb_sync)
		p->fbops->fb_sync(p);
	if (!left) {
		u32 bswapmask = fb_compute_bswapmask(p);
		void (*fill_op32)(struct fb_info *p,
				  unsigned long __iomem *dst, int dst_idx,
		                  unsigned long pat, unsigned n, int bits,
				  u32 bswapmask) = NULL;

		switch (rect->rop) {
		case ROP_XOR:
			fill_op32 = bitfill_aligned_rev;
			break;
		case ROP_COPY:
			fill_op32 = bitfill_aligned;
			break;
		default:
			printk( KERN_ERR "cfb_fillrect(): unknown rop, defaulting to ROP_COPY\n");
			fill_op32 = bitfill_aligned;
			break;
		}
		while (height--) {
			dst += dst_idx >> (ffs(bits) - 1);
			dst_idx &= (bits - 1);
			fill_op32(p, dst, dst_idx, pat, width*bpp, bits,
				  bswapmask);
			dst_idx += p->fix.line_length*8;
		}
	} else {
		int right, r;
		void (*fill_op)(struct fb_info *p, unsigned long __iomem *dst,
				int dst_idx, unsigned long pat, int left,
				int right, unsigned n, int bits) = NULL;
#ifdef __LITTLE_ENDIAN
		right = left;
		left = bpp - right;
#else
		right = bpp - left;
#endif
		switch (rect->rop) {
		case ROP_XOR:
			fill_op = bitfill_unaligned_rev;
			break;
		case ROP_COPY:
			fill_op = bitfill_unaligned;
			break;
		default:
			printk(KERN_ERR "cfb_fillrect(): unknown rop, defaulting to ROP_COPY\n");
			fill_op = bitfill_unaligned;
			break;
	fb_fillrect(p, rect);
}
		while (height--) {
			dst += dst_idx / bits;
			dst_idx &= (bits - 1);
			r = dst_idx % bpp;
			/* rotate pattern to the correct start position */
			pat2 = le_long_to_cpu(rolx(cpu_to_le_long(pat), r, bpp));
			fill_op(p, dst, dst_idx, pat2, left, right,
				width*bpp, bits);
			dst_idx += p->fix.line_length*8;
		}
	}
}

EXPORT_SYMBOL(cfb_fillrect);

MODULE_AUTHOR("James Simmons <jsimmons@users.sf.net>");
MODULE_DESCRIPTION("Generic software accelerated fill rectangle");
MODULE_AUTHOR("Zsolt Kajtar <soci@c64.rulez.org>");
MODULE_DESCRIPTION("I/O memory packed pixel framebuffer area fill");
MODULE_LICENSE("GPL");
+11 −346

File changed.

Preview size limit exceeded, changes collapsed.

+43 −0
Original line number Diff line number Diff line
/* SPDX-License-Identifier: GPL-2.0-only
 *
 *	I/O memory framebuffer access for drawing routines
 *
 *	Copyright (C) 2025 Zsolt Kajtar (soci@c64.rulez.org)
 */

/* keeps track of a bit address in framebuffer memory */
struct fb_address {
	void __iomem *address;
	int bits;
};

/* initialize the bit address pointer to the beginning of the frame buffer */
static inline struct fb_address fb_address_init(struct fb_info *p)
{
	void __iomem *base = p->screen_base;
	struct fb_address ptr;

	ptr.address = PTR_ALIGN_DOWN(base, BITS_PER_LONG / BITS_PER_BYTE);
	ptr.bits = (base - ptr.address) * BITS_PER_BYTE;
	return ptr;
}

/* framebuffer write access */
static inline void fb_write_offset(unsigned long val, int offset, const struct fb_address *dst)
{
#if BITS_PER_LONG == 32
	fb_writel(val, dst->address + offset * (BITS_PER_LONG / BITS_PER_BYTE));
#else
	fb_writeq(val, dst->address + offset * (BITS_PER_LONG / BITS_PER_BYTE));
#endif
}

/* framebuffer read access */
static inline unsigned long fb_read_offset(int offset, const struct fb_address *src)
{
#if BITS_PER_LONG == 32
	return fb_readl(src->address + offset * (BITS_PER_LONG / BITS_PER_BYTE));
#else
	return fb_readq(src->address + offset * (BITS_PER_LONG / BITS_PER_BYTE));
#endif
}
Loading