{"version":"https://jsonfeed.org/version/1","title":"Micro.blog - sherlock","home_page_url":"https://micro.blog","feed_url":"https://micro.blog/posts/sherlock","_microblog":{"about":"https://micro.blog/about/api","id":"1603824","username":"sherlock","bio":"","pronouns":"","is_following":false,"is_you":false,"following_count":1,"discover_count":0},"author":{"name":"sherlock","url":"https://sherlock.micro.blog/","avatar":"https://www.gravatar.com/avatar/082d0f794dd75f7c2f7ae4d85c0b7c2e?s=96&d=https%3A%2F%2Fmicro.blog%2Fimages%2Fblank_avatar.png"},"items":[{"id":"39851183","content_html":"<p>Parallel Reduction Optimization with CUDA: <a href=\"https://sherlock.micro.blog/2024/06/19/parallel-reduction-optimization.html\">sherlock.micro.blog</a></p>","summary":"","url":"https://sherlock.micro.blog/2024/06/19/parallel-reduction-optimization.html","date_published":"2024-06-19T02:33:54+00:00","author":{"name":"sherlock","url":"https://sherlock.micro.blog/","avatar":"https://cdn.micro.blog/photos/96/https%3A%2F%2Fwww.gravatar.com%2Favatar%2F082d0f794dd75f7c2f7ae4d85c0b7c2e%3Fs%3D96%26d%3Dhttps%253A%252F%252Fmicro.blog%252Fimages%252Fblank_avatar.png","_microblog":{"username":"sherlock"}},"_microblog":{"date_relative":"2024-06-19 02:33","date_timestamp":1718764434,"is_favorite":false,"is_bookmark":false,"is_deletable":false,"is_conversation":false,"is_linkpost":true,"is_mention":false,"note":"","syndication":[]}},{"id":"38537772","content_html":"<p>W4A8KV4 的一些思考: <a href=\"https://sherlock.micro.blog/2024/05/30/wakv.html\">sherlock.micro.blog</a></p>","summary":"","url":"https://sherlock.micro.blog/2024/05/30/wakv.html","date_published":"2024-05-30T14:34:50+00:00","author":{"name":"sherlock","url":"https://sherlock.micro.blog/","avatar":"https://cdn.micro.blog/photos/96/https%3A%2F%2Fwww.gravatar.com%2Favatar%2F082d0f794dd75f7c2f7ae4d85c0b7c2e%3Fs%3D96%26d%3Dhttps%253A%252F%252Fmicro.blog%252Fimages%252Fblank_avatar.png","_microblog":{"username":"sherlock"}},"_microblog":{"date_relative":"2024-05-30 14:34","date_timestamp":1717079690,"is_favorite":false,"is_bookmark":false,"is_deletable":false,"is_conversation":false,"is_linkpost":true,"is_mention":false,"note":"","syndication":[]}},{"id":"36898151","content_html":"<p>量化评估 Proposal: <a href=\"https://sherlock.micro.blog/2024/05/06/proposal.html\">sherlock.micro.blog</a></p>","summary":"","url":"https://sherlock.micro.blog/2024/05/06/proposal.html","date_published":"2024-05-06T03:29:24+00:00","author":{"name":"sherlock","url":"https://sherlock.micro.blog/","avatar":"https://cdn.micro.blog/photos/96/https%3A%2F%2Fwww.gravatar.com%2Favatar%2F082d0f794dd75f7c2f7ae4d85c0b7c2e%3Fs%3D96%26d%3Dhttps%253A%252F%252Fmicro.blog%252Fimages%252Fblank_avatar.png","_microblog":{"username":"sherlock"}},"_microblog":{"date_relative":"2024-05-06 03:29","date_timestamp":1714966164,"is_favorite":false,"is_bookmark":false,"is_deletable":false,"is_conversation":false,"is_linkpost":true,"is_mention":false,"note":"","syndication":[]}},{"id":"36850531","content_html":"<p>LLM Speculative Sampling: <a href=\"https://sherlock.micro.blog/2024/05/05/llm-speculative-sampling.html\">sherlock.micro.blog</a></p>","summary":"","url":"https://sherlock.micro.blog/2024/05/05/llm-speculative-sampling.html","date_published":"2024-05-05T07:51:24+00:00","author":{"name":"sherlock","url":"https://sherlock.micro.blog/","avatar":"https://cdn.micro.blog/photos/96/https%3A%2F%2Fwww.gravatar.com%2Favatar%2F082d0f794dd75f7c2f7ae4d85c0b7c2e%3Fs%3D96%26d%3Dhttps%253A%252F%252Fmicro.blog%252Fimages%252Fblank_avatar.png","_microblog":{"username":"sherlock"}},"_microblog":{"date_relative":"2024-05-05 07:51","date_timestamp":1714895484,"is_favorite":false,"is_bookmark":false,"is_deletable":false,"is_conversation":false,"is_linkpost":true,"is_mention":false,"note":"","syndication":[]}},{"id":"36198992","content_html":"<p>SmoothQuant 优化记录: <a href=\"https://sherlock.micro.blog/2024/04/25/154204.html\">sherlock.micro.blog</a></p>","summary":"","url":"https://sherlock.micro.blog/2024/04/25/154204.html","date_published":"2024-04-25T07:42:04+00:00","author":{"name":"sherlock","url":"https://sherlock.micro.blog/","avatar":"https://cdn.micro.blog/photos/96/https%3A%2F%2Fwww.gravatar.com%2Favatar%2F082d0f794dd75f7c2f7ae4d85c0b7c2e%3Fs%3D96%26d%3Dhttps%253A%252F%252Fmicro.blog%252Fimages%252Fblank_avatar.png","_microblog":{"username":"sherlock"}},"_microblog":{"date_relative":"2024-04-25 07:42","date_timestamp":1714030924,"is_favorite":false,"is_bookmark":false,"is_deletable":false,"is_conversation":false,"is_linkpost":true,"is_mention":false,"note":"","syndication":[]}},{"id":"36198813","content_html":"<p>SmoothQuant 问题记录: <a href=\"https://sherlock.micro.blog/2024/04/25/153838.html\">sherlock.micro.blog</a></p>","summary":"","url":"https://sherlock.micro.blog/2024/04/25/153838.html","date_published":"2024-04-25T07:38:38+00:00","author":{"name":"sherlock","url":"https://sherlock.micro.blog/","avatar":"https://cdn.micro.blog/photos/96/https%3A%2F%2Fwww.gravatar.com%2Favatar%2F082d0f794dd75f7c2f7ae4d85c0b7c2e%3Fs%3D96%26d%3Dhttps%253A%252F%252Fmicro.blog%252Fimages%252Fblank_avatar.png","_microblog":{"username":"sherlock"}},"_microblog":{"date_relative":"2024-04-25 07:38","date_timestamp":1714030718,"is_favorite":false,"is_bookmark":false,"is_deletable":false,"is_conversation":false,"is_linkpost":true,"is_mention":false,"note":"","syndication":[]}},{"id":"35736535","content_html":"在工作中如果我们写了一个 cuda kernel，需要去计算他的理论利用率和理论带宽，这样我们才能知道这个 kernel 还有多少的优化空间。\n\n搞清楚两个定义，tflops 和 memorybandwidth 分别是什么？\nmemory bandwidth 是单位时间 HBM 到寄存器的访问量。\n\n在 llm 中，最重要的一个 kernel 就是 attention，下面针对 causal LM，计算 self attention 在前向过程中的 tflops 和 bandwidth。\n\n首先回顾一下 attention 的计算流程，假设 query，key 和 value 分别为 QKV，那么整体 at... <a href=\"https://sherlock.micro.blog/2024/04/18/self-attention-tlops.html\">sherlock.micro.blog</a>","summary":"","url":"https://sherlock.micro.blog/2024/04/18/self-attention-tlops.html","date_published":"2024-04-18T03:52:21+00:00","author":{"name":"sherlock","url":"https://sherlock.micro.blog/","avatar":"https://cdn.micro.blog/photos/96/https%3A%2F%2Fwww.gravatar.com%2Favatar%2F082d0f794dd75f7c2f7ae4d85c0b7c2e%3Fs%3D96%26d%3Dhttps%253A%252F%252Fmicro.blog%252Fimages%252Fblank_avatar.png","_microblog":{"username":"sherlock"}},"_microblog":{"date_relative":"2024-04-18 03:52","date_timestamp":1713412341,"is_favorite":false,"is_bookmark":false,"is_deletable":false,"is_conversation":false,"is_linkpost":false,"is_mention":false,"note":"","syndication":[]}},{"id":"35674570","content_html":"<p>Triton Tutorial #2: <a href=\"https://sherlock.micro.blog/2024/04/17/triton-tutorial.html\">sherlock.micro.blog</a></p>","summary":"","url":"https://sherlock.micro.blog/2024/04/17/triton-tutorial.html","date_published":"2024-04-17T08:18:47+00:00","author":{"name":"sherlock","url":"https://sherlock.micro.blog/","avatar":"https://cdn.micro.blog/photos/96/https%3A%2F%2Fwww.gravatar.com%2Favatar%2F082d0f794dd75f7c2f7ae4d85c0b7c2e%3Fs%3D96%26d%3Dhttps%253A%252F%252Fmicro.blog%252Fimages%252Fblank_avatar.png","_microblog":{"username":"sherlock"}},"_microblog":{"date_relative":"2024-04-17 08:18","date_timestamp":1713341927,"is_favorite":false,"is_bookmark":false,"is_deletable":false,"is_conversation":false,"is_linkpost":true,"is_mention":false,"note":"","syndication":[]}}]}