What Is This?

This is an HTML+RDFa representation of metadata describing this Web-addressable resource.

Why Is This Important?

The property (attributes and values pairs) links on this page unveil a different kind of link, one which enables the following on HTTP networks such as the Web:

  1. Unambiguous identity for entities (aka. strong identifiers)
  2. Implicit binding of an entity and its metadata via strong identifiers
  3. Multiple metadata representations that enable a variety of presentations
  4. High precision Search and Find queries that simply use the metadata documents (by referencing entity URIs) as the query's Data Source Name

How Do I Discover Alternative Metadata Representations?

This document exposes metadata in the following formats: (X)HTML+RDFa, Turtle, N3, RDF/JSON, or RDF/XML. In the most basic form, you can simply view the (X)HTML source markup of this page, and go directly to the <head/> section which contains a <link/> tag with relationship and type properties for each format.

In addition, you can also explicitly request a desired metadata representation for a given resource via HTTP GET requests that use the entity's strong identifier as the call target.

How Can I Expose My Web Resources In This Manner?

Simply include the following in the <head/> section of your (static or dynamically generated) (X)HTML page:

<link rel="alternate" title="My Data in RDF Linked Data form"
type="application/rdf+xml"
href="http://linkeddata.uriburner.com/about/id/<this-page-URL>/>"

How Is This Related To The Linked Data Meme?

As stated above, the links in this page expose strong identifiers for its primary topic, secondary topics, attributes, and some values. These links, via implicit association, act as conduits to their metadata-bearing documents, in a variety formats.

[OpenLink Software]

About: PALU MLRD (Feature) by KeremTurgutlu · Pull Request #4 · AnswerDotAI/vllm · GitHub

An Entity of Type : Document, from Data Source : https://github.com/AnswerDotAI/vllm/pull/4/commits/a64559fecbf4cd7d69c9510a0ecf8c38e1635388, within Data Space : dev.restore.ovi.cnr.it:8890

  • References
  • Referenced By
content

Description
  • This PR implements PALU based on the existing XFormers(CLA) attn backend decode and prefill kernels: Our implementation follows Figure 2 from the paper and implements MLRD (Multi-head low rank decomposition) version from the paper to make implementation easier with the existing paged attention kernels. For example, Grid: (num_heads, num_seqs, max_num_partitions) - is the launch parameter for the paged attention kernel meaning that blockDim.x corresponds to a single head so during up projection it will be easier to work with a single head. Kernel implementations below are responsible only for the (QK^T) @ V portion of the computation, and fused output projection will be handled in the model layer. query - This will have the original head_size without compression as it is computed every time. key - This will be down projected by the fused Kd_proj at the model layer before caching. Inside the attention kernel it will be up projected on the fly inside and RoPE will be applied. value - Similar to the key, value will be also down projected by the fused Vd_proj at the model layer before caching, but it won't require an up projection inside the kernel since we will be using a fused output projection layer O_proj at the model layer. 1) PALU Paged Attention Decode CUDA Kernel Implemented csrc/attention/attention_kernels_palu.cu based on csrc/attention/attention_kernels.cu. Followed implementation details from the docs. Only support BLOCK_SIZE=32 (this is paged attn block size not CUDA grid!) to make it equal to WARP_SIZE, to ensure THREAD_GROUP_SIZE=1 in which case each thread will process all the elements of 1 key token of 1 head at a given time. This way we can up project elements of a single key token of a given head using one thread. This is also to make implementation easier and to avoid dealing with synching across multiple threads during dot product of the up projection. Added initial tests in a notebook which currently fail. Fix implementation and pass the tests. Add RoPE. Here we modify 2) PALU Paged Attention Prefill Triton Kernel TODO. 3) Remaining changes required at higher level: Such as handling paged attention KV cache allocation based on palu_head_size which can be passed as a config param. Also, other model related code changes as needed. TODO.
Title
  • PALU MLRD (Feature) by KeremTurgutlu · Pull Request #4 · AnswerDotAI/vllm · GitHub
container of
  • http://dev.restore.ovi.cnr.it:8890/abo...510a0ecf8c38e1635388#id0x7f1970ecb118
  • http://dev.restore.ovi.cnr.it:8890/abo...510a0ecf8c38e1635388#id0x7f1970bed368
  • http://dev.restore.ovi.cnr.it:8890/abo...510a0ecf8c38e1635388#id0x7f65831868b8
  • http://dev.restore.ovi.cnr.it:8890/abo...510a0ecf8c38e1635388#id0x7f6580a8a258
links to
  • https://github.com/resources/whitepapers
  • https://github.com/security/advanced-security
  • https://desktop.github.com
  • https://github.com/
  • https://github.com/features/copilot
  • https://github.com/features/actions
  • https://github.com/features/codespaces
  • https://github.com/features/issues
  • https://github.com/features/code-review
  • https://github.com/features/discussions
  • https://github.com/features/code-search
  • https://github.com/features
  • https://docs.github.com
  • https://skills.github.com
  • https://github.blog
  • https://github.com/enterprise
  • https://github.com/team
  • https://github.com/enterprise/startups
  • https://github.com/solutions/industry/nonprofits
  • https://github.com/solutions/use-case/devsecops
  • https://github.com/solutions/use-case/devops
  • https://github.com/solutions/use-case/ci-cd
  • https://github.com/solutions/use-case
  • https://github.com/solutions/industry/healthcare
  • https://github.com/solutions/industry/financial-services
  • https://github.com/solutions/industry/manufacturing
  • https://github.com/solutions/industry/government
  • https://github.com/solutions/industry
  • https://github.com/solutions
  • https://github.com/resources/articles/ai
  • https://github.com/resources/articles/devops
  • https://github.com/resources/articles/security
  • https://github.com/resources/articles/software-development
  • https://github.com/resources/articles
  • https://resources.github.com/learn/pathways
  • https://resources.github.com
  • https://github.com/customer-stories
  • https://partner.github.com
  • https://github.com/solutions/executive-insights
  • https://github.com/sponsors
  • https://github.com/readme
  • https://github.com/topics
  • https://github.com/trending
  • https://github.com/collections
  • https://github.com/premium-support
  • https://github.com/pricing
  • https://docs.github.com/search-github/...derstanding-github-code-search-syntax
  • https://github.com
  • https://docs.github.com/site-policy/github-terms/github-terms-of-service
  • https://docs.github.com/site-policy/privacy-policies/github-privacy-statement
  • https://github.com/security
  • https://www.githubstatus.com/
  • https://docs.github.com/
  • https://support.github.com?tags=dotcom-footer
  • https://github.com/why-github
  • https://github.com/vllm-project/vllm
  • https://docs.github.com/terms
  • https://docs.github.com/privacy
  • https://github.co/hiddenchars
  • https://github.com/features/copilot/copilot-business
  • https://github.com/AnswerDotAI
  • https://github.com/login?return_to=%2FAnswerDotAI%2Fvllm
  • https://github.com/AnswerDotAI/vllm/pulls
  • https://github.com/AnswerDotAI/vllm/actions
  • https://github.com/AnswerDotAI/vllm/projects
  • https://github.com/AnswerDotAI/vllm/security
  • https://github.com/AnswerDotAI/vllm/pulse
  • https://github.com/AnswerDotAI/vllm/pull/4
  • https://github.com/AnswerDotAI/vllm
  • https://github.com/AnswerDotAI/vllm/pu...510a0ecf8c38e1635388#start-of-content
  • https://github.com/login?return_to=htt...559fecbf4cd7d69c9510a0ecf8c38e1635388
  • https://github.com/signup?ref_cta=Sign...r-repo&source_repo=AnswerDotAI%2Fvllm
  • https://github.com/AnswerDotAI/vllm/pu...ecbf4cd7d69c9510a0ecf8c38e1635388#top
  • https://github.com/AnswerDotAI/vllm/pu...d5a34b0222cef273b7c3a2af62eb747f9d20a
  • https://github.com/AnswerDotAI/vllm/pu...233df5a3f0f660e6eac03d7b1a329262be64f
  • https://github.com/AnswerDotAI/vllm/pu...0c65b3c44d9958afba1a208dcf0dfe92330e8
  • https://github.com/AnswerDotAI/vllm/bl...c9510a0ecf8c38e1635388/CMakeLists.txt
  • https://github.com/AnswerDotAI/vllm/pull/4/commits/{{ revealButtonHref }}
  • https://github.com/AnswerDotAI/vllm/bl...c/attention/attention_kernels_palu.cu
  • https://github.com/signup?return_to=%2...rDotAI%2Fvllm%2Fissues%2Fnew%2Fchoose
  • https://github.com/login?return_to=%2FAnswerDotAI%2Fvllm%2Fissues%2Fnew%2Fchoose
  • https://github.com/KeremTurgutlu
  • https://github.com/AnswerDotAI/vllm/tree/torchao
  • https://github.com/AnswerDotAI/vllm/tree/palu
  • https://github.com/AnswerDotAI/vllm/pull/4/commits
  • https://github.com/AnswerDotAI/vllm/pull/4/checks
  • https://github.com/AnswerDotAI/vllm/pull/4/files
  • https://github.com/AnswerDotAI/vllm/pu...dd2361ba5b5349195a1f8c0fee757e9eb11ee
  • https://github.com/AnswerDotAI/vllm/pu...d89644d88dfaa04e7f7cdb43c1e7a2b1a571a
  • https://github.com/AnswerDotAI/vllm/pu...559fecbf4cd7d69c9510a0ecf8c38e1635388
  • https://github.com/AnswerDotAI/vllm/pu...9fb6a55575bac9eb2e3d6e7e6f74f48044783
  • https://github.com/AnswerDotAI/vllm/pu...f41f4974dcc7911b0f0cadbbf2a3648e69501
  • https://github.com/AnswerDotAI/vllm/pu...1ea94051432c856b887e01d8f782662381438
  • https://github.com/AnswerDotAI/vllm/pu...dd4e94c4911a1f0857ee6b8037dd280f3e294
  • https://github.com/AnswerDotAI/vllm/pu...f61e5a65eb831a209ff01213c87d68b6c15b5
type
  • Document
xhv:alternate
  • https://github.com/AnswerDotAI/vllm/pull/4.diff
  • https://github.com/AnswerDotAI/vllm/pull/4.patch
described by
  • https://github.com/AnswerDotAI/vllm/pull/4/commits/a64559fecbf4cd7d69c9510a0ecf8c38e1635388
Subject
  • https://github.com/AnswerDotAI/vllm/pull/4/commits/a64559fecbf4cd7d69c9510a0ecf8c38e1635388
container of
  • https://github.com/AnswerDotAI/vllm/pull/4/commits/a64559fecbf4cd7d69c9510a0ecf8c38e1635388
primary topic
  • https://github.com/AnswerDotAI/vllm/pull/4/commits/a64559fecbf4cd7d69c9510a0ecf8c38e1635388
Alternative Linked Data Views: Facets | iSPARQL | ODE     Raw Linked Data formats: CXML | CSV | RDF ( N-Triples N3/Turtle JSON XML ) | OData ( Atom JSON ) | Microdata ( JSON HTML) | JSON-LD
This material is Open Knowledge   W3C Semantic Web Technology     This material is Open Knowledge Creative Commons License Valid XHTML + RDFa
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.
OpenLink Virtuoso version 07.20.3231, on Linux (x86_64-generic_glibc25-linux-gnu), Single Edition